diff mbox series

[FFmpeg-devel,v3,1/6] lavc/arm: dont assign hevc_qpel functions for non-multiple of 8 widths

Message ID 20220104052018.9541-1-jdek@itanimul.li
State New
Headers show
Series [FFmpeg-devel,v3,1/6] lavc/arm: dont assign hevc_qpel functions for non-multiple of 8 widths | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished
andriy/make_ppc success Make finished
andriy/make_fate_ppc fail Make fate failed

Commit Message

J. Dekker Jan. 4, 2022, 5:20 a.m. UTC
The assembly is written assuming that the width is a multiple of 8.

However the real issue is the functions were errorneously assigned to
the 2, 4, 6 & 12 widths. This behaviour never broke the decoder as
samples which trigger the functions for these widths have not been found
in the wild. This relies on the mappings in ff_hevc_pel_weight[].

Signed-off-by: J. Dekker <jdek@itanimul.li>
---
 libavcodec/arm/hevcdsp_init_neon.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

 This set has already been reviewed by Martin, sending to list for
 transparency.

Comments

Andreas Rheinhardt Jan. 5, 2022, 6:46 a.m. UTC | #1
J. Dekker:
> The assembly is written assuming that the width is a multiple of 8.
> 
> However the real issue is the functions were errorneously assigned to
> the 2, 4, 6 & 12 widths. This behaviour never broke the decoder as
> samples which trigger the functions for these widths have not been found
> in the wild. This relies on the mappings in ff_hevc_pel_weight[].
> 
> Signed-off-by: J. Dekker <jdek@itanimul.li>
> ---
>  libavcodec/arm/hevcdsp_init_neon.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
>  This set has already been reviewed by Martin, sending to list for
>  transparency.
> 
> diff --git a/libavcodec/arm/hevcdsp_init_neon.c b/libavcodec/arm/hevcdsp_init_neon.c
> index 201a088dac..112edb5edd 100644
> --- a/libavcodec/arm/hevcdsp_init_neon.c
> +++ b/libavcodec/arm/hevcdsp_init_neon.c
> @@ -270,7 +270,8 @@ av_cold void ff_hevc_dsp_init_neon(HEVCDSPContext *c, const int bit_depth)
>          put_hevc_qpel_uw_neon[3][1]      = ff_hevc_put_qpel_uw_h1v3_neon_8;
>          put_hevc_qpel_uw_neon[3][2]      = ff_hevc_put_qpel_uw_h2v3_neon_8;
>          put_hevc_qpel_uw_neon[3][3]      = ff_hevc_put_qpel_uw_h3v3_neon_8;
> -        for (x = 0; x < 10; x++) {
> +        for (x = 3; x < 10; x++) {
> +            if (x == 4) continue;
>              c->put_hevc_qpel[x][1][0]         = ff_hevc_put_qpel_neon_wrapper;
>              c->put_hevc_qpel[x][0][1]         = ff_hevc_put_qpel_neon_wrapper;
>              c->put_hevc_qpel[x][1][1]         = ff_hevc_put_qpel_neon_wrapper;
> 

This patchset led to regressions; see e.g.
http://fate.ffmpeg.org/report.cgi?time=20220104162724&slot=aarch64-linux-qemu-ubuntu-gcc-4.8

- Andreas
Martin Storsjö Jan. 5, 2022, 8:30 a.m. UTC | #2
On Wed, 5 Jan 2022, Andreas Rheinhardt wrote:

> J. Dekker:
>> The assembly is written assuming that the width is a multiple of 8.
>>
>> However the real issue is the functions were errorneously assigned to
>> the 2, 4, 6 & 12 widths. This behaviour never broke the decoder as
>> samples which trigger the functions for these widths have not been found
>> in the wild. This relies on the mappings in ff_hevc_pel_weight[].
>>
>> Signed-off-by: J. Dekker <jdek@itanimul.li>
>> ---
>>  libavcodec/arm/hevcdsp_init_neon.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>>  This set has already been reviewed by Martin, sending to list for
>>  transparency.
>>
>> diff --git a/libavcodec/arm/hevcdsp_init_neon.c b/libavcodec/arm/hevcdsp_init_neon.c
>> index 201a088dac..112edb5edd 100644
>> --- a/libavcodec/arm/hevcdsp_init_neon.c
>> +++ b/libavcodec/arm/hevcdsp_init_neon.c
>> @@ -270,7 +270,8 @@ av_cold void ff_hevc_dsp_init_neon(HEVCDSPContext *c, const int bit_depth)
>>          put_hevc_qpel_uw_neon[3][1]      = ff_hevc_put_qpel_uw_h1v3_neon_8;
>>          put_hevc_qpel_uw_neon[3][2]      = ff_hevc_put_qpel_uw_h2v3_neon_8;
>>          put_hevc_qpel_uw_neon[3][3]      = ff_hevc_put_qpel_uw_h3v3_neon_8;
>> -        for (x = 0; x < 10; x++) {
>> +        for (x = 3; x < 10; x++) {
>> +            if (x == 4) continue;
>>              c->put_hevc_qpel[x][1][0]         = ff_hevc_put_qpel_neon_wrapper;
>>              c->put_hevc_qpel[x][0][1]         = ff_hevc_put_qpel_neon_wrapper;
>>              c->put_hevc_qpel[x][1][1]         = ff_hevc_put_qpel_neon_wrapper;
>>
>
> This patchset led to regressions; see e.g.
> http://fate.ffmpeg.org/report.cgi?time=20220104162724&slot=aarch64-linux-qemu-ubuntu-gcc-4.8

Indeed. I had only ran fate-checkasm while reviewing it, assuming that it 
had been fully tested with fate-hevc by the patch author.

Instead of reverting the full 6 patch set, it's enough to revert a couple 
patches out of it though (there's some cosmetic cleanup that we can keep 
in for now). But I'm afraid we should disable the preexisting 
ff_hevc_sao_band_filter_8x8_8_neon function too. It's currently only run 
for the [0] case, which I think corresponds to width <= 8. But if that 
case also must handle widths that aren't an even multiple of 8, we'd have 
a lingering bug that isn't exercised by fate-hevc.

// Martin
diff mbox series

Patch

diff --git a/libavcodec/arm/hevcdsp_init_neon.c b/libavcodec/arm/hevcdsp_init_neon.c
index 201a088dac..112edb5edd 100644
--- a/libavcodec/arm/hevcdsp_init_neon.c
+++ b/libavcodec/arm/hevcdsp_init_neon.c
@@ -270,7 +270,8 @@  av_cold void ff_hevc_dsp_init_neon(HEVCDSPContext *c, const int bit_depth)
         put_hevc_qpel_uw_neon[3][1]      = ff_hevc_put_qpel_uw_h1v3_neon_8;
         put_hevc_qpel_uw_neon[3][2]      = ff_hevc_put_qpel_uw_h2v3_neon_8;
         put_hevc_qpel_uw_neon[3][3]      = ff_hevc_put_qpel_uw_h3v3_neon_8;
-        for (x = 0; x < 10; x++) {
+        for (x = 3; x < 10; x++) {
+            if (x == 4) continue;
             c->put_hevc_qpel[x][1][0]         = ff_hevc_put_qpel_neon_wrapper;
             c->put_hevc_qpel[x][0][1]         = ff_hevc_put_qpel_neon_wrapper;
             c->put_hevc_qpel[x][1][1]         = ff_hevc_put_qpel_neon_wrapper;