diff mbox series

[FFmpeg-devel,1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow

Message ID OSZP286MB2173897FD4B90324B4B1AFAFCAF22@OSZP286MB2173.JPNP286.PROD.OUTLOOK.COM
State New
Headers show
Series [FFmpeg-devel,1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Wu Jianhua May 29, 2024, 3:38 p.m. UTC
From: Wu Jianhua <toqsxw@outlook.com>

Some tests fails with certain seeds

tests/checkasm/checkasm 2325607578 --test=vvc_alf
checkasm: using random seed 2325607578
AVX2:
    vvc_alf_filter_luma_120x20_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x24_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x28_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x32_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x36_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x40_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x44_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x48_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x52_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x56_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x60_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x64_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x68_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x72_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x76_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x80_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x84_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x88_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x92_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x96_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x100_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x104_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x108_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x112_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x116_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x120_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x124_12_avx2 (vvc_alf.c:104)
    vvc_alf_filter_luma_120x128_12_avx2 (vvc_alf.c:104)
  - vvc_alf.alf_filter   [FAILED]
  - vvc_alf.alf_classify [OK]
checkasm: 28 of 9216 tests have failed

Reported-by: James Almer <jamrial@gmail.com>
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
---
 libavcodec/x86/vvc/vvc_alf.asm | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Andreas Rheinhardt May 29, 2024, 4:23 p.m. UTC | #1
toqsxw@outlook.com:
> From: Wu Jianhua <toqsxw@outlook.com>
> 
> Some tests fails with certain seeds
> 
> tests/checkasm/checkasm 2325607578 --test=vvc_alf
> checkasm: using random seed 2325607578
> AVX2:
>     vvc_alf_filter_luma_120x20_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x24_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x28_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x32_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x36_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x40_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x44_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x48_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x52_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x56_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x60_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x64_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x68_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x72_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x76_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x80_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x84_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x88_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x92_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x96_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x100_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x104_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x108_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x112_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x116_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x120_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x124_12_avx2 (vvc_alf.c:104)
>     vvc_alf_filter_luma_120x128_12_avx2 (vvc_alf.c:104)
>   - vvc_alf.alf_filter   [FAILED]
>   - vvc_alf.alf_classify [OK]
> checkasm: 28 of 9216 tests have failed
> 
> Reported-by: James Almer <jamrial@gmail.com>
> Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
> ---
>  libavcodec/x86/vvc/vvc_alf.asm | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/libavcodec/x86/vvc/vvc_alf.asm b/libavcodec/x86/vvc/vvc_alf.asm
> index 71e821c27b..91f158bac9 100644
> --- a/libavcodec/x86/vvc/vvc_alf.asm
> +++ b/libavcodec/x86/vvc/vvc_alf.asm
> @@ -278,7 +278,9 @@ SECTION .text
>      psrad             m0, SHIFT + 3
>      psrad             m1, SHIFT + 3
>  %%shift_end:
> +%if ps == 1
>      packssdw          m0, m0, m1
> +%endif
>  %endmacro
>  
>  ; FILTER_VB(line)
> @@ -356,7 +358,18 @@ SECTION .text
>  
>      FILTER_VB         xq
>  
> +    ; sum += curr
> +%if ps == 1
>      paddw             m0, m2
> +%else
> +    vpunpcklqdq      m11, m2, m2
> +    vpunpckhqdq      m12, m2, m2
> +    vpunpcklwd       m11, m11, m14
> +    vpunpcklwd       m12, m12, m14
> +    paddd             m0, m11
> +    paddd             m1, m12
> +    packssdw          m0, m0, m1
> +%endif
>  
>      ; clip to pixel
>      CLIPW             m0, m14, m15

Can this happen with real inputs (like when called from the decoder)? If
not, then the test needs to be made more realistic.
Anyway, what is the performance impact of this?

- Andreas
Ronald S. Bultje May 29, 2024, 5:51 p.m. UTC | #2
Hi,

On Wed, May 29, 2024 at 11:38 AM <toqsxw@outlook.com> wrote:

> +%else
> +    vpunpcklqdq      m11, m2, m2
> +    vpunpckhqdq      m12, m2, m2
> +    vpunpcklwd       m11, m11, m14
> +    vpunpcklwd       m12, m12, m14
> +    paddd             m0, m11
> +    paddd             m1, m12
> +    packssdw          m0, m0, m1
> +%endif
>

punpcklqdq a, src, src
punpckhqdq b, src, src
punpcklwd a, a, zero
punpcklwd b, b, zero

is the same as

punpcklwd a, src, zero
punpckhwd b, src, zero

Also, the whole thing just emulates a saturated add. Can't you use paddsw
instead of paddw and be done with it? To add to Andreas' question: is
saturating here normatively required?

Ronald
Wu Jianhua May 29, 2024, 7:44 p.m. UTC | #3
Ronald S. Bultje:
> 发件人: Ronald S. Bultje <rsbultje@gmail.com>
> 发送时间: 2024年5月29日 10:51
> 收件人: FFmpeg development discussions and patches
> 抄送: James Almer; Wu Jianhua
> 主题: Re: [FFmpeg-devel] [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow
> 
> Hi,
> 
> On Wed, May 29, 2024 at 11:38 AM <toqsxw@outlook.com<mailto:toqsxw@outlook.com>> wrote:
> +%else
> +    vpunpcklqdq      m11, m2, m2
> +    vpunpckhqdq      m12, m2, m2
> +    vpunpcklwd       m11, m11, m14
> +    vpunpcklwd       m12, m12, m14
> +    paddd             m0, m11
> +    paddd             m1, m12
> +    packssdw          m0, m0, m1
> +%endif
> 
> punpcklqdq a, src, src
> punpckhqdq b, src, src
> punpcklwd a, a, zero
> punpcklwd b, b, zero
> 
> is the same as
> 
> punpcklwd a, src, zero
> punpckhwd b, src, zero

Thank you for pointing out this. This modification is really helpful for my improvement!


Andreas:
>Can this happen with real inputs (like when called from the decoder)? If
> not, then the test needs to be made more realistic.
> Anyway, what is the performance impact of this?

I didn't have a unit test, but the average FPS looks no change.

Ronald:
> Also, the whole thing just emulates a saturated add. Can't you use paddsw instead of paddw and be done with it? To add to Andreas' question: is saturating here normatively required?

We didn't have any sample that failed for this issue except for the checksum with specific seeds. I think we can keep not changing it until a real sample has something wrong. 

@Nuomi to get more details.
Ronald S. Bultje May 29, 2024, 8:56 p.m. UTC | #4
Hi,

On Wed, May 29, 2024 at 3:44 PM Wu Jianhua <toqsxw@outlook.com> wrote:

> Ronald S. Bultje:
> > On Wed, May 29, 2024 at 11:38 AM <toqsxw@outlook.com<mailto:
> toqsxw@outlook.com>> wrote:
> > +%else
> > +    vpunpcklqdq      m11, m2, m2
> > +    vpunpckhqdq      m12, m2, m2
> > +    vpunpcklwd       m11, m11, m14
> > +    vpunpcklwd       m12, m12, m14
> > +    paddd             m0, m11
> > +    paddd             m1, m12
> > +    packssdw          m0, m0, m1
> > +%endif
> >
> [..]
>
> Also, the whole thing just emulates a saturated add. Can't you use paddsw
> instead of paddw and be done with it? To add to Andreas' question: is
> saturating here normatively required?
>
> We didn't have any sample that failed for this issue except for the
> checksum with specific seeds. I think we can keep not changing it until a
> real sample has something wrong.
>
> @Nuomi to get more details.
>

I think "just" replacing paddw with paddsw is correct, since the input
pixels are 12bit (so they could be either unsigned or signed), the filtered
output is the result of packssdw (so signed words), and the desired output
is 12bit pixels anyway, anything greater than that is clipped to 12bit
range. So to me, it seems paddsw is a cheaper way to accomplish the same
thing.

Ronald
Wu Jianhua May 30, 2024, 4:31 p.m. UTC | #5
Ronald S. Bultje:
> 发件人: Ronald S. Bultje <rsbultje@gmail.com>
> 发送时间: 2024年5月29日 13:56
> 收件人: Wu Jianhua
> 抄送: FFmpeg development discussions and patches; Nuo Mi; James Almer
> 主题: Re: [FFmpeg-devel] [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow
> 
> Hi,
> 
> On Wed, May 29, 2024 at 3:44 PM Wu Jianhua <toqsxw@outlook.com<mailto:toqsxw@outlook.com>> wrote:
> Ronald S. Bultje:
>> On Wed, May 29, 2024 at 11:38 AM <toqsxw@outlook.com<mailto:toqsxw@outlook.com>> <mailto:toqsxw@outlook.com<mailto:toqsxw@outlook.com>>> wrote:
>> +%else
>> +    vpunpcklqdq      m11, m2, m2
>> +    vpunpckhqdq      m12, m2, m2
>> +    vpunpcklwd       m11, m11, m14
>> +    vpunpcklwd       m12, m12, m14
>> +    paddd             m0, m11
>> +    paddd             m1, m12
>>  +    packssdw          m0, m0, m1
>> +%endif
>
> [..]
> > Also, the whole thing just emulates a saturated add. Can't you use paddsw instead of paddw and be done with it? To add to Andreas' question: is >>  saturating here normatively required?
> 
> > We didn't have any sample that failed for this issue except for the checksum with specific seeds. I think we can keep not changing it until a real  sample has something wrong.
> 
> @Nuomi to get more details.
> 
> I think "just" replacing paddw with paddsw is correct, since the input pixels are 12bit (so they could be either unsigned or signed), the filtered output > is the result of packssdw (so signed words), and the desired output is 12bit pixels anyway, anything greater than that is clipped to 12bit range. So to > me, it seems paddsw is a cheaper way to accomplish the same thing.
> 
> Ronald

Hi Ronald,

Yes, it does. I've test paddsw and everything works well. It must be a cheaper way to get minimum performance loss.

And v2 sent.

Thanks for this.
Jianhua
diff mbox series

Patch

diff --git a/libavcodec/x86/vvc/vvc_alf.asm b/libavcodec/x86/vvc/vvc_alf.asm
index 71e821c27b..91f158bac9 100644
--- a/libavcodec/x86/vvc/vvc_alf.asm
+++ b/libavcodec/x86/vvc/vvc_alf.asm
@@ -278,7 +278,9 @@  SECTION .text
     psrad             m0, SHIFT + 3
     psrad             m1, SHIFT + 3
 %%shift_end:
+%if ps == 1
     packssdw          m0, m0, m1
+%endif
 %endmacro
 
 ; FILTER_VB(line)
@@ -356,7 +358,18 @@  SECTION .text
 
     FILTER_VB         xq
 
+    ; sum += curr
+%if ps == 1
     paddw             m0, m2
+%else
+    vpunpcklqdq      m11, m2, m2
+    vpunpckhqdq      m12, m2, m2
+    vpunpcklwd       m11, m11, m14
+    vpunpcklwd       m12, m12, m14
+    paddd             m0, m11
+    paddd             m1, m12
+    packssdw          m0, m0, m1
+%endif
 
     ; clip to pixel
     CLIPW             m0, m14, m15