diff mbox series

[FFmpeg-devel,2/2] checkasm/vc1dsp: check the not-in-place block content

Message ID 20240611145505.14934-2-remi@remlab.net
State New
Headers show
Series [FFmpeg-devel,1/2] lavc/vc1dsp: match C block content in inv_trans_8x4_rvv | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 fail Make fate failed
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Rémi Denis-Courmont June 11, 2024, 2:55 p.m. UTC
This seems to cause issues in FATE for 4x4 and 4x8 transforms. But then
again, FATE does not seem to care in the 8x4 case.

Note that AArch64 NEON code is known to fail this test.
---
 tests/checkasm/vc1dsp.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

James Almer June 11, 2024, 3:08 p.m. UTC | #1
On 6/11/2024 11:55 AM, Rémi Denis-Courmont wrote:
> This seems to cause issues in FATE for 4x4 and 4x8 transforms. But then
> again, FATE does not seem to care in the 8x4 case.
> 
> Note that AArch64 NEON code is known to fail this test.
> ---
>   tests/checkasm/vc1dsp.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c
> index f18f0f8251..2cc6785a0c 100644
> --- a/tests/checkasm/vc1dsp.c
> +++ b/tests/checkasm/vc1dsp.c
> @@ -317,11 +317,13 @@ static void check_inv_trans_adding(void)
>               for (int j = 0; j < tests[t].height; ++j)
>                   for (int i = 0; i < tests[t].width; ++i) {
>                       int idx = j * 8 + i;
> -                    inv_trans_in1[idx] = inv_trans_in0[idx] = coeffs->d[j * tests[t].width + i];
> +                    inv_trans_in0[idx] = coeffs->d[j * tests[t].width + i];
>                   }
> +            memcpy(inv_trans_in1, inv_trans_in0, 8 * 8 * 2);

Is there any gain doing this?

>               call_ref(inv_trans_out0 + 24 + 8, 24, inv_trans_in0);
>               call_new(inv_trans_out1 + 24 + 8, 24, inv_trans_in1);
> -            if (memcmp(inv_trans_out0, inv_trans_out1, 10 * 24))
> +            if (memcmp(inv_trans_in0,   inv_trans_in1, 8 * 8 * 2) ||

nit: sizeof (int16_t)

> +                memcmp(inv_trans_out0, inv_trans_out1, 10 * 24))
>                   fail();
>               bench_new(inv_trans_out1 + 24 + 8, 24, inv_trans_in1 + 8);
>               av_free(coeffs);
Rémi Denis-Courmont June 11, 2024, 3:34 p.m. UTC | #2
Le tiistaina 11. kesäkuuta 2024, 18.08.51 EEST James Almer a écrit :
> On 6/11/2024 11:55 AM, Rémi Denis-Courmont wrote:
> > This seems to cause issues in FATE for 4x4 and 4x8 transforms. But then
> > again, FATE does not seem to care in the 8x4 case.
> > 
> > Note that AArch64 NEON code is known to fail this test.
> > ---
> > 
> >   tests/checkasm/vc1dsp.c | 6 ++++--
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c
> > index f18f0f8251..2cc6785a0c 100644
> > --- a/tests/checkasm/vc1dsp.c
> > +++ b/tests/checkasm/vc1dsp.c
> > @@ -317,11 +317,13 @@ static void check_inv_trans_adding(void)
> > 
> >               for (int j = 0; j < tests[t].height; ++j)
> >               
> >                   for (int i = 0; i < tests[t].width; ++i) {
> >                   
> >                       int idx = j * 8 + i;
> > 
> > -                    inv_trans_in1[idx] = inv_trans_in0[idx] = coeffs->d[j
> > * tests[t].width + i]; +                    inv_trans_in0[idx] =
> > coeffs->d[j * tests[t].width + i];> 
> >                   }
> > 
> > +            memcpy(inv_trans_in1, inv_trans_in0, 8 * 8 * 2);
> 
> Is there any gain doing this?

Checking that assembler does not write out of bounds in the later memcmp().

> >               call_ref(inv_trans_out0 + 24 + 8, 24, inv_trans_in0);
> >               call_new(inv_trans_out1 + 24 + 8, 24, inv_trans_in1);
> > 
> > -            if (memcmp(inv_trans_out0, inv_trans_out1, 10 * 24))
> > +            if (memcmp(inv_trans_in0,   inv_trans_in1, 8 * 8 * 2) ||
> 
> nit: sizeof (int16_t)

Yes well this code cannot be merged because we (at least I) do not know if it 
is correct, and it breaks AArch64.

> > +                memcmp(inv_trans_out0, inv_trans_out1, 10 * 24))
> > 
> >                   fail();
> >               
> >               bench_new(inv_trans_out1 + 24 + 8, 24, inv_trans_in1 + 8);
> >               av_free(coeffs);
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
diff mbox series

Patch

diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c
index f18f0f8251..2cc6785a0c 100644
--- a/tests/checkasm/vc1dsp.c
+++ b/tests/checkasm/vc1dsp.c
@@ -317,11 +317,13 @@  static void check_inv_trans_adding(void)
             for (int j = 0; j < tests[t].height; ++j)
                 for (int i = 0; i < tests[t].width; ++i) {
                     int idx = j * 8 + i;
-                    inv_trans_in1[idx] = inv_trans_in0[idx] = coeffs->d[j * tests[t].width + i];
+                    inv_trans_in0[idx] = coeffs->d[j * tests[t].width + i];
                 }
+            memcpy(inv_trans_in1, inv_trans_in0, 8 * 8 * 2);
             call_ref(inv_trans_out0 + 24 + 8, 24, inv_trans_in0);
             call_new(inv_trans_out1 + 24 + 8, 24, inv_trans_in1);
-            if (memcmp(inv_trans_out0, inv_trans_out1, 10 * 24))
+            if (memcmp(inv_trans_in0,   inv_trans_in1, 8 * 8 * 2) ||
+                memcmp(inv_trans_out0, inv_trans_out1, 10 * 24))
                 fail();
             bench_new(inv_trans_out1 + 24 + 8, 24, inv_trans_in1 + 8);
             av_free(coeffs);