Message ID | 20240611145505.14934-2-remi@remlab.net |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel,1/2] lavc/vc1dsp: match C block content in inv_trans_8x4_rvv | expand |
Context | Check | Description |
---|---|---|
yinshiyou/make_loongarch64 | success | Make finished |
yinshiyou/make_fate_loongarch64 | fail | Make fate failed |
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
On 6/11/2024 11:55 AM, Rémi Denis-Courmont wrote: > This seems to cause issues in FATE for 4x4 and 4x8 transforms. But then > again, FATE does not seem to care in the 8x4 case. > > Note that AArch64 NEON code is known to fail this test. > --- > tests/checkasm/vc1dsp.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c > index f18f0f8251..2cc6785a0c 100644 > --- a/tests/checkasm/vc1dsp.c > +++ b/tests/checkasm/vc1dsp.c > @@ -317,11 +317,13 @@ static void check_inv_trans_adding(void) > for (int j = 0; j < tests[t].height; ++j) > for (int i = 0; i < tests[t].width; ++i) { > int idx = j * 8 + i; > - inv_trans_in1[idx] = inv_trans_in0[idx] = coeffs->d[j * tests[t].width + i]; > + inv_trans_in0[idx] = coeffs->d[j * tests[t].width + i]; > } > + memcpy(inv_trans_in1, inv_trans_in0, 8 * 8 * 2); Is there any gain doing this? > call_ref(inv_trans_out0 + 24 + 8, 24, inv_trans_in0); > call_new(inv_trans_out1 + 24 + 8, 24, inv_trans_in1); > - if (memcmp(inv_trans_out0, inv_trans_out1, 10 * 24)) > + if (memcmp(inv_trans_in0, inv_trans_in1, 8 * 8 * 2) || nit: sizeof (int16_t) > + memcmp(inv_trans_out0, inv_trans_out1, 10 * 24)) > fail(); > bench_new(inv_trans_out1 + 24 + 8, 24, inv_trans_in1 + 8); > av_free(coeffs);
Le tiistaina 11. kesäkuuta 2024, 18.08.51 EEST James Almer a écrit : > On 6/11/2024 11:55 AM, Rémi Denis-Courmont wrote: > > This seems to cause issues in FATE for 4x4 and 4x8 transforms. But then > > again, FATE does not seem to care in the 8x4 case. > > > > Note that AArch64 NEON code is known to fail this test. > > --- > > > > tests/checkasm/vc1dsp.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c > > index f18f0f8251..2cc6785a0c 100644 > > --- a/tests/checkasm/vc1dsp.c > > +++ b/tests/checkasm/vc1dsp.c > > @@ -317,11 +317,13 @@ static void check_inv_trans_adding(void) > > > > for (int j = 0; j < tests[t].height; ++j) > > > > for (int i = 0; i < tests[t].width; ++i) { > > > > int idx = j * 8 + i; > > > > - inv_trans_in1[idx] = inv_trans_in0[idx] = coeffs->d[j > > * tests[t].width + i]; + inv_trans_in0[idx] = > > coeffs->d[j * tests[t].width + i];> > > } > > > > + memcpy(inv_trans_in1, inv_trans_in0, 8 * 8 * 2); > > Is there any gain doing this? Checking that assembler does not write out of bounds in the later memcmp(). > > call_ref(inv_trans_out0 + 24 + 8, 24, inv_trans_in0); > > call_new(inv_trans_out1 + 24 + 8, 24, inv_trans_in1); > > > > - if (memcmp(inv_trans_out0, inv_trans_out1, 10 * 24)) > > + if (memcmp(inv_trans_in0, inv_trans_in1, 8 * 8 * 2) || > > nit: sizeof (int16_t) Yes well this code cannot be merged because we (at least I) do not know if it is correct, and it breaks AArch64. > > + memcmp(inv_trans_out0, inv_trans_out1, 10 * 24)) > > > > fail(); > > > > bench_new(inv_trans_out1 + 24 + 8, 24, inv_trans_in1 + 8); > > av_free(coeffs); > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c index f18f0f8251..2cc6785a0c 100644 --- a/tests/checkasm/vc1dsp.c +++ b/tests/checkasm/vc1dsp.c @@ -317,11 +317,13 @@ static void check_inv_trans_adding(void) for (int j = 0; j < tests[t].height; ++j) for (int i = 0; i < tests[t].width; ++i) { int idx = j * 8 + i; - inv_trans_in1[idx] = inv_trans_in0[idx] = coeffs->d[j * tests[t].width + i]; + inv_trans_in0[idx] = coeffs->d[j * tests[t].width + i]; } + memcpy(inv_trans_in1, inv_trans_in0, 8 * 8 * 2); call_ref(inv_trans_out0 + 24 + 8, 24, inv_trans_in0); call_new(inv_trans_out1 + 24 + 8, 24, inv_trans_in1); - if (memcmp(inv_trans_out0, inv_trans_out1, 10 * 24)) + if (memcmp(inv_trans_in0, inv_trans_in1, 8 * 8 * 2) || + memcmp(inv_trans_out0, inv_trans_out1, 10 * 24)) fail(); bench_new(inv_trans_out1 + 24 + 8, 24, inv_trans_in1 + 8); av_free(coeffs);