Message ID | 20240610192039.27012-1-remi@remlab.net |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel] lavc/vc1dsp: match C block layout in inv_trans_4x8_rvv | expand |
Context | Check | Description |
---|---|---|
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
Hi, On Mon, Jun 10, 2024 at 3:20 PM Rémi Denis-Courmont <remi@remlab.net> wrote: > Although checkasm does not verify this, the decoder requires that the > transform updates the input block exactly like the C code does. > Would it be possible to update the checkasm test to verify this? Ronald
Le 11 juin 2024 15:09:42 GMT+03:00, "Ronald S. Bultje" <rsbultje@gmail.com> a écrit : >Hi, > >On Mon, Jun 10, 2024 at 3:20 PM Rémi Denis-Courmont <remi@remlab.net> wrote: > >> Although checkasm does not verify this, the decoder requires that the >> transform updates the input block exactly like the C code does. >> > >Would it be possible to update the checkasm test to verify this? In theory, it has to be possible. I can't do it for lack of knowledge about VC-1 though. James tried to make such a patch but it ended up failing both AArch64 and RVV 8x4 transforms. I don't know if this points to a latent bug or is a false positive. > >Ronald >_______________________________________________ >ffmpeg-devel mailing list >ffmpeg-devel@ffmpeg.org >https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > >To unsubscribe, visit link above, or email >ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
On 6/11/2024 9:19 AM, Rémi Denis-Courmont wrote: > > > Le 11 juin 2024 15:09:42 GMT+03:00, "Ronald S. Bultje" <rsbultje@gmail.com> a écrit : >> Hi, >> >> On Mon, Jun 10, 2024 at 3:20 PM Rémi Denis-Courmont <remi@remlab.net> wrote: >> >>> Although checkasm does not verify this, the decoder requires that the >>> transform updates the input block exactly like the C code does. >>> >> >> Would it be possible to update the checkasm test to verify this? > > In theory, it has to be possible. I can't do it for lack of knowledge about VC-1 though. James tried to make such a patch but it ended up failing both AArch64 and RVV 8x4 transforms. I don't know if this points to a latent bug or is a false positive. My patch added a memcmp of the block buffer. If it fails, then the asm implementation is probably writing more than it should. Can you see what bytes don't match? It might give you an idea of what store instruction is touching the wrong bytes. > >> >> Ronald >> _______________________________________________ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> >> To unsubscribe, visit link above, or email >> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Le tiistaina 11. kesäkuuta 2024, 15.23.11 EEST James Almer a écrit : > On 6/11/2024 9:19 AM, Rémi Denis-Courmont wrote: > > Le 11 juin 2024 15:09:42 GMT+03:00, "Ronald S. Bultje" <rsbultje@gmail.com> a écrit : > >> Hi, > >> > >> On Mon, Jun 10, 2024 at 3:20 PM Rémi Denis-Courmont <remi@remlab.net> wrote: > >>> Although checkasm does not verify this, the decoder requires that the > >>> transform updates the input block exactly like the C code does. > >> > >> Would it be possible to update the checkasm test to verify this? > > > > In theory, it has to be possible. I can't do it for lack of knowledge > > about VC-1 though. James tried to make such a patch but it ended up > > failing both AArch64 and RVV 8x4 transforms. I don't know if this points > > to a latent bug or is a false positive. > My patch added a memcmp of the block buffer. If it fails, then the asm > implementation is probably writing more than it should. The patch has two bugs that end up cancelling one another. > Can you see what bytes don't match? It might give you an idea of what > store instruction is touching the wrong bytes. The RVV transform is failing because it stores different values in the right places. I will attach patches here, but it is entirely unclear to me if this is necessary at all; at least, it does not seem to fix anything in FATE, and it causes the pre-existing AArch64 functions to fail checkasm.
diff --git a/libavcodec/riscv/vc1dsp_rvv.S b/libavcodec/riscv/vc1dsp_rvv.S index c4517d54f5..860b0cc5b1 100644 --- a/libavcodec/riscv/vc1dsp_rvv.S +++ b/libavcodec/riscv/vc1dsp_rvv.S @@ -303,15 +303,24 @@ func ff_vc1_inv_trans_4x8_rvv, zve32x vlsseg4e16.v v0, (a2), a3 li t1, 3 jal t0, ff_vc1_inv_trans_4_rvv + vssseg4e16.v v0, (a2), a3 + vsetivli zero, 4, e16, mf2, ta, ma addi t1, a2, 1 * 8 * 2 - vse16.v v0, (a2) + vle16.v v0, (a2) addi t2, a2, 2 * 8 * 2 - vse16.v v1, (t1) + vle16.v v1, (t1) addi t3, a2, 3 * 8 * 2 - vse16.v v2, (t2) - vse16.v v3, (t3) - vsetivli zero, 4, e16, mf2, ta, ma - vlseg8e16.v v0, (a2) + vle16.v v2, (t2) + addi t4, a2, 4 * 8 * 2 + vle16.v v3, (t3) + addi t5, a2, 5 * 8 * 2 + vle16.v v4, (t4) + addi t6, a2, 6 * 8 * 2 + vle16.v v5, (t5) + addi t1, a2, 7 * 8 * 2 + vle16.v v6, (t6) + vle16.v v7, (t1) + jal t0, ff_vc1_inv_trans_8_rvv vadd.vi v4, v4, 1 add t0, a1, a0