diff mbox series

[FFmpeg-devel] lavc/vc1dsp: match C block layout in inv_trans_4x8_rvv

Message ID 20240610192039.27012-1-remi@remlab.net
State New
Headers show
Series [FFmpeg-devel] lavc/vc1dsp: match C block layout in inv_trans_4x8_rvv | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Rémi Denis-Courmont June 10, 2024, 7:20 p.m. UTC
Although checkasm does not verify this, the decoder requires that the
transform updates the input block exactly like the C code does.

This fixes vc1-ism, vc1_ilaced_twomv, vc1_sa00040, vc1_sa10091,
vc1_sa10143, vc1_sa20021, vc1test_smm0005 and wmv3-drm-dec tests.
---
 libavcodec/riscv/vc1dsp_rvv.S | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

Comments

Ronald S. Bultje June 11, 2024, 12:09 p.m. UTC | #1
Hi,

On Mon, Jun 10, 2024 at 3:20 PM Rémi Denis-Courmont <remi@remlab.net> wrote:

> Although checkasm does not verify this, the decoder requires that the
> transform updates the input block exactly like the C code does.
>

Would it be possible to update the checkasm test to verify this?

Ronald
Rémi Denis-Courmont June 11, 2024, 12:19 p.m. UTC | #2
Le 11 juin 2024 15:09:42 GMT+03:00, "Ronald S. Bultje" <rsbultje@gmail.com> a écrit :
>Hi,
>
>On Mon, Jun 10, 2024 at 3:20 PM Rémi Denis-Courmont <remi@remlab.net> wrote:
>
>> Although checkasm does not verify this, the decoder requires that the
>> transform updates the input block exactly like the C code does.
>>
>
>Would it be possible to update the checkasm test to verify this?

In theory, it has to be possible. I can't do it for lack of knowledge about VC-1 though. James tried to make such a patch but it ended up failing both AArch64 and RVV 8x4 transforms. I don't know if this points to a latent bug or is a false positive.

>
>Ronald
>_______________________________________________
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
James Almer June 11, 2024, 12:23 p.m. UTC | #3
On 6/11/2024 9:19 AM, Rémi Denis-Courmont wrote:
> 
> 
> Le 11 juin 2024 15:09:42 GMT+03:00, "Ronald S. Bultje" <rsbultje@gmail.com> a écrit :
>> Hi,
>>
>> On Mon, Jun 10, 2024 at 3:20 PM Rémi Denis-Courmont <remi@remlab.net> wrote:
>>
>>> Although checkasm does not verify this, the decoder requires that the
>>> transform updates the input block exactly like the C code does.
>>>
>>
>> Would it be possible to update the checkasm test to verify this?
> 
> In theory, it has to be possible. I can't do it for lack of knowledge about VC-1 though. James tried to make such a patch but it ended up failing both AArch64 and RVV 8x4 transforms. I don't know if this points to a latent bug or is a false positive.

My patch added a memcmp of the block buffer. If it fails, then the asm 
implementation is probably writing more than it should.
Can you see what bytes don't match? It might give you an idea of what 
store instruction is touching the wrong bytes.

> 
>>
>> Ronald
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Rémi Denis-Courmont June 11, 2024, 2:53 p.m. UTC | #4
Le tiistaina 11. kesäkuuta 2024, 15.23.11 EEST James Almer a écrit :
> On 6/11/2024 9:19 AM, Rémi Denis-Courmont wrote:
> > Le 11 juin 2024 15:09:42 GMT+03:00, "Ronald S. Bultje" 
<rsbultje@gmail.com> a écrit :
> >> Hi,
> >> 
> >> On Mon, Jun 10, 2024 at 3:20 PM Rémi Denis-Courmont <remi@remlab.net> 
wrote:
> >>> Although checkasm does not verify this, the decoder requires that the
> >>> transform updates the input block exactly like the C code does.
> >> 
> >> Would it be possible to update the checkasm test to verify this?
> > 
> > In theory, it has to be possible. I can't do it for lack of knowledge
> > about VC-1 though. James tried to make such a patch but it ended up
> > failing both AArch64 and RVV 8x4 transforms. I don't know if this points
> > to a latent bug or is a false positive.
> My patch added a memcmp of the block buffer. If it fails, then the asm
> implementation is probably writing more than it should.

The patch has two bugs that end up cancelling one another.

> Can you see what bytes don't match? It might give you an idea of what
> store instruction is touching the wrong bytes.

The RVV transform is failing because it stores different values in the right 
places. I will attach patches here, but it is entirely unclear to me if this 
is necessary at all; at least, it does not seem to fix anything in FATE, and it 
causes the pre-existing AArch64 functions to fail checkasm.
diff mbox series

Patch

diff --git a/libavcodec/riscv/vc1dsp_rvv.S b/libavcodec/riscv/vc1dsp_rvv.S
index c4517d54f5..860b0cc5b1 100644
--- a/libavcodec/riscv/vc1dsp_rvv.S
+++ b/libavcodec/riscv/vc1dsp_rvv.S
@@ -303,15 +303,24 @@  func ff_vc1_inv_trans_4x8_rvv, zve32x
         vlsseg4e16.v v0, (a2), a3
         li           t1, 3
         jal          t0, ff_vc1_inv_trans_4_rvv
+        vssseg4e16.v v0, (a2), a3
+        vsetivli     zero, 4, e16, mf2, ta, ma
         addi         t1, a2, 1 * 8 * 2
-        vse16.v      v0, (a2)
+        vle16.v      v0, (a2)
         addi         t2, a2, 2 * 8 * 2
-        vse16.v      v1, (t1)
+        vle16.v      v1, (t1)
         addi         t3, a2, 3 * 8 * 2
-        vse16.v      v2, (t2)
-        vse16.v      v3, (t3)
-        vsetivli     zero, 4, e16, mf2, ta, ma
-        vlseg8e16.v  v0, (a2)
+        vle16.v      v2, (t2)
+        addi         t4, a2, 4 * 8 * 2
+        vle16.v      v3, (t3)
+        addi         t5, a2, 5 * 8 * 2
+        vle16.v      v4, (t4)
+        addi         t6, a2, 6 * 8 * 2
+        vle16.v      v5, (t5)
+        addi         t1, a2, 7 * 8 * 2
+        vle16.v      v6, (t6)
+        vle16.v      v7, (t1)
+
         jal          t0, ff_vc1_inv_trans_8_rvv
         vadd.vi      v4, v4, 1
         add          t0, a1, a0