Message ID | 20221025101920.40198-1-martin@martin.st |
---|---|
State | Accepted |
Commit | cb803a0072cb98945dcd3f1660bd2a975650ce42 |
Headers | show |
Series | [FFmpeg-devel] swscale: aarch64: Fix yuv2rgb with negative strides | expand |
Context | Check | Description |
---|---|---|
yinshiyou/make_loongarch64 | success | Make finished |
yinshiyou/make_fate_loongarch64 | success | Make fate finished |
On Tue, 25 Oct 2022, Martin Storsjö wrote: > Treat the 32 bit stride registers as signed. > > Alternatively, we could make the stride arguments ptrdiff_t instead > of int, and changing all of the assembly to operate on these > registers with their full 64 bit width, but that's a much larger > and more intrusive change (and risks missing some operation, which > would clamp the intermediates to 32 bit still). > > Fixes: https://trac.ffmpeg.org/ticket/9985 > > Signed-off-by: Martin Storsjö <martin@martin.st> > --- > libswscale/aarch64/yuv2rgb_neon.S | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/libswscale/aarch64/yuv2rgb_neon.S b/libswscale/aarch64/yuv2rgb_neon.S > index f4b220fb60..f341268c5d 100644 > --- a/libswscale/aarch64/yuv2rgb_neon.S > +++ b/libswscale/aarch64/yuv2rgb_neon.S > @@ -118,8 +118,8 @@ > .endm > > .macro increment_yuv422p > - add x6, x6, w7, UXTW // srcU += incU > - add x13, x13, w14, UXTW // srcV += incV > + add x6, x6, w7, SXTW // srcU += incU > + add x13, x13, w14, SXTW // srcV += incV > .endm > > .macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2 > @@ -189,8 +189,8 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1 > st4 {v16.8B,v17.8B,v18.8B,v19.8B}, [x2], #32 > subs w8, w8, #16 // width -= 16 > b.gt 2b > - add x2, x2, w3, UXTW // dst += padding > - add x4, x4, w5, UXTW // srcY += paddingY > + add x2, x2, w3, SXTW // dst += padding > + add x4, x4, w5, SXTW // srcY += paddingY > increment_\ifmt > subs w1, w1, #1 // height -= 1 > b.gt 1b > -- > 2.37.0 (Apple Git-136) Will push later today, and backport to some older branches where relevant (a bit later). // Martin
diff --git a/libswscale/aarch64/yuv2rgb_neon.S b/libswscale/aarch64/yuv2rgb_neon.S index f4b220fb60..f341268c5d 100644 --- a/libswscale/aarch64/yuv2rgb_neon.S +++ b/libswscale/aarch64/yuv2rgb_neon.S @@ -118,8 +118,8 @@ .endm .macro increment_yuv422p - add x6, x6, w7, UXTW // srcU += incU - add x13, x13, w14, UXTW // srcV += incV + add x6, x6, w7, SXTW // srcU += incU + add x13, x13, w14, SXTW // srcV += incV .endm .macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2 @@ -189,8 +189,8 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1 st4 {v16.8B,v17.8B,v18.8B,v19.8B}, [x2], #32 subs w8, w8, #16 // width -= 16 b.gt 2b - add x2, x2, w3, UXTW // dst += padding - add x4, x4, w5, UXTW // srcY += paddingY + add x2, x2, w3, SXTW // dst += padding + add x4, x4, w5, SXTW // srcY += paddingY increment_\ifmt subs w1, w1, #1 // height -= 1 b.gt 1b
Treat the 32 bit stride registers as signed. Alternatively, we could make the stride arguments ptrdiff_t instead of int, and changing all of the assembly to operate on these registers with their full 64 bit width, but that's a much larger and more intrusive change (and risks missing some operation, which would clamp the intermediates to 32 bit still). Fixes: https://trac.ffmpeg.org/ticket/9985 Signed-off-by: Martin Storsjö <martin@martin.st> --- libswscale/aarch64/yuv2rgb_neon.S | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)