diff mbox series

[FFmpeg-devel] avcodec/x86/mathops: use constrained immediate operands

Message ID 20230715235832.64221-1-jamrial@gmail.com
State New
Headers show
Series [FFmpeg-devel] avcodec/x86/mathops: use constrained immediate operands | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

James Almer July 15, 2023, 11:58 p.m. UTC
Should fix assembling with binutil as >= 2.41

Signed-off-by: James Almer <jamrial@gmail.com>
---
This is IMO a big breakage. binutil's as has until now clipped these values on
its own, and never required the compiler to do it.

 libavcodec/x86/mathops.h | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Rémi Denis-Courmont July 16, 2023, 9:23 a.m. UTC | #1
Le sunnuntaina 16. heinäkuuta 2023, 2.58.32 EEST James Almer a écrit :
> Should fix assembling with binutil as >= 2.41
> 
> Signed-off-by: James Almer <jamrial@gmail.com>
> ---
> This is IMO a big breakage. binutil's as has until now clipped these values
> on its own, and never required the compiler to do it.

TBH, silently clipping immediate constants sounds like a nasty bug that could 
cause really nasty suprises if somebody every passes an out-of-range constant. 
This has happened to me many times, typically with incidentally out-of-range 
immediate offsets in loads/stores.

(...)

>      __asm__ ("shrl %1, %0\n\t"
>           : "+r" (a)
> -         : "ic" ((uint8_t)(-s))
> +         : "Ic" ((uint8_t)(-s))

Note that this is not equivalent. Now, if `s` is constant but out of range, 
the compiler will be required to fit it. And it does that by moving it into 
ECX. This is probably not what you want.

AFAICT, you should keep the constraint as it is, and fix the operand value 
instead by masking it, e.g.:

        if (__builtin_constant_p(s))
                __asm__ ("shrl %1, %0\n\t"
                        : "+r" (a)
                        : "i" ((-s) & 0x1f)
                );
        else
                __asm__ ("shrl %1, %0\n\t"
                        : "+r" (a)
                        : "c" (-s)
                );

(Not sure if the the 0x1f mask is correct, but you get the idea.)
Nicolas George July 16, 2023, 9:42 a.m. UTC | #2
James Almer (12023-07-15):
> Should fix assembling with binutil as >= 2.41
> 
> Signed-off-by: James Almer <jamrial@gmail.com>
> ---
> This is IMO a big breakage. binutil's as has until now clipped these values on
> its own, and never required the compiler to do it.

I confirm it fixes the build failures on up-to-date Debian testing.

OTOH, I ran a benchmark (decoding some x264):

474134     mod
488751 orig
494359     mod
498554 orig
508958 orig
514246 orig
518160     mod
528427 orig
530223     mod
534762     mod
536415 orig
548434 orig
550789 orig
551716     mod
553951 orig
561754 orig
572688     mod
580254     mod
581205 orig
583856     mod
583939 orig
584748 orig
594143 orig
600681     mod
607596     mod
612757     mod
621033 orig
624567 orig
626346     mod
627309     mod
628242     mod
638344     mod

The numbers are the sum of the “user” column of the -benchmark_all
output, on an AMD Ryzen 3 3200U and Debaian stable. The mod lines are
when I disabled the two faulty functions.

They are all over the place, it is hard to be sure, but it seems to
indicate that, as you suspected, the benefit is not that big.

Regards,
James Almer July 16, 2023, 11:55 a.m. UTC | #3
On 7/16/2023 6:23 AM, Rémi Denis-Courmont wrote:
> Le sunnuntaina 16. heinäkuuta 2023, 2.58.32 EEST James Almer a écrit :
>> Should fix assembling with binutil as >= 2.41
>>
>> Signed-off-by: James Almer <jamrial@gmail.com>
>> ---
>> This is IMO a big breakage. binutil's as has until now clipped these values
>> on its own, and never required the compiler to do it.
> 
> TBH, silently clipping immediate constants sounds like a nasty bug that could
> cause really nasty suprises if somebody every passes an out-of-range constant.

We're passing it out or range constants alright. I tried adding an 
av_assert0((uint8_t)(-s) <= 31) and most fate tests started failing.

> This has happened to me many times, typically with incidentally out-of-range
> immediate offsets in loads/stores.
> 
> (...)
> 
>>       __asm__ ("shrl %1, %0\n\t"
>>            : "+r" (a)
>> -         : "ic" ((uint8_t)(-s))
>> +         : "Ic" ((uint8_t)(-s))
> 
> Note that this is not equivalent. Now, if `s` is constant but out of range,
> the compiler will be required to fit it. And it does that by moving it into
> ECX. This is probably not what you want.
> 
> AFAICT, you should keep the constraint as it is, and fix the operand value
> instead by masking it, e.g.:
> 
>          if (__builtin_constant_p(s))
>                  __asm__ ("shrl %1, %0\n\t"
>                          : "+r" (a)
>                          : "i" ((-s) & 0x1f)
>                  );
>          else
>                  __asm__ ("shrl %1, %0\n\t"
>                          : "+r" (a)
>                          : "c" (-s)
>                  );
> 
> (Not sure if the the 0x1f mask is correct, but you get the idea.)

It is, just tested.
Rémi Denis-Courmont July 16, 2023, 12:23 p.m. UTC | #4
Le sunnuntaina 16. heinäkuuta 2023, 14.55.43 EEST James Almer a écrit :
> On 7/16/2023 6:23 AM, Rémi Denis-Courmont wrote:
> > Le sunnuntaina 16. heinäkuuta 2023, 2.58.32 EEST James Almer a écrit :
> >> Should fix assembling with binutil as >= 2.41
> >> 
> >> Signed-off-by: James Almer <jamrial@gmail.com>
> >> ---
> >> This is IMO a big breakage. binutil's as has until now clipped these
> >> values
> >> on its own, and never required the compiler to do it.
> > 
> > TBH, silently clipping immediate constants sounds like a nasty bug that
> > could cause really nasty suprises if somebody every passes an
> > out-of-range constant.
> We're passing it out or range constants alright. I tried adding an
> av_assert0((uint8_t)(-s) <= 31) and most fate tests started failing.

Well, yes. That's why recent binutils is complaining. It wouldn't if the 
constant values were always in range.

I'm not versed in the x86 subdomain of black magic, so I'm not sure if you 
imply that it was intentional that FFmpeg fed out of range values that would 
be cropped, or if it was unintentional. In the later case, I think that the 
existing assembler constraint should actually be kept as it is precisely to 
detect errors, and the calling code path ought to be fixed instead.

Either way, changing "i" for "I" will generate suboptimal-looking code as I 
pointed out up-thread. If we don't even care about that, then we migth as well 
shift in C code, AFAICT.

> > This has happened to me many times, typically with incidentally
> > out-of-range immediate offsets in loads/stores.
> > 
> > (...)
> > 
> >>       __asm__ ("shrl %1, %0\n\t"
> >>       
> >>            : "+r" (a)
> >> 
> >> -         : "ic" ((uint8_t)(-s))
> >> +         : "Ic" ((uint8_t)(-s))
> > 
> > Note that this is not equivalent. Now, if `s` is constant but out of
> > range,
> > the compiler will be required to fit it. And it does that by moving it
> > into
> > ECX. This is probably not what you want.
> > 
> > AFAICT, you should keep the constraint as it is, and fix the operand value
> > 
> > instead by masking it, e.g.:
> >          if (__builtin_constant_p(s))
> >          
> >                  __asm__ ("shrl %1, %0\n\t"
> >                  
> >                          : "+r" (a)
> >                          : "i" ((-s) & 0x1f)
> >                  
> >                  );
> >          
> >          else
> >          
> >                  __asm__ ("shrl %1, %0\n\t"
> >                  
> >                          : "+r" (a)
> >                          : "c" (-s)
> >                  
> >                  );
> > 
> > (Not sure if the the 0x1f mask is correct, but you get the idea.)
> 
> It is, just tested.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
diff mbox series

Patch

diff --git a/libavcodec/x86/mathops.h b/libavcodec/x86/mathops.h
index 6298f5ed19..a08c6193bf 100644
--- a/libavcodec/x86/mathops.h
+++ b/libavcodec/x86/mathops.h
@@ -39,7 +39,7 @@  static av_always_inline av_const int MULL(int a, int b, unsigned shift)
         "imull %3               \n\t"
         "shrdl %4, %%edx, %%eax \n\t"
         :"=a"(rt), "=d"(dummy)
-        :"a"(a), "rm"(b), "ci"((uint8_t)shift)
+        :"a"(a), "rm"(b), "cI"((uint8_t)shift)
     );
     return rt;
 }
@@ -115,16 +115,17 @@  __asm__ volatile(\
 static inline  int32_t NEG_SSR32( int32_t a, int8_t s){
     __asm__ ("sarl %1, %0\n\t"
          : "+r" (a)
-         : "ic" ((uint8_t)(-s))
+         : "Ic" ((uint8_t)(-s))
     );
     return a;
 }
 
 #define NEG_USR32 NEG_USR32
 static inline uint32_t NEG_USR32(uint32_t a, int8_t s){
+
     __asm__ ("shrl %1, %0\n\t"
          : "+r" (a)
-         : "ic" ((uint8_t)(-s))
+         : "Ic" ((uint8_t)(-s))
     );
     return a;
 }