diff mbox series

[FFmpeg-devel] lavu/common: Fix AV_CEIL_RSHIFT for unsigned LHS

Message ID 20241005223922.53888-1-post@frankplowman.com
State New
Headers show
Series [FFmpeg-devel] lavu/common: Fix AV_CEIL_RSHIFT for unsigned LHS | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished

Commit Message

Frank Plowman Oct. 5, 2024, 10:38 p.m. UTC
The first branch of this ternary expression was intended to avoid
having two shift operations in the case the RHS is not known at
compile time.  It only works if the LHS has a signed type however,
otherwise the result is invalid.

We could alternatively have different versions of AV_CEIL_RSHIFT for
different sizes of operand, then cast the LHS to the relevant signed
type.

Signed-off-by: Frank Plowman <post@frankplowman.com>
---
 libavutil/common.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Frank Plowman Oct. 14, 2024, 8:27 p.m. UTC | #1
Ping

Here is a demo of the issue: https://godbolt.org/z/hYnYvbcjE

On 05/10/2024 23:38, Frank Plowman wrote:
> The first branch of this ternary expression was intended to avoid
> having two shift operations in the case the RHS is not known at
> compile time.  It only works if the LHS has a signed type however,
> otherwise the result is invalid.
> 
> We could alternatively have different versions of AV_CEIL_RSHIFT for
> different sizes of operand, then cast the LHS to the relevant signed
> type.
> 
> Signed-off-by: Frank Plowman <post@frankplowman.com>
> ---
>  libavutil/common.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/libavutil/common.h b/libavutil/common.h
> index 3b830daf30..ec38752b64 100644
> --- a/libavutil/common.h
> +++ b/libavutil/common.h
> @@ -57,8 +57,7 @@
>  /* assume b>0 */
>  #define ROUNDED_DIV(a,b) (((a)>=0 ? (a) + ((b)>>1) : (a) - ((b)>>1))/(b))
>  /* Fast a/(1<<b) rounded toward +inf. Assume a>=0 and b>=0 */
> -#define AV_CEIL_RSHIFT(a,b) (!av_builtin_constant_p(b) ? -((-(a)) >> (b)) \
> -                                                       : ((a) + (1<<(b)) - 1) >> (b))
> +#define AV_CEIL_RSHIFT(a,b) (((a) + (1<<(b)) - 1) >> (b))
>  /* Backwards compat. */
>  #define FF_CEIL_RSHIFT AV_CEIL_RSHIFT
>
Michael Niedermayer Oct. 14, 2024, 10:26 p.m. UTC | #2
On Sat, Oct 05, 2024 at 03:38:05PM -0700, Frank Plowman wrote:
> The first branch of this ternary expression was intended to avoid
> having two shift operations in the case the RHS is not known at
> compile time.  It only works if the LHS has a signed type however,
> otherwise the result is invalid.

If the expression is faster, why not check if its signed ?

if its not faster, then the argument could be that its not
faster and removes complexity

thx

[...]
Frank Plowman Oct. 15, 2024, 8:23 p.m. UTC | #3
On 14/10/2024 23:26, Michael Niedermayer wrote:
> On Sat, Oct 05, 2024 at 03:38:05PM -0700, Frank Plowman wrote:
>> The first branch of this ternary expression was intended to avoid
>> having two shift operations in the case the RHS is not known at
>> compile time.  It only works if the LHS has a signed type however,
>> otherwise the result is invalid.
> 
> If the expression is faster, why not check if its signed ?
> 
> if its not faster, then the argument could be that its not
> faster and removes complexity
> 
> thx
> 

In a quick microbenchmark (clang 16, AArch64), the bithack is 10% faster
with -O0 but there is no significant difference with -O1 and up.

Cheers,
Frank Plowman Oct. 18, 2024, 5:48 p.m. UTC | #4
On 15/10/2024 21:23, Frank Plowman wrote:
> On 14/10/2024 23:26, Michael Niedermayer wrote:
>> On Sat, Oct 05, 2024 at 03:38:05PM -0700, Frank Plowman wrote:
>>> The first branch of this ternary expression was intended to avoid
>>> having two shift operations in the case the RHS is not known at
>>> compile time.  It only works if the LHS has a signed type however,
>>> otherwise the result is invalid.
>>
>> If the expression is faster, why not check if its signed ?
>>
>> if its not faster, then the argument could be that its not
>> faster and removes complexity
>>
>> thx
>>
> 
> In a quick microbenchmark (clang 16, AArch64), the bithack is 10% faster
> with -O0 but there is no significant difference with -O1 and up.

It has been drawn to my attention that this is causing a failure for
VVC_HDR_UHDTV1_OpenGOP_3840x2160_50fps_HLG10_mosaic from the DVB VVC
test suite, due to the macro's use at
libavcodec/cbs_h266_syntax_template.c:1206

In light of the lack of performance difference between the two
implementations and the simplicity of the proposed change, I think the
bithack should be removed.

Cheers,
Michael Niedermayer Oct. 20, 2024, 6:30 p.m. UTC | #5
On Fri, Oct 18, 2024 at 06:48:40PM +0100, Frank Plowman wrote:
> On 15/10/2024 21:23, Frank Plowman wrote:
> > On 14/10/2024 23:26, Michael Niedermayer wrote:
> >> On Sat, Oct 05, 2024 at 03:38:05PM -0700, Frank Plowman wrote:
> >>> The first branch of this ternary expression was intended to avoid
> >>> having two shift operations in the case the RHS is not known at
> >>> compile time.  It only works if the LHS has a signed type however,
> >>> otherwise the result is invalid.
> >>
> >> If the expression is faster, why not check if its signed ?
> >>
> >> if its not faster, then the argument could be that its not
> >> faster and removes complexity
> >>
> >> thx
> >>
> > 
> > In a quick microbenchmark (clang 16, AArch64), the bithack is 10% faster
> > with -O0 but there is no significant difference with -O1 and up.
> 
> It has been drawn to my attention that this is causing a failure for
> VVC_HDR_UHDTV1_OpenGOP_3840x2160_50fps_HLG10_mosaic from the DVB VVC
> test suite, due to the macro's use at
> libavcodec/cbs_h266_syntax_template.c:1206
> 

> In light of the lack of performance difference between the two
> implementations

Thats a bit of a jump from
"In a quick microbenchmark (clang 16, AArch64), the bithack is 10% faster
 with -O0 but there is no significant difference with -O1 and up."
to a general "lack of performance difference"

Also if there is a slowdown such slowdowns accumulate and multiple reports
suggest that FFmpeg has become slower over the years. We should not be sloppy
with changes that could negatively affect speed. Noone ever said (s)he liked
the slowdown FFmpeg had over the years

Your test covers clang and aarch, your suggested changes covers all
compilers and all arhcitectures.
Its alot of work to test the main affected cases, which i agree is annoying

But even if one ignores the question about speed loss. The change is
actually still introducing other issues

-#define AV_CEIL_RSHIFT(a,b) (!av_builtin_constant_p(b) ? -((-(a)) >> (b)) \
-                                                       : ((a) + (1<<(b)) - 1) >> (b))
+#define AV_CEIL_RSHIFT(a,b) (((a) + (1<<(b)) - 1) >> (b))

Here the argument was
-((-(a)) >> (b))
only works with signed numbers, ok true, even though iam not sure why
we would need to use this with unsigned numbers.

((a) + (1<<(b)) - 1) >> (b)
only works with ints
it fails with int64_t as (1<<(b)) can overflow
also it fails with many more cases
for example, something like this
AV_CEIL_RSHIFT(0x7FFFFFFF, 30)
will overflow (a) + (1<<(b)), this will be fine with -((-(a)) >> (b))

and if you just change that to (int64_t)1 then you will have a speed
loss on 32bit systems.

-((-(a)) >> (b))
works with a broader range of values

thx

[...]
diff mbox series

Patch

diff --git a/libavutil/common.h b/libavutil/common.h
index 3b830daf30..ec38752b64 100644
--- a/libavutil/common.h
+++ b/libavutil/common.h
@@ -57,8 +57,7 @@ 
 /* assume b>0 */
 #define ROUNDED_DIV(a,b) (((a)>=0 ? (a) + ((b)>>1) : (a) - ((b)>>1))/(b))
 /* Fast a/(1<<b) rounded toward +inf. Assume a>=0 and b>=0 */
-#define AV_CEIL_RSHIFT(a,b) (!av_builtin_constant_p(b) ? -((-(a)) >> (b)) \
-                                                       : ((a) + (1<<(b)) - 1) >> (b))
+#define AV_CEIL_RSHIFT(a,b) (((a) + (1<<(b)) - 1) >> (b))
 /* Backwards compat. */
 #define FF_CEIL_RSHIFT AV_CEIL_RSHIFT