Message ID | 20241005223922.53888-1-post@frankplowman.com |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel] lavu/common: Fix AV_CEIL_RSHIFT for unsigned LHS | expand |
Context | Check | Description |
---|---|---|
yinshiyou/make_loongarch64 | success | Make finished |
yinshiyou/make_fate_loongarch64 | success | Make fate finished |
Ping Here is a demo of the issue: https://godbolt.org/z/hYnYvbcjE On 05/10/2024 23:38, Frank Plowman wrote: > The first branch of this ternary expression was intended to avoid > having two shift operations in the case the RHS is not known at > compile time. It only works if the LHS has a signed type however, > otherwise the result is invalid. > > We could alternatively have different versions of AV_CEIL_RSHIFT for > different sizes of operand, then cast the LHS to the relevant signed > type. > > Signed-off-by: Frank Plowman <post@frankplowman.com> > --- > libavutil/common.h | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/libavutil/common.h b/libavutil/common.h > index 3b830daf30..ec38752b64 100644 > --- a/libavutil/common.h > +++ b/libavutil/common.h > @@ -57,8 +57,7 @@ > /* assume b>0 */ > #define ROUNDED_DIV(a,b) (((a)>=0 ? (a) + ((b)>>1) : (a) - ((b)>>1))/(b)) > /* Fast a/(1<<b) rounded toward +inf. Assume a>=0 and b>=0 */ > -#define AV_CEIL_RSHIFT(a,b) (!av_builtin_constant_p(b) ? -((-(a)) >> (b)) \ > - : ((a) + (1<<(b)) - 1) >> (b)) > +#define AV_CEIL_RSHIFT(a,b) (((a) + (1<<(b)) - 1) >> (b)) > /* Backwards compat. */ > #define FF_CEIL_RSHIFT AV_CEIL_RSHIFT >
On Sat, Oct 05, 2024 at 03:38:05PM -0700, Frank Plowman wrote: > The first branch of this ternary expression was intended to avoid > having two shift operations in the case the RHS is not known at > compile time. It only works if the LHS has a signed type however, > otherwise the result is invalid. If the expression is faster, why not check if its signed ? if its not faster, then the argument could be that its not faster and removes complexity thx [...]
On 14/10/2024 23:26, Michael Niedermayer wrote: > On Sat, Oct 05, 2024 at 03:38:05PM -0700, Frank Plowman wrote: >> The first branch of this ternary expression was intended to avoid >> having two shift operations in the case the RHS is not known at >> compile time. It only works if the LHS has a signed type however, >> otherwise the result is invalid. > > If the expression is faster, why not check if its signed ? > > if its not faster, then the argument could be that its not > faster and removes complexity > > thx > In a quick microbenchmark (clang 16, AArch64), the bithack is 10% faster with -O0 but there is no significant difference with -O1 and up. Cheers,
On 15/10/2024 21:23, Frank Plowman wrote: > On 14/10/2024 23:26, Michael Niedermayer wrote: >> On Sat, Oct 05, 2024 at 03:38:05PM -0700, Frank Plowman wrote: >>> The first branch of this ternary expression was intended to avoid >>> having two shift operations in the case the RHS is not known at >>> compile time. It only works if the LHS has a signed type however, >>> otherwise the result is invalid. >> >> If the expression is faster, why not check if its signed ? >> >> if its not faster, then the argument could be that its not >> faster and removes complexity >> >> thx >> > > In a quick microbenchmark (clang 16, AArch64), the bithack is 10% faster > with -O0 but there is no significant difference with -O1 and up. It has been drawn to my attention that this is causing a failure for VVC_HDR_UHDTV1_OpenGOP_3840x2160_50fps_HLG10_mosaic from the DVB VVC test suite, due to the macro's use at libavcodec/cbs_h266_syntax_template.c:1206 In light of the lack of performance difference between the two implementations and the simplicity of the proposed change, I think the bithack should be removed. Cheers,
On Fri, Oct 18, 2024 at 06:48:40PM +0100, Frank Plowman wrote: > On 15/10/2024 21:23, Frank Plowman wrote: > > On 14/10/2024 23:26, Michael Niedermayer wrote: > >> On Sat, Oct 05, 2024 at 03:38:05PM -0700, Frank Plowman wrote: > >>> The first branch of this ternary expression was intended to avoid > >>> having two shift operations in the case the RHS is not known at > >>> compile time. It only works if the LHS has a signed type however, > >>> otherwise the result is invalid. > >> > >> If the expression is faster, why not check if its signed ? > >> > >> if its not faster, then the argument could be that its not > >> faster and removes complexity > >> > >> thx > >> > > > > In a quick microbenchmark (clang 16, AArch64), the bithack is 10% faster > > with -O0 but there is no significant difference with -O1 and up. > > It has been drawn to my attention that this is causing a failure for > VVC_HDR_UHDTV1_OpenGOP_3840x2160_50fps_HLG10_mosaic from the DVB VVC > test suite, due to the macro's use at > libavcodec/cbs_h266_syntax_template.c:1206 > > In light of the lack of performance difference between the two > implementations Thats a bit of a jump from "In a quick microbenchmark (clang 16, AArch64), the bithack is 10% faster with -O0 but there is no significant difference with -O1 and up." to a general "lack of performance difference" Also if there is a slowdown such slowdowns accumulate and multiple reports suggest that FFmpeg has become slower over the years. We should not be sloppy with changes that could negatively affect speed. Noone ever said (s)he liked the slowdown FFmpeg had over the years Your test covers clang and aarch, your suggested changes covers all compilers and all arhcitectures. Its alot of work to test the main affected cases, which i agree is annoying But even if one ignores the question about speed loss. The change is actually still introducing other issues -#define AV_CEIL_RSHIFT(a,b) (!av_builtin_constant_p(b) ? -((-(a)) >> (b)) \ - : ((a) + (1<<(b)) - 1) >> (b)) +#define AV_CEIL_RSHIFT(a,b) (((a) + (1<<(b)) - 1) >> (b)) Here the argument was -((-(a)) >> (b)) only works with signed numbers, ok true, even though iam not sure why we would need to use this with unsigned numbers. ((a) + (1<<(b)) - 1) >> (b) only works with ints it fails with int64_t as (1<<(b)) can overflow also it fails with many more cases for example, something like this AV_CEIL_RSHIFT(0x7FFFFFFF, 30) will overflow (a) + (1<<(b)), this will be fine with -((-(a)) >> (b)) and if you just change that to (int64_t)1 then you will have a speed loss on 32bit systems. -((-(a)) >> (b)) works with a broader range of values thx [...]
diff --git a/libavutil/common.h b/libavutil/common.h index 3b830daf30..ec38752b64 100644 --- a/libavutil/common.h +++ b/libavutil/common.h @@ -57,8 +57,7 @@ /* assume b>0 */ #define ROUNDED_DIV(a,b) (((a)>=0 ? (a) + ((b)>>1) : (a) - ((b)>>1))/(b)) /* Fast a/(1<<b) rounded toward +inf. Assume a>=0 and b>=0 */ -#define AV_CEIL_RSHIFT(a,b) (!av_builtin_constant_p(b) ? -((-(a)) >> (b)) \ - : ((a) + (1<<(b)) - 1) >> (b)) +#define AV_CEIL_RSHIFT(a,b) (((a) + (1<<(b)) - 1) >> (b)) /* Backwards compat. */ #define FF_CEIL_RSHIFT AV_CEIL_RSHIFT
The first branch of this ternary expression was intended to avoid having two shift operations in the case the RHS is not known at compile time. It only works if the LHS has a signed type however, otherwise the result is invalid. We could alternatively have different versions of AV_CEIL_RSHIFT for different sizes of operand, then cast the LHS to the relevant signed type. Signed-off-by: Frank Plowman <post@frankplowman.com> --- libavutil/common.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)