diff mbox series

[FFmpeg-devel] get_cabac_inline_x86: Don't inline if 32-bit clang on windows

Message ID 20210817153539.188484-1-ccom@randomderp.com
State Accepted
Commit 8990c5869e27fcd43b53045f87ba251f42e7d293
Headers show
Series [FFmpeg-devel] get_cabac_inline_x86: Don't inline if 32-bit clang on windows | expand

Checks

Context Check Description
andriy/x86_make success Make finished
andriy/x86_make_fate success Make fate finished
andriy/PPC64_make success Make finished
andriy/PPC64_make_fate success Make fate finished

Commit Message

Christopher Degawa Aug. 17, 2021, 3:35 p.m. UTC
Fixes https://trac.ffmpeg.org/ticket/8903

relevant https://github.com/msys2/MINGW-packages/discussions/9258

Signed-off-by: Christopher Degawa <ccom@randomderp.com>
---
 libavcodec/x86/cabac.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

James Almer Aug. 17, 2021, 4:01 p.m. UTC | #1
On 8/17/2021 12:35 PM, Christopher Degawa wrote:
> Fixes https://trac.ffmpeg.org/ticket/8903
> 
> relevant https://github.com/msys2/MINGW-packages/discussions/9258
> 
> Signed-off-by: Christopher Degawa <ccom@randomderp.com>
> ---
>   libavcodec/x86/cabac.h | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/libavcodec/x86/cabac.h b/libavcodec/x86/cabac.h
> index 53d74c541e..b046a56a6b 100644
> --- a/libavcodec/x86/cabac.h
> +++ b/libavcodec/x86/cabac.h
> @@ -177,8 +177,13 @@
>   
>   #if HAVE_7REGS && !BROKEN_COMPILER
>   #define get_cabac_inline get_cabac_inline_x86
> -static av_always_inline int get_cabac_inline_x86(CABACContext *c,
> -                                                 uint8_t *const state)
> +static
> +#if defined(_WIN32) && !defined(_WIN64) && defined(__clang__)

Can you do some benchmarks to see how not inlining this compares to 
simply disabling this code for this target? Because it sounds like you 
may want to add this case to the BROKEN_COMPILER macro, and not use this 
code at all.

> +av_noinline
> +#else
> +av_always_inline
> +#endif
> +int get_cabac_inline_x86(CABACContext *c, uint8_t *const state)
>   {
>       int bit, tmp;
>   #ifdef BROKEN_RELOCATIONS
>
Martin Storsjö Aug. 17, 2021, 4:05 p.m. UTC | #2
On Tue, 17 Aug 2021, Christopher Degawa wrote:

> Fixes https://trac.ffmpeg.org/ticket/8903
>
> relevant https://github.com/msys2/MINGW-packages/discussions/9258
>
> Signed-off-by: Christopher Degawa <ccom@randomderp.com>
> ---
> libavcodec/x86/cabac.h | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/libavcodec/x86/cabac.h b/libavcodec/x86/cabac.h
> index 53d74c541e..b046a56a6b 100644
> --- a/libavcodec/x86/cabac.h
> +++ b/libavcodec/x86/cabac.h
> @@ -177,8 +177,13 @@
>
> #if HAVE_7REGS && !BROKEN_COMPILER
> #define get_cabac_inline get_cabac_inline_x86
> -static av_always_inline int get_cabac_inline_x86(CABACContext *c,
> -                                                 uint8_t *const state)
> +static
> +#if defined(_WIN32) && !defined(_WIN64) && defined(__clang__)
> +av_noinline
> +#else
> +av_always_inline
> +#endif
> +int get_cabac_inline_x86(CABACContext *c, uint8_t *const state)
> {
>     int bit, tmp;
> #ifdef BROKEN_RELOCATIONS
> -- 
> 2.32.0

This looks good to me, and is a less intrusive fix for the issue than the 
one I submitted last year.

FWIW, the issue is avoided in some configurations by configuring with 
--cpu=i686, which disallows use of inline MMX/SSE like this, but with this 
fix one can keep all the asm enabled.

// Martin
Martin Storsjö Aug. 17, 2021, 4:08 p.m. UTC | #3
On Tue, 17 Aug 2021, James Almer wrote:

> On 8/17/2021 12:35 PM, Christopher Degawa wrote:
>> Fixes https://trac.ffmpeg.org/ticket/8903
>> 
>> relevant https://github.com/msys2/MINGW-packages/discussions/9258
>> 
>> Signed-off-by: Christopher Degawa <ccom@randomderp.com>
>> ---
>>   libavcodec/x86/cabac.h | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>> 
>> diff --git a/libavcodec/x86/cabac.h b/libavcodec/x86/cabac.h
>> index 53d74c541e..b046a56a6b 100644
>> --- a/libavcodec/x86/cabac.h
>> +++ b/libavcodec/x86/cabac.h
>> @@ -177,8 +177,13 @@
>>     #if HAVE_7REGS && !BROKEN_COMPILER
>>   #define get_cabac_inline get_cabac_inline_x86
>> -static av_always_inline int get_cabac_inline_x86(CABACContext *c,
>> -                                                 uint8_t *const state)
>> +static
>> +#if defined(_WIN32) && !defined(_WIN64) && defined(__clang__)
>
> Can you do some benchmarks to see how not inlining this compares to 
> simply disabling this code for this target? Because it sounds like you 
> may want to add this case to the BROKEN_COMPILER macro, and not use this 
> code at all.

FWIW, my patch for this issue last year was exactly to add this config 
combo to the broken compiler case - but I believe this is better. 
Benchmarks is of course always best. It'd be interesting also to measure 
the impact of not inlineing this in a configuration where it actually 
works as intended.

// Martin
Martin Storsjö Aug. 18, 2021, 10:01 a.m. UTC | #4
On Tue, 17 Aug 2021, James Almer wrote:

> On 8/17/2021 12:35 PM, Christopher Degawa wrote:
>> Fixes https://trac.ffmpeg.org/ticket/8903
>> 
>> relevant https://github.com/msys2/MINGW-packages/discussions/9258
>> 
>> Signed-off-by: Christopher Degawa <ccom@randomderp.com>
>> ---
>>   libavcodec/x86/cabac.h | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>> 
>> diff --git a/libavcodec/x86/cabac.h b/libavcodec/x86/cabac.h
>> index 53d74c541e..b046a56a6b 100644
>> --- a/libavcodec/x86/cabac.h
>> +++ b/libavcodec/x86/cabac.h
>> @@ -177,8 +177,13 @@
>>     #if HAVE_7REGS && !BROKEN_COMPILER
>>   #define get_cabac_inline get_cabac_inline_x86
>> -static av_always_inline int get_cabac_inline_x86(CABACContext *c,
>> -                                                 uint8_t *const state)
>> +static
>> +#if defined(_WIN32) && !defined(_WIN64) && defined(__clang__)
>
> Can you do some benchmarks to see how not inlining this compares to simply 
> disabling this code for this target? Because it sounds like you may want to 
> add this case to the BROKEN_COMPILER macro, and not use this code at all.

I tried benchmarking it, and in short, this patch seems to be the best 
solution.

I tested 3 configurations; with this patch (changing av_always_inline into 
av_noinline), setting BROKEN_COMPILER (skipping these inline asm 
functions) and configuring with --cpu=i686 (which means it passes 
-march=i686 to the compiler, which disallows the use of inline MMX/SSE). I 
benchmarked singlethreaded decoding of a high bitrate H264 clip (listing 
the lowest measured time out of 3 runs):

av_noinline: 90.94 seconds
BROKEN_COMPILER: 98.92 seconds
-march=i686: 94.63 seconds

(The fact that building with -march=i686 is faster than using some but not 
all inline MMX/SSE is a bit surprising.)

I also tested the same setup on x86_64 (on a different machine, with Apple 
Clang), where I tested the above and compare it with the default 
configuration using av_always_inline):

av_always_inline: 74.65 seconds
av_noinline: 73.74 seconds
BROKEN_COMPILER: 78.10 seconds

So av_noinline actually seems to be generally favourable here (and for 
some reason, actually a bit faster than the always_inline case, although 
I'm not sure if that bit is deterministic in general or not).


// Martin
James Almer Aug. 19, 2021, 6:40 p.m. UTC | #5
On 8/18/2021 7:01 AM, Martin Storsjö wrote:
> On Tue, 17 Aug 2021, James Almer wrote:
> 
>> On 8/17/2021 12:35 PM, Christopher Degawa wrote:
>>> Fixes https://trac.ffmpeg.org/ticket/8903
>>>
>>> relevant https://github.com/msys2/MINGW-packages/discussions/9258
>>>
>>> Signed-off-by: Christopher Degawa <ccom@randomderp.com>
>>> ---
>>>   libavcodec/x86/cabac.h | 9 +++++++--
>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/libavcodec/x86/cabac.h b/libavcodec/x86/cabac.h
>>> index 53d74c541e..b046a56a6b 100644
>>> --- a/libavcodec/x86/cabac.h
>>> +++ b/libavcodec/x86/cabac.h
>>> @@ -177,8 +177,13 @@
>>>     #if HAVE_7REGS && !BROKEN_COMPILER
>>>   #define get_cabac_inline get_cabac_inline_x86
>>> -static av_always_inline int get_cabac_inline_x86(CABACContext *c,
>>> -                                                 uint8_t *const state)
>>> +static
>>> +#if defined(_WIN32) && !defined(_WIN64) && defined(__clang__)
>>
>> Can you do some benchmarks to see how not inlining this compares to 
>> simply disabling this code for this target? Because it sounds like you 
>> may want to add this case to the BROKEN_COMPILER macro, and not use 
>> this code at all.
> 
> I tried benchmarking it, and in short, this patch seems to be the best 
> solution.
> 
> I tested 3 configurations; with this patch (changing av_always_inline 
> into av_noinline), setting BROKEN_COMPILER (skipping these inline asm 
> functions) and configuring with --cpu=i686 (which means it passes 
> -march=i686 to the compiler, which disallows the use of inline MMX/SSE). 
> I benchmarked singlethreaded decoding of a high bitrate H264 clip 
> (listing the lowest measured time out of 3 runs):
> 
> av_noinline: 90.94 seconds
> BROKEN_COMPILER: 98.92 seconds
> -march=i686: 94.63 seconds
> 
> (The fact that building with -march=i686 is faster than using some but 
> not all inline MMX/SSE is a bit surprising.)
> 
> I also tested the same setup on x86_64 (on a different machine, with 
> Apple Clang), where I tested the above and compare it with the default 
> configuration using av_always_inline):
> 
> av_always_inline: 74.65 seconds
> av_noinline: 73.74 seconds
> BROKEN_COMPILER: 78.10 seconds
> 
> So av_noinline actually seems to be generally favourable here (and for 
> some reason, actually a bit faster than the always_inline case, although 
> I'm not sure if that bit is deterministic in general or not).
> 
> 
> // Martin

Alright, LGTM then.
diff mbox series

Patch

diff --git a/libavcodec/x86/cabac.h b/libavcodec/x86/cabac.h
index 53d74c541e..b046a56a6b 100644
--- a/libavcodec/x86/cabac.h
+++ b/libavcodec/x86/cabac.h
@@ -177,8 +177,13 @@ 
 
 #if HAVE_7REGS && !BROKEN_COMPILER
 #define get_cabac_inline get_cabac_inline_x86
-static av_always_inline int get_cabac_inline_x86(CABACContext *c,
-                                                 uint8_t *const state)
+static
+#if defined(_WIN32) && !defined(_WIN64) && defined(__clang__)
+av_noinline
+#else
+av_always_inline
+#endif
+int get_cabac_inline_x86(CABACContext *c, uint8_t *const state)
 {
     int bit, tmp;
 #ifdef BROKEN_RELOCATIONS