diff mbox series

[FFmpeg-devel] enable auto vectorization for gcc 7 and higher

Message ID 05a46152f1b2458ea326edd9cfb6d817@amazon.com
State Superseded
Headers show
Series [FFmpeg-devel] enable auto vectorization for gcc 7 and higher | expand

Checks

Context Check Description
yinshiyou/commit_msg_loongarch64 warning The first line of the commit message must start with a context terminated by a colon and a space, for example "lavu/opt: " or "doc: ".
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/commit_msg_x86 warning The first line of the commit message must start with a context terminated by a colon and a space, for example "lavu/opt: " or "doc: ".
andriy/make_fate_x86 success Make fate finished
andriy/make_x86 warning New warnings during build

Commit Message

Swinney, Jonathan July 27, 2022, 5:34 p.m. UTC
I recognize that this patch is going to be somewhat controversial. I'm submitting it mostly to see what the opinions are and evaluate options. I am working on improving performance for aarch64. On that architecture, there are fewer hand written assembly implementations of hot functions than there are for x86_64 and allowing gcc to auto-vectorize yields noticeable improvements.

Gcc vectorization has improved recently and it hasn't been evaluated on the mailing list for a few years. This is the latest discussion I found in my searches: http://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/193977.html

If the community is not comfortable accepting a patch like this outright, would you be willing to accept a new option to the configure script, something like --enable-auto-vectorization?

Thanks!

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
---
 configure | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

James Almer July 27, 2022, 5:39 p.m. UTC | #1
On 7/27/2022 2:34 PM, Swinney, Jonathan wrote:
> I recognize that this patch is going to be somewhat controversial. I'm submitting it mostly to see what the opinions are and evaluate options. I am working on improving performance for aarch64. On that architecture, there are fewer hand written assembly implementations of hot functions than there are for x86_64 and allowing gcc to auto-vectorize yields noticeable improvements.
> 
> Gcc vectorization has improved recently and it hasn't been evaluated on the mailing list for a few years. This is the latest discussion I found in my searches: http://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/193977.html

Every time this was done, it was inevitably reverted after complains and 
crash reports started piling up because gcc can't really handle all the 
inline code our codebase has, among other things.

> 
> If the community is not comfortable accepting a patch like this outright, would you be willing to accept a new option to the configure script, something like --enable-auto-vectorization?

--extra-cflags can be used for this.

> 
> Thanks!
> 
> Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
> ---
>   configure | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/configure b/configure
> index 6629d14099..c63c9348ad 100755
> --- a/configure
> +++ b/configure
> @@ -7173,7 +7173,9 @@ if enabled icc; then
>               disable aligned_stack
>       fi
>   elif enabled gcc; then
> -    check_optflags -fno-tree-vectorize
> +    case $gcc_basever in
> +        2|2.*|3.*|4.*|5.*|6.*) check_optflags -fno-tree-vectorize ;;
> +    esac
>       check_cflags -Werror=format-security
>       check_cflags -Werror=implicit-function-declaration
>       check_cflags -Werror=missing-prototypes
Andreas Rheinhardt July 27, 2022, 5:49 p.m. UTC | #2
James Almer:
> On 7/27/2022 2:34 PM, Swinney, Jonathan wrote:
>> I recognize that this patch is going to be somewhat controversial. I'm
>> submitting it mostly to see what the opinions are and evaluate
>> options. I am working on improving performance for aarch64. On that
>> architecture, there are fewer hand written assembly implementations of
>> hot functions than there are for x86_64 and allowing gcc to
>> auto-vectorize yields noticeable improvements.
>>
>> Gcc vectorization has improved recently and it hasn't been evaluated
>> on the mailing list for a few years. This is the latest discussion I
>> found in my searches:
>> http://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/193977.html
> 
> Every time this was done, it was inevitably reverted after complains and
> crash reports started piling up because gcc can't really handle all the
> inline code our codebase has, among other things.
> 
>>
>> If the community is not comfortable accepting a patch like this
>> outright, would you be willing to accept a new option to the configure
>> script, something like --enable-auto-vectorization?
> 
> --extra-cflags can be used for this.
> 

No, it can't, because what is given via --extra-cflags is inserted at
the start of CFLAGS, so that the automatically added -fno-tree-vectorize
overwrites it.

- Andreas
Soft Works July 27, 2022, 6:54 p.m. UTC | #3
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Swinney, Jonathan
> Sent: Wednesday, July 27, 2022 7:35 PM
> To: ffmpeg-devel@ffmpeg.org
> Subject: [FFmpeg-devel] [PATCH] enable auto vectorization for gcc 7
> and higher
> 
> I recognize that this patch is going to be somewhat controversial.
> I'm submitting it mostly to see what the opinions are and evaluate
> options. I am working on improving performance for aarch64. On that
> architecture, there are fewer hand written assembly implementations
> of hot functions than there are for x86_64 and allowing gcc to auto-
> vectorize yields noticeable improvements.
> 
> Gcc vectorization has improved recently and it hasn't been evaluated
> on the mailing list for a few years. This is the latest discussion I
> found in my searches: http://ffmpeg.org/pipermail/ffmpeg-devel/2016-
> May/193977.html
> 
> If the community is not comfortable accepting a patch like this
> outright, would you be willing to accept a new option to the
> configure script, something like --enable-auto-vectorization?
> 
> Thanks!
> 
> Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
> ---
>  configure | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/configure b/configure
> index 6629d14099..c63c9348ad 100755
> --- a/configure
> +++ b/configure
> @@ -7173,7 +7173,9 @@ if enabled icc; then
>              disable aligned_stack
>      fi
>  elif enabled gcc; then
> -    check_optflags -fno-tree-vectorize
> +    case $gcc_basever in
> +        2|2.*|3.*|4.*|5.*|6.*) check_optflags -fno-tree-vectorize ;;
> +    esac
>      check_cflags -Werror=format-security
>      check_cflags -Werror=implicit-function-declaration
>      check_cflags -Werror=missing-prototypes
> --

LGTM - basically. I had removed that flag about two years ago and never
seen an issue (Win,Linux,BSD x x86_64,armv7,aarch64). 
But it has always been with quite recent versions of gcc, so I can't say 
whether it's safe already with 7.x

One exception I've seen was with an Android NDK build in gcc compatibility
mode, where I got a clang compilation error. But that's nothing of concern
I think.

sw
Hendrik Leppkes July 27, 2022, 8:41 p.m. UTC | #4
On Wed, Jul 27, 2022 at 7:39 PM James Almer <jamrial@gmail.com> wrote:
>
> On 7/27/2022 2:34 PM, Swinney, Jonathan wrote:
> > I recognize that this patch is going to be somewhat controversial. I'm submitting it mostly to see what the opinions are and evaluate options. I am working on improving performance for aarch64. On that architecture, there are fewer hand written assembly implementations of hot functions than there are for x86_64 and allowing gcc to auto-vectorize yields noticeable improvements.
> >
> > Gcc vectorization has improved recently and it hasn't been evaluated on the mailing list for a few years. This is the latest discussion I found in my searches: http://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/193977.html
>
> Every time this was done, it was inevitably reverted after complains and
> crash reports started piling up because gcc can't really handle all the
> inline code our codebase has, among other things.
>

No need to wait for issues, I just tested, and the same issues still
persist that have existed for years with GCC now. They don't seem to
care to make it compatible with inline asm, which might be fair
enough, but it means it just can't work here.

In file included from libavcodec/cabac_functions.h:49,
                 from libavcodec/h264_cabac.c:36:
libavcodec/h264_cabac.c: In function 'ff_h264_decode_mb_cabac':
libavcodec/x86/cabac.h:199:5: error: 'asm' operand has impossible constraints

GCC 11.3, configure --cpu=haswell, mingw32

So this is a NACK. It just flat out breaks builds.

- Hendrik
Martin Storsjö July 27, 2022, 9:01 p.m. UTC | #5
On Wed, 27 Jul 2022, Hendrik Leppkes wrote:

> On Wed, Jul 27, 2022 at 7:39 PM James Almer <jamrial@gmail.com> wrote:
>>
>> On 7/27/2022 2:34 PM, Swinney, Jonathan wrote:
>>> I recognize that this patch is going to be somewhat controversial. I'm submitting it mostly to see what the opinions are and evaluate options. I am working on improving performance for aarch64. On that architecture, there are fewer hand written assembly implementations of hot functions than there are for x86_64 and allowing gcc to auto-vectorize yields noticeable improvements.
>>>
>>> Gcc vectorization has improved recently and it hasn't been evaluated on the mailing list for a few years. This is the latest discussion I found in my searches: http://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/193977.html
>>
>> Every time this was done, it was inevitably reverted after complains and
>> crash reports started piling up because gcc can't really handle all the
>> inline code our codebase has, among other things.
>>
>
> No need to wait for issues, I just tested, and the same issues still
> persist that have existed for years with GCC now. They don't seem to
> care to make it compatible with inline asm, which might be fair
> enough, but it means it just can't work here.
>
> In file included from libavcodec/cabac_functions.h:49,
>                 from libavcodec/h264_cabac.c:36:
> libavcodec/h264_cabac.c: In function 'ff_h264_decode_mb_cabac':
> libavcodec/x86/cabac.h:199:5: error: 'asm' operand has impossible constraints

This particular bit of inline assembly has historically been very 
problematic in many configurations (although primarily on i386 I think) - 
see e.g. 8990c5869e27fcd43b53045f87ba251f42e7d293. Would something like 
that be enough for that build configuration to succeed, or are there many 
other cases that break?

// Martin
Hendrik Leppkes July 27, 2022, 9:07 p.m. UTC | #6
On Wed, Jul 27, 2022 at 11:02 PM Martin Storsjö <martin@martin.st> wrote:
>
> On Wed, 27 Jul 2022, Hendrik Leppkes wrote:
>
> > On Wed, Jul 27, 2022 at 7:39 PM James Almer <jamrial@gmail.com> wrote:
> >>
> >> On 7/27/2022 2:34 PM, Swinney, Jonathan wrote:
> >>> I recognize that this patch is going to be somewhat controversial. I'm submitting it mostly to see what the opinions are and evaluate options. I am working on improving performance for aarch64. On that architecture, there are fewer hand written assembly implementations of hot functions than there are for x86_64 and allowing gcc to auto-vectorize yields noticeable improvements.
> >>>
> >>> Gcc vectorization has improved recently and it hasn't been evaluated on the mailing list for a few years. This is the latest discussion I found in my searches: http://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/193977.html
> >>
> >> Every time this was done, it was inevitably reverted after complains and
> >> crash reports started piling up because gcc can't really handle all the
> >> inline code our codebase has, among other things.
> >>
> >
> > No need to wait for issues, I just tested, and the same issues still
> > persist that have existed for years with GCC now. They don't seem to
> > care to make it compatible with inline asm, which might be fair
> > enough, but it means it just can't work here.
> >
> > In file included from libavcodec/cabac_functions.h:49,
> >                 from libavcodec/h264_cabac.c:36:
> > libavcodec/h264_cabac.c: In function 'ff_h264_decode_mb_cabac':
> > libavcodec/x86/cabac.h:199:5: error: 'asm' operand has impossible constraints
>
> This particular bit of inline assembly has historically been very
> problematic in many configurations (although primarily on i386 I think) -
> see e.g. 8990c5869e27fcd43b53045f87ba251f42e7d293. Would something like
> that be enough for that build configuration to succeed, or are there many
> other cases that break?
>

I can test tomorrow, but if we start influencing optimizer decisions
just to run another optimizer flag, such a change would need to be
backed with (positive!) performance numbers, and _very_ thorough
testing (as we all know, trying to prove that something is not an
issue is practically impossible, as the combinations are infinite)

- Hendrik
Andreas Rheinhardt July 27, 2022, 9:33 p.m. UTC | #7
Hendrik Leppkes:
> On Wed, Jul 27, 2022 at 7:39 PM James Almer <jamrial@gmail.com> wrote:
>>
>> On 7/27/2022 2:34 PM, Swinney, Jonathan wrote:
>>> I recognize that this patch is going to be somewhat controversial. I'm submitting it mostly to see what the opinions are and evaluate options. I am working on improving performance for aarch64. On that architecture, there are fewer hand written assembly implementations of hot functions than there are for x86_64 and allowing gcc to auto-vectorize yields noticeable improvements.
>>>
>>> Gcc vectorization has improved recently and it hasn't been evaluated on the mailing list for a few years. This is the latest discussion I found in my searches: http://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/193977.html
>>
>> Every time this was done, it was inevitably reverted after complains and
>> crash reports started piling up because gcc can't really handle all the
>> inline code our codebase has, among other things.
>>
> 
> No need to wait for issues, I just tested, and the same issues still
> persist that have existed for years with GCC now. They don't seem to
> care to make it compatible with inline asm, which might be fair
> enough, but it means it just can't work here.
> 

Have the GCC devs been informed of this issue?

> In file included from libavcodec/cabac_functions.h:49,
>                  from libavcodec/h264_cabac.c:36:
> libavcodec/h264_cabac.c: In function 'ff_h264_decode_mb_cabac':
> libavcodec/x86/cabac.h:199:5: error: 'asm' operand has impossible constraints
> 
> GCC 11.3, configure --cpu=haswell, mingw32
> 
> So this is a NACK. It just flat out breaks builds.
> 
> - Hendrik
Soft Works July 28, 2022, 1:02 a.m. UTC | #8
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Hendrik Leppkes
> Sent: Wednesday, July 27, 2022 10:42 PM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH] enable auto vectorization for gcc
> 7 and higher
> 
> On Wed, Jul 27, 2022 at 7:39 PM James Almer <jamrial@gmail.com>
> wrote:
> >
> > On 7/27/2022 2:34 PM, Swinney, Jonathan wrote:
> > > I recognize that this patch is going to be somewhat
> controversial. I'm submitting it mostly to see what the opinions are
> and evaluate options. I am working on improving performance for
> aarch64. On that architecture, there are fewer hand written assembly
> implementations of hot functions than there are for x86_64 and
> allowing gcc to auto-vectorize yields noticeable improvements.
> > >
> > > Gcc vectorization has improved recently and it hasn't been
> evaluated on the mailing list for a few years. This is the latest
> discussion I found in my searches:
> http://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/193977.html
> >
> > Every time this was done, it was inevitably reverted after
> complains and
> > crash reports started piling up because gcc can't really handle all
> the
> > inline code our codebase has, among other things.
> >
> 
> No need to wait for issues, I just tested, and the same issues still
> persist that have existed for years with GCC now. They don't seem to
> care to make it compatible with inline asm, which might be fair
> enough, but it means it just can't work here.
> 
> In file included from libavcodec/cabac_functions.h:49,
>                  from libavcodec/h264_cabac.c:36:
> libavcodec/h264_cabac.c: In function 'ff_h264_decode_mb_cabac':
> libavcodec/x86/cabac.h:199:5: error: 'asm' operand has impossible
> constraints

I wonder why it doesn't fail when I try the same on MINGW32:

gcc -I. -Isrc/ -D_FORTIFY_SOURCE=0 -D__USE_MINGW_ANSI_STDIO=1 -D_ISOC99_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -U__STRICT_ANSI__ -D__USE_MINGW_ANSI_STDIO=1 -D__printf__=__gnu_printf__ -D_POSIX_C_SOURCE=200112 -D_XOPEN_SOURCE=600 -DOPJ_STATIC -DZLIB_CONST -DHAVE_AV_CONFIG_H -DBUILDING_avcodec -mthreads -DLIBTWOLAME_STATIC -std=c11 -IV:/ffbuild/mas/local32/include -IV:/ffbuild/mas/msys64/mingw32/include -I/mingw32/include -IF:/ffbuild/mas/local32/include -DLIBARCHIVE_STATIC -Wdeclaration-after-statement -Wall -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wwrite-strings -Wtype-limits -Wundef -Wmissing-prototypes -Wstrict-prototypes -Wempty-body -Wno-parentheses -Wno-switch -Wno-format-zero-length -Wno-pointer-sign -Wno-unused-const-variable -Wno-bool-operation -Wno-char-subscripts -O3 -Werror=format-security -Werror=implicit-function-declaration -Werror=missing-prototypes -Werror=return-type -Werror=vla -Wformat -fdiagnostics-color=auto -Wno-maybe-uninitialized -
 ftree-vectorize -MMD -MF libavcodec/h264_cabac.d -MT libavcodec/h264_cabac.o -c -o libavcodec/h264_cabac.o src/libavcodec/h264_cabac.c

When I add garbage to line 199 in cabac.h, it errors, so I'm sure it
gets compiled. Same for the av_always_inline line above.

gcc version is 10.3.0. I wonder whether it's about some of the compiler 
flags that it doesn't error here, but I couldn't reproduce with various
combinations. Maybe you can spot a difference?


From my experience, tree-vectorize can provide quite some improvements
in certain cases, but I often had to rewrite the loops (primarily
simplifying) until these got actually vectorized in the way I wanted.

Another conclusion from that work is that there's hardly any benefit
in using tree-vectorize in combination with O3. When O3 is specified,
gcc preferres loop-unrolling over vectorization in the vast majority
of cases (often slower).
Even worse is that loop-unrolling cannot be disabled individually
(neither globally with -O3 -fno-unroll-loops - nor locally with
function __attribute__ or pragma gcc optimize)

I had done a (small) number of tests doing typical stuff to compare
O2 and O3 and I couldn't notice any relevant advantages of O3. 
It wasn't exhaustive and very likely one can find cases where O3
performs better, but the vectorization advantages on the other
side were actually relevant, so I had chosen to change all our builds
to O2.

Looking at my notes I remember that I had tried a number of things
to control gcc optimizations at the function level. It didn't
work for me to activate vectorization optimizations (which are 
globally disabled), but maybe it works the other way round.
What you could try is either:

#pragma GCC optimize("no-tree-vectorize")

or 

#ifdef __GNUC__
    __attribute__((optimize("-fno-tree-vectorize")))
#endif

to decorate the function which errors at your side (or maybe
even at the upstream caller).
Maybe this allows to disable vectorization locally for the
erroring case.

Best regards,
softworkz

PS: The observations I made were for x86_64 code in the 
context of ffmpeg compiled with gcc 10 (maybe 9) and analyzed
with Intel tools.
James Almer July 28, 2022, 1:05 a.m. UTC | #9
On 7/27/2022 10:02 PM, Soft Works wrote:
> 
>> -----Original Message-----
>> From: ffmpeg-devel<ffmpeg-devel-bounces@ffmpeg.org>  On Behalf Of
>> Hendrik Leppkes
>> Sent: Wednesday, July 27, 2022 10:42 PM
>> To: FFmpeg development discussions and patches <ffmpeg-
>> devel@ffmpeg.org>
>> Subject: Re: [FFmpeg-devel] [PATCH] enable auto vectorization for gcc
>> 7 and higher
>>
>> On Wed, Jul 27, 2022 at 7:39 PM James Almer<jamrial@gmail.com>
>> wrote:
>>> On 7/27/2022 2:34 PM, Swinney, Jonathan wrote:
>>>> I recognize that this patch is going to be somewhat
>> controversial. I'm submitting it mostly to see what the opinions are
>> and evaluate options. I am working on improving performance for
>> aarch64. On that architecture, there are fewer hand written assembly
>> implementations of hot functions than there are for x86_64 and
>> allowing gcc to auto-vectorize yields noticeable improvements.
>>>> Gcc vectorization has improved recently and it hasn't been
>> evaluated on the mailing list for a few years. This is the latest
>> discussion I found in my searches:
>> http://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/193977.html
>>> Every time this was done, it was inevitably reverted after
>> complains and
>>> crash reports started piling up because gcc can't really handle all
>> the
>>> inline code our codebase has, among other things.
>>>
>> No need to wait for issues, I just tested, and the same issues still
>> persist that have existed for years with GCC now. They don't seem to
>> care to make it compatible with inline asm, which might be fair
>> enough, but it means it just can't work here.
>>
>> In file included from libavcodec/cabac_functions.h:49,
>>                   from libavcodec/h264_cabac.c:36:
>> libavcodec/h264_cabac.c: In function 'ff_h264_decode_mb_cabac':
>> libavcodec/x86/cabac.h:199:5: error: 'asm' operand has impossible
>> constraints
> I wonder why it doesn't fail when I try the same on MINGW32:
> 
> gcc -I. -Isrc/ -D_FORTIFY_SOURCE=0 -D__USE_MINGW_ANSI_STDIO=1 -D_ISOC99_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -U__STRICT_ANSI__ -D__USE_MINGW_ANSI_STDIO=1 -D__printf__=__gnu_printf__ -D_POSIX_C_SOURCE=200112 -D_XOPEN_SOURCE=600 -DOPJ_STATIC -DZLIB_CONST -DHAVE_AV_CONFIG_H -DBUILDING_avcodec -mthreads -DLIBTWOLAME_STATIC -std=c11 -IV:/ffbuild/mas/local32/include -IV:/ffbuild/mas/msys64/mingw32/include -I/mingw32/include -IF:/ffbuild/mas/local32/include -DLIBARCHIVE_STATIC -Wdeclaration-after-statement -Wall -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wwrite-strings -Wtype-limits -Wundef -Wmissing-prototypes -Wstrict-prototypes -Wempty-body -Wno-parentheses -Wno-switch -Wno-format-zero-length -Wno-pointer-sign -Wno-unused-const-variable -Wno-bool-operation -Wno-char-subscripts -O3 -Werror=format-security -Werror=implicit-function-declaration -Werror=missing-prototypes -Werror=return-type -Werror=vla -Wformat -fdiagnostics-color=auto -Wno-maybe-uninitialized
  -
>   ftree-vectorize -MMD -MF libavcodec/h264_cabac.d -MT libavcodec/h264_cabac.o -c -o libavcodec/h264_cabac.o src/libavcodec/h264_cabac.c

You didn't set CPU to haswell (Which will add -march=haswell to the 
command line).
Soft Works July 28, 2022, 1:10 a.m. UTC | #10
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> James Almer
> Sent: Thursday, July 28, 2022 3:05 AM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH] enable auto vectorization for gcc
> 7 and higher
> 
> On 7/27/2022 10:02 PM, Soft Works wrote:
> >
> >> -----Original Message-----
> >> From: ffmpeg-devel<ffmpeg-devel-bounces@ffmpeg.org>  On Behalf Of
> >> Hendrik Leppkes
> >> Sent: Wednesday, July 27, 2022 10:42 PM
> >> To: FFmpeg development discussions and patches <ffmpeg-
> >> devel@ffmpeg.org>
> >> Subject: Re: [FFmpeg-devel] [PATCH] enable auto vectorization for
> gcc
> >> 7 and higher
> >>
> >> On Wed, Jul 27, 2022 at 7:39 PM James Almer<jamrial@gmail.com>
> >> wrote:
> >>> On 7/27/2022 2:34 PM, Swinney, Jonathan wrote:
> >>>> I recognize that this patch is going to be somewhat
> >> controversial. I'm submitting it mostly to see what the opinions
> are
> >> and evaluate options. I am working on improving performance for
> >> aarch64. On that architecture, there are fewer hand written
> assembly
> >> implementations of hot functions than there are for x86_64 and
> >> allowing gcc to auto-vectorize yields noticeable improvements.
> >>>> Gcc vectorization has improved recently and it hasn't been
> >> evaluated on the mailing list for a few years. This is the latest
> >> discussion I found in my searches:
> >> http://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/193977.html
> >>> Every time this was done, it was inevitably reverted after
> >> complains and
> >>> crash reports started piling up because gcc can't really handle
> all
> >> the
> >>> inline code our codebase has, among other things.
> >>>
> >> No need to wait for issues, I just tested, and the same issues
> still
> >> persist that have existed for years with GCC now. They don't seem
> to
> >> care to make it compatible with inline asm, which might be fair
> >> enough, but it means it just can't work here.
> >>
> >> In file included from libavcodec/cabac_functions.h:49,
> >>                   from libavcodec/h264_cabac.c:36:
> >> libavcodec/h264_cabac.c: In function 'ff_h264_decode_mb_cabac':
> >> libavcodec/x86/cabac.h:199:5: error: 'asm' operand has impossible
> >> constraints
> > I wonder why it doesn't fail when I try the same on MINGW32:
> >
> > gcc -I. -Isrc/ -D_FORTIFY_SOURCE=0 -D__USE_MINGW_ANSI_STDIO=1 -
> D_ISOC99_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -
> U__STRICT_ANSI__ -D__USE_MINGW_ANSI_STDIO=1 -
> D__printf__=__gnu_printf__ -D_POSIX_C_SOURCE=200112 -
> D_XOPEN_SOURCE=600 -DOPJ_STATIC -DZLIB_CONST -DHAVE_AV_CONFIG_H -
> DBUILDING_avcodec -mthreads -DLIBTWOLAME_STATIC -std=c11 -
> IV:/ffbuild/mas/local32/include -
> IV:/ffbuild/mas/msys64/mingw32/include -I/mingw32/include -
> IF:/ffbuild/mas/local32/include -DLIBARCHIVE_STATIC -Wdeclaration-
> after-statement -Wall -Wdisabled-optimization -Wpointer-arith -
> Wredundant-decls -Wwrite-strings -Wtype-limits -Wundef -Wmissing-
> prototypes -Wstrict-prototypes -Wempty-body -Wno-parentheses -Wno-
> switch -Wno-format-zero-length -Wno-pointer-sign -Wno-unused-const-
> variable -Wno-bool-operation -Wno-char-subscripts -O3 -Werror=format-
> security -Werror=implicit-function-declaration -Werror=missing-
> prototypes -Werror=return-type -Werror=vla -Wformat -fdiagnostics-
> color=auto -Wno-maybe-uninitialized
>   -
> >   ftree-vectorize -MMD -MF libavcodec/h264_cabac.d -MT
> libavcodec/h264_cabac.o -c -o libavcodec/h264_cabac.o
> src/libavcodec/h264_cabac.c
> 
> You didn't set CPU to haswell (Which will add -march=haswell to the
> command line).

Yup, you're right - this way I get the same error as Hendrik. Thanks!

But then, when changing -O3 to -O2, it's compiling without
error again.

softworkz
Soft Works July 28, 2022, 1:15 a.m. UTC | #11
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Soft Works
> Sent: Thursday, July 28, 2022 3:11 AM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH] enable auto vectorization for gcc
> 7 and higher
> 
> 
> 
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> > James Almer
> > Sent: Thursday, July 28, 2022 3:05 AM
> > To: ffmpeg-devel@ffmpeg.org
> > Subject: Re: [FFmpeg-devel] [PATCH] enable auto vectorization for
> gcc
> > 7 and higher
> >
> > On 7/27/2022 10:02 PM, Soft Works wrote:
> > >
> > >> -----Original Message-----
> > >> From: ffmpeg-devel<ffmpeg-devel-bounces@ffmpeg.org>  On Behalf
> Of
> > >> Hendrik Leppkes
> > >> Sent: Wednesday, July 27, 2022 10:42 PM
> > >> To: FFmpeg development discussions and patches <ffmpeg-
> > >> devel@ffmpeg.org>
> > >> Subject: Re: [FFmpeg-devel] [PATCH] enable auto vectorization
> for
> > gcc
> > >> 7 and higher
> > >>
> > >> On Wed, Jul 27, 2022 at 7:39 PM James Almer<jamrial@gmail.com>
> > >> wrote:
> > >>> On 7/27/2022 2:34 PM, Swinney, Jonathan wrote:
> > >>>> I recognize that this patch is going to be somewhat
> > >> controversial. I'm submitting it mostly to see what the opinions
> > are
> > >> and evaluate options. I am working on improving performance for
> > >> aarch64. On that architecture, there are fewer hand written
> > assembly
> > >> implementations of hot functions than there are for x86_64 and
> > >> allowing gcc to auto-vectorize yields noticeable improvements.
> > >>>> Gcc vectorization has improved recently and it hasn't been
> > >> evaluated on the mailing list for a few years. This is the
> latest
> > >> discussion I found in my searches:
> > >> http://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/193977.html
> > >>> Every time this was done, it was inevitably reverted after
> > >> complains and
> > >>> crash reports started piling up because gcc can't really handle
> > all
> > >> the
> > >>> inline code our codebase has, among other things.
> > >>>
> > >> No need to wait for issues, I just tested, and the same issues
> > still
> > >> persist that have existed for years with GCC now. They don't
> seem
> > to
> > >> care to make it compatible with inline asm, which might be fair
> > >> enough, but it means it just can't work here.
> > >>
> > >> In file included from libavcodec/cabac_functions.h:49,
> > >>                   from libavcodec/h264_cabac.c:36:
> > >> libavcodec/h264_cabac.c: In function 'ff_h264_decode_mb_cabac':
> > >> libavcodec/x86/cabac.h:199:5: error: 'asm' operand has
> impossible
> > >> constraints
> > > I wonder why it doesn't fail when I try the same on MINGW32:
> > >
> > > gcc -I. -Isrc/ -D_FORTIFY_SOURCE=0 -D__USE_MINGW_ANSI_STDIO=1 -
> > D_ISOC99_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -
> > U__STRICT_ANSI__ -D__USE_MINGW_ANSI_STDIO=1 -
> > D__printf__=__gnu_printf__ -D_POSIX_C_SOURCE=200112 -
> > D_XOPEN_SOURCE=600 -DOPJ_STATIC -DZLIB_CONST -DHAVE_AV_CONFIG_H -
> > DBUILDING_avcodec -mthreads -DLIBTWOLAME_STATIC -std=c11 -
> > IV:/ffbuild/mas/local32/include -
> > IV:/ffbuild/mas/msys64/mingw32/include -I/mingw32/include -
> > IF:/ffbuild/mas/local32/include -DLIBARCHIVE_STATIC -Wdeclaration-
> > after-statement -Wall -Wdisabled-optimization -Wpointer-arith -
> > Wredundant-decls -Wwrite-strings -Wtype-limits -Wundef -Wmissing-
> > prototypes -Wstrict-prototypes -Wempty-body -Wno-parentheses -Wno-
> > switch -Wno-format-zero-length -Wno-pointer-sign -Wno-unused-const-
> > variable -Wno-bool-operation -Wno-char-subscripts -O3 -
> Werror=format-
> > security -Werror=implicit-function-declaration -Werror=missing-
> > prototypes -Werror=return-type -Werror=vla -Wformat -fdiagnostics-
> > color=auto -Wno-maybe-uninitialized
> >   -
> > >   ftree-vectorize -MMD -MF libavcodec/h264_cabac.d -MT
> > libavcodec/h264_cabac.o -c -o libavcodec/h264_cabac.o
> > src/libavcodec/h264_cabac.c
> >
> > You didn't set CPU to haswell (Which will add -march=haswell to the
> > command line).
> 
> Yup, you're right - this way I get the same error as Hendrik. Thanks!
> 
> But then, when changing -O3 to -O2, it's compiling without
> error again.

Adding 

#pragma GCC optimize("no-tree-vectorize")

to get_cabac_inline_x86() allows compiling even with -O3
(the attribute approach doesn't seem to work).

softworkz
diff mbox series

Patch

diff --git a/configure b/configure
index 6629d14099..c63c9348ad 100755
--- a/configure
+++ b/configure
@@ -7173,7 +7173,9 @@  if enabled icc; then
             disable aligned_stack
     fi
 elif enabled gcc; then
-    check_optflags -fno-tree-vectorize
+    case $gcc_basever in
+        2|2.*|3.*|4.*|5.*|6.*) check_optflags -fno-tree-vectorize ;;
+    esac
     check_cflags -Werror=format-security
     check_cflags -Werror=implicit-function-declaration
     check_cflags -Werror=missing-prototypes