diff mbox series

[FFmpeg-devel] configure: reenable tree vectorization for GCC

Message ID MBJY2ob--3-2@lynne.ee
State New
Headers show
Series [FFmpeg-devel] configure: reenable tree vectorization for GCC | expand

Checks

Context Check Description
andriy/default pending
andriy/make success Make finished
andriy/make_fate success Make fate finished

Commit Message

Lynne July 3, 2020, noon UTC
Maybe they've fixed the FATE failures.
Subject: [PATCH] configure: reenable tree vectorization for GCC

Its been years, maybe they've fixed the FATE failures.
---
 configure | 1 -
 1 file changed, 1 deletion(-)

Comments

Hendrik Leppkes July 3, 2020, 12:53 p.m. UTC | #1
On Fri, Jul 3, 2020 at 2:00 PM Lynne <dev@lynne.ee> wrote:
>
> Maybe they've fixed the FATE failures.
>

Failures with old compilers are not going to get fixed retroactively.
If you want to enable this, you need to figure out which compilers
dont break and enable it conditionally. Otherwise, it'll remain off,
since its known to break in numerous configurations.

- Hendrik
James Almer July 3, 2020, 1:19 p.m. UTC | #2
On 7/3/2020 9:00 AM, Lynne wrote:
> Maybe they've fixed the FATE failures.

We have done this at least two times before, and all of them limited to
the newest compilers which supposedly were good. It always resulted in
compilations issues being reported sooner or later, and us having to
remove it again.

So no, lets keep it this way. The parts in the code that matter have
custom assembly after all.
Lynne July 3, 2020, 1:38 p.m. UTC | #3
Jul 3, 2020, 13:53 by h.leppkes@gmail.com:

> On Fri, Jul 3, 2020 at 2:00 PM Lynne <dev@lynne.ee> wrote:
>
>>
>> Maybe they've fixed the FATE failures.
>>
>
> Failures with old compilers are not going to get fixed retroactively.
> If you want to enable this, you need to figure out which compilers
> dont break and enable it conditionally. Otherwise, it'll remain off,
> since its known to break in numerous configurations.
>

Or we can push this and check FATE.
Paul B Mahol July 3, 2020, 2:18 p.m. UTC | #4
On 7/3/20, James Almer <jamrial@gmail.com> wrote:
> On 7/3/2020 9:00 AM, Lynne wrote:
>> Maybe they've fixed the FATE failures.
>
> We have done this at least two times before, and all of them limited to
> the newest compilers which supposedly were good. It always resulted in
> compilations issues being reported sooner or later, and us having to
> remove it again.
>
> So no, lets keep it this way. The parts in the code that matter have
> custom assembly after all.

This is very bad decision. There should be way to disable
vectorization on file by file basis.
Lynne July 3, 2020, 2:22 p.m. UTC | #5
Jul 3, 2020, 14:19 by jamrial@gmail.com:

> On 7/3/2020 9:00 AM, Lynne wrote:
>
>> Maybe they've fixed the FATE failures.
>>
>
> We have done this at least two times before, and all of them limited to
> the newest compilers which supposedly were good. It always resulted in
> compilations issues being reported sooner or later, and us having to
> remove it again.
>
> So no, lets keep it this way. The parts in the code that matter have
> custom assembly after all.
>

I'm fine with that, I just posted this to check people's opinions on it.
I do remember it giving a small boost to dirac golomb parsing, but
not as much as clang's default settings had.
Martin Storsjö July 3, 2020, 6:06 p.m. UTC | #6
On Fri, 3 Jul 2020, James Almer wrote:

> On 7/3/2020 9:00 AM, Lynne wrote:
>> Maybe they've fixed the FATE failures.
>
> We have done this at least two times before, and all of them limited to
> the newest compilers which supposedly were good. It always resulted in
> compilations issues being reported sooner or later, and us having to
> remove it again.

If there's issues with the latest compiler versions, it would be good if 
the issues would be reproduced, reduced and reported - otherwise the bugs 
probably won't get fixed ever.

In general, autovectorization does help quite notably in cases when 
there's no SIMD yet. At least on dav1d's DSP routines on aarch64, GCC's 
autovectorization does a better job than clang (without misoptimizations).

// Martin
Michael Niedermayer July 3, 2020, 8:42 p.m. UTC | #7
On Fri, Jul 03, 2020 at 02:00:02PM +0200, Lynne wrote:
> Maybe they've fixed the FATE failures.
> 

>  configure |    1 -
>  1 file changed, 1 deletion(-)
> b22153ad0a33560a4fb549b227efabeec0952542  0001-configure-reenable-tree-vectorization-for-GCC.patch
> From 70b1f1e9af83ff855fc4633d2f95eab7eb0173e1 Mon Sep 17 00:00:00 2001
> From: Lynne <dev@lynne.ee>
> Date: Fri, 3 Jul 2020 12:51:27 +0100
> Subject: [PATCH] configure: reenable tree vectorization for GCC
> 
> Its been years, maybe they've fixed the FATE failures.
> ---
>  configure | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/configure b/configure
> index bdfd731602..cf4ea6ea48 100755
> --- a/configure
> +++ b/configure
> @@ -6908,7 +6908,6 @@ if enabled icc; then
>              disable aligned_stack
>      fi
>  elif enabled gcc; then
> -    check_optflags -fno-tree-vectorize
>      check_cflags -Werror=format-security
>      check_cflags -Werror=implicit-function-declaration
>      check_cflags -Werror=missing-prototypes
> -- 
> 2.27.0.353.gb9a2d1a020
> 


Breaks with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04) [First and only version tested]
did not investigate if the affected code is free of UB or what is exactly
happening, just reporting that theres a segfault ...

./ffmpeg -i mm-short.mpg -pix_fmt yuv420p16be -vframes 1 file-yuv420p16be.pgmyuv
./ffmpeg -y -i file-yuv420p16be.pgmyuv -bitexact file-yuv420p16be.bmp
... 
Input #0, pgmyuv_pipe, from 'file-yuv420p16be.pgmyuv':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: pgmyuv, yuv420p16le, 720x576, 25 tbr, 25 tbn, 25 tbc
Stream mapping:
  Stream #0:0 -> #0:0 (pgmyuv (native) -> bmp (native))
Press [q] to stop, [?] for help

Program received signal SIGSEGV, Segmentation fault.
pnm_decode_frame (avctx=<optimized out>, data=0x555557b936c0, got_frame=0x7fffffffd590, avpkt=<optimized out>) at libavcodec/pnmdec.c:230
230	                    v = av_be2ne16(((uint16_t *)s->bytestream)[j]);
(gdb) bt
#0  0x000055555601d210 in pnm_decode_frame (avctx=<optimized out>, data=0x555557b936c0, got_frame=0x7fffffffd590, avpkt=<optimized out>) at libavcodec/pnmdec.c:230
#1  0x0000555555c4cdb7 in decode_simple_internal (frame=0x555557b936c0, avctx=0x555557b85e80) at libavcodec/decode.c:342
#2  0x0000555555c4cdb7 in decode_simple_receive_frame (frame=<optimized out>, avctx=<optimized out>) at libavcodec/decode.c:538
#3  0x0000555555c4cdb7 in decode_receive_frame_internal (avctx=avctx@entry=0x555557b85e80, frame=0x555557b936c0) at libavcodec/decode.c:556
#4  0x0000555555c4d868 in avcodec_send_packet (avctx=0x555557b85e80, avpkt=0x7fffffffd640) at libavcodec/decode.c:614
#5  0x000055555573eb2b in decode (pkt=0x7fffffffd640, got_frame=0x7fffffffd780, frame=<optimized out>, avctx=0x555557b85e80) at fftools/ffmpeg.c:2217
#6  0x000055555573eb2b in decode_video (ist=0x555557b85cc0, pkt=<optimized out>, got_output=0x7fffffffd780, duration_pts=0x7fffffffd788, eof=0, decode_failed=0x7fffffffd784) at fftools/ffmpeg.c:2359
#7  0x000055555573f85c in process_input_packet (ist=0x555557b85cc0, pkt=0x7fffffffd970, no_eof=0) at fftools/ffmpeg.c:2600
#8  0x0000555555742250 in process_input (file_index=<optimized out>) at fftools/ffmpeg.c:4491
#9  0x0000555555742250 in transcode_step () at fftools/ffmpeg.c:4611
#10 0x0000555555742250 in transcode () at fftools/ffmpeg.c:4665
#11 0x000055555571bd8c in main (argc=<optimized out>, argv=<optimized out>) at fftools/ffmpeg.c:4870
(gdb) disassemble $rip-32, $rip+32
Dump of assembler code from 0x55555601d1f0 to 0x55555601d230:
   0x000055555601d1f0 <pnm_decode_frame+2640>:	(bad)  
   0x000055555601d1f1 <pnm_decode_frame+2641>:	xor    %edx,%edx
   0x000055555601d1f3 <pnm_decode_frame+2643>:	xor    %edi,%edi
   0x000055555601d1f5 <pnm_decode_frame+2645>:	sub    %ecx,%r14d
   0x000055555601d1f8 <pnm_decode_frame+2648>:	add    %rcx,%rcx
   0x000055555601d1fb <pnm_decode_frame+2651>:	lea    (%rsi,%rcx,1),%r11
   0x000055555601d1ff <pnm_decode_frame+2655>:	mov    %r14d,%r12d
   0x000055555601d202 <pnm_decode_frame+2658>:	add    %r8,%rcx
   0x000055555601d205 <pnm_decode_frame+2661>:	shr    $0x3,%r12d
   0x000055555601d209 <pnm_decode_frame+2665>:	nopl   0x0(%rax)
=> 0x000055555601d210 <pnm_decode_frame+2672>:	movdqa (%r11,%rdx,1),%xmm0
   0x000055555601d216 <pnm_decode_frame+2678>:	add    $0x1,%edi
   0x000055555601d219 <pnm_decode_frame+2681>:	movdqa %xmm0,%xmm2
   0x000055555601d21d <pnm_decode_frame+2685>:	movdqa %xmm0,%xmm1
   0x000055555601d221 <pnm_decode_frame+2689>:	punpcklwd %xmm3,%xmm2
   0x000055555601d225 <pnm_decode_frame+2693>:	punpckhwd %xmm3,%xmm1
   0x000055555601d229 <pnm_decode_frame+2697>:	pslld  $0x8,%xmm2
   0x000055555601d22e <pnm_decode_frame+2702>:	pslld  $0x8,%xmm1
End of assembler dump.


[...]
Steinar H. Gunderson July 3, 2020, 9:21 p.m. UTC | #8
On Fri, Jul 03, 2020 at 10:42:04PM +0200, Michael Niedermayer wrote:
> 230	                    v = av_be2ne16(((uint16_t *)s->bytestream)[j]);

Dereferencing an unaligned uint16_t* is UB, indeed. I've seen similar code
break on ARM, with no -ftree-vectorize.

/* Steinar */
Michael Niedermayer July 3, 2020, 10:16 p.m. UTC | #9
On Fri, Jul 03, 2020 at 11:21:55PM +0200, Steinar H. Gunderson wrote:
> On Fri, Jul 03, 2020 at 10:42:04PM +0200, Michael Niedermayer wrote:
> > 230	                    v = av_be2ne16(((uint16_t *)s->bytestream)[j]);
> 
> Dereferencing an unaligned uint16_t* is UB, indeed. I've seen similar code
> break on ARM, with no -ftree-vectorize.

yes, just posted a patch to fix the obvious such cases from pnmdec
this also fixes this tree vectorize segfault

thx

[...]
diff mbox series

Patch

diff --git a/configure b/configure
index bdfd731602..cf4ea6ea48 100755
--- a/configure
+++ b/configure
@@ -6908,7 +6908,6 @@  if enabled icc; then
             disable aligned_stack
     fi
 elif enabled gcc; then
-    check_optflags -fno-tree-vectorize
     check_cflags -Werror=format-security
     check_cflags -Werror=implicit-function-declaration
     check_cflags -Werror=missing-prototypes