Message ID | MBJY2ob--3-2@lynne.ee |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel] configure: reenable tree vectorization for GCC | expand |
Context | Check | Description |
---|---|---|
andriy/default | pending | |
andriy/make | success | Make finished |
andriy/make_fate | success | Make fate finished |
On Fri, Jul 3, 2020 at 2:00 PM Lynne <dev@lynne.ee> wrote: > > Maybe they've fixed the FATE failures. > Failures with old compilers are not going to get fixed retroactively. If you want to enable this, you need to figure out which compilers dont break and enable it conditionally. Otherwise, it'll remain off, since its known to break in numerous configurations. - Hendrik
On 7/3/2020 9:00 AM, Lynne wrote:
> Maybe they've fixed the FATE failures.
We have done this at least two times before, and all of them limited to
the newest compilers which supposedly were good. It always resulted in
compilations issues being reported sooner or later, and us having to
remove it again.
So no, lets keep it this way. The parts in the code that matter have
custom assembly after all.
Jul 3, 2020, 13:53 by h.leppkes@gmail.com: > On Fri, Jul 3, 2020 at 2:00 PM Lynne <dev@lynne.ee> wrote: > >> >> Maybe they've fixed the FATE failures. >> > > Failures with old compilers are not going to get fixed retroactively. > If you want to enable this, you need to figure out which compilers > dont break and enable it conditionally. Otherwise, it'll remain off, > since its known to break in numerous configurations. > Or we can push this and check FATE.
On 7/3/20, James Almer <jamrial@gmail.com> wrote: > On 7/3/2020 9:00 AM, Lynne wrote: >> Maybe they've fixed the FATE failures. > > We have done this at least two times before, and all of them limited to > the newest compilers which supposedly were good. It always resulted in > compilations issues being reported sooner or later, and us having to > remove it again. > > So no, lets keep it this way. The parts in the code that matter have > custom assembly after all. This is very bad decision. There should be way to disable vectorization on file by file basis.
Jul 3, 2020, 14:19 by jamrial@gmail.com: > On 7/3/2020 9:00 AM, Lynne wrote: > >> Maybe they've fixed the FATE failures. >> > > We have done this at least two times before, and all of them limited to > the newest compilers which supposedly were good. It always resulted in > compilations issues being reported sooner or later, and us having to > remove it again. > > So no, lets keep it this way. The parts in the code that matter have > custom assembly after all. > I'm fine with that, I just posted this to check people's opinions on it. I do remember it giving a small boost to dirac golomb parsing, but not as much as clang's default settings had.
On Fri, 3 Jul 2020, James Almer wrote: > On 7/3/2020 9:00 AM, Lynne wrote: >> Maybe they've fixed the FATE failures. > > We have done this at least two times before, and all of them limited to > the newest compilers which supposedly were good. It always resulted in > compilations issues being reported sooner or later, and us having to > remove it again. If there's issues with the latest compiler versions, it would be good if the issues would be reproduced, reduced and reported - otherwise the bugs probably won't get fixed ever. In general, autovectorization does help quite notably in cases when there's no SIMD yet. At least on dav1d's DSP routines on aarch64, GCC's autovectorization does a better job than clang (without misoptimizations). // Martin
On Fri, Jul 03, 2020 at 02:00:02PM +0200, Lynne wrote: > Maybe they've fixed the FATE failures. > > configure | 1 - > 1 file changed, 1 deletion(-) > b22153ad0a33560a4fb549b227efabeec0952542 0001-configure-reenable-tree-vectorization-for-GCC.patch > From 70b1f1e9af83ff855fc4633d2f95eab7eb0173e1 Mon Sep 17 00:00:00 2001 > From: Lynne <dev@lynne.ee> > Date: Fri, 3 Jul 2020 12:51:27 +0100 > Subject: [PATCH] configure: reenable tree vectorization for GCC > > Its been years, maybe they've fixed the FATE failures. > --- > configure | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/configure b/configure > index bdfd731602..cf4ea6ea48 100755 > --- a/configure > +++ b/configure > @@ -6908,7 +6908,6 @@ if enabled icc; then > disable aligned_stack > fi > elif enabled gcc; then > - check_optflags -fno-tree-vectorize > check_cflags -Werror=format-security > check_cflags -Werror=implicit-function-declaration > check_cflags -Werror=missing-prototypes > -- > 2.27.0.353.gb9a2d1a020 > Breaks with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04) [First and only version tested] did not investigate if the affected code is free of UB or what is exactly happening, just reporting that theres a segfault ... ./ffmpeg -i mm-short.mpg -pix_fmt yuv420p16be -vframes 1 file-yuv420p16be.pgmyuv ./ffmpeg -y -i file-yuv420p16be.pgmyuv -bitexact file-yuv420p16be.bmp ... Input #0, pgmyuv_pipe, from 'file-yuv420p16be.pgmyuv': Duration: N/A, bitrate: N/A Stream #0:0: Video: pgmyuv, yuv420p16le, 720x576, 25 tbr, 25 tbn, 25 tbc Stream mapping: Stream #0:0 -> #0:0 (pgmyuv (native) -> bmp (native)) Press [q] to stop, [?] for help Program received signal SIGSEGV, Segmentation fault. pnm_decode_frame (avctx=<optimized out>, data=0x555557b936c0, got_frame=0x7fffffffd590, avpkt=<optimized out>) at libavcodec/pnmdec.c:230 230 v = av_be2ne16(((uint16_t *)s->bytestream)[j]); (gdb) bt #0 0x000055555601d210 in pnm_decode_frame (avctx=<optimized out>, data=0x555557b936c0, got_frame=0x7fffffffd590, avpkt=<optimized out>) at libavcodec/pnmdec.c:230 #1 0x0000555555c4cdb7 in decode_simple_internal (frame=0x555557b936c0, avctx=0x555557b85e80) at libavcodec/decode.c:342 #2 0x0000555555c4cdb7 in decode_simple_receive_frame (frame=<optimized out>, avctx=<optimized out>) at libavcodec/decode.c:538 #3 0x0000555555c4cdb7 in decode_receive_frame_internal (avctx=avctx@entry=0x555557b85e80, frame=0x555557b936c0) at libavcodec/decode.c:556 #4 0x0000555555c4d868 in avcodec_send_packet (avctx=0x555557b85e80, avpkt=0x7fffffffd640) at libavcodec/decode.c:614 #5 0x000055555573eb2b in decode (pkt=0x7fffffffd640, got_frame=0x7fffffffd780, frame=<optimized out>, avctx=0x555557b85e80) at fftools/ffmpeg.c:2217 #6 0x000055555573eb2b in decode_video (ist=0x555557b85cc0, pkt=<optimized out>, got_output=0x7fffffffd780, duration_pts=0x7fffffffd788, eof=0, decode_failed=0x7fffffffd784) at fftools/ffmpeg.c:2359 #7 0x000055555573f85c in process_input_packet (ist=0x555557b85cc0, pkt=0x7fffffffd970, no_eof=0) at fftools/ffmpeg.c:2600 #8 0x0000555555742250 in process_input (file_index=<optimized out>) at fftools/ffmpeg.c:4491 #9 0x0000555555742250 in transcode_step () at fftools/ffmpeg.c:4611 #10 0x0000555555742250 in transcode () at fftools/ffmpeg.c:4665 #11 0x000055555571bd8c in main (argc=<optimized out>, argv=<optimized out>) at fftools/ffmpeg.c:4870 (gdb) disassemble $rip-32, $rip+32 Dump of assembler code from 0x55555601d1f0 to 0x55555601d230: 0x000055555601d1f0 <pnm_decode_frame+2640>: (bad) 0x000055555601d1f1 <pnm_decode_frame+2641>: xor %edx,%edx 0x000055555601d1f3 <pnm_decode_frame+2643>: xor %edi,%edi 0x000055555601d1f5 <pnm_decode_frame+2645>: sub %ecx,%r14d 0x000055555601d1f8 <pnm_decode_frame+2648>: add %rcx,%rcx 0x000055555601d1fb <pnm_decode_frame+2651>: lea (%rsi,%rcx,1),%r11 0x000055555601d1ff <pnm_decode_frame+2655>: mov %r14d,%r12d 0x000055555601d202 <pnm_decode_frame+2658>: add %r8,%rcx 0x000055555601d205 <pnm_decode_frame+2661>: shr $0x3,%r12d 0x000055555601d209 <pnm_decode_frame+2665>: nopl 0x0(%rax) => 0x000055555601d210 <pnm_decode_frame+2672>: movdqa (%r11,%rdx,1),%xmm0 0x000055555601d216 <pnm_decode_frame+2678>: add $0x1,%edi 0x000055555601d219 <pnm_decode_frame+2681>: movdqa %xmm0,%xmm2 0x000055555601d21d <pnm_decode_frame+2685>: movdqa %xmm0,%xmm1 0x000055555601d221 <pnm_decode_frame+2689>: punpcklwd %xmm3,%xmm2 0x000055555601d225 <pnm_decode_frame+2693>: punpckhwd %xmm3,%xmm1 0x000055555601d229 <pnm_decode_frame+2697>: pslld $0x8,%xmm2 0x000055555601d22e <pnm_decode_frame+2702>: pslld $0x8,%xmm1 End of assembler dump. [...]
On Fri, Jul 03, 2020 at 10:42:04PM +0200, Michael Niedermayer wrote:
> 230 v = av_be2ne16(((uint16_t *)s->bytestream)[j]);
Dereferencing an unaligned uint16_t* is UB, indeed. I've seen similar code
break on ARM, with no -ftree-vectorize.
/* Steinar */
On Fri, Jul 03, 2020 at 11:21:55PM +0200, Steinar H. Gunderson wrote: > On Fri, Jul 03, 2020 at 10:42:04PM +0200, Michael Niedermayer wrote: > > 230 v = av_be2ne16(((uint16_t *)s->bytestream)[j]); > > Dereferencing an unaligned uint16_t* is UB, indeed. I've seen similar code > break on ARM, with no -ftree-vectorize. yes, just posted a patch to fix the obvious such cases from pnmdec this also fixes this tree vectorize segfault thx [...]
diff --git a/configure b/configure index bdfd731602..cf4ea6ea48 100755 --- a/configure +++ b/configure @@ -6908,7 +6908,6 @@ if enabled icc; then disable aligned_stack fi elif enabled gcc; then - check_optflags -fno-tree-vectorize check_cflags -Werror=format-security check_cflags -Werror=implicit-function-declaration check_cflags -Werror=missing-prototypes