Message ID | CABA=pqd1DysBzbvmjusNXg4NF7GQ_oeDAt50ci4DpX70rGE4kA@mail.gmail.com |
---|---|
State | Superseded |
Headers | show |
On Fri, Jun 09, 2017 at 03:07:55PM +0300, Ivan Kalvachev wrote: > On 6/9/17, Ivan Kalvachev <ikalvachev@gmail.com> wrote: > > On 6/9/17, Michael Niedermayer <michael@niedermayer.cc> wrote: > >> On Fri, Jun 09, 2017 at 01:36:07AM +0300, Ivan Kalvachev wrote: > >>> opus_pvq.c | 9 > >>> opus_pvq.h | 5 > >>> x86/Makefile | 1 > >>> x86/opus_dsp_init.c | 47 +++ > >>> x86/opus_pvq_search.asm | 597 > >>> ++++++++++++++++++++++++++++++++++++++++++++++++ > >>> 5 files changed, 657 insertions(+), 2 deletions(-) > >>> 3b9648bea3f01dad2cf159382f0ffc2d992c84b2 > >>> 0001-SIMD-opus-pvq_search-implementation.patch > >>> From 06dc798c302e90aa5b45bec5d8fbcd64ba4af076 Mon Sep 17 00:00:00 2001 > >>> From: Ivan Kalvachev <ikalvachev@gmail.com> > >>> Date: Thu, 8 Jun 2017 22:24:33 +0300 > >>> Subject: [PATCH 1/3] SIMD opus pvq_search implementation. > >> > >> seems this breaks build with mingw64, didnt investigate but it > >> fails with these errors: > >> > >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): > >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd): > >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1): > >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48): > >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): > >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd): > >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1): > >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48): > >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' > >> collect2: error: ld returned 1 exit status > >> collect2: error: ld returned 1 exit status > >> make: *** [ffmpeg_g.exe] Error 1 > >> make: *** Waiting for unfinished jobs.... > >> make: *** [ffprobe_g.exe] Error 1 > > > > > > const_*_edge is used on only one place is the code. > > Would you check if this patch fixes the issue. > > > Sorry, the patch was not tested and the variable name was not correct. > This one should be fine... I hope > > --- a/libavcodec/x86/opus_pvq_search.asm > +++ b/libavcodec/x86/opus_pvq_search.asm > @@ -419,7 +419,7 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N > add Nq, r4q ; Nq = align(Nq, mmsize) > sub rsp, Nq ; allocate tmpX[Nq] > > - movups m3, [const_align_abs_edge-mmsize+r4q] ; this is > the bit mask for the padded read at the end of the input > + movups m3, [const_float_abs_mask+32-mmsize+r4q] ; this > is the bit mask for the padded read at the end of the input > > lea r4q, [Nq-mmsize] ; Nq is rounded up (aligned > up) to mmsize, so r4q can't become negative here, unless N=0. > movups m2, [inXq + r4q] doesnt help libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask' libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask' libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask' libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask' libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask' libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask' libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask' libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask' collect2: error: ld returned 1 exit status collect2: error: ld returned 1 exit status make: *** [ffmpeg_g.exe] Error 1 make: *** Waiting for unfinished jobs.... make: *** [ffprobe_g.exe] Error 1 maybe your distribution has a cross compiler for mingw ? would make it much easier for you to test [...]
--- a/libavcodec/x86/opus_pvq_search.asm +++ b/libavcodec/x86/opus_pvq_search.asm @@ -419,7 +419,7 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N add Nq, r4q ; Nq = align(Nq, mmsize) sub rsp, Nq ; allocate tmpX[Nq] - movups m3, [const_align_abs_edge-mmsize+r4q] ; this is the bit mask for the padded read at the end of the input + movups m3, [const_float_abs_mask+32-mmsize+r4q] ; this is the bit mask for the padded read at the end of the input