diff mbox

[FFmpeg-devel,WIP] Opus Piramid Vector Quantization Search in x86 SIMD asm

Message ID CABA=pqd1DysBzbvmjusNXg4NF7GQ_oeDAt50ci4DpX70rGE4kA@mail.gmail.com
State Superseded
Headers show

Commit Message

Ivan Kalvachev June 9, 2017, 12:07 p.m. UTC
On 6/9/17, Ivan Kalvachev <ikalvachev@gmail.com> wrote:
> On 6/9/17, Michael Niedermayer <michael@niedermayer.cc> wrote:
>> On Fri, Jun 09, 2017 at 01:36:07AM +0300, Ivan Kalvachev wrote:
>>>  opus_pvq.c              |    9
>>>  opus_pvq.h              |    5
>>>  x86/Makefile            |    1
>>>  x86/opus_dsp_init.c     |   47 +++
>>>  x86/opus_pvq_search.asm |  597
>>> ++++++++++++++++++++++++++++++++++++++++++++++++
>>>  5 files changed, 657 insertions(+), 2 deletions(-)
>>> 3b9648bea3f01dad2cf159382f0ffc2d992c84b2
>>> 0001-SIMD-opus-pvq_search-implementation.patch
>>> From 06dc798c302e90aa5b45bec5d8fbcd64ba4af076 Mon Sep 17 00:00:00 2001
>>> From: Ivan Kalvachev <ikalvachev@gmail.com>
>>> Date: Thu, 8 Jun 2017 22:24:33 +0300
>>> Subject: [PATCH 1/3] SIMD opus pvq_search implementation.
>>
>> seems this breaks build with mingw64, didnt investigate but it
>> fails with these errors:
>>
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> collect2: error: ld returned 1 exit status
>> collect2: error: ld returned 1 exit status
>> make: *** [ffmpeg_g.exe] Error 1
>> make: *** Waiting for unfinished jobs....
>> make: *** [ffprobe_g.exe] Error 1
>
>
> const_*_edge is used on only one place is the code.
> Would you check if this patch fixes the issue.
>
Sorry, the patch was not tested and the variable name was not correct.
This one should be fine... I hope

         lea         r4q,  [Nq-mmsize]   ; Nq is rounded up (aligned
up) to mmsize, so r4q can't become negative here, unless N=0.
         movups      m2,   [inXq + r4q]

Comments

Michael Niedermayer June 10, 2017, 1:27 a.m. UTC | #1
On Fri, Jun 09, 2017 at 03:07:55PM +0300, Ivan Kalvachev wrote:
> On 6/9/17, Ivan Kalvachev <ikalvachev@gmail.com> wrote:
> > On 6/9/17, Michael Niedermayer <michael@niedermayer.cc> wrote:
> >> On Fri, Jun 09, 2017 at 01:36:07AM +0300, Ivan Kalvachev wrote:
> >>>  opus_pvq.c              |    9
> >>>  opus_pvq.h              |    5
> >>>  x86/Makefile            |    1
> >>>  x86/opus_dsp_init.c     |   47 +++
> >>>  x86/opus_pvq_search.asm |  597
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>  5 files changed, 657 insertions(+), 2 deletions(-)
> >>> 3b9648bea3f01dad2cf159382f0ffc2d992c84b2
> >>> 0001-SIMD-opus-pvq_search-implementation.patch
> >>> From 06dc798c302e90aa5b45bec5d8fbcd64ba4af076 Mon Sep 17 00:00:00 2001
> >>> From: Ivan Kalvachev <ikalvachev@gmail.com>
> >>> Date: Thu, 8 Jun 2017 22:24:33 +0300
> >>> Subject: [PATCH 1/3] SIMD opus pvq_search implementation.
> >>
> >> seems this breaks build with mingw64, didnt investigate but it
> >> fails with these errors:
> >>
> >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
> >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd):
> >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1):
> >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48):
> >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
> >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd):
> >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1):
> >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> >> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48):
> >> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> >> collect2: error: ld returned 1 exit status
> >> collect2: error: ld returned 1 exit status
> >> make: *** [ffmpeg_g.exe] Error 1
> >> make: *** Waiting for unfinished jobs....
> >> make: *** [ffprobe_g.exe] Error 1
> >
> >
> > const_*_edge is used on only one place is the code.
> > Would you check if this patch fixes the issue.
> >
> Sorry, the patch was not tested and the variable name was not correct.
> This one should be fine... I hope
> 
> --- a/libavcodec/x86/opus_pvq_search.asm
> +++ b/libavcodec/x86/opus_pvq_search.asm
> @@ -419,7 +419,7 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N
>          add         Nq,   r4q           ; Nq = align(Nq, mmsize)
>          sub         rsp,  Nq            ; allocate tmpX[Nq]
> 
> -        movups      m3,   [const_align_abs_edge-mmsize+r4q] ; this is
> the bit mask for the padded read at the end of the input
> +        movups      m3,   [const_float_abs_mask+32-mmsize+r4q] ; this
> is the bit mask for the padded read at the end of the input
> 
>          lea         r4q,  [Nq-mmsize]   ; Nq is rounded up (aligned
> up) to mmsize, so r4q can't become negative here, unless N=0.
>          movups      m2,   [inXq + r4q]

doesnt help

libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask'
libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask'
libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask'
libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask'
libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask'
libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask'
libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask'
libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48): relocation truncated to fit: R_X86_64_32 against `const_float_abs_mask'
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
make: *** [ffmpeg_g.exe] Error 1
make: *** Waiting for unfinished jobs....
make: *** [ffprobe_g.exe] Error 1

maybe your distribution has a cross compiler for mingw ? would
make it much easier for you to test

[...]
diff mbox

Patch

--- a/libavcodec/x86/opus_pvq_search.asm
+++ b/libavcodec/x86/opus_pvq_search.asm
@@ -419,7 +419,7 @@  cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N
         add         Nq,   r4q           ; Nq = align(Nq, mmsize)
         sub         rsp,  Nq            ; allocate tmpX[Nq]

-        movups      m3,   [const_align_abs_edge-mmsize+r4q] ; this is
the bit mask for the padded read at the end of the input
+        movups      m3,   [const_float_abs_mask+32-mmsize+r4q] ; this
is the bit mask for the padded read at the end of the input