Message ID | CABA=pqdTJMwEynAn6onJ599M-b101NNWgv1x9rNLU8HYtgfN9A@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Sun, Jun 11, 2017 at 11:34 AM, Ivan Kalvachev <ikalvachev@gmail.com> wrote: > On 6/10/17, James Darnley <james.darnley@gmail.com> wrote: >> On 2017-06-09 13:41, Ivan Kalvachev wrote: >>> On 6/9/17, Michael Niedermayer <michael@niedermayer.cc> wrote: >>>> seems this breaks build with mingw64, didnt investigate but it >>>> fails with these errors: >>>> >>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): >>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd): >>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1): >>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48): >>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): >>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd): >>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1): >>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48): >>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>> collect2: error: ld returned 1 exit status >>>> collect2: error: ld returned 1 exit status >>>> make: *** [ffmpeg_g.exe] Error 1 >>>> make: *** Waiting for unfinished jobs.... >>>> make: *** [ffprobe_g.exe] Error 1 >>> >>> >>> const_*_edge is used on only one place is the code. >>> Would you check if this patch fixes the issue. >>> >>> I expected that the addresses would be pre-calculated >>> by n/yasm as one value and indexed >>> relative to the section start. >>> Instead it seems that each entry is represented with >>> its own address and offset from it. >>> Since the offset is negative it uses all 64 bits and >>> it makes difference if it is truncated to 32 bits. >>> >>> Same issue could happen with clang tools. >> >> The problem is with the relative addressing. You need to load the real >> address first before you can offset with another register at runtime. So >> something like: >> >>> mov reg1, [read_only_const] > lea ? >>> mova mmreg, [reg1 + reg2] > > . > OK, Getting mingw for my distro is problem > and compiling one myself would take a bit more effort/time. > So I'm posting a patch that "should" work. > ====== > --- a/libavcodec/x86/opus_pvq_search.asm > +++ b/libavcodec/x86/opus_pvq_search.asm > @@ -406,7 +406,7 @@ align 16 > ; uint32 N - Number of vector elements. Must be 0 < N < 8192 > ; > %macro PVQ_FAST_SEARCH 0 > -cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N > +cglobal pvq_search,4,6,8, mmsize, inX, outY, K, N > %define tmpX rsp > > ; movsxdifnidn Nq, Nd > @@ -419,7 +419,12 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N > add Nq, r4q ; Nq = align(Nq, mmsize) > sub rsp, Nq ; allocate tmpX[Nq] > > +%ifdef PIC > + lea r5q, [const_align_abs_edge] ; rip+const > + movups m3, [r5q+r4q-mmsize] ; this is > the bit mask for the padded read at the end of the input > +%else > movups m3, [const_align_abs_edge-mmsize+r4q] ; this is > the bit mask for the padded read at the end of the input > +%endif > > lea r4q, [Nq-mmsize] ; Nq is rounded up (aligned > up) to mmsize, so r4q can't become negative here, unless N=0. > movups m2, [inXq + r4q] > ====== > > What I find surprising is that PIC is enabled only on Windows and does not seem > to depend on CONFIG_PIC, so textrels are used all over assembly code. > Do I miss something? > Win32 code is always PIC, independent of what ffmpeg configure thinks it should be. :) Using the PIC define seems to be the appropriate thing, thats what the other asm code does. - Hendrik
On 6/11/17, Hendrik Leppkes <h.leppkes@gmail.com> wrote: > On Sun, Jun 11, 2017 at 11:34 AM, Ivan Kalvachev <ikalvachev@gmail.com> > wrote: >> On 6/10/17, James Darnley <james.darnley@gmail.com> wrote: >>> On 2017-06-09 13:41, Ivan Kalvachev wrote: >>>> On 6/9/17, Michael Niedermayer <michael@niedermayer.cc> wrote: >>>>> seems this breaks build with mingw64, didnt investigate but it >>>>> fails with these errors: >>>>> >>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): >>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd): >>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1): >>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48): >>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d): >>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd): >>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1): >>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48): >>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge' >>>>> collect2: error: ld returned 1 exit status >>>>> collect2: error: ld returned 1 exit status >>>>> make: *** [ffmpeg_g.exe] Error 1 >>>>> make: *** Waiting for unfinished jobs.... >>>>> make: *** [ffprobe_g.exe] Error 1 >>>> >>>> >>>> const_*_edge is used on only one place is the code. >>>> Would you check if this patch fixes the issue. >>>> >>>> I expected that the addresses would be pre-calculated >>>> by n/yasm as one value and indexed >>>> relative to the section start. >>>> Instead it seems that each entry is represented with >>>> its own address and offset from it. >>>> Since the offset is negative it uses all 64 bits and >>>> it makes difference if it is truncated to 32 bits. >>>> >>>> Same issue could happen with clang tools. >>> >>> The problem is with the relative addressing. You need to load the real >>> address first before you can offset with another register at runtime. So >>> something like: >>> >>>> mov reg1, [read_only_const] >> lea ? >>>> mova mmreg, [reg1 + reg2] >> >> . >> OK, Getting mingw for my distro is problem >> and compiling one myself would take a bit more effort/time. >> So I'm posting a patch that "should" work. >> ====== >> --- a/libavcodec/x86/opus_pvq_search.asm >> +++ b/libavcodec/x86/opus_pvq_search.asm >> @@ -406,7 +406,7 @@ align 16 >> ; uint32 N - Number of vector elements. Must be 0 < N < 8192 >> ; >> %macro PVQ_FAST_SEARCH 0 >> -cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N >> +cglobal pvq_search,4,6,8, mmsize, inX, outY, K, N >> %define tmpX rsp >> >> ; movsxdifnidn Nq, Nd >> @@ -419,7 +419,12 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N >> add Nq, r4q ; Nq = align(Nq, mmsize) >> sub rsp, Nq ; allocate tmpX[Nq] >> >> +%ifdef PIC >> + lea r5q, [const_align_abs_edge] ; rip+const >> + movups m3, [r5q+r4q-mmsize] ; this is >> the bit mask for the padded read at the end of the input >> +%else >> movups m3, [const_align_abs_edge-mmsize+r4q] ; this is >> the bit mask for the padded read at the end of the input >> +%endif >> >> lea r4q, [Nq-mmsize] ; Nq is rounded up (aligned >> up) to mmsize, so r4q can't become negative here, unless N=0. >> movups m2, [inXq + r4q] >> ====== >> >> What I find surprising is that PIC is enabled only on Windows and does not >> seem >> to depend on CONFIG_PIC, so textrels are used all over assembly code. >> Do I miss something? >> > > Win32 code is always PIC, independent of what ffmpeg configure thinks > it should be. :) > Using the PIC define seems to be the appropriate thing, thats what the > other asm code does. > Yes, but my query is more about why CONFIG_PIC does not enable asm PIC too? After all gcc -fpic is supposed to generate code without texrel, isn't it?
On 2017-06-11 11:34, Ivan Kalvachev wrote: > On 6/10/17, James Darnley <james.darnley@gmail.com> wrote: >> On 2017-06-09 13:41, Ivan Kalvachev wrote: >>> >>> const_*_edge is used on only one place is the code. >>> Would you check if this patch fixes the issue. >>> >>> I expected that the addresses would be pre-calculated >>> by n/yasm as one value and indexed >>> relative to the section start. >>> Instead it seems that each entry is represented with >>> its own address and offset from it. >>> Since the offset is negative it uses all 64 bits and >>> it makes difference if it is truncated to 32 bits. >>> >>> Same issue could happen with clang tools. >> >> The problem is with the relative addressing. You need to load the real >> address first before you can offset with another register at runtime. So >> something like: >> >>> mov reg1, [read_only_const] > lea ? Yes, sorry about that. > ====== > --- a/libavcodec/x86/opus_pvq_search.asm > +++ b/libavcodec/x86/opus_pvq_search.asm > @@ -406,7 +406,7 @@ align 16 > ; uint32 N - Number of vector elements. Must be 0 < N < 8192 > ; > %macro PVQ_FAST_SEARCH 0 > -cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N > +cglobal pvq_search,4,6,8, mmsize, inX, outY, K, N > %define tmpX rsp > > ; movsxdifnidn Nq, Nd > @@ -419,7 +419,12 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N > add Nq, r4q ; Nq = align(Nq, mmsize) > sub rsp, Nq ; allocate tmpX[Nq] > > +%ifdef PIC > + lea r5q, [const_align_abs_edge] ; rip+const > + movups m3, [r5q+r4q-mmsize] ; this is the bit mask for the padded read at the end of the input > +%else > movups m3, [const_align_abs_edge-mmsize+r4q] ; this is the bit mask for the padded read at the end of the input > +%endif > > lea r4q, [Nq-mmsize] ; Nq is rounded up (aligned up) to mmsize, so r4q can't become negative here, unless N=0. > movups m2, [inXq + r4q] > ====== FYI that diff works just fine, when I correct the line wrapping. I can compile and run on Win64. This is not a comment about the original patch.
====== --- a/libavcodec/x86/opus_pvq_search.asm +++ b/libavcodec/x86/opus_pvq_search.asm @@ -406,7 +406,7 @@ align 16 ; uint32 N - Number of vector elements. Must be 0 < N < 8192 ; %macro PVQ_FAST_SEARCH 0 -cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N +cglobal pvq_search,4,6,8, mmsize, inX, outY, K, N %define tmpX rsp ; movsxdifnidn Nq, Nd @@ -419,7 +419,12 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N add Nq, r4q ; Nq = align(Nq, mmsize) sub rsp, Nq ; allocate tmpX[Nq] +%ifdef PIC + lea r5q, [const_align_abs_edge] ; rip+const + movups m3, [r5q+r4q-mmsize] ; this is the bit mask for the padded read at the end of the input +%else movups m3, [const_align_abs_edge-mmsize+r4q] ; this is the bit mask for the padded read at the end of the input +%endif