[FFmpeg-devel,6/8] x86inc: reduce difference to x264 upstream

Submitted by James Darnley on Oct. 30, 2017, 1:08 p.m.

Details

Message ID 20171030130835.28327-7-jdarnley@obe.tv
State New
Headers show

Commit Message

James Darnley Oct. 30, 2017, 1:08 p.m.
These changes were commited to x264 in b568a256 "Experimental nasm
support"
---
 libavutil/x86/x86inc.asm | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Michael Niedermayer Oct. 31, 2017, 3:30 a.m.
On Mon, Oct 30, 2017 at 02:08:33PM +0100, James Darnley wrote:
> These changes were commited to x264 in b568a256 "Experimental nasm
> support"
> ---
>  libavutil/x86/x86inc.asm | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)

breaks build:

libavcodec/x86/rv34dsp.asm:211: error: parser: instruction expected
libavcodec/x86/rv34dsp.asm:211: error: label or instruction expected at start of line
libavcodec/x86/rv34dsp.asm:213: error: parser: instruction expected
libavcodec/x86/rv34dsp.asm:213: error: label or instruction expected at start of line
make: *** [libavcodec/x86/rv34dsp.o] Error 1
make: *** Waiting for unfinished jobs....

NASM version 2.10.09 compiled on Dec 29 2013


[...]
James Darnley Nov. 6, 2017, 7:56 p.m.
On 2017-10-31 04:30, Michael Niedermayer wrote:
> On Mon, Oct 30, 2017 at 02:08:33PM +0100, James Darnley wrote:
>> These changes were commited to x264 in b568a256 "Experimental nasm
>> support"
>> ---
>>  libavutil/x86/x86inc.asm | 16 ++++++++++++++--
>>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> breaks build:
> 
> libavcodec/x86/rv34dsp.asm:211: error: parser: instruction expected
> libavcodec/x86/rv34dsp.asm:211: error: label or instruction expected at start of line
> libavcodec/x86/rv34dsp.asm:213: error: parser: instruction expected
> libavcodec/x86/rv34dsp.asm:213: error: label or instruction expected at start of line
> make: *** [libavcodec/x86/rv34dsp.o] Error 1
> make: *** Waiting for unfinished jobs....
> 
> NASM version 2.10.09 compiled on Dec 29 2013

This is very weird.  I have reproduced it on 2.13.01 on my server but I
couldn't on my laptop.  After a `make distclean` I can.  I think that is
the bug with nasm not listing includes as dependencies so I wasn't
rebuilding those objects.

> libavcodec/x86/rv34dsp.asm:211: error: parser: instruction expected
> libavcodec/x86/rv34dsp.asm:173: ... from macro `RV34_IDCT_DC_ADD' defined here
> libavutil/x86/x86inc.asm:707: ... from macro `cglobal' defined here
> libavutil/x86/x86inc.asm:733: ... from macro `cglobal_internal' defined here
> libavcodec/x86/rv34dsp.asm:211: error: label or instruction expected at start of line
> libavcodec/x86/rv34dsp.asm:173: ... from macro `RV34_IDCT_DC_ADD' defined here
> libavutil/x86/x86inc.asm:707: ... from macro `cglobal' defined here
> libavutil/x86/x86inc.asm:733: ... from macro `cglobal_internal' defined here
> libavcodec/x86/rv34dsp.asm:213: error: parser: instruction expected
> libavcodec/x86/rv34dsp.asm:173: ... from macro `RV34_IDCT_DC_ADD' defined here
> libavutil/x86/x86inc.asm:707: ... from macro `cglobal' defined here
> libavutil/x86/x86inc.asm:733: ... from macro `cglobal_internal' defined here
> libavcodec/x86/rv34dsp.asm:213: error: label or instruction expected at start of line
> libavcodec/x86/rv34dsp.asm:173: ... from macro `RV34_IDCT_DC_ADD' defined here
> libavutil/x86/x86inc.asm:707: ... from macro `cglobal' defined here
> libavutil/x86/x86inc.asm:733: ... from macro `cglobal_internal' defined here

Line 733 is the align command to align the start of the function.  I
can't see why it fails here but not on any other function in that file
or any other file.

After some cutting and pasting there is something about the mmxext
function on line 159.  Cutting it out with the following diff, and
building it will succeed.

> diff --git a/libavcodec/x86/rv34dsp.asm b/libavcodec/x86/rv34dsp.asm
> index 692b4acfcd..3448f82e75 100644
> --- a/libavcodec/x86/rv34dsp.asm
> +++ b/libavcodec/x86/rv34dsp.asm
> @@ -156,17 +156,6 @@ cglobal rv34_idct_dc_add, 3, 3
>      packuswb     %2, %2
>      movd         %1, %2
>  %endmacro
> -INIT_MMX mmxext
> -cglobal rv34_idct_add, 3,3,0, d, s, b
> -    ROW_TRANSFORM       bq
> -    COL_TRANSFORM     [dq], mm0, [pw_col_coeffs+ 0], [pw_col_coeffs+ 8]
> -    mova               mm0, [pw_col_coeffs+ 0]
> -    COL_TRANSFORM  [dq+sq], mm4, mm0, [pw_col_coeffs+ 8]
> -    mova               mm4, [pw_col_coeffs+ 8]
> -    lea                 dq, [dq + 2*sq]
> -    COL_TRANSFORM     [dq], mm6, mm0, mm4
> -    COL_TRANSFORM  [dq+sq], mm7, mm0, mm4
> -    ret
> 
>  ; ff_rv34_idct_dc_add_sse4(uint8_t *dst, int stride, int dc);
>  %macro RV34_IDCT_DC_ADD 0

I honestly have no idea why that causes a problem for the following
function.  I thought it might be the lower-case ret rather than the
upper-case RET macro but changing it doesn't change the error.
James Almer Nov. 6, 2017, 8:15 p.m.
On 11/6/2017 4:56 PM, James Darnley wrote:
> On 2017-10-31 04:30, Michael Niedermayer wrote:
>> On Mon, Oct 30, 2017 at 02:08:33PM +0100, James Darnley wrote:
>>> These changes were commited to x264 in b568a256 "Experimental nasm
>>> support"
>>> ---
>>>  libavutil/x86/x86inc.asm | 16 ++++++++++++++--
>>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> breaks build:
>>
>> libavcodec/x86/rv34dsp.asm:211: error: parser: instruction expected
>> libavcodec/x86/rv34dsp.asm:211: error: label or instruction expected at start of line
>> libavcodec/x86/rv34dsp.asm:213: error: parser: instruction expected
>> libavcodec/x86/rv34dsp.asm:213: error: label or instruction expected at start of line
>> make: *** [libavcodec/x86/rv34dsp.o] Error 1
>> make: *** Waiting for unfinished jobs....
>>
>> NASM version 2.10.09 compiled on Dec 29 2013
> 
> This is very weird.  I have reproduced it on 2.13.01 on my server but I
> couldn't on my laptop.  After a `make distclean` I can.  I think that is
> the bug with nasm not listing includes as dependencies so I wasn't
> rebuilding those objects.
> 
>> libavcodec/x86/rv34dsp.asm:211: error: parser: instruction expected
>> libavcodec/x86/rv34dsp.asm:173: ... from macro `RV34_IDCT_DC_ADD' defined here
>> libavutil/x86/x86inc.asm:707: ... from macro `cglobal' defined here
>> libavutil/x86/x86inc.asm:733: ... from macro `cglobal_internal' defined here
>> libavcodec/x86/rv34dsp.asm:211: error: label or instruction expected at start of line
>> libavcodec/x86/rv34dsp.asm:173: ... from macro `RV34_IDCT_DC_ADD' defined here
>> libavutil/x86/x86inc.asm:707: ... from macro `cglobal' defined here
>> libavutil/x86/x86inc.asm:733: ... from macro `cglobal_internal' defined here
>> libavcodec/x86/rv34dsp.asm:213: error: parser: instruction expected
>> libavcodec/x86/rv34dsp.asm:173: ... from macro `RV34_IDCT_DC_ADD' defined here
>> libavutil/x86/x86inc.asm:707: ... from macro `cglobal' defined here
>> libavutil/x86/x86inc.asm:733: ... from macro `cglobal_internal' defined here
>> libavcodec/x86/rv34dsp.asm:213: error: label or instruction expected at start of line
>> libavcodec/x86/rv34dsp.asm:173: ... from macro `RV34_IDCT_DC_ADD' defined here
>> libavutil/x86/x86inc.asm:707: ... from macro `cglobal' defined here
>> libavutil/x86/x86inc.asm:733: ... from macro `cglobal_internal' defined here
> 
> Line 733 is the align command to align the start of the function.  I
> can't see why it fails here but not on any other function in that file
> or any other file.
> 
> After some cutting and pasting there is something about the mmxext
> function on line 159.  Cutting it out with the following diff, and
> building it will succeed.
> 
>> diff --git a/libavcodec/x86/rv34dsp.asm b/libavcodec/x86/rv34dsp.asm
>> index 692b4acfcd..3448f82e75 100644
>> --- a/libavcodec/x86/rv34dsp.asm
>> +++ b/libavcodec/x86/rv34dsp.asm
>> @@ -156,17 +156,6 @@ cglobal rv34_idct_dc_add, 3, 3
>>      packuswb     %2, %2
>>      movd         %1, %2
>>  %endmacro
>> -INIT_MMX mmxext
>> -cglobal rv34_idct_add, 3,3,0, d, s, b
>> -    ROW_TRANSFORM       bq
>> -    COL_TRANSFORM     [dq], mm0, [pw_col_coeffs+ 0], [pw_col_coeffs+ 8]
>> -    mova               mm0, [pw_col_coeffs+ 0]
>> -    COL_TRANSFORM  [dq+sq], mm4, mm0, [pw_col_coeffs+ 8]
>> -    mova               mm4, [pw_col_coeffs+ 8]
>> -    lea                 dq, [dq + 2*sq]
>> -    COL_TRANSFORM     [dq], mm6, mm0, mm4
>> -    COL_TRANSFORM  [dq+sq], mm7, mm0, mm4
>> -    ret
>>
>>  ; ff_rv34_idct_dc_add_sse4(uint8_t *dst, int stride, int dc);
>>  %macro RV34_IDCT_DC_ADD 0
> 
> I honestly have no idea why that causes a problem for the following
> function.  I thought it might be the lower-case ret rather than the
> upper-case RET macro but changing it doesn't change the error.

Try compiling with

make libavcodec/x86/rv34dsp.dbg.asm DBG=1
make libavcodec/x86/rv34dsp.dbg.o DBG=1

To bypass all the macro obfuscation and get errors pointing to the exact
problematic line in the preprocessed .dbg.asm file
James Darnley Nov. 6, 2017, 8:28 p.m.
On 2017-11-06 21:15, James Almer wrote:
> On 11/6/2017 4:56 PM, James Darnley wrote:
>> Line 733 is the align command to align the start of the function.  I
>> can't see why it fails here but not on any other function in that file
>> or any other file.
>>
>> After some cutting and pasting there is something about the mmxext
>> function on line 159.  Cutting it out with the following diff, and
>> building it will succeed.
>>
>>> ...
>>
>> I honestly have no idea why that causes a problem for the following
>> function.  I thought it might be the lower-case ret rather than the
>> upper-case RET macro but changing it doesn't change the error.
> 
> Try compiling with
> 
> make libavcodec/x86/rv34dsp.dbg.asm DBG=1
> make libavcodec/x86/rv34dsp.dbg.o DBG=1
> 
> To bypass all the macro obfuscation and get errors pointing to the exact
> problematic line in the preprocessed .dbg.asm file

WTF?!  Both those lines run without error.
James Almer Nov. 6, 2017, 8:40 p.m.
On 11/6/2017 5:28 PM, James Darnley wrote:
> On 2017-11-06 21:15, James Almer wrote:
>> On 11/6/2017 4:56 PM, James Darnley wrote:
>>> Line 733 is the align command to align the start of the function.  I
>>> can't see why it fails here but not on any other function in that file
>>> or any other file.
>>>
>>> After some cutting and pasting there is something about the mmxext
>>> function on line 159.  Cutting it out with the following diff, and
>>> building it will succeed.
>>>
>>>> ...
>>>
>>> I honestly have no idea why that causes a problem for the following
>>> function.  I thought it might be the lower-case ret rather than the
>>> upper-case RET macro but changing it doesn't change the error.
>>
>> Try compiling with
>>
>> make libavcodec/x86/rv34dsp.dbg.asm DBG=1
>> make libavcodec/x86/rv34dsp.dbg.o DBG=1
>>
>> To bypass all the macro obfuscation and get errors pointing to the exact
>> problematic line in the preprocessed .dbg.asm file
> 
> WTF?!  Both those lines run without error.

The first one is supposed to. It's just pre processes the asm file and
creates a new one. The second command should fail, though, assuming
assembling the normal libavcodec/x86/rv34dsp.o object also fails.

Patch hide | download patch | download mbox

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index f3dd2b788a..10e4603a65 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -113,6 +113,10 @@ 
     %endif
 %endmacro
 
+%ifdef __NASM_VER__
+    %use smartalign
+%endif
+
 ; Macros to eliminate most code duplication between x86_32 and x86_64:
 ; Currently this works only for leaf functions which load all their arguments
 ; into registers at the start, and make no other use of the stack. Luckily that
@@ -857,9 +861,17 @@  BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae,
     %endif
 
     %if ARCH_X86_64 || cpuflag(sse2)
-        CPUNOP amdnop
+        %ifdef __NASM_VER__
+            ALIGNMODE p6
+        %else
+            CPU amdnop
+        %endif
     %else
-        CPUNOP basicnop
+        %ifdef __NASM_VER__
+            ALIGNMODE nop
+        %else
+            CPU basicnop
+        %endif
     %endif
 %endmacro