[FFmpeg-devel,4/5] x86/opusdsp: implement FMA3 accelerated postfilter and deemphasis

Submitted by Lynne on April 1, 2019, 12:13 p.m.

Details

Message ID LbNoKBb--B-1@lynne.ee
State New
Headers show

Commit Message

Lynne April 1, 2019, 12:13 p.m.
Mar 31, 2019, 11:49 PM by jamrial@gmail.com:

> A float ret value needs to be in xmm0, and you swapped m0 with m2 on
> Win64. This is the source of the fate failure.
>

Attached a patch to fix this.

> %if WIN64
> -    SWAP 0, 2
> -%endif
> +    shufps m0, m2, 0
> +%else
>      shufps m0, m0, 0
> +%endif
> %endif

Comments

James Almer April 1, 2019, 2:59 p.m.
On 4/1/2019 9:13 AM, Lynne wrote:
> Mar 31, 2019, 11:49 PM by jamrial@gmail.com:
> 
>> A float ret value needs to be in xmm0, and you swapped m0 with m2 on
>> Win64. This is the source of the fate failure.
>>
> Attached a patch to fix this.
> 
>> %if WIN64
>> -    SWAP 0, 2
>> -%endif
>> +    shufps m0, m2, 0
>> +%else
>>       shufps m0, m0, 0
>> +%endif
>> %endif
> 
> 0001-x86-opusdsp-fix-WIN64-return-value.patch
> 
> From 98e93b6f322a2a9dee7499fe87b74cf50a33b022 Mon Sep 17 00:00:00 2001
> From: Lynne <dev@lynne.ee>
> Date: Mon, 1 Apr 2019 13:06:34 +0100
> Subject: [PATCH] x86/opusdsp: fix WIN64 return value
> 
> ---
>  libavcodec/x86/opusdsp.asm | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/libavcodec/x86/opusdsp.asm b/libavcodec/x86/opusdsp.asm
> index ed65614e06..a7bff4f0b3 100644
> --- a/libavcodec/x86/opusdsp.asm
> +++ b/libavcodec/x86/opusdsp.asm
> @@ -40,9 +40,10 @@ cglobal opus_deemphasis, 4, 4, 8, out, in, coeff, len
>      VBROADCASTSS m0, coeffm
>  %else
>  %if WIN64
> -    SWAP 0, 2
> -%endif
> +    shufps m0, m2, 0

No, like this shufps will interleave the values from m2 and m0. The
correct instruction would be

shufps m0, m2, m2, 0

Where m2 is used for both input arguments, and the VEX encoding lets us
use another reg as output.
Tested and pushed it with that change. Thanks.
Hendrik Leppkes April 2, 2019, 12:16 p.m.
On Mon, Apr 1, 2019 at 4:59 PM James Almer <jamrial@gmail.com> wrote:
>
> On 4/1/2019 9:13 AM, Lynne wrote:
> > Mar 31, 2019, 11:49 PM by jamrial@gmail.com:
> >
> >> A float ret value needs to be in xmm0, and you swapped m0 with m2 on
> >> Win64. This is the source of the fate failure.
> >>
> > Attached a patch to fix this.
> >
> >> %if WIN64
> >> -    SWAP 0, 2
> >> -%endif
> >> +    shufps m0, m2, 0
> >> +%else
> >>       shufps m0, m0, 0
> >> +%endif
> >> %endif
> >
> > 0001-x86-opusdsp-fix-WIN64-return-value.patch
> >
> > From 98e93b6f322a2a9dee7499fe87b74cf50a33b022 Mon Sep 17 00:00:00 2001
> > From: Lynne <dev@lynne.ee>
> > Date: Mon, 1 Apr 2019 13:06:34 +0100
> > Subject: [PATCH] x86/opusdsp: fix WIN64 return value
> >
> > ---
> >  libavcodec/x86/opusdsp.asm | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/libavcodec/x86/opusdsp.asm b/libavcodec/x86/opusdsp.asm
> > index ed65614e06..a7bff4f0b3 100644
> > --- a/libavcodec/x86/opusdsp.asm
> > +++ b/libavcodec/x86/opusdsp.asm
> > @@ -40,9 +40,10 @@ cglobal opus_deemphasis, 4, 4, 8, out, in, coeff, len
> >      VBROADCASTSS m0, coeffm
> >  %else
> >  %if WIN64
> > -    SWAP 0, 2
> > -%endif
> > +    shufps m0, m2, 0
>
> No, like this shufps will interleave the values from m2 and m0. The
> correct instruction would be
>
> shufps m0, m2, m2, 0
>
> Where m2 is used for both input arguments, and the VEX encoding lets us
> use another reg as output.
> Tested and pushed it with that change. Thanks.

Opus tests still fail on win32, both mingw and msvc.

- Hendrik
James Almer April 2, 2019, 2:39 p.m.
On 4/2/2019 9:16 AM, Hendrik Leppkes wrote:
> On Mon, Apr 1, 2019 at 4:59 PM James Almer <jamrial@gmail.com> wrote:
>>
>> On 4/1/2019 9:13 AM, Lynne wrote:
>>> Mar 31, 2019, 11:49 PM by jamrial@gmail.com:
>>>
>>>> A float ret value needs to be in xmm0, and you swapped m0 with m2 on
>>>> Win64. This is the source of the fate failure.
>>>>
>>> Attached a patch to fix this.
>>>
>>>> %if WIN64
>>>> -    SWAP 0, 2
>>>> -%endif
>>>> +    shufps m0, m2, 0
>>>> +%else
>>>>       shufps m0, m0, 0
>>>> +%endif
>>>> %endif
>>>
>>> 0001-x86-opusdsp-fix-WIN64-return-value.patch
>>>
>>> From 98e93b6f322a2a9dee7499fe87b74cf50a33b022 Mon Sep 17 00:00:00 2001
>>> From: Lynne <dev@lynne.ee>
>>> Date: Mon, 1 Apr 2019 13:06:34 +0100
>>> Subject: [PATCH] x86/opusdsp: fix WIN64 return value
>>>
>>> ---
>>>  libavcodec/x86/opusdsp.asm | 5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/libavcodec/x86/opusdsp.asm b/libavcodec/x86/opusdsp.asm
>>> index ed65614e06..a7bff4f0b3 100644
>>> --- a/libavcodec/x86/opusdsp.asm
>>> +++ b/libavcodec/x86/opusdsp.asm
>>> @@ -40,9 +40,10 @@ cglobal opus_deemphasis, 4, 4, 8, out, in, coeff, len
>>>      VBROADCASTSS m0, coeffm
>>>  %else
>>>  %if WIN64
>>> -    SWAP 0, 2
>>> -%endif
>>> +    shufps m0, m2, 0
>>
>> No, like this shufps will interleave the values from m2 and m0. The
>> correct instruction would be
>>
>> shufps m0, m2, m2, 0
>>
>> Where m2 is used for both input arguments, and the VEX encoding lets us
>> use another reg as output.
>> Tested and pushed it with that change. Thanks.
> 
> Opus tests still fail on win32, both mingw and msvc.
> 
> - Hendrik

x86_32 regardless of OS was crashing, because of unaligned access.
Should be fixed.

Patch hide | download patch | download mbox

From 98e93b6f322a2a9dee7499fe87b74cf50a33b022 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Mon, 1 Apr 2019 13:06:34 +0100
Subject: [PATCH] x86/opusdsp: fix WIN64 return value

---
 libavcodec/x86/opusdsp.asm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libavcodec/x86/opusdsp.asm b/libavcodec/x86/opusdsp.asm
index ed65614e06..a7bff4f0b3 100644
--- a/libavcodec/x86/opusdsp.asm
+++ b/libavcodec/x86/opusdsp.asm
@@ -40,9 +40,10 @@  cglobal opus_deemphasis, 4, 4, 8, out, in, coeff, len
     VBROADCASTSS m0, coeffm
 %else
 %if WIN64
-    SWAP 0, 2
-%endif
+    shufps m0, m2, 0
+%else
     shufps m0, m0, 0
+%endif
 %endif
 
     movaps m4, [tab_st]
-- 
2.20.1