Message ID | LbNoKBb--B-1@lynne.ee |
---|---|
State | New |
Headers | show |
On 4/1/2019 9:13 AM, Lynne wrote: > Mar 31, 2019, 11:49 PM by jamrial@gmail.com: > >> A float ret value needs to be in xmm0, and you swapped m0 with m2 on >> Win64. This is the source of the fate failure. >> > Attached a patch to fix this. > >> %if WIN64 >> - SWAP 0, 2 >> -%endif >> + shufps m0, m2, 0 >> +%else >> shufps m0, m0, 0 >> +%endif >> %endif > > 0001-x86-opusdsp-fix-WIN64-return-value.patch > > From 98e93b6f322a2a9dee7499fe87b74cf50a33b022 Mon Sep 17 00:00:00 2001 > From: Lynne <dev@lynne.ee> > Date: Mon, 1 Apr 2019 13:06:34 +0100 > Subject: [PATCH] x86/opusdsp: fix WIN64 return value > > --- > libavcodec/x86/opusdsp.asm | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/x86/opusdsp.asm b/libavcodec/x86/opusdsp.asm > index ed65614e06..a7bff4f0b3 100644 > --- a/libavcodec/x86/opusdsp.asm > +++ b/libavcodec/x86/opusdsp.asm > @@ -40,9 +40,10 @@ cglobal opus_deemphasis, 4, 4, 8, out, in, coeff, len > VBROADCASTSS m0, coeffm > %else > %if WIN64 > - SWAP 0, 2 > -%endif > + shufps m0, m2, 0 No, like this shufps will interleave the values from m2 and m0. The correct instruction would be shufps m0, m2, m2, 0 Where m2 is used for both input arguments, and the VEX encoding lets us use another reg as output. Tested and pushed it with that change. Thanks.
On Mon, Apr 1, 2019 at 4:59 PM James Almer <jamrial@gmail.com> wrote: > > On 4/1/2019 9:13 AM, Lynne wrote: > > Mar 31, 2019, 11:49 PM by jamrial@gmail.com: > > > >> A float ret value needs to be in xmm0, and you swapped m0 with m2 on > >> Win64. This is the source of the fate failure. > >> > > Attached a patch to fix this. > > > >> %if WIN64 > >> - SWAP 0, 2 > >> -%endif > >> + shufps m0, m2, 0 > >> +%else > >> shufps m0, m0, 0 > >> +%endif > >> %endif > > > > 0001-x86-opusdsp-fix-WIN64-return-value.patch > > > > From 98e93b6f322a2a9dee7499fe87b74cf50a33b022 Mon Sep 17 00:00:00 2001 > > From: Lynne <dev@lynne.ee> > > Date: Mon, 1 Apr 2019 13:06:34 +0100 > > Subject: [PATCH] x86/opusdsp: fix WIN64 return value > > > > --- > > libavcodec/x86/opusdsp.asm | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/libavcodec/x86/opusdsp.asm b/libavcodec/x86/opusdsp.asm > > index ed65614e06..a7bff4f0b3 100644 > > --- a/libavcodec/x86/opusdsp.asm > > +++ b/libavcodec/x86/opusdsp.asm > > @@ -40,9 +40,10 @@ cglobal opus_deemphasis, 4, 4, 8, out, in, coeff, len > > VBROADCASTSS m0, coeffm > > %else > > %if WIN64 > > - SWAP 0, 2 > > -%endif > > + shufps m0, m2, 0 > > No, like this shufps will interleave the values from m2 and m0. The > correct instruction would be > > shufps m0, m2, m2, 0 > > Where m2 is used for both input arguments, and the VEX encoding lets us > use another reg as output. > Tested and pushed it with that change. Thanks. Opus tests still fail on win32, both mingw and msvc. - Hendrik
On 4/2/2019 9:16 AM, Hendrik Leppkes wrote: > On Mon, Apr 1, 2019 at 4:59 PM James Almer <jamrial@gmail.com> wrote: >> >> On 4/1/2019 9:13 AM, Lynne wrote: >>> Mar 31, 2019, 11:49 PM by jamrial@gmail.com: >>> >>>> A float ret value needs to be in xmm0, and you swapped m0 with m2 on >>>> Win64. This is the source of the fate failure. >>>> >>> Attached a patch to fix this. >>> >>>> %if WIN64 >>>> - SWAP 0, 2 >>>> -%endif >>>> + shufps m0, m2, 0 >>>> +%else >>>> shufps m0, m0, 0 >>>> +%endif >>>> %endif >>> >>> 0001-x86-opusdsp-fix-WIN64-return-value.patch >>> >>> From 98e93b6f322a2a9dee7499fe87b74cf50a33b022 Mon Sep 17 00:00:00 2001 >>> From: Lynne <dev@lynne.ee> >>> Date: Mon, 1 Apr 2019 13:06:34 +0100 >>> Subject: [PATCH] x86/opusdsp: fix WIN64 return value >>> >>> --- >>> libavcodec/x86/opusdsp.asm | 5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/libavcodec/x86/opusdsp.asm b/libavcodec/x86/opusdsp.asm >>> index ed65614e06..a7bff4f0b3 100644 >>> --- a/libavcodec/x86/opusdsp.asm >>> +++ b/libavcodec/x86/opusdsp.asm >>> @@ -40,9 +40,10 @@ cglobal opus_deemphasis, 4, 4, 8, out, in, coeff, len >>> VBROADCASTSS m0, coeffm >>> %else >>> %if WIN64 >>> - SWAP 0, 2 >>> -%endif >>> + shufps m0, m2, 0 >> >> No, like this shufps will interleave the values from m2 and m0. The >> correct instruction would be >> >> shufps m0, m2, m2, 0 >> >> Where m2 is used for both input arguments, and the VEX encoding lets us >> use another reg as output. >> Tested and pushed it with that change. Thanks. > > Opus tests still fail on win32, both mingw and msvc. > > - Hendrik x86_32 regardless of OS was crashing, because of unaligned access. Should be fixed.
From 98e93b6f322a2a9dee7499fe87b74cf50a33b022 Mon Sep 17 00:00:00 2001 From: Lynne <dev@lynne.ee> Date: Mon, 1 Apr 2019 13:06:34 +0100 Subject: [PATCH] x86/opusdsp: fix WIN64 return value --- libavcodec/x86/opusdsp.asm | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/libavcodec/x86/opusdsp.asm b/libavcodec/x86/opusdsp.asm index ed65614e06..a7bff4f0b3 100644 --- a/libavcodec/x86/opusdsp.asm +++ b/libavcodec/x86/opusdsp.asm @@ -40,9 +40,10 @@ cglobal opus_deemphasis, 4, 4, 8, out, in, coeff, len VBROADCASTSS m0, coeffm %else %if WIN64 - SWAP 0, 2 -%endif + shufps m0, m2, 0 +%else shufps m0, m0, 0 +%endif %endif movaps m4, [tab_st] -- 2.20.1