Message ID | LdE4fv9--3-1@lynne.ee |
---|---|
State | New |
Headers | show |
On 4/24/19, Lynne <dev@lynne.ee> wrote: > Has a slight speedup. > Can't be carried over to aarch64, since it has no shufps-like instruction. > > On what CPU? And by how much?
On 4/25/19, Paul B Mahol <onemda@gmail.com> wrote: > On 4/24/19, Lynne <dev@lynne.ee> wrote: >> Has a slight speedup. >> Can't be carried over to aarch64, since it has no shufps-like >> instruction. >> >> > > On what CPU? And by how much? > Patch should generally be OK if output does not change. Is this code covered by FATE?
Apr 25, 2019, 6:44 PM by onemda@gmail.com: > On 4/25/19, Paul B Mahol <> onemda@gmail.com <mailto:onemda@gmail.com>> > wrote: > >> On 4/24/19, Lynne <>> dev@lynne.ee <mailto:dev@lynne.ee>>> > wrote: >> >>> Has a slight speedup. >>> Can't be carried over to aarch64, since it has no shufps-like >>> instruction. >>> >> >> On what CPU? And by how much? >> > > Patch should generally be OK if output does not change. > > Is this code covered by FATE? > Yes, fate-opus. CPU is Skylake, speedup was about 30 decicycles (10512 -> 10482), low enough it could just be noise. The patch just removes some redundant tables.
On 4/25/2019 2:51 PM, Lynne wrote: > > > > Apr 25, 2019, 6:44 PM by onemda@gmail.com: > >> On 4/25/19, Paul B Mahol <> onemda@gmail.com <mailto:onemda@gmail.com>> > wrote: >> >>> On 4/24/19, Lynne <>> dev@lynne.ee <mailto:dev@lynne.ee>>> > wrote: >>> >>>> Has a slight speedup. >>>> Can't be carried over to aarch64, since it has no shufps-like >>>> instruction. >>>> >>> >>> On what CPU? And by how much? >>> >> >> Patch should generally be OK if output does not change. >> >> Is this code covered by FATE? >> > > Yes, fate-opus. > CPU is Skylake, speedup was about 30 decicycles (10512 -> 10482), low enough it could just be noise. The patch just removes some redundant tables. Pushed, thanks.
From 6b09809a05c0d0bd916f7f6de5b205965bf4b69a Mon Sep 17 00:00:00 2001 From: Lynne <dev@lynne.ee> Date: Wed, 24 Apr 2019 12:19:48 +0100 Subject: [PATCH] x86/opusdsp: replace loads with shuffles Has a slight speedup. Can't be carried over to aarch64, since it has no shufps-like instruction. --- libavcodec/x86/opusdsp.asm | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/libavcodec/x86/opusdsp.asm b/libavcodec/x86/opusdsp.asm index 6c99821b89..f5d206a8b1 100644 --- a/libavcodec/x86/opusdsp.asm +++ b/libavcodec/x86/opusdsp.asm @@ -24,9 +24,6 @@ SECTION_RODATA ; 0.85..^1 0.85..^2 0.85..^3 0.85..^4 tab_st: dd 0x3f599a00, 0x3f38f671, 0x3f1d382a, 0x3f05a32f -tab_x0: dd 0x0, 0x3f599a00, 0x3f599a00, 0x3f599a00 -tab_x1: dd 0x0, 0x0, 0x3f38f671, 0x3f38f671 -tab_x2: dd 0x0, 0x0, 0x0, 0x3f1d382a SECTION .text @@ -45,9 +42,9 @@ cglobal opus_deemphasis, 4, 4, 8, out, in, coeff, len %endif movaps m4, [tab_st] - movaps m5, [tab_x0] - movaps m6, [tab_x1] - movaps m7, [tab_x2] + VBROADCASTSS m5, m4 + shufps m6, m4, m4, q1111 + shufps m7, m4, m4, q2222 .loop: movaps m1, [inq] ; x0, x1, x2, x3 -- 2.20.1