[FFmpeg-devel] x86/opusdsp: replace loads with shuffles

Submitted by Lynne on April 24, 2019, 11:24 a.m.

Details

Message ID LdE4fv9--3-1@lynne.ee
State New
Headers show

Commit Message

Lynne April 24, 2019, 11:24 a.m.
Has a slight speedup.
Can't be carried over to aarch64, since it has no shufps-like instruction.

Comments

Paul B Mahol April 25, 2019, 5:41 p.m.
On 4/24/19, Lynne <dev@lynne.ee> wrote:
> Has a slight speedup.
> Can't be carried over to aarch64, since it has no shufps-like instruction.
>
>

On what CPU? And by how much?
Paul B Mahol April 25, 2019, 5:44 p.m.
On 4/25/19, Paul B Mahol <onemda@gmail.com> wrote:
> On 4/24/19, Lynne <dev@lynne.ee> wrote:
>> Has a slight speedup.
>> Can't be carried over to aarch64, since it has no shufps-like
>> instruction.
>>
>>
>
> On what CPU? And by how much?
>

Patch should generally be OK if output does not change.

Is this code covered by FATE?
Lynne April 25, 2019, 5:51 p.m.
Apr 25, 2019, 6:44 PM by onemda@gmail.com:

> On 4/25/19, Paul B Mahol <> onemda@gmail.com <mailto:onemda@gmail.com>> > wrote:
>
>> On 4/24/19, Lynne <>> dev@lynne.ee <mailto:dev@lynne.ee>>> > wrote:
>>
>>> Has a slight speedup.
>>> Can't be carried over to aarch64, since it has no shufps-like
>>> instruction.
>>>
>>
>> On what CPU? And by how much?
>>
>
> Patch should generally be OK if output does not change.
>
> Is this code covered by FATE?
>

Yes, fate-opus.
CPU is Skylake, speedup was about 30 decicycles (10512 -> 10482), low enough it could just be noise. The patch just removes some redundant tables.
James Almer April 26, 2019, 11:40 p.m.
On 4/25/2019 2:51 PM, Lynne wrote:
> 
> 
> 
> Apr 25, 2019, 6:44 PM by onemda@gmail.com:
> 
>> On 4/25/19, Paul B Mahol <> onemda@gmail.com <mailto:onemda@gmail.com>> > wrote:
>>
>>> On 4/24/19, Lynne <>> dev@lynne.ee <mailto:dev@lynne.ee>>> > wrote:
>>>
>>>> Has a slight speedup.
>>>> Can't be carried over to aarch64, since it has no shufps-like
>>>> instruction.
>>>>
>>>
>>> On what CPU? And by how much?
>>>
>>
>> Patch should generally be OK if output does not change.
>>
>> Is this code covered by FATE?
>>
> 
> Yes, fate-opus.
> CPU is Skylake, speedup was about 30 decicycles (10512 -> 10482), low enough it could just be noise. The patch just removes some redundant tables.

Pushed, thanks.

Patch hide | download patch | download mbox

From 6b09809a05c0d0bd916f7f6de5b205965bf4b69a Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Wed, 24 Apr 2019 12:19:48 +0100
Subject: [PATCH] x86/opusdsp: replace loads with shuffles

Has a slight speedup.
Can't be carried over to aarch64, since it has no shufps-like instruction.
---
 libavcodec/x86/opusdsp.asm | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/libavcodec/x86/opusdsp.asm b/libavcodec/x86/opusdsp.asm
index 6c99821b89..f5d206a8b1 100644
--- a/libavcodec/x86/opusdsp.asm
+++ b/libavcodec/x86/opusdsp.asm
@@ -24,9 +24,6 @@  SECTION_RODATA
 
          ; 0.85..^1    0.85..^2    0.85..^3    0.85..^4
 tab_st: dd 0x3f599a00, 0x3f38f671, 0x3f1d382a, 0x3f05a32f
-tab_x0: dd 0x0,        0x3f599a00, 0x3f599a00, 0x3f599a00
-tab_x1: dd 0x0,        0x0,        0x3f38f671, 0x3f38f671
-tab_x2: dd 0x0,        0x0,        0x0,        0x3f1d382a
 
 SECTION .text
 
@@ -45,9 +42,9 @@  cglobal opus_deemphasis, 4, 4, 8, out, in, coeff, len
 %endif
 
     movaps m4, [tab_st]
-    movaps m5, [tab_x0]
-    movaps m6, [tab_x1]
-    movaps m7, [tab_x2]
+    VBROADCASTSS m5, m4
+    shufps m6, m4, m4, q1111
+    shufps m7, m4, m4, q2222
 
 .loop:
     movaps  m1, [inq]                ; x0, x1, x2, x3
-- 
2.20.1