diff mbox

[FFmpeg-devel,3/4] x86util: import MOVHL macro

Message ID 20170213124417.25808-3-jdarnley@obe.tv
State Accepted
Commit 7627df15d411a69f236b4650e88b1ab911f38efc
Headers show

Commit Message

James Darnley Feb. 13, 2017, 12:44 p.m. UTC
Originally committed to x264 in 1637239a by Henrik Gramner who has
agreed to re-license it as LGPL.  Original commit message follows.

    x86: Avoid some bypass delays and false dependencies

    A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning
    between int and float domains, so try to avoid that if possible.
---
 libavutil/x86/x86util.asm | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Henrik Gramner Feb. 14, 2017, 4:21 p.m. UTC | #1
On Mon, Feb 13, 2017 at 1:44 PM, James Darnley <jdarnley@obe.tv> wrote:
> Originally committed to x264 in 1637239a by Henrik Gramner who has
> agreed to re-license it as LGPL.  Original commit message follows.
>
>     x86: Avoid some bypass delays and false dependencies
>
>     A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning
>     between int and float domains, so try to avoid that if possible.

Not sure if I see the point in copying the original commit message in
this case, but the patch itself is OK.
James Darnley Feb. 15, 2017, 3:55 p.m. UTC | #2
On 2017-02-14 17:21, Henrik Gramner wrote:
> On Mon, Feb 13, 2017 at 1:44 PM, James Darnley <jdarnley@obe.tv> wrote:
>> Originally committed to x264 in 1637239a by Henrik Gramner who has
>> agreed to re-license it as LGPL.  Original commit message follows.
>>
>>     x86: Avoid some bypass delays and false dependencies
>>
>>     A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning
>>     between int and float domains, so try to avoid that if possible.
> 
> Not sure if I see the point in copying the original commit message in
> this case, but the patch itself is OK.

It provides a nice "why" for the patch/macro is useful.
diff mbox

Patch

diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm
index c063436e0a..1408f0a176 100644
--- a/libavutil/x86/x86util.asm
+++ b/libavutil/x86/x86util.asm
@@ -876,3 +876,15 @@ 
     psrlq   %1, 8*(%2)
 %endif
 %endmacro
+
+%macro MOVHL 2 ; dst, src
+%ifidn %1, %2
+    punpckhqdq %1, %2
+%elif cpuflag(avx)
+    punpckhqdq %1, %2, %2
+%elif cpuflag(sse4)
+    pshufd     %1, %2, q3232 ; pshufd is slow on some older CPUs, so only use it on more modern ones
+%else
+    movhlps    %1, %2        ; may cause an int/float domain transition and has a dependency on dst
+%endif
+%endmacro