Message ID | 20170213124417.25808-3-jdarnley@obe.tv |
---|---|
State | Accepted |
Commit | 7627df15d411a69f236b4650e88b1ab911f38efc |
Headers | show |
On Mon, Feb 13, 2017 at 1:44 PM, James Darnley <jdarnley@obe.tv> wrote: > Originally committed to x264 in 1637239a by Henrik Gramner who has > agreed to re-license it as LGPL. Original commit message follows. > > x86: Avoid some bypass delays and false dependencies > > A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning > between int and float domains, so try to avoid that if possible. Not sure if I see the point in copying the original commit message in this case, but the patch itself is OK.
On 2017-02-14 17:21, Henrik Gramner wrote: > On Mon, Feb 13, 2017 at 1:44 PM, James Darnley <jdarnley@obe.tv> wrote: >> Originally committed to x264 in 1637239a by Henrik Gramner who has >> agreed to re-license it as LGPL. Original commit message follows. >> >> x86: Avoid some bypass delays and false dependencies >> >> A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning >> between int and float domains, so try to avoid that if possible. > > Not sure if I see the point in copying the original commit message in > this case, but the patch itself is OK. It provides a nice "why" for the patch/macro is useful.
diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm index c063436e0a..1408f0a176 100644 --- a/libavutil/x86/x86util.asm +++ b/libavutil/x86/x86util.asm @@ -876,3 +876,15 @@ psrlq %1, 8*(%2) %endif %endmacro + +%macro MOVHL 2 ; dst, src +%ifidn %1, %2 + punpckhqdq %1, %2 +%elif cpuflag(avx) + punpckhqdq %1, %2, %2 +%elif cpuflag(sse4) + pshufd %1, %2, q3232 ; pshufd is slow on some older CPUs, so only use it on more modern ones +%else + movhlps %1, %2 ; may cause an int/float domain transition and has a dependency on dst +%endif +%endmacro