diff mbox

[FFmpeg-devel,RFC] lavu/x86/pixelutils: Call emms before returning.

Message ID 201610022253.17825.cehoyos@ag.or.at
State Rejected
Headers show

Commit Message

Carl Eugen Hoyos Oct. 2, 2016, 8:53 p.m. UTC
Hi!

The functions in libavutil/x86/pixelutils.asm are exported 
to the library users if I understand the code correctly.
I suspect it can be expected that the MMX state is reset 
after returning.

Fixes the pixelutils fate test with musl on x86-32.

Please comment, Carl Eugen
From f16831e6dc0f36c8f290d927bf51142c64fe8afc Mon Sep 17 00:00:00 2001
From: Carl Eugen Hoyos <cehoyos@ag.or.at>
Date: Sun, 2 Oct 2016 22:45:51 +0200
Subject: [PATCH] lavu/x86/pixelutils: Call emms before returning.

The functions are exported and the library user can
expect that the MMX state is reset after returning.
---
 libavutil/x86/pixelutils.asm |    5 +++++
 1 file changed, 5 insertions(+)

Comments

Hendrik Leppkes Oct. 2, 2016, 9:04 p.m. UTC | #1
On Sun, Oct 2, 2016 at 10:53 PM, Carl Eugen Hoyos <cehoyos@ag.or.at> wrote:
> Hi!
>
> The functions in libavutil/x86/pixelutils.asm are exported
> to the library users if I understand the code correctly.
> I suspect it can be expected that the MMX state is reset
> after returning.
>
> Fixes the pixelutils fate test with musl on x86-32.
>
> Please comment, Carl Eugen
>

These functions are called in tight loops, and emms is then called in
the user-code after the loop. Calling emms inside the DSP function
would result in a performance degredation.

If a test doesn't call emms after using it, then that should be fixed.
And the docs should perhaps be amended to make note of that. Many DSP
functions work like that, to allow using them in tight loops without
the overhead.

- Hendrik
Michael Niedermayer Oct. 2, 2016, 9:06 p.m. UTC | #2
On Sun, Oct 02, 2016 at 10:53:17PM +0200, Carl Eugen Hoyos wrote:
> Hi!
> 
> The functions in libavutil/x86/pixelutils.asm are exported 
> to the library users if I understand the code correctly.
> I suspect it can be expected that the MMX state is reset 
> after returning.
> 
> Fixes the pixelutils fate test with musl on x86-32.
> 
> Please comment, Carl Eugen

>  pixelutils.asm |    5 +++++
>  1 file changed, 5 insertions(+)
> a7b729f55ec6170b1c4b1959a4be7530926368ce  0001-lavu-x86-pixelutils-Call-emms-before-returning.patch
> From f16831e6dc0f36c8f290d927bf51142c64fe8afc Mon Sep 17 00:00:00 2001
> From: Carl Eugen Hoyos <cehoyos@ag.or.at>
> Date: Sun, 2 Oct 2016 22:45:51 +0200
> Subject: [PATCH] lavu/x86/pixelutils: Call emms before returning.
> 
> The functions are exported and the library user can
> expect that the MMX state is reset after returning.
> ---
>  libavutil/x86/pixelutils.asm |    5 +++++
>  1 file changed, 5 insertions(+)

I think the performance impact of this on some CPUs is too large
to call this at a 8x8 and 16x16 granularity

[...]
diff mbox

Patch

diff --git a/libavutil/x86/pixelutils.asm b/libavutil/x86/pixelutils.asm
index 7af3007..092eb6e 100644
--- a/libavutil/x86/pixelutils.asm
+++ b/libavutil/x86/pixelutils.asm
@@ -61,6 +61,7 @@  cglobal pixelutils_sad_8x8, 4,4,0, src1, stride1, src2, stride2
     paddw       m6, m0
     movd        eax, m6
     movzx       eax, ax
+    emms
     RET
 
 ;-------------------------------------------------------------------------------
@@ -81,6 +82,7 @@  cglobal pixelutils_sad_8x8, 4,4,0, src1, stride1, src2, stride2
     lea         src2q, [src2q + 2*stride2q]
 %endrep
     movd        eax, m2
+    emms
     RET
 
 ;-------------------------------------------------------------------------------
@@ -101,6 +103,7 @@  cglobal pixelutils_sad_16x16, 4,4,0, src1, stride1, src2, stride2
     add         src2q, stride2q
 %endrep
     movd        eax, m2
+    emms
     RET
 
 ;-------------------------------------------------------------------------------
@@ -131,6 +134,7 @@  cglobal pixelutils_sad_16x16, 4,4,5, src1, stride1, src2, stride2
     movhlps     m0, m4
     paddw       m4, m0
     movd        eax, m4
+    emms
     RET
 
 ;-------------------------------------------------------------------------------
@@ -158,6 +162,7 @@  cglobal pixelutils_sad_%1_16x16, 4,4,3, src1, stride1, src2, stride2
     movhlps     m0, m2
     paddw       m2, m0
     movd        eax, m2
+    emms
     RET
 %endmacro