diff mbox series

[FFmpeg-devel,1/4] avutil/x86/pixelutils: Empty MMX state in ff_pixelutils_sad_8x8_mmxext

Message ID GV1SPRMB0033280C6F96AACB191058008FA7A@GV1SPRMB0033.EURP250.PROD.OUTLOOK.COM
State Accepted
Commit 5b85ca5317f28a73eadf10d604d2a7421b84bed8
Headers show
Series [FFmpeg-devel,1/4] avutil/x86/pixelutils: Empty MMX state in ff_pixelutils_sad_8x8_mmxext | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Andreas Rheinhardt Nov. 1, 2023, 9:45 a.m. UTC
We currently mostly do not empty the MMX state in our MMX
DSP functions; instead we only do so before code that might
be using x87 code. This is a violation of the System V i386 ABI
(and maybe of other ABIs, too):
"The CPU shall be in x87 mode upon entry to a function. Therefore,
every function that uses the MMX registers is required to issue an
emms or femms instruction after using MMX registers, before returning
or calling another function." (See 2.2.1 in [1])
This patch does not intend to change all these functions to abide
by the ABI; it only does so for ff_pixelutils_sad_8x8_mmxext, as this
function can by called by external users, because it is exported
via the pixelutils API. Without this, the following fragment will
assert (on x86/x64):
    uint8_t src1[8 * 8], src2[8 * 8];
    av_pixelutils_sad_fn fn = av_pixelutils_get_sad_fn(3, 3, 0, NULL);
    fn(src1, 8, src2, 8);
    av_assert0_fpu();

[1]: https://raw.githubusercontent.com/wiki/hjl-tools/x86-psABI/intel386-psABI-1.1.pdf

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavutil/x86/pixelutils.asm | 1 +
 1 file changed, 1 insertion(+)

Comments

Henrik Gramner Nov. 1, 2023, 11:02 a.m. UTC | #1
On Wed, Nov 1, 2023 at 10:44 AM Andreas Rheinhardt
<andreas.rheinhardt@outlook.com> wrote:
>  libavutil/x86/pixelutils.asm | 1 +
>  1 file changed, 1 insertion(+)

IIRC the emms instructions is quite slow on many systems, so if this
is the only pixelutils function using mmx it's probably better to just
rewrite it to use SSE2 instead (even if that means only using the
lower half of xmm registers).
Andreas Rheinhardt Nov. 1, 2023, 11:23 a.m. UTC | #2
Henrik Gramner via ffmpeg-devel:
> On Wed, Nov 1, 2023 at 10:44 AM Andreas Rheinhardt
> <andreas.rheinhardt@outlook.com> wrote:
>>  libavutil/x86/pixelutils.asm | 1 +
>>  1 file changed, 1 insertion(+)
> 
> IIRC the emms instructions is quite slow on many systems, so if this
> is the only pixelutils function using mmx it's probably better to just
> rewrite it to use SSE2 instead (even if that means only using the
> lower half of xmm registers).

Fixing it by rewriting it in a way that avoids MMX altogether is
possible, but this doesn't change the fact that all public functions
absolutely need to abide by the ABI.

- Andreas
Andreas Rheinhardt Nov. 3, 2023, 11:15 a.m. UTC | #3
Andreas Rheinhardt:
> We currently mostly do not empty the MMX state in our MMX
> DSP functions; instead we only do so before code that might
> be using x87 code. This is a violation of the System V i386 ABI
> (and maybe of other ABIs, too):
> "The CPU shall be in x87 mode upon entry to a function. Therefore,
> every function that uses the MMX registers is required to issue an
> emms or femms instruction after using MMX registers, before returning
> or calling another function." (See 2.2.1 in [1])
> This patch does not intend to change all these functions to abide
> by the ABI; it only does so for ff_pixelutils_sad_8x8_mmxext, as this
> function can by called by external users, because it is exported
> via the pixelutils API. Without this, the following fragment will
> assert (on x86/x64):
>     uint8_t src1[8 * 8], src2[8 * 8];
>     av_pixelutils_sad_fn fn = av_pixelutils_get_sad_fn(3, 3, 0, NULL);
>     fn(src1, 8, src2, 8);
>     av_assert0_fpu();
> 
> [1]: https://raw.githubusercontent.com/wiki/hjl-tools/x86-psABI/intel386-psABI-1.1.pdf
> 
> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
> ---
>  libavutil/x86/pixelutils.asm | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/libavutil/x86/pixelutils.asm b/libavutil/x86/pixelutils.asm
> index fbe9b45971..0bcccb51f5 100644
> --- a/libavutil/x86/pixelutils.asm
> +++ b/libavutil/x86/pixelutils.asm
> @@ -43,6 +43,7 @@ cglobal pixelutils_sad_8x8, 4,4,0, src1, stride1, src2, stride2
>      lea         src2q, [src2q + 2*stride2q]
>  %endrep
>      movd        eax, m2
> +    emms
>      RET
>  
>  ;-------------------------------------------------------------------------------

Will apply this patchset tomorrow unless there are objections.

- Andreas
diff mbox series

Patch

diff --git a/libavutil/x86/pixelutils.asm b/libavutil/x86/pixelutils.asm
index fbe9b45971..0bcccb51f5 100644
--- a/libavutil/x86/pixelutils.asm
+++ b/libavutil/x86/pixelutils.asm
@@ -43,6 +43,7 @@  cglobal pixelutils_sad_8x8, 4,4,0, src1, stride1, src2, stride2
     lea         src2q, [src2q + 2*stride2q]
 %endrep
     movd        eax, m2
+    emms
     RET
 
 ;-------------------------------------------------------------------------------