diff mbox series

[FFmpeg-devel,v2] libswscale/x86/yuv2rgb: Add missing EMMS

Message ID DB8P193MB06968DB8968EA73A417B29BCAFD62@DB8P193MB0696.EURP193.PROD.OUTLOOK.COM
State New
Headers show
Series [FFmpeg-devel,v2] libswscale/x86/yuv2rgb: Add missing EMMS | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Mario Hros June 26, 2024, 5:54 p.m. UTC
Previous rewrite from inline assembly into nasm (commit e934194) missed the required EMMS instruction to bring the x87 FPU back into usable state.
This needs to be done for 8-byte MMX or Extended MMX only.

Signed-off-by: Mario Hros <k3x-devel@outlook.com>
---
 libswscale/x86/yuv_2_rgb.asm | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Ramiro Polla June 26, 2024, 6:55 p.m. UTC | #1
Hi,

On Wed, Jun 26, 2024 at 8:03 PM Mario Hros <k3x-devel@outlook.com> wrote:
> Previous rewrite from inline assembly into nasm (commit e934194) missed the required EMMS instruction to bring the x87 FPU back into usable state.
> This needs to be done for 8-byte MMX or Extended MMX only.

Sorry I didn't catch this thread earlier. I sent a patch to outright
remove the mmx/mmxext code (thread "swscale/yuv2rgb/x86: remove
mmx/mmxext yuv2rgb functions"):
https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2024-June/329785.html

The C code should be faster in most cases, or have very similar
performance to the mmx/mmxext code. Is this not the case for you?

Ramiro
diff mbox series

Patch

diff --git a/libswscale/x86/yuv_2_rgb.asm b/libswscale/x86/yuv_2_rgb.asm
index e3470fd9ad..5926133af8 100644
--- a/libswscale/x86/yuv_2_rgb.asm
+++ b/libswscale/x86/yuv_2_rgb.asm
@@ -354,6 +354,10 @@  add imageq, 8 * depth * time_num
 add indexq, 4 * time_num
 js .loop0
 
+%if mmsize == 8
+emms
+%endif
+
 RET
 
 %endmacro