diff mbox series

[FFmpeg-devel,1/2] avutil/cpu_internal: Fix check for SSE2SLOW

Message ID DB6PR0101MB22140E7649E0651A89B1D4238FAD9@DB6PR0101MB2214.eurprd01.prod.exchangelabs.com
State Accepted
Commit ac322ec21418a7510815eda198c978d43d5cd507
Headers show
Series [FFmpeg-devel,1/2] avutil/cpu_internal: Fix check for SSE2SLOW | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Andreas Rheinhardt June 15, 2022, 5:33 a.m. UTC
For SSE2 and SSE3, there are four states that the two flags
involved (AV_CPU_FLAG_SSE[23] and AV_CPU_FLAG_SSE[23]SLOW) can convey.
When ordered from worst to best they are:
1. both flags unset (SSE[23] unavailable)
2. the slow flag set, the ordinary flag unset (this is designed
for cases where SSE2 is available, but so slow that MMX(EXT)/SSE
code is usually faster)
3. both flags set (SSE2 is available, but there might be scenarios
where MMX(EXT)/SSE code is faster)
4. the ordinary flag set, the slow flag unset (this is the normal case)

The ordinary macros for checking cpuflags return true
in the latter two cases; the fast macros only return true for
the latter case. Yet the macros to check for slow currently
only return true in case three.

This seems unintended. In fact, the only uses of the slow macros
are all of the form
if (EXTERNAL_SSE2(cpu_flags) || EXTERNAL_SSE2_SLOW(cpu_flags))
where the check for EXTERNAL_SSE2_SLOW is completely redundant.
Even more importantly, it is not what was intended. Before
6369ba3c9cc74becfaad2a8882dff3dd3e7ae3c0, the checks passed
in cases 2 to 4. Said commit changed this to something that
only passes for the third case. Commits
7fb758cd8ed08e4a37f10e25003953d13c68b8cd and
c1913064e38cb338039f29c280a0dacc3fd1e451 restored the old behaviour,
yet merging 4efab89332ea39a77145e8b15562b981d9dbde68 (in commit
ac774cfa571734c49c26e2d3387adccff8957ff8) broke this again
by changing it to what it is now.*

This commit changes the macros to make the slow macros check
whether a specific instruction is supported, even if slow.
This restores the intended meaning to all uses of the SLOW macros
and is generally more natural.

*: Libav only checks for EXTERNAL_SSE2_SLOW, i.e. for the third case
only.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavutil/cpu_internal.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/libavutil/cpu_internal.h b/libavutil/cpu_internal.h
index e207b2d480..650d47fc96 100644
--- a/libavutil/cpu_internal.h
+++ b/libavutil/cpu_internal.h
@@ -30,12 +30,15 @@ 
     (HAVE_ ## cpuext ## suffix && ((flags) & AV_CPU_FLAG_ ## cpuext) && \
      !((flags) & AV_CPU_FLAG_ ## slow_cpuext ## SLOW))
 
+#define CPUEXT_SUFFIX_SLOW(flags, suffix, cpuext)                       \
+    (HAVE_ ## cpuext ## suffix &&                                       \
+     ((flags) & (AV_CPU_FLAG_ ## cpuext | AV_CPU_FLAG_ ## cpuext ## SLOW)))
+
 #define CPUEXT_SUFFIX_SLOW2(flags, suffix, cpuext, slow_cpuext)         \
     (HAVE_ ## cpuext ## suffix && ((flags) & AV_CPU_FLAG_ ## cpuext) && \
-     ((flags) & AV_CPU_FLAG_ ## slow_cpuext ## SLOW))
+     ((flags) & (AV_CPU_FLAG_ ## slow_cpuext | AV_CPU_FLAG_ ## slow_cpuext ## SLOW)))
 
 #define CPUEXT_SUFFIX_FAST(flags, suffix, cpuext) CPUEXT_SUFFIX_FAST2(flags, suffix, cpuext, cpuext)
-#define CPUEXT_SUFFIX_SLOW(flags, suffix, cpuext) CPUEXT_SUFFIX_SLOW2(flags, suffix, cpuext, cpuext)
 
 #define CPUEXT(flags, cpuext) CPUEXT_SUFFIX(flags, , cpuext)
 #define CPUEXT_FAST(flags, cpuext) CPUEXT_SUFFIX_FAST(flags, , cpuext)