Message ID | 20240227222030.51301-1-martin@martin.st |
---|---|
State | Accepted |
Commit | e30369bc1c683aeab6ea74bc37b4ae77b03f79b5 |
Headers | show |
Series | [FFmpeg-devel] aarch64: Use regular hwcaps flags instead of HWCAP_CPUID for CPU feature detection on Linux | expand |
Context | Check | Description |
---|---|---|
yinshiyou/make_loongarch64 | success | Make finished |
yinshiyou/make_fate_loongarch64 | success | Make fate finished |
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
On Wed, 28 Feb 2024, Martin Storsjö wrote: > The CPU feature detection was added in > 493fcde50a84cb23854335bcb0e55c6f383d55db, using HWCAP_CPUID. > > The argument for using that, was that HWCAP_CPUID was added much > earlier in the kernel (in Linux v4.11), while the HWCAP flags for > individual features were added much later. And if compiling with > older userland headers that lack the bits for e.g. HWCAP_I8MM, we > wouldn't be able to detect that feature. > > (In practice, e.g. Ubuntu 20.04 lacks HWCAP_I8MM in userland > headers, but the toolchain does support assembling such > instructions). > > However, while the flag HWCAP_I8MM was addded only in Linux v5.10, > any CPU with that feature is most likely running a kernel that is > newer than that as well. So by using HWCAP_CPUID, we could detect > that feature on kernels between v4.11 and v5.10, but that is a > quite unlikely case in practice. > > By using regular hwcaps flags, the code is much simplified, and > doesn't rely on inline assembly to read the cpu id registers. > > And instead of requiring the userland headers to provide the > definitions of the hwcap flags, provide our own definitions of the > constants (they are fixed constants anyway), with names not conflicting > with the ones from system headers. This avoids a number of ifdefs, and > allows detecting these features even if building with userland headers > that don't contain these definitions yet. > > Also, slightly older versions of QEMU, e.g. 6.2 in Ubuntu 22.04, > do expose these features via HWCAP flags, but the emulated cpuid > registers are missing the bits for exposing e.g. I8MM. > --- > libavutil/aarch64/cpu.c | 30 ++++++++---------------------- > 1 file changed, 8 insertions(+), 22 deletions(-) Will apply on Monday, if there's no objections. // Martin
diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index f27fef3992..7a05391343 100644 --- a/libavutil/aarch64/cpu.c +++ b/libavutil/aarch64/cpu.c @@ -24,34 +24,20 @@ #include <stdint.h> #include <sys/auxv.h> -#define get_cpu_feature_reg(reg, val) \ - __asm__("mrs %0, " #reg : "=r" (val)) +#define HWCAP_AARCH64_ASIMDDP (1 << 20) +#define HWCAP2_AARCH64_I8MM (1 << 13) static int detect_flags(void) { int flags = 0; -#if defined(HWCAP_CPUID) && HAVE_INLINE_ASM unsigned long hwcap = getauxval(AT_HWCAP); - // We can check for DOTPROD and I8MM using HWCAP_ASIMDDP and - // HWCAP2_I8MM too, avoiding to read the CPUID registers (which triggers - // a trap, handled by the kernel). However the HWCAP_* defines for these - // extensions are added much later than HWCAP_CPUID, so the userland - // headers might lack support for them even if the binary later is run - // on hardware that does support it (and where the kernel might support - // HWCAP_CPUID). - // See https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html - if (hwcap & HWCAP_CPUID) { - uint64_t tmp; - - get_cpu_feature_reg(ID_AA64ISAR0_EL1, tmp); - if (((tmp >> 44) & 0xf) == 0x1) - flags |= AV_CPU_FLAG_DOTPROD; - get_cpu_feature_reg(ID_AA64ISAR1_EL1, tmp); - if (((tmp >> 52) & 0xf) == 0x1) - flags |= AV_CPU_FLAG_I8MM; - } -#endif + unsigned long hwcap2 = getauxval(AT_HWCAP2); + + if (hwcap & HWCAP_AARCH64_ASIMDDP) + flags |= AV_CPU_FLAG_DOTPROD; + if (hwcap2 & HWCAP2_AARCH64_I8MM) + flags |= AV_CPU_FLAG_I8MM; return flags; }