From patchwork Tue Feb 27 22:20:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 46610 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:aea4:b0:19e:cdac:8cce with SMTP id do36csp14879pzb; Tue, 27 Feb 2024 14:20:41 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCVY2D93M1oQMuWfbLAm2X5Rsdp4IXA4opnliAoBCfvhZHsP6DBJgGWRT1IlZhPiCrY7rLo3Fv/7ke7uJIZPSMdr4/UwwREv3g1h9g== X-Google-Smtp-Source: AGHT+IHQC5DVJLlX14mW14l+TGnReWYFZfxiUxdVv68hRaDo77V2bDoImXkgNnBsb4W4+2obzH+U X-Received: by 2002:a19:f018:0:b0:512:b245:402b with SMTP id p24-20020a19f018000000b00512b245402bmr6222112lfc.17.1709072441705; Tue, 27 Feb 2024 14:20:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1709072441; cv=none; d=google.com; s=arc-20160816; b=S43q8nIKeJ6wiPnZXBzz+8LPE+9D6PGyT414tGP9JuRYnne7kRW4+IYpIittk7l0kp gw82faGjj8WmRnTdPhLMcHU+qPl3344CZf+kITv7Almn7K5An+G6lLhUeXYbLbtjWlzg ci9vVXUgM2kpdas3UKpfhvs3otvPDzESOqOdxXB2zExMwwkzY++0JX6xB8mcQ+sqM3T8 neITH+4h7iVDxVctOFU43PPiVejir9LGpYMvqgca6v0+4MXTRVIJVwxNHJUvJwFJEF6z BdqoOS2HFX7ixWFxkfjALTk0YnnwwO1jEKEG543+zMcdHK3n1Uns284UiDnpbZg8mdpt UtZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=PT/+SOIXxlmd10S7UTAjCQhlbH6zL7vxXw7P2TObGNc=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=zSdu2uve8EJlHoBFf6O90oPUiexzUk1HSm4JS+ZO5MEYYiB0/qSZ4HRtyr9HZcac04 4hPPU/k7cAPzf+LCsbKMS8kqy9EVhmg2eIJtmySYTSzPt6kXtM7qfV4RIviITUPQp4XI 2LilzyhT7+GF4rOEhfOBPbJ8zR5C80KQ+SEOw0rIj/1YQhufsA/bataibo31thbOJqnS CwBsEWTu/PslVMjQJkytGZFERQFHohJC5T32kBRcwHv/YNJcC09sdEyvKkFpgOQT/9pk hok4GSXZBQel3JkftltBObg2K93wBaibdL0LNG+LthUYvBHPpJh499hTDzZ8m2QWJ7Oj z/JQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b=qd3HkDdh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s19-20020a056402521300b005664e589501si506238edd.231.2024.02.27.14.20.41; Tue, 27 Feb 2024 14:20:41 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b=qd3HkDdh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A029C68C82F; Wed, 28 Feb 2024 00:20:38 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 64B0868C119 for ; Wed, 28 Feb 2024 00:20:32 +0200 (EET) Received: by mail-lf1-f47.google.com with SMTP id 2adb3069b0e04-513143d3c42so1124362e87.3 for ; Tue, 27 Feb 2024 14:20:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1709072431; x=1709677231; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=xNahnAoBI4G6zdhLbKV8655RvQV7tKA3suAtw08YKXg=; b=qd3HkDdhTNr+5XFSXIH6HzVpjbZunIqXtWwX7yVzfB/d5spxuLrJBxN4MzUXX/2cf9 ky/hp0T3fh7+N/r0zGWb4vP5jdS5ryVnY5Dw+S3t5uDIoJPJZevALwfqNg67DhxAGRKK w+6C/D/gsXedGESdVNXevrRLClcG6MGxN9w0rjq+oPHaIm1QVLRdnAfJgxpvS/uObvCV tKSm7Kvv2gza2rR65lMe9RjJOYX1DVtN4E3OhCkGJRAH16T3634XLx8y7kYu3CLJNpuy KItGtriAy6wB+wzmhEAzPgXyofMFNR7adMKI8xbwLPcZHGbN3xZpLRWXmRqUV9scP5vk x5og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709072431; x=1709677231; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xNahnAoBI4G6zdhLbKV8655RvQV7tKA3suAtw08YKXg=; b=UfhKsTDo9av9hNh1PDR0pembKcpNkEze8DFREkhgruxkwiLlqblZMvONYyg7sXaILs rvamDLYgbQfXQVkQwis2NgjB4YjUZwhtM3b3kmHnZNre+wjrWZEsSwZzB3w9wGP2svml D6YsKzc6OovqHFS5sGPd5yVlxdytgnOyZ2t5tJvhQeeowCeugk3j/pJR9rSb7rvASDD/ 9pfpXDyh9LuQluw5b6P8WlYaJRAuZV8xHk6Y4B1LG3IZGRG3H8t99nM6vh49/AafOkNO qDfyW1IU9ORRpbgr5V/56NFzwE8jWFYKQ784MVtlSUToQtNy63PK76VndhVCWSbHOSop TNQA== X-Gm-Message-State: AOJu0Yxv4K92N44Nk7yUGzy9Usgh7FiF2b1BX3IJXpH6cC9pwvrGyS6b 2lNbKxJXeH2onQOVUxyaOwX4V7fMIHFmSL7XaEVWOEELo+wTkl4a/akb3m+6kSxS5uaMgeNq5G9 Dog== X-Received: by 2002:a05:6512:24c:b0:512:b6b4:73a4 with SMTP id b12-20020a056512024c00b00512b6b473a4mr7262016lfo.9.1709072431119; Tue, 27 Feb 2024 14:20:31 -0800 (PST) Received: from localhost (host-97-144.parnet.fi. [77.234.97.144]) by smtp.gmail.com with ESMTPSA id br37-20020a056512402500b005131004b340sm277448lfb.175.2024.02.27.14.20.30 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 27 Feb 2024 14:20:30 -0800 (PST) From: =?utf-8?q?Martin_Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 28 Feb 2024 00:20:30 +0200 Message-Id: <20240227222030.51301-1-martin@martin.st> X-Mailer: git-send-email 2.39.3 (Apple Git-145) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] aarch64: Use regular hwcaps flags instead of HWCAP_CPUID for CPU feature detection on Linux X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: EjrhHzD175XU The CPU feature detection was added in 493fcde50a84cb23854335bcb0e55c6f383d55db, using HWCAP_CPUID. The argument for using that, was that HWCAP_CPUID was added much earlier in the kernel (in Linux v4.11), while the HWCAP flags for individual features were added much later. And if compiling with older userland headers that lack the bits for e.g. HWCAP_I8MM, we wouldn't be able to detect that feature. (In practice, e.g. Ubuntu 20.04 lacks HWCAP_I8MM in userland headers, but the toolchain does support assembling such instructions). However, while the flag HWCAP_I8MM was addded only in Linux v5.10, any CPU with that feature is most likely running a kernel that is newer than that as well. So by using HWCAP_CPUID, we could detect that feature on kernels between v4.11 and v5.10, but that is a quite unlikely case in practice. By using regular hwcaps flags, the code is much simplified, and doesn't rely on inline assembly to read the cpu id registers. And instead of requiring the userland headers to provide the definitions of the hwcap flags, provide our own definitions of the constants (they are fixed constants anyway), with names not conflicting with the ones from system headers. This avoids a number of ifdefs, and allows detecting these features even if building with userland headers that don't contain these definitions yet. Also, slightly older versions of QEMU, e.g. 6.2 in Ubuntu 22.04, do expose these features via HWCAP flags, but the emulated cpuid registers are missing the bits for exposing e.g. I8MM. --- libavutil/aarch64/cpu.c | 30 ++++++++---------------------- 1 file changed, 8 insertions(+), 22 deletions(-) diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index f27fef3992..7a05391343 100644 --- a/libavutil/aarch64/cpu.c +++ b/libavutil/aarch64/cpu.c @@ -24,34 +24,20 @@ #include #include -#define get_cpu_feature_reg(reg, val) \ - __asm__("mrs %0, " #reg : "=r" (val)) +#define HWCAP_AARCH64_ASIMDDP (1 << 20) +#define HWCAP2_AARCH64_I8MM (1 << 13) static int detect_flags(void) { int flags = 0; -#if defined(HWCAP_CPUID) && HAVE_INLINE_ASM unsigned long hwcap = getauxval(AT_HWCAP); - // We can check for DOTPROD and I8MM using HWCAP_ASIMDDP and - // HWCAP2_I8MM too, avoiding to read the CPUID registers (which triggers - // a trap, handled by the kernel). However the HWCAP_* defines for these - // extensions are added much later than HWCAP_CPUID, so the userland - // headers might lack support for them even if the binary later is run - // on hardware that does support it (and where the kernel might support - // HWCAP_CPUID). - // See https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html - if (hwcap & HWCAP_CPUID) { - uint64_t tmp; - - get_cpu_feature_reg(ID_AA64ISAR0_EL1, tmp); - if (((tmp >> 44) & 0xf) == 0x1) - flags |= AV_CPU_FLAG_DOTPROD; - get_cpu_feature_reg(ID_AA64ISAR1_EL1, tmp); - if (((tmp >> 52) & 0xf) == 0x1) - flags |= AV_CPU_FLAG_I8MM; - } -#endif + unsigned long hwcap2 = getauxval(AT_HWCAP2); + + if (hwcap & HWCAP_AARCH64_ASIMDDP) + flags |= AV_CPU_FLAG_DOTPROD; + if (hwcap2 & HWCAP2_AARCH64_I8MM) + flags |= AV_CPU_FLAG_I8MM; return flags; }