From patchwork Tue Sep 17 12:14:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 51631 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:d32e:0:b0:48e:c0f8:d0de with SMTP id cf14csp211013vqb; Tue, 17 Sep 2024 05:14:42 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXKezA6fs/kxb4yxV0LPwDjFvwPKfyJPq9I8hABwUrpKigfuVftxfoqN8ItLVB0laxXs2Xh9WLmcx8MX6FnAv/0@gmail.com X-Google-Smtp-Source: AGHT+IEYsjPZxyjpnyrd48DdX8hK/0To/PQzvABPlcp62L8P6yBFLv0ttihgbLrk7XZUv9C5Eyr9 X-Received: by 2002:a05:651c:b2b:b0:2f1:563d:ec8a with SMTP id 38308e7fff4ca-2f787f4a407mr86768691fa.41.1726575282306; Tue, 17 Sep 2024 05:14:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726575282; cv=none; d=google.com; s=arc-20240605; b=E6PARGbShreUbuMZaf/Hqw+MVHhZGVZoZoa3QJnsecTx5lP9Xvy0+RC9B2PzNHKncC oeNMMGCS8ZZYcfdCJUzESILj+rNOfmHn1/PUFdMg+CNXqcfNlTPs7TgPqnPkDwTfNuVp BxrDtmbUql9GGyQpw3BsiSyIWH0e3QkQJUQdTLMEpH1TSJaUVCyi8MPVZiby2NETU2KY eXFytPSNYdQsRbEnvT0MhrZqVKVqIoTChBTqwtr1+nRDnIEBGYlRP50+dTgzylwe3u1Y XkqHSEvg/J/cv4TQKuru4pYxbwp718DccLztxQkdie6A0mAoDgZ/WQ7mjPq8y9gVEKx2 NvrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=8TxagcKobMVr3cA3LbNLF/YaKhfbez6t6KkBgIqZdoI=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=JFYN65BS9tw4F8OV9YOlwelA/xTJw+5eXmA0Bd/sgTnz5oBAevGayIB8eeQvyxl741 B2O0LDSvDvakIdZL5X9XgsAct4Cz/4igasS8iFWB2lfScplDeULCF9Hs0Jx6DMHL6mGd kzzKDpU9wJSN7DG8XkKAliWunjBeKwucGyzfIR99JadAtGHLSodKuXUO9e0Pfhcp2wJV z/zallRGXQNdsPkNTXYo6l0OA/RoF4BNb7F3RrWzf25J3T4d7yqwT9R08Sl9N0WL9AC3 K+eVyNoQzXTvJVYKC7GXoZ7LZVcjU9HtFSiOZK3kj2xS/MBHhVVpt+EJC4ZLIutbUKoB XoXQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b=LnHedKOj; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dara=fail header.i=@gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f79d47ee3esi21007101fa.573.2024.09.17.05.14.41; Tue, 17 Sep 2024 05:14:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b=LnHedKOj; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dara=fail header.i=@gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 71A9F68D814; Tue, 17 Sep 2024 15:14:30 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com [209.85.167.45]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C5E8B68D78B for ; Tue, 17 Sep 2024 15:14:21 +0300 (EEST) Received: by mail-lf1-f45.google.com with SMTP id 2adb3069b0e04-5356ab89665so6119078e87.1 for ; Tue, 17 Sep 2024 05:14:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1726575261; x=1727180061; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=/D2bFNTQsKTEPYiPMHr2S4Wh5YoBtI+8ivI+zC/AU/Q=; b=LnHedKOjC67JEP2IPMbEyYK8W/Wx5jvqCxODF1ol7qOOhd1XalWpTQuGqpqxtqguFi 6hqNuZUCG5787FQgtIi/eOwy0ODgcUqb1vhXCLFZZ5g7w9J4/o9T570tVqDtxApzRwHp zP6fhH2upUjWBxW0BBQIhwIf7QCLUdDWpsGnlo8wU/22qQQKH3/miNP1tZGm8WxwINNJ xtlJbyRV2tzwIHK/UcOebBQgm9hwFEtMHAE917YtEYoDIt867wQg3kkzWtxd+Lu1qtk2 LlCTJnrZSvf0wEhRpBjfr7jSiDw/mpksKyJf4mTIlteIc/jsMwvzuhEkdDCeMmZhkc3U l6yQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726575261; x=1727180061; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/D2bFNTQsKTEPYiPMHr2S4Wh5YoBtI+8ivI+zC/AU/Q=; b=qfwBAjX8vByCQHRGSdKKqr8rXHRijQdior2p64OvEiaJodMdXxteTtvt8BGb8Gmthm 67s3C0PnsaRThT5aWn/np9mdcsSCOAV+yX6Zvxxvyrz5mu2+sqKjniVGdyK+sooMCPwQ NhWZSyhdJJLQvwQ1wXhx2ksj1qUYc8Ro0btpwrGaEVhyhBriiBFsy+cNT4AoP4rdEXhA bzgpF3ggolmzgOdg5FDqO4SSiFq2cKrjV+/7W9xDQ/ksW4GRqvqR33aPaIusKtIBc9Hp ic3icy4KAFs/ZhgkMJ2f78hzDLAYtLiIARLkWtMKik+iiiY9v/cOGagpSMlIds9UAlXN SgXg== X-Gm-Message-State: AOJu0YxTyCMH8hGtFfUbq3sReSVbNy83FcVFmXgC5Lo9FQI6I56xIpeb HmqYDphkEyrKmTH1KAN+EQhrcBrEVQ4RgYPjmBUYi7Ofo0qaAPkjI8xx3PWK9/uGwnFsvArNsQ7 2pQ== X-Received: by 2002:a05:6512:12c4:b0:52c:db0a:a550 with SMTP id 2adb3069b0e04-53678feb66dmr10387940e87.42.1726575260517; Tue, 17 Sep 2024 05:14:20 -0700 (PDT) Received: from localhost (dsl-tkubng21-58c01c-243.dhcp.inet.fi. [88.192.28.243]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-536870b8c09sm1198790e87.268.2024.09.17.05.14.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Sep 2024 05:14:20 -0700 (PDT) From: =?utf-8?q?Martin_Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Tue, 17 Sep 2024 15:14:15 +0300 Message-Id: <20240917121419.610349-2-martin@martin.st> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240917121419.610349-1-martin@martin.st> References: <20240917121419.610349-1-martin@martin.st> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/5] configure: Add detection of assembler support for SVE/SVE2 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Teh22SOeKtEu It turns out that recent versions of MS armasm64 does support some SVE instructions, but not all of them. Test for one of the instructions that it currently doesn't support. --- Just as disclaimer, I'm not currently actively planning on writing SVE/SVE2 optimizations. However, related projects such as x264 and dav1d do have a few functions using these extensions, so we might just as well add the framework support for these features in ffmpeg anyway, as functions needing this support will come sooner or later anyway. In the related projects, there's no really use of longer vectors (as there's very little such HW available anyway), but SVE gives widening loads (used in a couple places in x264) and 16 bit dot products (used in dav1d), which can be useful with 128 bit vectors. --- configure | 14 +++++++++++++- ffbuild/arch.mak | 2 ++ libavutil/aarch64/asm.S | 18 ++++++++++++++++++ 3 files changed, 33 insertions(+), 1 deletion(-) diff --git a/configure b/configure index da36419f2d..d05c4a5a51 100755 --- a/configure +++ b/configure @@ -466,6 +466,8 @@ Optimization options (experts only): --disable-neon disable NEON optimizations --disable-dotprod disable DOTPROD optimizations --disable-i8mm disable I8MM optimizations + --disable-sve disable SVE optimizations + --disable-sve2 disable SVE2 optimizations --disable-inline-asm disable use of inline assembly --disable-x86asm disable use of standalone x86 assembly --disable-mipsdsp disable MIPS DSP ASE R1 optimizations @@ -2163,6 +2165,8 @@ ARCH_EXT_LIST_ARM=" vfp vfpv3 setend + sve + sve2 " ARCH_EXT_LIST_MIPS=" @@ -2435,6 +2439,8 @@ TOOLCHAIN_FEATURES=" as_arch_directive as_archext_dotprod_directive as_archext_i8mm_directive + as_archext_sve_directive + as_archext_sve2_directive as_dn_directive as_fpu_directive as_func @@ -2755,6 +2761,8 @@ vfpv3_deps="vfp" setend_deps="arm" dotprod_deps="aarch64 neon" i8mm_deps="aarch64 neon" +sve_deps="aarch64 neon" +sve2_deps="aarch64 neon sve" map 'eval ${v}_inline_deps=inline_asm' $ARCH_EXT_LIST_ARM @@ -6223,9 +6231,11 @@ if enabled aarch64; then # internal assembler in clang 3.3 does not support this instruction enabled neon && check_insn neon 'ext v0.8B, v0.8B, v1.8B, #1' - archext_list="dotprod i8mm" + archext_list="dotprod i8mm sve sve2" enabled dotprod && check_archext_insn dotprod 'udot v0.4s, v0.16b, v0.16b' enabled i8mm && check_archext_insn i8mm 'usdot v0.4s, v0.16b, v0.16b' + enabled sve && check_archext_insn sve 'whilelt p0.s, x0, x1' + enabled sve2 && check_archext_insn sve2 'sqrdmulh z0.s, z0.s, z0.s' # Disable the main feature (e.g. HAVE_NEON) if neither inline nor external # assembly support the feature out of the box. Skip this for the features @@ -7913,6 +7923,8 @@ if enabled aarch64; then echo "NEON enabled ${neon-no}" echo "DOTPROD enabled ${dotprod-no}" echo "I8MM enabled ${i8mm-no}" + echo "SVE enabled ${sve-no}" + echo "SVE2 enabled ${sve2-no}" fi if enabled arm; then echo "ARMv5TE enabled ${armv5te-no}" diff --git a/ffbuild/arch.mak b/ffbuild/arch.mak index 3fc40e5e5d..af71aacfd2 100644 --- a/ffbuild/arch.mak +++ b/ffbuild/arch.mak @@ -3,6 +3,8 @@ OBJS-$(HAVE_ARMV6) += $(ARMV6-OBJS) $(ARMV6-OBJS-yes) OBJS-$(HAVE_ARMV8) += $(ARMV8-OBJS) $(ARMV8-OBJS-yes) OBJS-$(HAVE_VFP) += $(VFP-OBJS) $(VFP-OBJS-yes) OBJS-$(HAVE_NEON) += $(NEON-OBJS) $(NEON-OBJS-yes) +OBJS-$(HAVE_SVE) += $(SVE-OBJS) $(SVE-OBJS-yes) +OBJS-$(HAVE_SVE2) += $(SVE2-OBJS) $(SVE2-OBJS-yes) OBJS-$(HAVE_MIPSFPU) += $(MIPSFPU-OBJS) $(MIPSFPU-OBJS-yes) OBJS-$(HAVE_MIPSDSP) += $(MIPSDSP-OBJS) $(MIPSDSP-OBJS-yes) diff --git a/libavutil/aarch64/asm.S b/libavutil/aarch64/asm.S index 1840f9fb01..50ce7d4dfd 100644 --- a/libavutil/aarch64/asm.S +++ b/libavutil/aarch64/asm.S @@ -56,8 +56,26 @@ #define DISABLE_I8MM #endif +#if HAVE_AS_ARCHEXT_SVE_DIRECTIVE +#define ENABLE_SVE .arch_extension sve +#define DISABLE_SVE .arch_extension nosve +#else +#define ENABLE_SVE +#define DISABLE_SVE +#endif + +#if HAVE_AS_ARCHEXT_SVE2_DIRECTIVE +#define ENABLE_SVE2 .arch_extension sve2 +#define DISABLE_SVE2 .arch_extension nosve2 +#else +#define ENABLE_SVE2 +#define DISABLE_SVE2 +#endif + DISABLE_DOTPROD DISABLE_I8MM +DISABLE_SVE +DISABLE_SVE2 /* Support macros for