From patchwork Sat Jun 8 11:37:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 49701 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:c209:0:b0:460:55fa:d5ed with SMTP id d9csp1537350vqo; Sat, 8 Jun 2024 04:37:40 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCX9mHAUjTNyJV0PAfOdNA1LMD06nHRfP8ykPzX8mnskMHdxc8oPa0XEyoZTiwz7wpgS1FeyvRo+Tww6ql7ssTOXNQU1bbYPtYFUDQ== X-Google-Smtp-Source: AGHT+IGZBZcaTrxTBkrfuV1mQrpHpUgfaBtx+G1kxCsvFqCR1319/GHVCFuokv2nZ9TedAItz7Uv X-Received: by 2002:a17:906:f80c:b0:a62:2cae:c02 with SMTP id a640c23a62f3a-a6cdacfeda4mr278964566b.61.1717846660384; Sat, 08 Jun 2024 04:37:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717846660; cv=none; d=google.com; s=arc-20160816; b=0JW+czWWx7hBf1DnohlT0FW3mBGKExe5H7/MOBmUfku25Jdr992tX7PbVhTr+tZxym Z31zrjiMzpQHyvJ6vjFX7MQ69OaB4S7HNEBeDkCr8pAd7nNXFmL+HszDEvoPGHCPVUlW 1KaGhsC199OgYInr6PqaOrKivL8JxtPbV4G8ZekFDqJQbesV+L9krVuMQVJWzO6u3yHz wjLhk8AaeRYTVz7V1cEH6kBbS70gqTYCGMXv1av+QIlc7x4K1t0id7mAPPhkwh8+YmCf MRlaDVWFlQPgcw0JrgS/APc0pRI0lWmveUu41KKKZg6gvTbSTuQRlGUknNJRzAQcOseK Pczg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=4Rr9WkCIq+NcJukgiPH96BIPzQJ6DAJYf/Mwh8h1Zfo=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=SMjRTIJ4q/ptnYDwvyevgwd9SqXFMH4Lh3uAHBfrzHpo+Ow27eOxya4ow8A92v//yI Vj+PHyoqDybaT/31Kuksv2UJnKjKbk2ifjfeMdrcVj9ttsA5AoKhybSou9hyJlywIZVj f4ndBvlDgmULDsBEkL3cO+Z5Z3tKOIveYl6mAoQYDZ7GBD5PMgFUvbCSl0iLog77PCx/ arfOuApDKCqcJUuouKtYKyPU0p8Eu6hDtMrp1jqI2PifcqaGds36WaZHno6U9A7zcbCl yZwOhVt57XKUv6TTxoYrNvJP28VpNvyGrv3Pmi8MZDSub3pdyoMzFaFCeh1ITJJyumwq qkDw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a6c805cfc81si276406866b.263.2024.06.08.04.37.39; Sat, 08 Jun 2024 04:37:40 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BA8DD68D64B; Sat, 8 Jun 2024 14:37:25 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3956D68D64B for ; Sat, 8 Jun 2024 14:37:18 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id CE62FC02F9 for ; Sat, 8 Jun 2024 14:37:17 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Sat, 8 Jun 2024 14:37:16 +0300 Message-ID: <20240608113717.1677043-4-remi@remlab.net> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240608113717.1677043-1-remi@remlab.net> References: <20240608113717.1677043-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/4] lavu/riscv: use Zbb CLZ/CTZ/CLZW/CTZW at run-time X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: TczilfcgtMLN Zbb static Zbb dynamic I baseline clz 0.668032642 1.336072283 19.552376803 clzl 0.668092643 1.336181786 26.110855571 ctz 1.336208533 3.340209702 26.054869008 ctzl 1.336247784 3.340362457 26.055266290 (seconds for 1 billion iterations on a SiFive-U74 core) --- libavutil/riscv/intmath.h | 101 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 101 insertions(+) diff --git a/libavutil/riscv/intmath.h b/libavutil/riscv/intmath.h index 1f0afbc81d..3e7ab864c5 100644 --- a/libavutil/riscv/intmath.h +++ b/libavutil/riscv/intmath.h @@ -73,6 +73,107 @@ static av_always_inline av_const int av_clip_intp2_rvi(int a, int p) } #if defined (__GNUC__) || defined (__clang__) +static inline av_const int ff_ctz_rv(int x) +{ +#if HAVE_RV && !defined(__riscv_zbb) + if (!__builtin_constant_p(x) && + __builtin_expect(ff_rv_zbb_support(), true)) { + int y; + + __asm__ ( + ".option push\n" + ".option arch, +zbb\n" +#if __riscv_xlen >= 64 + "ctzw %0, %1\n" +#else + "ctz %0, %1\n" +#endif + ".option pop" : "=r" (y) : "r" (x)); + if (y > 32) + __builtin_unreachable(); + return y; + } +#endif + return __builtin_ctz(x); +} +#define ff_ctz ff_ctz_rv + +static inline av_const int ff_ctzll_rv(long long x) +{ +#if HAVE_RV && !defined(__riscv_zbb) && __riscv_xlen == 64 + if (!__builtin_constant_p(x) && + __builtin_expect(ff_rv_zbb_support(), true)) { + int y; + + __asm__ ( + ".option push\n" + ".option arch, +zbb\n" + "ctz %0, %1\n" + ".option pop" : "=r" (y) : "r" (x)); + if (y > 64) + __builtin_unreachable(); + return y; + } +#endif + return __builtin_ctzll(x); +} +#define ff_ctzll ff_ctzll_rv + +static inline av_const int ff_clz_rv(int x) +{ +#if HAVE_RV && !defined(__riscv_zbb) + if (!__builtin_constant_p(x) && + __builtin_expect(ff_rv_zbb_support(), true)) { + int y; + + __asm__ ( + ".option push\n" + ".option arch, +zbb\n" +#if __riscv_xlen >= 64 + "clzw %0, %1\n" +#else + "clz %0, %1\n" +#endif + ".option pop" : "=r" (y) : "r" (x)); + if (y > 32) + __builtin_unreachable(); + return y; + } +#endif + return __builtin_clz(x); +} +#define ff_clz ff_clz_rv + +#if __riscv_xlen == 64 +static inline av_const int ff_clzll_rv(long long x) +{ +#if HAVE_RV && !defined(__riscv_zbb) + if (!__builtin_constant_p(x) && + __builtin_expect(ff_rv_zbb_support(), true)) { + int y; + + __asm__ ( + ".option push\n" + ".option arch, +zbb\n" + "clz %0, %1\n" + ".option pop" : "=r" (y) : "r" (x)); + if (y > 64) + __builtin_unreachable(); + return y; + } +#endif + return __builtin_clzll(x); +} +#define ff_clz ff_clz_rv +#endif + +static inline av_const int ff_log2_rv(unsigned int x) +{ + return 31 - ff_clz_rv(x | 1); +} +#define ff_log2 ff_log2_rv +#define ff_log2_16bit ff_log2_rv + static inline av_const int av_popcount_rv(unsigned int x) { #if HAVE_RV && !defined(__riscv_zbb)