From patchwork Sat Jun 8 11:37:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 49700 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:c209:0:b0:460:55fa:d5ed with SMTP id d9csp1537279vqo; Sat, 8 Jun 2024 04:37:29 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU+T9okkR1XtRqfJFDDtyWuCPiOdRvwokW/1HX5W0btpgpApAzbB7LUBneEGwbzZ5e8WKsgblz+BizF6okp4tIdjXupkIB2GqdwTQ== X-Google-Smtp-Source: AGHT+IHC++SJg72awI1FtbumQ9OyrMNaddr5xYoN9/7phQHUWpnU03VIWyT2c00QG5x0r1NKhf3w X-Received: by 2002:a17:906:a888:b0:a6e:5526:e574 with SMTP id a640c23a62f3a-a6e5526f9bemr196466866b.46.1717846649555; Sat, 08 Jun 2024 04:37:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717846649; cv=none; d=google.com; s=arc-20160816; b=u4Ir2t7/WD/g1D9NWg++AK+Wm6iFuIH21TwiYujj4ETlrlhC0Wj4eO5cHkNkJ7ERpk sdKWOpnbCbRqCqEz3AN3YvqUxJ85bPFxy5KJXuyEVIIBy8Obvk3S8wLszqCU75vEaFau QEZWJXFgcf05aZxj15IJ0TyCyXZp3uHsXAQIp6Xb78SO1fHk1t3odLVnI9LZc1IVHrC9 6RUHzMXwcddtxSxzMyyYHSYCi1wJKrIwQdGWyueRGRFENWDKWD3Fk5HYxN55eLZdHnB2 Q4vaypsXzyuBnDXS5AWmReMG1bnyxt9zhuSltFFpLro7O0xBWqV4i8eMOipBTjceZD0R pqJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=tAxJ7YwAZ7SOkRMnlWm/cKAYDfpzvooKcciY4DvAV34=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=xsd31j/oTw6Ss3Vt5WtyerAkzjyN576DFkFNWWbvMyQhhxQGmL/AR469veUBBKwcOH O/3VIhk/553peCGzzNjoggdpt9zTgxW/J+pPMs5oEeGZFdoPlnTSIiKCLCAJc/u0SqFE jqDQ8WlljUhTvooA51VWzXSt02Agbt5phkVNXrZP/rRUhrd5GR30EQkVGCacnLcyKOA9 WU6d507jdGmedwyis3NlQpJhsD4cSVEAavgdDs7QxQMP0kIqmG2Ky5njxwLG4WRHFZDO yqLrtIPcU5AvKmLscTdfA985nvhW/iutXCuz0rTjNuaq1cvaVm7RpdVvnQnoWLsmtfGm AMCA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a6ef450273csi117958466b.341.2024.06.08.04.37.28; Sat, 08 Jun 2024 04:37:29 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6F4E868D6B2; Sat, 8 Jun 2024 14:37:24 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E5CDC68D64B for ; Sat, 8 Jun 2024 14:37:17 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 3A611C006B for ; Sat, 8 Jun 2024 14:37:17 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Sat, 8 Jun 2024 14:37:13 +0300 Message-ID: <20240608113717.1677043-1-remi@remlab.net> X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/4] riscv: probe for Zbb extension at load time X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 2QKw7DVgcFgJ Due to hysterical raisins, most RISC-V Linux distributions target a RV64GC baseline excluding the Bit-manipulation ISA extensions, most notably: - Zba: address generation extension and - Zbb: basic bit manipulation extension. Most CPUs that would make sense to run FFmpeg on support Zba and Zbb (including the current FATE runner), so it makes sense to optimise for them. In fact a large chunk of existing assembler optimisations relies on Zba and/or Zbb. Since we cannot patch shared library code, the next best thing is to carry a flag initialised at load-time and check it on need basis. This results in 3 instructions overhead on isolated use, e.g.: 1: AUIPC rd, %pcrel_hi(ff_rv_zbb_supported) LBU rd, %pcrel_lo(1b)(rd) BEQZ rd, non_Zbb_fallback_code // Zbb code here The C compiler will typically load the flag ahead of time to reducing latency, and can also keep it around if Zbb is used multiple times in a single optimisation scope. For this to work, the flag symbol must be hidden; otherwise the optimisation degrades with a GOT look-up to support interposition: 1: AUIPC rd, GOT_OFFSET_HI LD rd, GOT_OFFSET_LO(rd) LBU rd, (rd) BEQZ rd, non_Zbb_fallback_code // Zbb code here This patch adds code to provision the flag in libraries using bit manipulation functions from libavutil: byte-swap, bit-weight and counting leading or trailing zeroes. --- libavcodec/riscv/Makefile | 2 ++ libavcodec/riscv/cpu_common.c | 1 + libavdevice/riscv/Makefile | 1 + libavdevice/riscv/cpu_common.c | 1 + libavfilter/riscv/Makefile | 2 ++ libavfilter/riscv/cpu_common.c | 1 + libavformat/riscv/Makefile | 1 + libavformat/riscv/cpu_common.c | 1 + libavutil/riscv/Makefile | 3 ++- libavutil/riscv/cpu.h | 14 ++++++++++++++ libavutil/riscv/cpu_common.c | 33 +++++++++++++++++++++++++++++++++ libswscale/riscv/Makefile | 2 ++ libswscale/riscv/cpu_common.c | 1 + tests/ref/fate/source | 5 +++++ 14 files changed, 67 insertions(+), 1 deletion(-) create mode 100644 libavcodec/riscv/cpu_common.c create mode 100644 libavdevice/riscv/Makefile create mode 100644 libavdevice/riscv/cpu_common.c create mode 100644 libavfilter/riscv/cpu_common.c create mode 100644 libavformat/riscv/Makefile create mode 100644 libavformat/riscv/cpu_common.c create mode 100644 libavutil/riscv/cpu_common.c create mode 100644 libswscale/riscv/cpu_common.c diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 590655f829..c180223141 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -77,3 +77,5 @@ RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o \ riscv/vp9_mc_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o + +SHLIBOBJS += riscv/cpu_common.o diff --git a/libavcodec/riscv/cpu_common.c b/libavcodec/riscv/cpu_common.c new file mode 100644 index 0000000000..17c9b392c9 --- /dev/null +++ b/libavcodec/riscv/cpu_common.c @@ -0,0 +1 @@ +#include "libavutil/riscv/cpu_common.c" diff --git a/libavdevice/riscv/Makefile b/libavdevice/riscv/Makefile new file mode 100644 index 0000000000..52857aacba --- /dev/null +++ b/libavdevice/riscv/Makefile @@ -0,0 +1 @@ +SHLIBOBJS += riscv/cpu_common.o diff --git a/libavdevice/riscv/cpu_common.c b/libavdevice/riscv/cpu_common.c new file mode 100644 index 0000000000..17c9b392c9 --- /dev/null +++ b/libavdevice/riscv/cpu_common.c @@ -0,0 +1 @@ +#include "libavutil/riscv/cpu_common.c" diff --git a/libavfilter/riscv/Makefile b/libavfilter/riscv/Makefile index 277dde2aed..14a4470d96 100644 --- a/libavfilter/riscv/Makefile +++ b/libavfilter/riscv/Makefile @@ -1,2 +1,4 @@ OBJS-$(CONFIG_AFIR_FILTER) += riscv/af_afir_init.o RVV-OBJS-$(CONFIG_AFIR_FILTER) += riscv/af_afir_rvv.o + +SHLIBOBJS += riscv/cpu_common.o diff --git a/libavfilter/riscv/cpu_common.c b/libavfilter/riscv/cpu_common.c new file mode 100644 index 0000000000..17c9b392c9 --- /dev/null +++ b/libavfilter/riscv/cpu_common.c @@ -0,0 +1 @@ +#include "libavutil/riscv/cpu_common.c" diff --git a/libavformat/riscv/Makefile b/libavformat/riscv/Makefile new file mode 100644 index 0000000000..52857aacba --- /dev/null +++ b/libavformat/riscv/Makefile @@ -0,0 +1 @@ +SHLIBOBJS += riscv/cpu_common.o diff --git a/libavformat/riscv/cpu_common.c b/libavformat/riscv/cpu_common.c new file mode 100644 index 0000000000..17c9b392c9 --- /dev/null +++ b/libavformat/riscv/cpu_common.c @@ -0,0 +1 @@ +#include "libavutil/riscv/cpu_common.c" diff --git a/libavutil/riscv/Makefile b/libavutil/riscv/Makefile index 7e9a51194b..5db4c432d9 100644 --- a/libavutil/riscv/Makefile +++ b/libavutil/riscv/Makefile @@ -1,7 +1,8 @@ OBJS += riscv/float_dsp_init.o \ riscv/fixed_dsp_init.o \ riscv/lls_init.o \ - riscv/cpu.o + riscv/cpu.o \ + riscv/cpu_common.o RVV-OBJS += riscv/float_dsp_rvv.o \ riscv/fixed_dsp_rvv.o \ riscv/lls_rvv.o diff --git a/libavutil/riscv/cpu.h b/libavutil/riscv/cpu.h index af1440f626..bb8e08aa14 100644 --- a/libavutil/riscv/cpu.h +++ b/libavutil/riscv/cpu.h @@ -24,8 +24,22 @@ #include "config.h" #include #include +#include "libavutil/attributes_internal.h" #include "libavutil/cpu.h" +#ifndef __riscv_zbb +extern attribute_visibility_hidden bool ff_rv_zbb_supported; +#endif + +static inline av_const bool ff_rv_zbb_support(void) +{ +#ifndef __riscv_zbb + return ff_rv_zbb_supported; +#else + return true; +#endif +} + #if HAVE_RVV /** * Returns the vector size in bytes (always a power of two and at least 4). diff --git a/libavutil/riscv/cpu_common.c b/libavutil/riscv/cpu_common.c new file mode 100644 index 0000000000..3ecf95809b --- /dev/null +++ b/libavutil/riscv/cpu_common.c @@ -0,0 +1,33 @@ +/* + * Copyright © 2024 Rémi Denis-Courmont. + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/cpu.h" + +#ifndef __riscv_zbb +unsigned char ff_rv_zbb_supported = 0; + +#ifdef __ELF__ +__attribute__((constructor)) +static void probe_zbb(void) +{ + ff_rv_zbb_supported = (av_get_cpu_flags() & AV_CPU_FLAG_RVB_BASIC) != 0; +} +#endif +#endif diff --git a/libswscale/riscv/Makefile b/libswscale/riscv/Makefile index 48afaf62aa..ea324bdc5f 100644 --- a/libswscale/riscv/Makefile +++ b/libswscale/riscv/Makefile @@ -1,3 +1,5 @@ OBJS += riscv/rgb2rgb.o RV-OBJS += riscv/rgb2rgb_rvb.o RVV-OBJS += riscv/rgb2rgb_rvv.o + +SHLIBOBJS += riscv/cpu_common.o diff --git a/libswscale/riscv/cpu_common.c b/libswscale/riscv/cpu_common.c new file mode 100644 index 0000000000..17c9b392c9 --- /dev/null +++ b/libswscale/riscv/cpu_common.c @@ -0,0 +1 @@ +#include "libavutil/riscv/cpu_common.c" diff --git a/tests/ref/fate/source b/tests/ref/fate/source index a3beb35093..0abeff8036 100644 --- a/tests/ref/fate/source +++ b/tests/ref/fate/source @@ -3,17 +3,22 @@ libavcodec/file_open.c libavcodec/interplayacm.c libavcodec/log2_tab.c libavcodec/reverse.c +libavcodec/riscv/cpu_common.c libavdevice/file_open.c libavdevice/reverse.c +libavdevice/riscv/cpu_common.c libavfilter/file_open.c libavfilter/log2_tab.c +libavfilter/riscv/cpu_common.c libavformat/bitstream.c libavformat/file_open.c libavformat/golomb_tab.c libavformat/log2_tab.c libavformat/rangecoder_dec.c +libavformat/riscv/cpu_common.c libswresample/log2_tab.c libswscale/log2_tab.c +libswscale/riscv/cpu_common.c tools/uncoded_frame.c tools/yuvcmp.c Headers without standard inclusion guards: From patchwork Sat Jun 8 11:37:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 49702 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:c209:0:b0:460:55fa:d5ed with SMTP id d9csp1537448vqo; Sat, 8 Jun 2024 04:37:56 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXH8CR9EK8rB+bgccVKRr2RgwFWNOzt2rGTOfP/w679T8+2x2k7pLU0YwkkZcdCDKRpP4ycFotnqdrQJ0i5QIcOCfcpQN3bJC1z0Q== X-Google-Smtp-Source: AGHT+IG2egjF1Umw9VzV6yzw2QLhB3a2FcQm+a1kJTOxvimHqIL2sJWsTFEsfwVPut/Mn3gJEJWN X-Received: by 2002:a50:ccc8:0:b0:57c:7413:a6e0 with SMTP id 4fb4d7f45d1cf-57c7413a7f4mr61515a12.2.1717846676600; Sat, 08 Jun 2024 04:37:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717846676; cv=none; d=google.com; s=arc-20160816; b=QYcvGJShsIJtzAv8UlU6LF6MWTqDSgn9bySELbOJxOTSCrbHYqwTk4oDFpDd4bsv9E YRIn+uXpEtLuN+f+28eoNKgT0kXUvY+PQqqiSFtSmczu5TO+no5FHK+GEXBzPcjF4+zt JGgriga+kHOL6PND5MnDJFP2pT2fzYBHvTEf7TJEwp4RrOsw5JsadFXIVEIfCTsWMxSK +K2D9Hqj3JQUJrq3Zmo0FQE+yekPDJbgPWpJ2I6cf86LZboI6hgsOrR4YJ09PqZtiP8j xOakUDZlHG42MTcxTuZ0NG4kKz3ABfti7cjXc5QLCLHcA5UArp/JTIpULaohZOLw0/p+ EbGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=i+k+Jd7B/l1SbbE36U0p/ZM+C91PLEMzULssfOxvE/8=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=wy6qiv0kDmutzGKLaBP7afifU0uyaqHsrK8mEVq61pl9cglFPRw7lzeizidWvU3Js+ dEJpGvDucq5Y0GadK23LWMSkDQiG3q04GX6BJEmvgnJr/NTe0c0/MfishT0yomb4Lh4v wANftjh5zjtFP7QPBuBFFMFaaFn8OS3dxBKFxmAj4kMer9Q7rhvHSCW2WZN+WbDUCuVM lokm1u6PgTAhxiCM/K3fNcetJ5eebalJ0hOHROHRcb4QKJKUFUMQ/BMl5F1mg80AaRtl 4ldcLnyyt7Ti54iwHUS+twG5FOg2v8NtI/W7opnBDXCpzxpXMlBdKC7YgEgBwrPXsRaJ 3Kwg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-57c683406f8si838847a12.383.2024.06.08.04.37.56; Sat, 08 Jun 2024 04:37:56 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 607AD68D6DF; Sat, 8 Jun 2024 14:37:28 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 487F468D6A1 for ; Sat, 8 Jun 2024 14:37:18 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 6F272C006C for ; Sat, 8 Jun 2024 14:37:17 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Sat, 8 Jun 2024 14:37:14 +0300 Message-ID: <20240608113717.1677043-2-remi@remlab.net> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240608113717.1677043-1-remi@remlab.net> References: <20240608113717.1677043-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/4] lavu/riscv: use Zbb REV8 at run-time X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: tXBAsGuzJWxQ This adds runtime support to use Zbb REV8 for 32- and 64-bit byte-wise swaps. The result is about five times slower than if targetting Zbb statically, but still a lot faster than the default bespoke C code or a call to GCC run-time functions. For 16-bit swap, this is however unsurprisingly a lot worse, and so this sticks to the baseline. In fact, even using REV8 statically does not seem to be beneficial in that case. Zbb static Zbb dynamic I baseline bswap16: 0.668184765 3.340764069 0.668029012 bswap32: 0.668174014 3.340763319 9.353855435 bswap64: 0.668221765 3.340496313 14.698672283 (seconds for 1 billion iterations on a SiFive-U74 core) --- libavutil/riscv/bswap.h | 44 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 42 insertions(+), 2 deletions(-) diff --git a/libavutil/riscv/bswap.h b/libavutil/riscv/bswap.h index ce75de974e..886893e241 100644 --- a/libavutil/riscv/bswap.h +++ b/libavutil/riscv/bswap.h @@ -22,11 +22,51 @@ #include #include "config.h" #include "libavutil/attributes.h" +#include "libavutil/riscv/cpu.h" #if defined (__GNUC__) || defined (__clang__) #define av_bswap16 __builtin_bswap16 -#define av_bswap32 __builtin_bswap32 -#define av_bswap64 __builtin_bswap64 + +static av_always_inline av_const uint32_t av_bswap32_rv(uint32_t x) +{ +#if HAVE_RV && !defined(__riscv_zbb) + if (!__builtin_constant_p(x) && + __builtin_expect(ff_rv_zbb_support(), 1)) { + uintptr_t y; + + __asm__ ( + ".option push\n" + ".option arch, +zbb\n" + "rev8 %0, %1\n" + ".option pop" : "=r" (y) : "r" (x)); + return y >> (__riscv_xlen - 32); + } +#endif + return __builtin_bswap32(x); +} +#define av_bswap32 av_bswap32_rv + +#if __riscv_xlen >= 64 +static av_always_inline av_const uint64_t av_bswap64_rv(uint64_t x) +{ +#if HAVE_RV && !defined(__riscv_zbb) + if (!__builtin_constant_p(x) && + __builtin_expect(ff_rv_zbb_support(), 1)) { + uintptr_t y; + + __asm__ ( + ".option push\n" + ".option arch, +zbb\n" + "rev8 %0, %1\n" + ".option pop" : "=r" (y) : "r" (x)); + return y >> (__riscv_xlen - 64); + } +#endif + return __builtin_bswap64(x); +} +#define av_bswap64 av_bswap64_rv +#endif + #endif #endif /* AVUTIL_RISCV_BSWAP_H */ From patchwork Sat Jun 8 11:37:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 49699 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:c209:0:b0:460:55fa:d5ed with SMTP id d9csp1537387vqo; Sat, 8 Jun 2024 04:37:48 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVmk/Vn0JvSOVfxrvmY66JxnvBVDzyYG8HnryHwLC/OW8FdrVxY5B1Fve6TUirKBTP8u+q28NLRndOLph6KkxKBRlspvJw6dVB80g== X-Google-Smtp-Source: AGHT+IF1TnVvccD9XZesGx58e1To9U6q5adeQv8Ka1PDcTIKuyLQQwHnCgLMnIYySiFEf0KWFkEG X-Received: by 2002:a50:d613:0:b0:57c:61b3:bdc1 with SMTP id 4fb4d7f45d1cf-57c61b3be4emr1755834a12.3.1717846667870; Sat, 08 Jun 2024 04:37:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717846667; cv=none; d=google.com; s=arc-20160816; b=0gGJzck+xmIfuo0pwT9WKl4PR08MAcYf6x3TfTSCGSWDTykcm9PYP461WrFPxNdTkJ cagQU/5Sea6BzOEbqYg1r9vDZQwwsYxLqLiwslbzNPsAU85WdKfiL9YV3SLDJ5I9SRk+ zF3uvcve9JhO/9cARtxi4mBK7IeLtS8jwJh6jEwqMnrJgOrnW7T/GSEaUjWmbVr9WJ9Q 9BVBiQrOTbTEwX7m17qEvZ0RpssjsP06Nqz/agPsoSrb6Ta/L1JMFyjtlCgc54Nt3vI/ /H0Nt/mD1uKEKwFYBXG12k2tAoRY9d9wbpkFpYEdp26zwunTt+cadK+WKiXKJfOVEV3R cQHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=ZDikNYpohx+EIdNu9HfSNoofHELdD82Yy1jWTQtNii4=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=rhPUsZJMr4+AQHzuIqjikSZQ4aFOAh3Chgc7FSExPFw+spjmOwom8Ma17vQSbP3bTW 4d5zKCR2c7phQlIsfEo4h/6LSPJi1017/M/rcz0Cu96wWxK9KVJNQlQCB8ymJsI4kgN/ +KwGqgeiDkBWQw6aGfJBATEZT5LgneExzbyTGHbi3Q0KNaiJonoYiB+15L0Pcic8KckO Lk72N72Xcwvoh3k8b4zYrxKhMd1FfOltjPGDBnpgOQxQ1ZseWg879xPyQO7ZpG/xpG5v 7mu075Pexk4S1414DuMzmsgtZ58bqT7YZKeak/trdCFW0HeUl4Qf0KzXKOKhPY1inw5Q EE0Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-57c68faf76asi841392a12.615.2024.06.08.04.37.47; Sat, 08 Jun 2024 04:37:47 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2430968D6C5; Sat, 8 Jun 2024 14:37:27 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 40D7D68D69C for ; Sat, 8 Jun 2024 14:37:18 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 9E032C02F8 for ; Sat, 8 Jun 2024 14:37:17 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Sat, 8 Jun 2024 14:37:15 +0300 Message-ID: <20240608113717.1677043-3-remi@remlab.net> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240608113717.1677043-1-remi@remlab.net> References: <20240608113717.1677043-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/4] lavu/riscv: use Zbb CPOP/CPOPW at run-time X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: K9+B/m6DGKBZ Zbb static Zbb dynamic I baseline popcount 1.336129286 3.469067758 20.146362909 popcountl 1.336322291 3.340292968 20.224829821 (seconds for 1 billion iterations on a SiFive-U74 core) --- libavutil/riscv/intmath.h | 73 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 69 insertions(+), 4 deletions(-) diff --git a/libavutil/riscv/intmath.h b/libavutil/riscv/intmath.h index ae9ee7775b..1f0afbc81d 100644 --- a/libavutil/riscv/intmath.h +++ b/libavutil/riscv/intmath.h @@ -1,4 +1,6 @@ /* + * Copyright © 2022-2024 Rémi Denis-Courmont. + * * This file is part of FFmpeg. * * FFmpeg is free software; you can redistribute it and/or @@ -23,6 +25,7 @@ #include "config.h" #include "libavutil/attributes.h" +#include "libavutil/riscv/cpu.h" /* * The compiler is forced to sign-extend the result anyhow, so it is faster to @@ -70,12 +73,74 @@ static av_always_inline av_const int av_clip_intp2_rvi(int a, int p) } #if defined (__GNUC__) || defined (__clang__) -#define av_popcount __builtin_popcount -#if (__riscv_xlen >= 64) -#define av_popcount64 __builtin_popcountl +static inline av_const int av_popcount_rv(unsigned int x) +{ +#if HAVE_RV && !defined(__riscv_zbb) + if (!__builtin_constant_p(x) && + __builtin_expect(ff_rv_zbb_support(), true)) { + int y; + + __asm__ ( + ".option push\n" + ".option arch, +zbb\n" +#if __riscv_xlen >= 64 + "cpopw %0, %1\n" #else -#define av_popcount64 __builtin_popcountll + "cpop %0, %1\n" +#endif + ".option pop" : "=r" (y) : "r" (x)); + if (y > 32) + __builtin_unreachable(); + return y; + } +#endif + return __builtin_popcount(x); +} +#define av_popcount av_popcount_rv + +static inline av_const int av_popcount64_rv(uint64_t x) +{ +#if HAVE_RV && !defined(__riscv_zbb) && __riscv_xlen >= 64 + if (!__builtin_constant_p(x) && + __builtin_expect(ff_rv_zbb_support(), true)) { + int y; + + __asm__ ( + ".option push\n" + ".option arch, +zbb\n" + "cpop %0, %1\n" + ".option pop" : "=r" (y) : "r" (x)); + if (y > 64) + __builtin_unreachable(); + return y; + } #endif + return __builtin_popcountl(x); +} +#define av_popcount64 av_popcount64_rv + +static inline av_const int av_parity_rv(unsigned int x) +{ +#if HAVE_RV && !defined(__riscv_zbb) + if (!__builtin_constant_p(x) && + __builtin_expect(ff_rv_zbb_support(), true)) { + int y; + + __asm__ ( + ".option push\n" + ".option arch, +zbb\n" +#if __riscv_xlen >= 64 + "cpopw %0, %1\n" +#else + "cpop %0, %1\n" +#endif + ".option pop" : "=r" (y) : "r" (x)); + return y & 1; + } +#endif + return __builtin_parity(x); +} +#define av_parity av_parity_rv #endif #endif /* AVUTIL_RISCV_INTMATH_H */ From patchwork Sat Jun 8 11:37:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 49701 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:c209:0:b0:460:55fa:d5ed with SMTP id d9csp1537350vqo; Sat, 8 Jun 2024 04:37:40 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCX9mHAUjTNyJV0PAfOdNA1LMD06nHRfP8ykPzX8mnskMHdxc8oPa0XEyoZTiwz7wpgS1FeyvRo+Tww6ql7ssTOXNQU1bbYPtYFUDQ== X-Google-Smtp-Source: AGHT+IGZBZcaTrxTBkrfuV1mQrpHpUgfaBtx+G1kxCsvFqCR1319/GHVCFuokv2nZ9TedAItz7Uv X-Received: by 2002:a17:906:f80c:b0:a62:2cae:c02 with SMTP id a640c23a62f3a-a6cdacfeda4mr278964566b.61.1717846660384; Sat, 08 Jun 2024 04:37:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717846660; cv=none; d=google.com; s=arc-20160816; b=0JW+czWWx7hBf1DnohlT0FW3mBGKExe5H7/MOBmUfku25Jdr992tX7PbVhTr+tZxym Z31zrjiMzpQHyvJ6vjFX7MQ69OaB4S7HNEBeDkCr8pAd7nNXFmL+HszDEvoPGHCPVUlW 1KaGhsC199OgYInr6PqaOrKivL8JxtPbV4G8ZekFDqJQbesV+L9krVuMQVJWzO6u3yHz wjLhk8AaeRYTVz7V1cEH6kBbS70gqTYCGMXv1av+QIlc7x4K1t0id7mAPPhkwh8+YmCf MRlaDVWFlQPgcw0JrgS/APc0pRI0lWmveUu41KKKZg6gvTbSTuQRlGUknNJRzAQcOseK Pczg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=4Rr9WkCIq+NcJukgiPH96BIPzQJ6DAJYf/Mwh8h1Zfo=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=SMjRTIJ4q/ptnYDwvyevgwd9SqXFMH4Lh3uAHBfrzHpo+Ow27eOxya4ow8A92v//yI Vj+PHyoqDybaT/31Kuksv2UJnKjKbk2ifjfeMdrcVj9ttsA5AoKhybSou9hyJlywIZVj f4ndBvlDgmULDsBEkL3cO+Z5Z3tKOIveYl6mAoQYDZ7GBD5PMgFUvbCSl0iLog77PCx/ arfOuApDKCqcJUuouKtYKyPU0p8Eu6hDtMrp1jqI2PifcqaGds36WaZHno6U9A7zcbCl yZwOhVt57XKUv6TTxoYrNvJP28VpNvyGrv3Pmi8MZDSub3pdyoMzFaFCeh1ITJJyumwq qkDw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a6c805cfc81si276406866b.263.2024.06.08.04.37.39; Sat, 08 Jun 2024 04:37:40 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BA8DD68D64B; Sat, 8 Jun 2024 14:37:25 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3956D68D64B for ; Sat, 8 Jun 2024 14:37:18 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id CE62FC02F9 for ; Sat, 8 Jun 2024 14:37:17 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Sat, 8 Jun 2024 14:37:16 +0300 Message-ID: <20240608113717.1677043-4-remi@remlab.net> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240608113717.1677043-1-remi@remlab.net> References: <20240608113717.1677043-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/4] lavu/riscv: use Zbb CLZ/CTZ/CLZW/CTZW at run-time X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: TczilfcgtMLN Zbb static Zbb dynamic I baseline clz 0.668032642 1.336072283 19.552376803 clzl 0.668092643 1.336181786 26.110855571 ctz 1.336208533 3.340209702 26.054869008 ctzl 1.336247784 3.340362457 26.055266290 (seconds for 1 billion iterations on a SiFive-U74 core) --- libavutil/riscv/intmath.h | 101 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 101 insertions(+) diff --git a/libavutil/riscv/intmath.h b/libavutil/riscv/intmath.h index 1f0afbc81d..3e7ab864c5 100644 --- a/libavutil/riscv/intmath.h +++ b/libavutil/riscv/intmath.h @@ -73,6 +73,107 @@ static av_always_inline av_const int av_clip_intp2_rvi(int a, int p) } #if defined (__GNUC__) || defined (__clang__) +static inline av_const int ff_ctz_rv(int x) +{ +#if HAVE_RV && !defined(__riscv_zbb) + if (!__builtin_constant_p(x) && + __builtin_expect(ff_rv_zbb_support(), true)) { + int y; + + __asm__ ( + ".option push\n" + ".option arch, +zbb\n" +#if __riscv_xlen >= 64 + "ctzw %0, %1\n" +#else + "ctz %0, %1\n" +#endif + ".option pop" : "=r" (y) : "r" (x)); + if (y > 32) + __builtin_unreachable(); + return y; + } +#endif + return __builtin_ctz(x); +} +#define ff_ctz ff_ctz_rv + +static inline av_const int ff_ctzll_rv(long long x) +{ +#if HAVE_RV && !defined(__riscv_zbb) && __riscv_xlen == 64 + if (!__builtin_constant_p(x) && + __builtin_expect(ff_rv_zbb_support(), true)) { + int y; + + __asm__ ( + ".option push\n" + ".option arch, +zbb\n" + "ctz %0, %1\n" + ".option pop" : "=r" (y) : "r" (x)); + if (y > 64) + __builtin_unreachable(); + return y; + } +#endif + return __builtin_ctzll(x); +} +#define ff_ctzll ff_ctzll_rv + +static inline av_const int ff_clz_rv(int x) +{ +#if HAVE_RV && !defined(__riscv_zbb) + if (!__builtin_constant_p(x) && + __builtin_expect(ff_rv_zbb_support(), true)) { + int y; + + __asm__ ( + ".option push\n" + ".option arch, +zbb\n" +#if __riscv_xlen >= 64 + "clzw %0, %1\n" +#else + "clz %0, %1\n" +#endif + ".option pop" : "=r" (y) : "r" (x)); + if (y > 32) + __builtin_unreachable(); + return y; + } +#endif + return __builtin_clz(x); +} +#define ff_clz ff_clz_rv + +#if __riscv_xlen == 64 +static inline av_const int ff_clzll_rv(long long x) +{ +#if HAVE_RV && !defined(__riscv_zbb) + if (!__builtin_constant_p(x) && + __builtin_expect(ff_rv_zbb_support(), true)) { + int y; + + __asm__ ( + ".option push\n" + ".option arch, +zbb\n" + "clz %0, %1\n" + ".option pop" : "=r" (y) : "r" (x)); + if (y > 64) + __builtin_unreachable(); + return y; + } +#endif + return __builtin_clzll(x); +} +#define ff_clz ff_clz_rv +#endif + +static inline av_const int ff_log2_rv(unsigned int x) +{ + return 31 - ff_clz_rv(x | 1); +} +#define ff_log2 ff_log2_rv +#define ff_log2_16bit ff_log2_rv + static inline av_const int av_popcount_rv(unsigned int x) { #if HAVE_RV && !defined(__riscv_zbb)