From patchwork Sun Oct 2 11:54:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 38514 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp744910pzh; Sun, 2 Oct 2022 04:55:11 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6b3OHROcozFmTL0uyw2Q7Qa0sSF6sYNVEgVz5BCaqyxO82EPFP8A8G9/rv1buMWnogADGg X-Received: by 2002:a17:907:743:b0:740:ef93:2ffc with SMTP id xc3-20020a170907074300b00740ef932ffcmr12353157ejb.514.1664711710850; Sun, 02 Oct 2022 04:55:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664711710; cv=none; d=google.com; s=arc-20160816; b=BX1eMJluCdoW19MClxC6p69DlE0q+5hajOXmxPXPHARGsdMkdrOSykpqjVC8etfz/V MovwNPk9oUP4Br2L1dNxU7iciGRdlAaY5YE9btaVf8xPO3NvA+9/73NJBIYHNKvZPRau QFgymPwEFXgCxCCtC85WGNYVDxZeJf7WRKXEcTStswWaiso7rJFEonOpFFcVT/GLkMCV 4UuP2r2FHwIW1FI830+9QYiTwgmKxGYf3mL1qvXkC6bm6g/KbKZkKnh3Ri0+qob0Egw2 E4RbzBpoU08iQUeugQgn0eys3DfXUibsnMwqQ7uyEgmEODR0DDZGQR6o26h33OV7vNU3 wdIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=PpvB/ZiYQuLe6wICYs+xuYrgr/qT3vNbHDrnII3OBvI=; b=b4Crksp+jC+WgRymyr02LS4XfsxiKB9ud01MekkFjBkbbv2Efr0zWgaoGNnvFMhIgJ V3nW79L+U/QypNUDmLfldF3DjWauiCOpkblEbFmFa825G4p0AdgwYiMQ2NQC5B0x68RD 6wMR6Ij//SXy8F297jD7Ws2LZIOkq2FjjNMOYuGXpLXf1z2VBblBlkdKjX2biYwXzwi+ lMbDOMFfVkfr+fFxRYV7WebwAr0EormoE2orX0bwISMoWaQQ5iEalyfppZgWOQJ/Wgnh ryPfj9XXQamY06BQ+v00k10C39fW2j5aLDTwbQ9hvbh8WCrU989QALjQ5iWzAqqC2azM Fm9A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id d7-20020a170906174700b0077fd5b45e18si4391280eje.929.2022.10.02.04.55.10; Sun, 02 Oct 2022 04:55:10 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 504F868BB5D; Sun, 2 Oct 2022 14:55:08 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0713F68BAFE for ; Sun, 2 Oct 2022 14:55:02 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id B4B81C006F for ; Sun, 2 Oct 2022 14:55:01 +0300 (EEST) From: remi@remlab.net To: ffmpeg-devel@ffmpeg.org Date: Sun, 2 Oct 2022 14:54:58 +0300 Message-Id: <20221002115501.17996-1-remi@remlab.net> X-Mailer: git-send-email 2.37.2 In-Reply-To: <2650188.mvXUDI8C0e@basile.remlab.net> References: <2650188.mvXUDI8C0e@basile.remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/4] lavu/riscv: CPU flag for the Zbb extension X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 4hzcz6Oja7p/ From: Rémi Denis-Courmont Unfortunately, it is common, and will remain so, that the Bit manipulations are not enabled at compilation time. This is an official policy for Debian ports in general (though they do not support RISC-V officially as of yet) to stick to the minimal target baseline, which does not include the B extension or even its Zbb subset. For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems impractical. But at least it can work for the byte-swap DSP functions. --- libavutil/cpu.c | 1 + libavutil/cpu.h | 1 + libavutil/riscv/cpu.c | 6 ++++++ tests/checkasm/checkasm.c | 1 + 4 files changed, 9 insertions(+) diff --git a/libavutil/cpu.c b/libavutil/cpu.c index 5818fd9c1c..2c5f7f4958 100644 --- a/libavutil/cpu.c +++ b/libavutil/cpu.c @@ -188,6 +188,7 @@ int av_parse_cpu_caps(unsigned *flags, const char *s) { "rvv-f32", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F32 }, .unit = "flags" }, { "rvv-i64", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_I64 }, .unit = "flags" }, { "rvv", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F64 }, .unit = "flags" }, + { "rvb-basic",NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVB_BASIC }, .unit = "flags" }, #endif { NULL }, }; diff --git a/libavutil/cpu.h b/libavutil/cpu.h index 18f42af015..8fa5ea9199 100644 --- a/libavutil/cpu.h +++ b/libavutil/cpu.h @@ -86,6 +86,7 @@ #define AV_CPU_FLAG_RVV_F32 (1 << 4) ///< Vectors of float's */ #define AV_CPU_FLAG_RVV_I64 (1 << 5) ///< Vectors of 64-bit int's */ #define AV_CPU_FLAG_RVV_F64 (1 << 6) ///< Vectors of double's +#define AV_CPU_FLAG_RVB_BASIC (1 << 7) ///< Basic bit-manipulations /** * Return the flags which specify extensions supported by the CPU. diff --git a/libavutil/riscv/cpu.c b/libavutil/riscv/cpu.c index e234201395..a9263dbb78 100644 --- a/libavutil/riscv/cpu.c +++ b/libavutil/riscv/cpu.c @@ -40,6 +40,8 @@ int ff_get_cpu_flags_riscv(void) ret |= AV_CPU_FLAG_RVF; if (hwcap & HWCAP_RV('D')) ret |= AV_CPU_FLAG_RVD; + if (hwcap & HWCAP_RV('B')) + ret |= AV_CPU_FLAG_RVB_BASIC; /* The V extension implies all Zve* functional subsets */ if (hwcap & HWCAP_RV('V')) @@ -57,6 +59,10 @@ int ff_get_cpu_flags_riscv(void) #endif #endif +#ifdef __riscv_zbb + ret |= AV_CPU_FLAG_RVB_BASIC; +#endif + /* If RV-V is enabled statically at compile-time, check the details. */ #ifdef __riscv_vectors ret |= AV_CPU_FLAG_RVV_I32; diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 90dd7e4634..421bd096c5 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -240,6 +240,7 @@ static const struct { { "RVVf32", "rvv_f32", AV_CPU_FLAG_RVV_F32 }, { "RVVi64", "rvv_i64", AV_CPU_FLAG_RVV_I64 }, { "RVVf64", "rvv_f64", AV_CPU_FLAG_RVV_F64 }, + { "RVBbasic", "rvb_b", AV_CPU_FLAG_RVB_BASIC }, #elif ARCH_MIPS { "MMI", "mmi", AV_CPU_FLAG_MMI }, { "MSA", "msa", AV_CPU_FLAG_MSA }, From patchwork Sun Oct 2 11:54:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 38515 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp744959pzh; Sun, 2 Oct 2022 04:55:18 -0700 (PDT) X-Google-Smtp-Source: AMsMyM43FxlmLsq2xmrvdOCW2nzIliztRp+v7rdCOEempK/0JKr1WNaixvxTjv0UQm3QN3G9x7Dp X-Received: by 2002:a17:907:960e:b0:782:68e3:620f with SMTP id gb14-20020a170907960e00b0078268e3620fmr11853094ejc.663.1664711718503; Sun, 02 Oct 2022 04:55:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664711718; cv=none; d=google.com; s=arc-20160816; b=qtPwuf23aI9BrDuslsq3O+cIlG8jK1esu0j90gBMPDPINK7TBSqdMeHzaAB2rR4FUA K0N06L8Q5D/cUTyVpxf/KpUQq8j1iQqLC0Z21gOhIGsNFAhp/5Njdf6C84we4K0HD/UX Uzd6Jvk5fcQLBpPaPmmxT0d6j7BzsYKCe7oWrcI9jJBlfJcGkdZvMVvsJU/Fqe1qQ9G8 5VLbIlGUGHLrGL/dHkrdz8jRxYSsurSNYKNeHDHMw//vG98d2gNY+brA7xrCBfVT6yCP D/Kba9T3L0x/uWdYEW2R16HM4IV3YMPFRQ1H0VVg9BbNf8RVP58BRerTXKCiV+IVZeKt qtyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=9ERhvO4smgUVW0lXRihyfemYiL4tqhh+3T9cK7JVZ3Q=; b=CG33Pzxi3P94BDRH7RYxRb7jse4c/K/VPoF0LOMsWjMp6c18wNIQ3Fh31DAu4y0yqG ZSFKJ0Zh9kYeE7PTrmVUrov1SZ/6iLSuOuNxESDsHuRoidNyXXQoneLRuLGbne33PkoW mBSWkg2zNom1YxQgdg6paD9YbzqZWZp69yl8zigOirHJZ3jPMKhs2dUr0U834vB7R2md /mZdclCfajq7kiW+JrYudbL8UPk0O7GuDXB3kA8Y+fMJAYB8zRlY8qjnwVmHUj5ybHRc N2MrOd8qWGDawfWcEFwPyd3a5DaooDJCwifJgHRoQxgmyaD1OjXu37lvCbqgv8wOaRdd xBKg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id f12-20020a50fc8c000000b0045889fa1b4csi4633824edq.484.2022.10.02.04.55.18; Sun, 02 Oct 2022 04:55:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 58BCE68BB4C; Sun, 2 Oct 2022 14:55:09 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5B61868BAFE for ; Sun, 2 Oct 2022 14:55:02 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id E2F6CC0070 for ; Sun, 2 Oct 2022 14:55:01 +0300 (EEST) From: remi@remlab.net To: ffmpeg-devel@ffmpeg.org Date: Sun, 2 Oct 2022 14:54:59 +0300 Message-Id: <20221002115501.17996-2-remi@remlab.net> X-Mailer: git-send-email 2.37.2 In-Reply-To: <2650188.mvXUDI8C0e@basile.remlab.net> References: <2650188.mvXUDI8C0e@basile.remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/4] lavc/bswapdsp: RISC-V B bswap_buf X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: kvVOab3UfKSf From: Rémi Denis-Courmont Simply taking the Zbb REV8 instruction into use in a simple loop gives some significant savings: bswap_buf_c: 1081.0 bswap_buf_rvb_b: 771.0 But we can also use the 64-bit REV8 as a pseudo-SIMD instruction with just one additional shift, and one fewer load, effectively doubling the bandwidth. Consequently, this patch is useful even if the compile-time target has Zbb enabled for C code: bswap_buf_c: 1081.0 bswap_buf_rvb_b: 341.0 (this patch) On the other hand, this approach fails miserably for bswap16_buf as the ratio of shifts and stores becomes unfavorable compared to naïve C: bswap16_buf_c: 1542.0 bswap16_buf_rvb_b: 1803.7 Unrolling to process 128 bits (4 samples) at a time actually worsens performance ever so slightly: bswap_buf_c: 1081.0 bswap_buf_rvb_b: 408.5 --- libavcodec/bswapdsp.c | 4 +- libavcodec/bswapdsp.h | 1 + libavcodec/riscv/Makefile | 2 + libavcodec/riscv/bswapdsp_init.c | 38 ++++++++++++++++++ libavcodec/riscv/bswapdsp_rvb.S | 68 ++++++++++++++++++++++++++++++++ 5 files changed, 112 insertions(+), 1 deletion(-) create mode 100644 libavcodec/riscv/bswapdsp_init.c create mode 100644 libavcodec/riscv/bswapdsp_rvb.S diff --git a/libavcodec/bswapdsp.c b/libavcodec/bswapdsp.c index 4c4ea10acc..f0ea2b55c5 100644 --- a/libavcodec/bswapdsp.c +++ b/libavcodec/bswapdsp.c @@ -51,7 +51,9 @@ av_cold void ff_bswapdsp_init(BswapDSPContext *c) c->bswap_buf = bswap_buf; c->bswap16_buf = bswap16_buf; -#if ARCH_X86 +#if ARCH_RISCV + ff_bswapdsp_init_riscv(c); +#elif ARCH_X86 ff_bswapdsp_init_x86(c); #endif } diff --git a/libavcodec/bswapdsp.h b/libavcodec/bswapdsp.h index 4d19092254..6f4db66115 100644 --- a/libavcodec/bswapdsp.h +++ b/libavcodec/bswapdsp.h @@ -27,6 +27,7 @@ typedef struct BswapDSPContext { } BswapDSPContext; void ff_bswapdsp_init(BswapDSPContext *c); +void ff_bswapdsp_init_riscv(BswapDSPContext *c); void ff_bswapdsp_init_x86(BswapDSPContext *c); #endif /* AVCODEC_BSWAPDSP_H */ diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 0fb2c81c75..db4384bca7 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -3,6 +3,8 @@ RVV-OBJS-$(CONFIG_AAC_DECODER) += riscv/aacpsdsp_rvv.o OBJS-$(CONFIG_AUDIODSP) += riscv/audiodsp_init.o \ riscv/audiodsp_rvf.o RVV-OBJS-$(CONFIG_AUDIODSP) += riscv/audiodsp_rvv.o +OBJS-$(CONFIG_BSWAPDSP) += riscv/bswapdsp_init.o \ + riscv/bswapdsp_rvb.o OBJS-$(CONFIG_FMTCONVERT) += riscv/fmtconvert_init.o RVV-OBJS-$(CONFIG_FMTCONVERT) += riscv/fmtconvert_rvv.o OBJS-$(CONFIG_IDCTDSP) += riscv/idctdsp_init.o diff --git a/libavcodec/riscv/bswapdsp_init.c b/libavcodec/riscv/bswapdsp_init.c new file mode 100644 index 0000000000..701dbeaaa6 --- /dev/null +++ b/libavcodec/riscv/bswapdsp_init.c @@ -0,0 +1,38 @@ +/* + * Copyright © 2022 Rémi Denis-Courmont. + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include + +#include "config.h" +#include "libavutil/attributes.h" +#include "libavutil/cpu.h" +#include "libavcodec/bswapdsp.h" + +void ff_bswap32_buf_rvb(uint32_t *dst, const uint32_t *src, int len); + +av_cold void ff_bswapdsp_init_riscv(BswapDSPContext *c) +{ +#if (__riscv_xlen >= 64) + int cpu_flags = av_get_cpu_flags(); + + if (cpu_flags & AV_CPU_FLAG_RVB_BASIC) + c->bswap_buf = ff_bswap32_buf_rvb; +#endif +} diff --git a/libavcodec/riscv/bswapdsp_rvb.S b/libavcodec/riscv/bswapdsp_rvb.S new file mode 100644 index 0000000000..91b47bf82d --- /dev/null +++ b/libavcodec/riscv/bswapdsp_rvb.S @@ -0,0 +1,68 @@ +/* + * Copyright © 2022 Rémi Denis-Courmont. + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "config.h" +#include "libavutil/riscv/asm.S" + +#if (__riscv_xlen >= 64) +func ff_bswap32_buf_rvb, zbb + andi t0, a1, 4 + beqz t0, 1f + /* Align a1 (input) to 64-bit */ + lwu t0, (a1) + addi a0, a0, 4 + rev8 t0, t0 + addi a2, a2, -1 + srli t0, t0, __riscv_xlen - 32 + addi a1, a1, 4 + sw t0, -4(a0) +1: + andi a3, a2, -2 + sh2add a2, a2, a0 + beqz a3, 3f + sh2add a3, a3, a0 +2: /* 2 elements (64 bits) at a time on a 64-bit boundary */ + ld t0, (a1) + addi a0, a0, 8 + rev8 t0, t0 +#if (__riscv_xlen == 64) + srli t2, t0, 32 + sw t0, -4(a0) +#else + srli t1, t0, __riscv_xlen - 64 + srli t2, t0, __riscv_xlen - 32 + sw t1, -4(a0) +#endif + addi a1, a1, 8 + sw t2, -8(a0) + bne a0, a3, 2b +3: + beq a0, a2, 5f +4: /* Process last element */ + lwu t0, (a1) + addi a0, a0, 4 + rev8 t0, t0 + addi a1, a1, 4 + srli t0, t0, __riscv_xlen - 32 + sw t0, -4(a0) +5: + ret +endfunc +#endif From patchwork Sun Oct 2 11:55:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 38516 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp745001pzh; Sun, 2 Oct 2022 04:55:27 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4+W26o7CZIUdDPTadh6w9Hx8cLTCS3RnJ+5+h/jmJPIOszjLGx3qRau6CdHTOtnapCWdBn X-Received: by 2002:a05:6402:2547:b0:450:668c:9d93 with SMTP id l7-20020a056402254700b00450668c9d93mr14976484edb.92.1664711727284; Sun, 02 Oct 2022 04:55:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664711727; cv=none; d=google.com; s=arc-20160816; b=Pv6fCHCdt5zca71kgax4wFVKe3eDxDkNiBvlQ/nzuzgdxh8PHtV3pEfY4VLoO5AN/H 31qF/1NF1L4e78E8QbE/wY1RAlDVQKFbwzOmQSYAlTrryKLfS1bn5/Itbnh6DRMwBfgo oy73Ahpx9OariqaoFGmgr+HgDKo3rzNXtU6TJ1diLPGJfzRsgo1H9zwokSr9kK0SrcRu gDCM25WbN2/2xeal1kVODCvpgPvWp0ChuY2q9bGG6vUyNhuLS30InhFere6mFHk/ni3I I+stbuLzS+KsyLvCOGpDHZ5/ElzCbR+1Lt5zg4eiIuN+aRJteSBu53x7jbe3gc5r8LIm QagA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=r8nEETVqsJsTIy8UudPrrBfgiqHTzhtYVR79+BHLB70=; b=MpW4DEm98Uo7/rHqrIHLIttZv/LhIfBc0n/v5orB/V9g7+izRzgYKIP/EqHdMFkdFN RK9dePaUX4feLekAHnWJrya/flbgPEDNYH5pHBjl2OtoAEcm6cuSQipnrkoretOcZYYw DBw5fncsoz/QAjOxlGsN4BiEtIWKHFLdmlk2l+U/Jfske81z+Vh+PAabnBcJnvRD553/ ITfHY5pdqW1bd12WDoW/Dp87wComzdjHh+JpteNW6gde/AbslGqx7QTtk+9r4xjbsnzK pSQvZ2kUfWUA7Bw13g3Q8szhXMRm0mErGMHtvaYBvlym6gDsq7HIMHVXNDxIBh0sxRTM esMg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id ec18-20020a170906b6d200b0077156c9124esi5677136ejb.1001.2022.10.02.04.55.26; Sun, 02 Oct 2022 04:55:27 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5A08068BB75; Sun, 2 Oct 2022 14:55:10 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 65CCE68BB41 for ; Sun, 2 Oct 2022 14:55:02 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 1C6EDC00AF for ; Sun, 2 Oct 2022 14:55:02 +0300 (EEST) From: remi@remlab.net To: ffmpeg-devel@ffmpeg.org Date: Sun, 2 Oct 2022 14:55:00 +0300 Message-Id: <20221002115501.17996-3-remi@remlab.net> X-Mailer: git-send-email 2.37.2 In-Reply-To: <2650188.mvXUDI8C0e@basile.remlab.net> References: <2650188.mvXUDI8C0e@basile.remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/4] lavc/bswapdsp: RISC-V V bswap_buf X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Uvj7m0T04Iuu From: Rémi Denis-Courmont --- libavcodec/riscv/Makefile | 1 + libavcodec/riscv/bswapdsp_init.c | 7 ++++- libavcodec/riscv/bswapdsp_rvv.S | 45 ++++++++++++++++++++++++++++++++ 3 files changed, 52 insertions(+), 1 deletion(-) create mode 100644 libavcodec/riscv/bswapdsp_rvv.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index db4384bca7..b94901ce8d 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -5,6 +5,7 @@ OBJS-$(CONFIG_AUDIODSP) += riscv/audiodsp_init.o \ RVV-OBJS-$(CONFIG_AUDIODSP) += riscv/audiodsp_rvv.o OBJS-$(CONFIG_BSWAPDSP) += riscv/bswapdsp_init.o \ riscv/bswapdsp_rvb.o +RVV-OBJS-$(CONFIG_BSWAPDSP) += riscv/bswapdsp_rvv.o OBJS-$(CONFIG_FMTCONVERT) += riscv/fmtconvert_init.o RVV-OBJS-$(CONFIG_FMTCONVERT) += riscv/fmtconvert_rvv.o OBJS-$(CONFIG_IDCTDSP) += riscv/idctdsp_init.o diff --git a/libavcodec/riscv/bswapdsp_init.c b/libavcodec/riscv/bswapdsp_init.c index 701dbeaaa6..c17b6b75bb 100644 --- a/libavcodec/riscv/bswapdsp_init.c +++ b/libavcodec/riscv/bswapdsp_init.c @@ -26,13 +26,18 @@ #include "libavcodec/bswapdsp.h" void ff_bswap32_buf_rvb(uint32_t *dst, const uint32_t *src, int len); +void ff_bswap32_buf_rvv(uint32_t *dst, const uint32_t *src, int len); av_cold void ff_bswapdsp_init_riscv(BswapDSPContext *c) { -#if (__riscv_xlen >= 64) int cpu_flags = av_get_cpu_flags(); +#if (__riscv_xlen >= 64) if (cpu_flags & AV_CPU_FLAG_RVB_BASIC) c->bswap_buf = ff_bswap32_buf_rvb; #endif +#if HAVE_RVV + if (cpu_flags & AV_CPU_FLAG_RVV_I32) + c->bswap_buf = ff_bswap32_buf_rvv; +#endif } diff --git a/libavcodec/riscv/bswapdsp_rvv.S b/libavcodec/riscv/bswapdsp_rvv.S new file mode 100644 index 0000000000..7ea747b3ce --- /dev/null +++ b/libavcodec/riscv/bswapdsp_rvv.S @@ -0,0 +1,45 @@ +/* + * Copyright © 2022 Rémi Denis-Courmont. + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "config.h" +#include "libavutil/riscv/asm.S" + +func ff_bswap32_buf_rvv, zve32x + li t4, 4 + addi t1, a0, 1 + addi t2, a0, 2 + addi t3, a0, 3 +1: + vsetvli t0, a2, e8, m1, ta, ma + vlseg4e8.v v8, (a1) + sub a2, a2, t0 + sh2add a1, t0, a1 + vsse8.v v8, (t3), t4 + sh2add t3, t0, t3 + vsse8.v v9, (t2), t4 + sh2add t2, t0, t2 + vsse8.v v10, (t1), t4 + sh2add t1, t0, t1 + vsse8.v v11, (a0), t4 + sh2add a0, t0, a0 + bnez a2, 1b + + ret +endfunc From patchwork Sun Oct 2 11:55:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 38517 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp745061pzh; Sun, 2 Oct 2022 04:55:35 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6HJG74AwkZfGRK5pWnrDWjLi6BM/EaHYWe0TVAw1LGfEudBgliiFTYrZYYlZ8l2zB2yeF8 X-Received: by 2002:a17:907:7621:b0:741:6656:bd14 with SMTP id jy1-20020a170907762100b007416656bd14mr11867941ejc.298.1664711735422; Sun, 02 Oct 2022 04:55:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664711735; cv=none; d=google.com; s=arc-20160816; b=ylDNyuoh9FMl72Y7z7Q2+RpECiWUAj2T31kqBYWXzHFeUWN0E+XF7LrmR3ifHQOk72 W8ObxDVj9PaQxbJa5zLIqxdEUXsaRCNqZ3Q3DlrSUPtQXcuspGkoWveN+pcr0mvIr2U3 Tge0IgwdlO9uhJYNchcbWz6KNL4W1Ka/wOGANJKP2d6hvxqhDT0SBoTjtF8sMUrWG2OB 5pJ8GmCYvRvLZD9IzAyN3g5F4HOAnXLqn5YwVCSqjWIFqMsz09d7ofYmWmZILCiDh+AO jJtXHorXGtBM+yHNvnHuEH7WAZGg4gzqWOvcSCsDIccqRwVnEA5IZH/SVTWv5uuifsTW O4kQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=HdrBH7MXlOMEJuUkEIL9fVqCvU3gpTeWUdg2elA+KtE=; b=A+nYtt/vPFn5D6FP8itZo3LlZ2zZGxQ+5MNpFr4nS8mdLP0n3wn79QmG0bPbrHxKpA P8wL5Tzx6VyQ+RI06Tldh0UaIvA86UeQXZGA8iBxVfqhJkinCEFrDJBlX6/lEtYzi9V8 4BDhRR+TPAEBEtq7RNemRYOyCMHZ/mK74JVegZ9PBqx1HVRA+ziSUR219SlaF5Mrgtf1 V/3jNJLPeHpMjp3R48JL1gNRCULOFRzW2hoMPvNkfhG/aBSzJz78cNC0tJZd9ss+Tx1y fwbAPExoYxemav2mxK7rJLJ/o96Z7lRkUgE2UeLcT4izkydxDqm6G/B6wKBvc6dSM/lZ Qi7w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id z12-20020a05640240cc00b0043a6e796231si6325171edb.544.2022.10.02.04.55.35; Sun, 02 Oct 2022 04:55:35 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5EADE68BB8D; Sun, 2 Oct 2022 14:55:11 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 93FAF68BB41 for ; Sun, 2 Oct 2022 14:55:02 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 49ECDC00B0 for ; Sun, 2 Oct 2022 14:55:02 +0300 (EEST) From: remi@remlab.net To: ffmpeg-devel@ffmpeg.org Date: Sun, 2 Oct 2022 14:55:01 +0300 Message-Id: <20221002115501.17996-4-remi@remlab.net> X-Mailer: git-send-email 2.37.2 In-Reply-To: <2650188.mvXUDI8C0e@basile.remlab.net> References: <2650188.mvXUDI8C0e@basile.remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/4] lavc/bswapdsp: RISC-V V bswap16_buf X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ijhDPcMTcvax From: Rémi Denis-Courmont --- libavcodec/riscv/bswapdsp_init.c | 5 ++++- libavcodec/riscv/bswapdsp_rvv.S | 17 +++++++++++++++++ 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/bswapdsp_init.c b/libavcodec/riscv/bswapdsp_init.c index c17b6b75bb..abe84ec1f7 100644 --- a/libavcodec/riscv/bswapdsp_init.c +++ b/libavcodec/riscv/bswapdsp_init.c @@ -27,6 +27,7 @@ void ff_bswap32_buf_rvb(uint32_t *dst, const uint32_t *src, int len); void ff_bswap32_buf_rvv(uint32_t *dst, const uint32_t *src, int len); +void ff_bswap16_buf_rvv(uint16_t *dst, const uint16_t *src, int len); av_cold void ff_bswapdsp_init_riscv(BswapDSPContext *c) { @@ -37,7 +38,9 @@ av_cold void ff_bswapdsp_init_riscv(BswapDSPContext *c) c->bswap_buf = ff_bswap32_buf_rvb; #endif #if HAVE_RVV - if (cpu_flags & AV_CPU_FLAG_RVV_I32) + if (cpu_flags & AV_CPU_FLAG_RVV_I32) { c->bswap_buf = ff_bswap32_buf_rvv; + c->bswap16_buf = ff_bswap16_buf_rvv; + } #endif } diff --git a/libavcodec/riscv/bswapdsp_rvv.S b/libavcodec/riscv/bswapdsp_rvv.S index 7ea747b3ce..ef2999c1be 100644 --- a/libavcodec/riscv/bswapdsp_rvv.S +++ b/libavcodec/riscv/bswapdsp_rvv.S @@ -43,3 +43,20 @@ func ff_bswap32_buf_rvv, zve32x ret endfunc + +func ff_bswap16_buf_rvv, zve32x + li t2, 2 + addi t1, a0, 1 +1: + vsetvli t0, a2, e8, m1, ta, ma + vlseg2e8.v v8, (a1) + sub a2, a2, t0 + sh1add a1, t0, a1 + vsse8.v v8, (t1), t2 + sh1add t1, t0, t1 + vsse8.v v9, (a0), t2 + sh1add a0, t0, a0 + bnez a2, 1b + + ret +endfunc