From patchwork Wed Oct 5 16:12:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 38565 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4d9:b0:9c:f4b:4e41 with SMTP id 25csp700802pzd; Wed, 5 Oct 2022 09:13:13 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4GKVXdLN7QfMdiyO2yTPnVPUbRDjCtuh8A+UGYJc06KIxt/LNRROkWgFmW4KG/M3JRtlej X-Received: by 2002:a17:907:a48:b0:77c:51b0:5aeb with SMTP id be8-20020a1709070a4800b0077c51b05aebmr280971ejc.61.1664986393388; Wed, 05 Oct 2022 09:13:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664986393; cv=none; d=google.com; s=arc-20160816; b=BTK4l18IJkMPOQG9r8dyueJZpZkrv8Cvg7zwNwe/+jaulMt0ThojRGWjDOuSlcHDax 15hZf3kKJC4jbADG3/YNDY9D/x+HTscQ6RSY+ok0ujZXx/XrV7thlgByPzZyuQMPbqaR 54ObxDHK4yveGpD1OYKMDJ5hKoUyJHXnOTWZHagO7SZ0ycJ1E9UbcAzLHxCDCmHpi59z 8uoRgss8vES8zgT8Q/jOpzRw7rJyKijrC2l/DS+wYnwMiiiIxuyUwHkdgDG8IS9KCDhJ TTRSPuOJn9w0KMRBapfaGtm/xaZbA9co97sbOTcLQugP9TUQ1m1RP0ZH9uWD8uvThCfe J6iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=VkDnBKdOgPAoKxZAQF9kBoC8ZfmCz7PhGx6izdKb5aQ=; b=DM8cgeI8gxxpFsGme7OHDp2OAjTBmZ9rMbtZM+iPbWBmiyKlOsUcMpFytg96+/3hh2 saXOoFGEEem/gXX+hDeMd/BqbSehenACK+8r/T/h5rpGKHxiTlQ5M7yDXJtjQlwK/68r GwfHwQdnDZ3NJLfLot7VVEMW1cUWwqbDKfZWyb6r09maZn3+MWgcdhXR8gY912E0Bxtr r8tYsTNC7T13pVfQCUpKJ8UJhffJbs1wrYFPKPUOxXbdFGnQkkHnislKdVIhrcB+V21A lM9wdReZB78+yX5CbQXt+30OoKMrPaPWAtw6Hnspe+N65zZgx6kQ0BuiwHvoL8mthA+t XH7Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id cw3-20020a170906478300b0076f077cec04si12361808ejc.365.2022.10.05.09.13.12; Wed, 05 Oct 2022 09:13:13 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 176D268BD20; Wed, 5 Oct 2022 19:13:01 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D201D68BD12 for ; Wed, 5 Oct 2022 19:12:57 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 2B37DC000A for ; Wed, 5 Oct 2022 19:12:57 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Oct 2022 19:12:53 +0300 Message-Id: <20221005161256.27612-1-remi@remlab.net> X-Mailer: git-send-email 2.37.2 In-Reply-To: <12083658.O9o76ZdvQC@basile.remlab.net> References: <12083658.O9o76ZdvQC@basile.remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/4] lavc/opusdsp: RISC-V V (128-bit) postfilter X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: KtPHUnfm53Hz This is implemented for a vector size of 128-bit. Since the scalar product in the inner loop covers 5 samples or 160 bits, we need a group multipler of 2. To avoid reconfiguring the vector type, the outer loop, which loads multiple input samples sticks to the same multipler. Consequently, the outer loop loads 8 samples per iteration. This is safe since the minimum period of the CELT codec is 15 samples. The same code would also work, albeit needlessly inefficiently with a vector length of 256 bits. A proper implementation will follow instead. --- libavcodec/opusdsp.c | 2 ++ libavcodec/opusdsp.h | 1 + libavcodec/riscv/Makefile | 2 ++ libavcodec/riscv/opusdsp_init.c | 42 ++++++++++++++++++++++++ libavcodec/riscv/opusdsp_rvv.S | 57 +++++++++++++++++++++++++++++++++ 5 files changed, 104 insertions(+) create mode 100644 libavcodec/riscv/opusdsp_init.c create mode 100644 libavcodec/riscv/opusdsp_rvv.S diff --git a/libavcodec/opusdsp.c b/libavcodec/opusdsp.c index badcfcc884..0764d712e4 100644 --- a/libavcodec/opusdsp.c +++ b/libavcodec/opusdsp.c @@ -58,6 +58,8 @@ av_cold void ff_opus_dsp_init(OpusDSP *ctx) #if ARCH_AARCH64 ff_opus_dsp_init_aarch64(ctx); +#elif ARCH_RISCV + ff_opus_dsp_init_riscv(ctx); #elif ARCH_X86 ff_opus_dsp_init_x86(ctx); #endif diff --git a/libavcodec/opusdsp.h b/libavcodec/opusdsp.h index 3ea3d14bf0..c2a301e832 100644 --- a/libavcodec/opusdsp.h +++ b/libavcodec/opusdsp.h @@ -30,5 +30,6 @@ void ff_opus_dsp_init(OpusDSP *ctx); void ff_opus_dsp_init_x86(OpusDSP *ctx); void ff_opus_dsp_init_aarch64(OpusDSP *ctx); +void ff_opus_dsp_init_riscv(OpusDSP *ctx); #endif /* AVCODEC_OPUSDSP_H */ diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index eae87ea231..965942f4df 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -12,6 +12,8 @@ OBJS-$(CONFIG_FMTCONVERT) += riscv/fmtconvert_init.o RVV-OBJS-$(CONFIG_FMTCONVERT) += riscv/fmtconvert_rvv.o OBJS-$(CONFIG_IDCTDSP) += riscv/idctdsp_init.o RVV-OBJS-$(CONFIG_IDCTDSP) += riscv/idctdsp_rvv.o +OBJS-$(CONFIG_OPUS_DECODER) += riscv/opusdsp_init.o +RVV-OBJS-$(CONFIG_OPUS_DECODER) += riscv/opusdsp_rvv.o OBJS-$(CONFIG_PIXBLOCKDSP) += riscv/pixblockdsp_init.o \ riscv/pixblockdsp_rvi.o RVV-OBJS-$(CONFIG_PIXBLOCKDSP) += riscv/pixblockdsp_rvv.o diff --git a/libavcodec/riscv/opusdsp_init.c b/libavcodec/riscv/opusdsp_init.c new file mode 100644 index 0000000000..f1d2c871e3 --- /dev/null +++ b/libavcodec/riscv/opusdsp_init.c @@ -0,0 +1,42 @@ +/* + * Copyright © 2022 Rémi Denis-Courmont. + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "config.h" + +#include "libavutil/attributes.h" +#include "libavutil/cpu.h" +#include "libavutil/riscv/cpu.h" +#include "libavcodec/opusdsp.h" + +void ff_opus_postfilter_rvv_128(float *data, int period, float *g, int len); + +av_cold void ff_opus_dsp_init_riscv(OpusDSP *d) +{ +#if HAVE_RVV + int flags = av_get_cpu_flags(); + + if (flags & AV_CPU_FLAG_RVV_F32) + switch (ff_get_rv_vlenb()) { + case 16: + d->postfilter = ff_opus_postfilter_rvv_128; + break; + } +#endif +} diff --git a/libavcodec/riscv/opusdsp_rvv.S b/libavcodec/riscv/opusdsp_rvv.S new file mode 100644 index 0000000000..79b46696cd --- /dev/null +++ b/libavcodec/riscv/opusdsp_rvv.S @@ -0,0 +1,57 @@ +/* + * Copyright © 2022 Rémi Denis-Courmont. + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/riscv/asm.S" + +func ff_opus_postfilter_rvv_128, zve32f + addi a1, a1, 2 + slli a1, a1, 2 + lw t1, 4(a2) + vsetivli zero, 3, e32, m1, ta, ma + vle32.v v24, (a2) + sub a1, a0, a1 // a1 = &x4 = &data[-(period + 2)] + vsetivli zero, 5, e32, m2, ta, ma + vslide1up.vx v8, v24, t1 + lw t2, 8(a2) + vle32.v v16, (a1) + vslide1up.vx v24, v8, t2 // v24 = { g[2], g[1], g[0], g[1], g[2] } +2: + vsetvli t0, a3, e32, m2, ta, ma + vle32.v v0, (a0) + sub a3, a3, t0 +3: + vsetivli zero, 5, e32, m2, ta, ma + lw t2, 20(a1) + vfmul.vv v8, v24, v16 + addi a0, a0, 4 + vslide1down.vx v16, v16, t2 + addi a1, a1, 4 + vfredusum.vs v0, v8, v0 + vsetvli zero, t0, e32, m2, ta, ma + vmv.x.s t1, v0 + addi t0, t0, -1 + vslide1down.vx v0, v0, zero + sw t1, -4(a0) + bnez t0, 3b + + bnez a3, 2b + + ret +endfunc From patchwork Wed Oct 5 16:12:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 38564 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4d9:b0:9c:f4b:4e41 with SMTP id 25csp700747pzd; Wed, 5 Oct 2022 09:13:03 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4k9oz0dmOnqMM9UEsEGpiHcoQmNaqWZbq7CD4Aq+QxWT9bGDfM0MWY8yDkk15+HMkFPiPU X-Received: by 2002:a17:907:160d:b0:782:bc5d:162e with SMTP id hb13-20020a170907160d00b00782bc5d162emr275398ejc.291.1664986383338; Wed, 05 Oct 2022 09:13:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664986383; cv=none; d=google.com; s=arc-20160816; b=pfvjjvLY5wHkn9OsN99O/3VMt13MdueC+mdtJoGVG24Mtj7d3lgusFVwXMmqzwxN2+ UpkwtAO6w8Uf8+1BHd829unvifNedtyQiEvNIx5Snz3WyZc2OzQR207983L0yxsBslQ/ RgDAvBdu2XMGkPh6oujHsejOADMUoJ494rFXSYs1fnote7o/T2J9rFsZMkxdNETpjZiU 1iO+/okut8RsyVydlDqmwBOUtzcK71wTShsR3MSORGvskKzs/UJozHX9aJaHcryTwvnn faVvyhNOKNtFoSycrAoTrw+9bP0QRan4W8tuc4vKVVdqMfO0DbMv+LDIAAmdOXgkPFru SStg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=sLsjrN7p4tQkxgFGoPWacca7rv/HMum5I5NMbeO9VA8=; b=f+WWbOdr3eUiV//UWCesyLwlXqMZfUDltFC63n77xNRpc3UE8zAAyvAifU1+UzfhdT fOevkCTtt/B6yabsooLiINJ2ROmsd+/8TAz5x9cfjBCdBqF+NAWpH4SeEh+oAfcet5ob uhx+f99C9LirDepoDH/LlSjCnHQ/rKzDNLRpDTt6rJqkwfkn58+xzQHUPug1w1Hv1JQ5 ve2AogKTaeHbx5zEFy3rdvq5GU96QsXacfoDrg41IaEH4b0CSgf8R8jkkMibwggPTrS6 43cb1BKw6rHxUjXJR70H9906FZRxrPzYsrHvCvUP5fPt11os09UZ66MZUbCPBvlR4Jlq BuJw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id hq11-20020a1709073f0b00b007417e9a2c71si14965292ejc.352.2022.10.05.09.13.02; Wed, 05 Oct 2022 09:13:03 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 28BDB68BD15; Wed, 5 Oct 2022 19:13:00 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C185B68BD11 for ; Wed, 5 Oct 2022 19:12:57 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 5E571C0099 for ; Wed, 5 Oct 2022 19:12:57 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Oct 2022 19:12:54 +0300 Message-Id: <20221005161256.27612-2-remi@remlab.net> X-Mailer: git-send-email 2.37.2 In-Reply-To: <12083658.O9o76ZdvQC@basile.remlab.net> References: <12083658.O9o76ZdvQC@basile.remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/4] lavu/riscv: helper macro for VTYPE encoding X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: hsIaSxKhzxfl On most cases, the vector type (VTYPE) for the RISC-V Vector extension is supplied as an immediate value, with either of the VSETVLI or VSETIVLI instructions. There is however a third instruction VSETVL which takes the vector type from a general purpose register. That is so the type can be selected at run-time. This introduces a macro to load a (valid) vector type into a register. The syntax follows that of VSETVLI and VSETIVLI, with element size, group multiplier, then tail and mask policies. --- libavutil/riscv/asm.S | 75 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 75 insertions(+) diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S index ffa0bd9068..6ca74f263a 100644 --- a/libavutil/riscv/asm.S +++ b/libavutil/riscv/asm.S @@ -92,3 +92,78 @@ shnadd 3, \rd, \rs1, \rs2 .endm #endif + + /* Convenience macro to load a Vector type (vtype) as immediate */ + .macro lvtypei rd, e, m=m1, tp=tu, mp=mu + + .ifc \e,e8 + .equ ei, 0 + .else + .ifc \e,e16 + .equ ei, 8 + .else + .ifc \e,e32 + .equ ei, 16 + .else + .ifc \e,e64 + .equ ei, 24 + .else + .error "Unknown element type" + .endif + .endif + .endif + .endif + + .ifc \m,m1 + .equ mi, 0 + .else + .ifc \m,m2 + .equ mi, 1 + .else + .ifc \m,m4 + .equ mi, 2 + .else + .ifc \m,m8 + .equ mi, 3 + .else + .ifc \m,mf8 + .equ mi, 5 + .else + .ifc \m,mf4 + .equ mi, 6 + .else + .ifc \m,mf2 + .equ mi, 7 + .else + .error "Unknown multiplier" + .equ mi, 3 + .endif + .endif + .endif + .endif + .endif + .endif + .endif + + .ifc \tp,tu + .equ tpi, 0 + .else + .ifc \tp,ta + .equ tpi, 64 + .else + .error "Unknown tail policy" + .endif + .endif + + .ifc \mp,mu + .equ mpi, 0 + .else + .ifc \mp,ma + .equ mpi, 128 + .else + .error "Unknown mask policy" + .endif + .endif + + li \rd, (ei | mi | tpi | mpi) + .endm From patchwork Wed Oct 5 16:12:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 38566 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4d9:b0:9c:f4b:4e41 with SMTP id 25csp700898pzd; Wed, 5 Oct 2022 09:13:24 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5HPgmzAkGjkSR+WDynXCgw8pwUO0R+cfPIlzmoYjUz5AxN+P3G+S7yeVMBfeUZUqPajRbG X-Received: by 2002:a05:6402:2549:b0:452:8292:b610 with SMTP id l9-20020a056402254900b004528292b610mr460510edb.199.1664986404059; Wed, 05 Oct 2022 09:13:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664986404; cv=none; d=google.com; s=arc-20160816; b=1Au6c1KUwkMAeklzJzk8w/YR+YuBYMkotvLdJ1/pTuHcTy/SPWya7StQOvFbOp1K4l a4kb6m1d6dXdcXR3esTD7TCu2kvez6aa8CLbJyzP22w8o+TsNfYobS1RChkX2nJXyJkw YrRKKYobM59VSGPWwpBKZOMZp867bRYCdSwF+S6p+j8RcDcnqjdpgFmEi3g1grQ9B69Q Yu2k2IuWupC0OI0yGIESoY7lSK+wK67q2b6VKRFGvMTRz16MsOaoXNZ9hNR3MRIt2Dpn 3JHEAW4n0333V7OHRGW3wlAZithJcPBUhLo1Pz2BY5ZOQu07oXNnxpnI1k1Xtnsoe2a8 QcNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=HPqt4qZScekHfqdo91Dv0uMXKSEKH6dqxdP2SnuYzq0=; b=uIHNihSKM2F5s+ZqZSqtkPOfBvgvy/2qZ3n8agdFE1k5fxmpMHNk+jiXXG2zLtMh40 UNdF6cqapPFRY8qTdlV9PJpNeEw6IHewOe7DWoujf4vQx2/cQ7CG4TX+R94y6BbatZrI fQ/4WHu4wo/X0+kHVhSFxQdG4h9VztglA10JroyoZ7E676SU/r47eB83e+ugN+xkjy03 E7NFqKx/1+5X0U/KrHg00pc+JnasRxkcE0GVwqJ3F3DDE3Yr3Jeiyx6QKWQ+2kNVwl/M +cpqPqfmMiflAbV4cbeZNfl9EWl82l4JbGkcVu3bEAmO1D1nVc25rZBXKTodesWPAMZy 4NTQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id z16-20020a05640235d000b004574154f09asi412341edc.529.2022.10.05.09.13.23; Wed, 05 Oct 2022 09:13:24 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id F1DAB68BD24; Wed, 5 Oct 2022 19:13:01 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DB99868BD13 for ; Wed, 5 Oct 2022 19:12:57 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 911C8C00AF for ; Wed, 5 Oct 2022 19:12:57 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Oct 2022 19:12:55 +0300 Message-Id: <20221005161256.27612-3-remi@remlab.net> X-Mailer: git-send-email 2.37.2 In-Reply-To: <12083658.O9o76ZdvQC@basile.remlab.net> References: <12083658.O9o76ZdvQC@basile.remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/4] lavc/opusdsp: RISC-V V (256-bit) postfilter X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: RL83YfNtrnhy This adds a variant of the postfilter for use with 256-bit vectors. As a single vector is then large enough to perform the scalar product, the group multipler is reduced to just one at run-time. The different vector type is passed via register. Unfortunately, there is no VSETIVL instruction, so the constant vector size (5) also needs to be passed via a register. --- libavcodec/riscv/opusdsp_init.c | 4 ++++ libavcodec/riscv/opusdsp_rvv.S | 16 ++++++++++++---- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/libavcodec/riscv/opusdsp_init.c b/libavcodec/riscv/opusdsp_init.c index f1d2c871e3..e6f9505f77 100644 --- a/libavcodec/riscv/opusdsp_init.c +++ b/libavcodec/riscv/opusdsp_init.c @@ -26,6 +26,7 @@ #include "libavcodec/opusdsp.h" void ff_opus_postfilter_rvv_128(float *data, int period, float *g, int len); +void ff_opus_postfilter_rvv_256(float *data, int period, float *g, int len); av_cold void ff_opus_dsp_init_riscv(OpusDSP *d) { @@ -37,6 +38,9 @@ av_cold void ff_opus_dsp_init_riscv(OpusDSP *d) case 16: d->postfilter = ff_opus_postfilter_rvv_128; break; + case 32: + d->postfilter = ff_opus_postfilter_rvv_256; + break; } #endif } diff --git a/libavcodec/riscv/opusdsp_rvv.S b/libavcodec/riscv/opusdsp_rvv.S index 79b46696cd..243c9a5e52 100644 --- a/libavcodec/riscv/opusdsp_rvv.S +++ b/libavcodec/riscv/opusdsp_rvv.S @@ -21,30 +21,38 @@ #include "libavutil/riscv/asm.S" func ff_opus_postfilter_rvv_128, zve32f + lvtypei a5, e32, m2, ta, ma + j 1f +endfunc + +func ff_opus_postfilter_rvv_256, zve32f + lvtypei a5, e32, m1, ta, ma +1: + li a4, 5 addi a1, a1, 2 slli a1, a1, 2 lw t1, 4(a2) vsetivli zero, 3, e32, m1, ta, ma vle32.v v24, (a2) sub a1, a0, a1 // a1 = &x4 = &data[-(period + 2)] - vsetivli zero, 5, e32, m2, ta, ma + vsetvl zero, a4, a5 vslide1up.vx v8, v24, t1 lw t2, 8(a2) vle32.v v16, (a1) vslide1up.vx v24, v8, t2 // v24 = { g[2], g[1], g[0], g[1], g[2] } 2: - vsetvli t0, a3, e32, m2, ta, ma + vsetvl t0, a3, a5 vle32.v v0, (a0) sub a3, a3, t0 3: - vsetivli zero, 5, e32, m2, ta, ma + vsetvl zero, a4, a5 lw t2, 20(a1) vfmul.vv v8, v24, v16 addi a0, a0, 4 vslide1down.vx v16, v16, t2 addi a1, a1, 4 vfredusum.vs v0, v8, v0 - vsetvli zero, t0, e32, m2, ta, ma + vsetvl zero, t0, a5 vmv.x.s t1, v0 addi t0, t0, -1 vslide1down.vx v0, v0, zero From patchwork Wed Oct 5 16:12:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 38567 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4d9:b0:9c:f4b:4e41 with SMTP id 25csp700961pzd; Wed, 5 Oct 2022 09:13:34 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5SuWUCgDfNph9RKG6nIYODKRsZ0SEEtSq7jf4lu0APZ/oDL+MAtwXAldIuoER4KNMQjqls X-Received: by 2002:a17:907:7629:b0:776:a147:8524 with SMTP id jy9-20020a170907762900b00776a1478524mr290673ejc.632.1664986414000; Wed, 05 Oct 2022 09:13:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664986413; cv=none; d=google.com; s=arc-20160816; b=ActBtd/ARB4CuPG0tUseY0g+FRevJKgmDrT8zMljdf1WpPgO2nDtapoifQAqceXTC3 x/ZXC7GgVaWnQ6qQEXcNOERIWofmA6BClVE5/FfgI22ufaEywx0q5hOGNxgwTHXoAFWK jcNxtnIeY+DZyfllqVLHLh42zSodnAAId/JTmIIf5Aob1sxuqIxBYJQq76GR2YisD7uw BB6Zha9DpHfQh6VtHQ8cJIH8Q8pGki/KayazALXgQ9TkXWt20BN4OoutmFVSkBbeYntq ufPLHBHUzHZVthOG5XmUMqfyQ1feG/wdKlAe4IjYakmJORu2QimdnzWpbGJP5vruKhZ/ TE7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=lTZ0UgkTSYuyxb+xY6rwflC76Bx9IE8zsWlaAxxGTUU=; b=foj7iyqQbykpRKQmwdcUqAlG1unKdOQehdP5mcwI0DIfuJueufTzSG+4UleSeK0X3K h7N8aV7y0SzFQ+Fe54QJ3PPkfBuoEzpRz/Ng1Ug/Cxd23hsLmjjgEtmCPyfmK2YkRZ88 vDc4Yp9BO/sVDF3Lv9m/7zC/lW9ODAEEK7j72saVDWTpbmOpj/BVyExLSxRU6zfXNhCP cMNBy6UpIXW0vjqh8SdBjnrovmp+vY0lJwVGAKmzki1CduyC2PZgs4dt71VwT/SmamHZ iO2asHJwvLQ2rU6WUPh26wSdgbCRhoyL9C9cLQrt8jWqpmYpklRrXyKdQ3kxGyPyyHuH wCjA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u11-20020a50a40b000000b00458485463desi14215163edb.606.2022.10.05.09.13.33; Wed, 05 Oct 2022 09:13:33 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D8FC068BD28; Wed, 5 Oct 2022 19:13:02 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 27A6068BD15 for ; Wed, 5 Oct 2022 19:12:58 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id C417BC00B0 for ; Wed, 5 Oct 2022 19:12:57 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Oct 2022 19:12:56 +0300 Message-Id: <20221005161256.27612-4-remi@remlab.net> X-Mailer: git-send-email 2.37.2 In-Reply-To: <12083658.O9o76ZdvQC@basile.remlab.net> References: <12083658.O9o76ZdvQC@basile.remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/4] lavc/opusdsp: RISC-V V (512-bit) postfilter X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: gqVyEuepQ+S3 This adds a variant of the postfilter for use with 512-bit vectors. Half a vector is enough to perform the scalar product. Normally a whole vector would be used anyhow. Indeed fractional multiplers are no faster than the unit multipler. But in this particular function, a full vector makes up 16 samples, which would be loaded at each iteration of the outer loop. The minimum guaranteed CELT postfilter period is only 15. Accounting for the edges, we can only safely preload up to 13 samples. The fractional multipler is thus used to cap the selected vector length to a safe value of 8 elements or 256 bits. Likewise, we have the 1024-bit variant with the quarter multipler. In theory, a 2048-bit one would be possible with the eigth multipler, but that length is not even defined in the specifications as of yet, nor is it supported by any emulator - forget actual hardware. --- libavcodec/riscv/opusdsp_init.c | 8 ++++++++ libavcodec/riscv/opusdsp_rvv.S | 10 ++++++++++ 2 files changed, 18 insertions(+) diff --git a/libavcodec/riscv/opusdsp_init.c b/libavcodec/riscv/opusdsp_init.c index e6f9505f77..d564cca50c 100644 --- a/libavcodec/riscv/opusdsp_init.c +++ b/libavcodec/riscv/opusdsp_init.c @@ -27,6 +27,8 @@ void ff_opus_postfilter_rvv_128(float *data, int period, float *g, int len); void ff_opus_postfilter_rvv_256(float *data, int period, float *g, int len); +void ff_opus_postfilter_rvv_512(float *data, int period, float *g, int len); +void ff_opus_postfilter_rvv_1024(float *data, int period, float *g, int len); av_cold void ff_opus_dsp_init_riscv(OpusDSP *d) { @@ -41,6 +43,12 @@ av_cold void ff_opus_dsp_init_riscv(OpusDSP *d) case 32: d->postfilter = ff_opus_postfilter_rvv_256; break; + case 64: + d->postfilter = ff_opus_postfilter_rvv_512; + break; + case 128: + d->postfilter = ff_opus_postfilter_rvv_512; + break; } #endif } diff --git a/libavcodec/riscv/opusdsp_rvv.S b/libavcodec/riscv/opusdsp_rvv.S index 243c9a5e52..b3d23a9de5 100644 --- a/libavcodec/riscv/opusdsp_rvv.S +++ b/libavcodec/riscv/opusdsp_rvv.S @@ -25,6 +25,16 @@ func ff_opus_postfilter_rvv_128, zve32f j 1f endfunc +func ff_opus_postfilter_rvv_512, zve32f + lvtypei a5, e32, mf2, ta, ma + j 1f +endfunc + +func ff_opus_postfilter_rvv_1024, zve32f + lvtypei a5, e32, mf4, ta, ma + j 1f +endfunc + func ff_opus_postfilter_rvv_256, zve32f lvtypei a5, e32, m1, ta, ma 1: