From patchwork Mon Apr 29 19:21:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 48372 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1509:b0:1a9:af23:56c1 with SMTP id nq9csp2200854pzb; Mon, 29 Apr 2024 12:21:56 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCV5PYNBZ8nPxVF1kEmx5nEB51ETzIPOpSIEcq1NESkZ9Jv9zS6H0ch4fKa/toEZrvj+dGvXRAhvaADR8APaxJrhyRUCFJhn1rD6IQ== X-Google-Smtp-Source: AGHT+IFyxyKprSFG0p+dinDY0JXIzTxWgyuIZSIgze3K//AR18pp9mxXWoHs5Mi8IlmLDjZtPO9b X-Received: by 2002:ac2:483c:0:b0:51c:5570:f570 with SMTP id 28-20020ac2483c000000b0051c5570f570mr9039064lft.59.1714418516321; Mon, 29 Apr 2024 12:21:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714418516; cv=none; d=google.com; s=arc-20160816; b=k3t0ChoQEegESnvHQzHuEY10/PaIVFdhXjCQWpNLhIgD7B3/5rwpu/cA5VxkWRxPTT sW7htL3JJaUI8meFPvijx2evJjmL2W+OX0Bz0JWTXUtU2uoVcYu1tWZTQr5eePHaOM16 AfyHKufvHOZ35GW5Qakeyj1bmZsFIMVi/uvUbjihbv2lE3GOoBhN4WmE12Rx9OIcGVnU PIUA/sYqSn4mETzrP5mlGooZz0/UERsFB5w+9MF/7+FJts3oFkNgVzckcGMZkbN0oYZG r7qEB7MdPHjmCsb+zvRcJlAwM1kudBghk+gNKGfVnTav/19GsM+tAg2nvh+/Vhg++ATz 0ScA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=4/btTX9shgSCBk6TlQvw0kYjesKs9/5bn9ca+rEV5u8=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=hhSd2pK1BlsPM4HwDfx59PcwGuRow8vED4aBoM15DIE5Gs2HwqAPDgW5EuEqyayWsL lD8D2J0wgIvclxXFeoPPfvFqCfOaQNuUJ8M0W23hx3j+VtW/xBLsCKo/TIW+6zstHr/T aqlmY0jhzU1pIIO18iiq49hvwSguw/0/GvEhiBlj91oo61EZc7lGSYqeo/95aZ5jqfhI 7cnj3ZPcPnhDP2Ybov69rvBOe+YkggK1Q5AryHnUDT8A4l2Xvr5xnl+sUXqdpLABn210 8LltdUKz+Eyw4lE68GvmkRXIp+/mS8i9q5eZtk1cZEPmnMfb7J0j2J3Hj5SaKJWWJKev wwjA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s4-20020a195e04000000b0051e228231a2si226713lfb.205.2024.04.29.12.21.55; Mon, 29 Apr 2024 12:21:56 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C6FD768D514; Mon, 29 Apr 2024 22:21:51 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 98EE968D45A for ; Mon, 29 Apr 2024 22:21:45 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id A69B8C0069 for ; Mon, 29 Apr 2024 22:21:44 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 29 Apr 2024 22:21:43 +0300 Message-ID: <20240429192144.84571-1-remi@remlab.net> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] lavc/ac3dsp: R-V V sum_square_butterfly_int32 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: tDh5AkXViDOq ac3_sum_square_bufferfly_int32_c: 61.0 ac3_sum_square_bufferfly_int32_rvv_i64: 14.7 --- libavcodec/riscv/ac3dsp_init.c | 6 +++++ libavcodec/riscv/ac3dsp_rvv.S | 41 ++++++++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+) diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c index b9e14d56ca..be5e153fac 100644 --- a/libavcodec/riscv/ac3dsp_init.c +++ b/libavcodec/riscv/ac3dsp_init.c @@ -28,6 +28,8 @@ void ff_extract_exponents_rvb(uint8_t *exp, int32_t *coef, int nb_coefs); void ff_float_to_fixed24_rvv(int32_t *dst, const float *src, size_t len); +void ff_sum_square_butterfly_int32_rvv(int64_t *, const int32_t *, + const int32_t *, int); av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c) { @@ -39,6 +41,10 @@ av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c) c->extract_exponents = ff_extract_exponents_rvb; if (flags & AV_CPU_FLAG_RVV_F32) c->float_to_fixed24 = ff_float_to_fixed24_rvv; +# if __riscv_xlen >= 64 + if (flags & AV_CPU_FLAG_RVV_I64) + c->sum_square_butterfly_int32 = ff_sum_square_butterfly_int32_rvv; +# endif } #endif } diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S index b8d32c4677..dd0b4cd797 100644 --- a/libavcodec/riscv/ac3dsp_rvv.S +++ b/libavcodec/riscv/ac3dsp_rvv.S @@ -37,3 +37,44 @@ func ff_float_to_fixed24_rvv, zve32f ret endfunc + +#if __riscv_xlen >= 64 +func ff_sum_square_butterfly_int32_rvv, zve64x + vsetvli t0, zero, e64, m8, ta, ma + vmv.v.x v0, zero + vmv.v.x v8, zero +1: + vsetvli t0, a3, e32, m2, tu, ma + vle32.v v16, (a1) + sub a3, a3, t0 + vle32.v v20, (a2) + sh2add a1, t0, a1 + vadd.vv v24, v16, v20 + sh2add a2, t0, a2 + vsub.vv v28, v16, v20 + vwmacc.vv v0, v16, v16 + vwmacc.vv v4, v20, v20 + vwmacc.vv v8, v24, v24 + vwmacc.vv v12, v28, v28 + bnez a3, 1b + + vsetvli t0, zero, e64, m4, ta, ma + vmv.s.x v16, zero + vmv.s.x v17, zero + vredsum.vs v16, v0, v16 + vmv.s.x v18, zero + vredsum.vs v17, v4, v17 + vmv.s.x v19, zero + vredsum.vs v18, v8, v18 + vmv.x.s t0, v16 + vredsum.vs v19, v12, v19 + vmv.x.s t1, v17 + sd t0, (a0) + vmv.x.s t2, v18 + sd t1, 8(a0) + vmv.x.s t3, v19 + sd t2, 16(a0) + sd t3, 24(a0) + ret +endfunc +#endif From patchwork Mon Apr 29 19:21:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 48373 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1509:b0:1a9:af23:56c1 with SMTP id nq9csp2200937pzb; Mon, 29 Apr 2024 12:22:05 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU76ZNOIMp4gV4bg7j3VV82L284xZQxe85ohNfchw9mBY7Sy8HVJrYCwJ0f21k/3E+2Xvd34lghmtXdmA2DxlYo7GsO/qcm0nJleg== X-Google-Smtp-Source: AGHT+IG1ySNhPjmfC/WszKdfY5tinzBNu++JN9M+VFr8zOy6Gpu0SacbE4ek9KoYiC9XA1BelyJA X-Received: by 2002:a17:906:395b:b0:a55:b93e:90ef with SMTP id g27-20020a170906395b00b00a55b93e90efmr4621090eje.77.1714418525471; Mon, 29 Apr 2024 12:22:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714418525; cv=none; d=google.com; s=arc-20160816; b=oBukt4A+/ZczsubjMSLhM17KznhsixLIAykeqHsNV4HQRNgDDwAwYmwUbse7sE4ahH bz0AZ3UAKPpdO6wB69vddC+f5BovKGOGrvTH7loOEcK7BGNwIIjMiWra7akkyllroM79 mS4JgmPuY77MGIfqDfsCtl4GLgdx7vhjVdzDVzAEZLEs4YHV1AFngP9psI4Jxtyh7KVN 0Yzch6WERO/wprL6zAJhdoVEaJTUw50Bsin7AzgiIoALcSiTJn4Q8HBYiAjMvgKIGTA7 700latg9PobY9xURzny+y/+L3Xse1UzoKCPGYTGFVOnoPr+3N1E5dhre1/XmQ277vZZI u8Bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=iRLbC+09nJT25Q3JxdW4mEfMbvw46Q/XOedvd1jVhS4=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=P4MLC8gOqt2pCq6BAdgbDygGBHiQJoyv2mycuUds7Jp2ZMC6MhXGYmhx+7cQtQIQjy +fO/mLdA1D43fm72Nm23eYaX5gtnwl970CM0pki1NkZAxk/KIpHTfKbnJ9HPjGWdl94W cA9sziEyiWlPWOv7qeJ3AjrtwDk3ncVAC3j3xOFwsMWX/9z8+YJhVQA182ETFdJTn0Ou rW93PA4veFk3Rhqsl3aJBWEpIWFbxKdLAcsoT0PV9H96wicQjpcHW3yUH8vFLcWW2q3K DGOJZTmchYXK1xRT2dMnlsPibMQ7LbCNpxo4Tkal1stFs4qj9kTeSUAX4o9xKuHn7/ZQ XvmA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id l16-20020a1709062a9000b00a523d95aa45si14945016eje.298.2024.04.29.12.22.04; Mon, 29 Apr 2024 12:22:05 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E58E768D518; Mon, 29 Apr 2024 22:21:52 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9588468D40C for ; Mon, 29 Apr 2024 22:21:45 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id D6EEAC0214 for ; Mon, 29 Apr 2024 22:21:44 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 29 Apr 2024 22:21:44 +0300 Message-ID: <20240429192144.84571-2-remi@remlab.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240429192144.84571-1-remi@remlab.net> References: <20240429192144.84571-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] lavc/ac3dsp: R-V V sum_square_butterfly_float X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 6fHvuaIWT74e As we do not need to widen accumulators to 64 bits, we effectively get double capacity for unrolling compared to the integer function. This explains the slightly better performance gains. ac3_sum_square_bufferfly_float_c: 65.2 ac3_sum_square_bufferfly_float_rvv_f32: 12.2 --- libavcodec/riscv/ac3dsp_init.c | 6 +++++- libavcodec/riscv/ac3dsp_rvv.S | 39 ++++++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c index be5e153fac..e120aa2dce 100644 --- a/libavcodec/riscv/ac3dsp_init.c +++ b/libavcodec/riscv/ac3dsp_init.c @@ -30,6 +30,8 @@ void ff_extract_exponents_rvb(uint8_t *exp, int32_t *coef, int nb_coefs); void ff_float_to_fixed24_rvv(int32_t *dst, const float *src, size_t len); void ff_sum_square_butterfly_int32_rvv(int64_t *, const int32_t *, const int32_t *, int); +void ff_sum_square_butterfly_float_rvv(float *, const float *, + const float *, int); av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c) { @@ -39,8 +41,10 @@ av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c) if (flags & AV_CPU_FLAG_RVB_ADDR) { if (flags & AV_CPU_FLAG_RVB_BASIC) c->extract_exponents = ff_extract_exponents_rvb; - if (flags & AV_CPU_FLAG_RVV_F32) + if (flags & AV_CPU_FLAG_RVV_F32) { c->float_to_fixed24 = ff_float_to_fixed24_rvv; + c->sum_square_butterfly_float = ff_sum_square_butterfly_float_rvv; + } # if __riscv_xlen >= 64 if (flags & AV_CPU_FLAG_RVV_I64) c->sum_square_butterfly_int32 = ff_sum_square_butterfly_int32_rvv; diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S index dd0b4cd797..397e000ab0 100644 --- a/libavcodec/riscv/ac3dsp_rvv.S +++ b/libavcodec/riscv/ac3dsp_rvv.S @@ -78,3 +78,42 @@ func ff_sum_square_butterfly_int32_rvv, zve64x ret endfunc #endif + +func ff_sum_square_butterfly_float_rvv, zve32f + vsetvli t0, zero, e32, m8, ta, ma + vmv.v.x v0, zero + vmv.v.x v8, zero +1: + vsetvli t0, a3, e32, m4, tu, ma + vle32.v v16, (a1) + sub a3, a3, t0 + vle32.v v20, (a2) + sh2add a1, t0, a1 + vfadd.vv v24, v16, v20 + sh2add a2, t0, a2 + vfsub.vv v28, v16, v20 + vfmacc.vv v0, v16, v16 + vfmacc.vv v4, v20, v20 + vfmacc.vv v8, v24, v24 + vfmacc.vv v12, v28, v28 + bnez a3, 1b + + vsetvli t0, zero, e32, m4, ta, ma + vmv.s.x v16, zero + vmv.s.x v17, zero + vfredsum.vs v16, v0, v16 + vmv.s.x v18, zero + vfredsum.vs v17, v4, v17 + vmv.s.x v19, zero + vfredsum.vs v18, v8, v18 + vfmv.f.s ft0, v16 + vfredsum.vs v19, v12, v19 + vfmv.f.s ft1, v17 + fsw ft0, (a0) + vfmv.f.s ft2, v18 + fsw ft1, 4(a0) + vfmv.f.s ft3, v19 + fsw ft2, 8(a0) + fsw ft3, 12(a0) + ret +endfunc From patchwork Wed May 1 06:40:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 48436 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:1203:b0:1a9:af23:56c1 with SMTP id v3csp553621pzf; Tue, 30 Apr 2024 23:40:21 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUlYee0dBTds97BKSnZv+feD4mvON/LkObJvcyVoURuv+JQspCl1SG3BRwQ4/imeJX7oJhLD5RTBT9Afp/ibWwbnNsxbw9MkafPLw== X-Google-Smtp-Source: AGHT+IGyqg8Xm+CKwpPfk7T5vydmRPwtAAsb5Q5VSgC+Q9CVAPoRfqpgp+SG61Q/PE0yntY4lqKU X-Received: by 2002:a17:906:3546:b0:a58:7470:21f3 with SMTP id s6-20020a170906354600b00a58747021f3mr1093656eja.28.1714545621052; Tue, 30 Apr 2024 23:40:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714545621; cv=none; d=google.com; s=arc-20160816; b=eIeRoQx7V9kJA+1pZgWAq6Z3qHJqoN986Dj33Dxfl+CI4hy5AErnIZKC5oAVLf4+da H+jM//w+u47ObEaW2WTZ2aUb0k3opq4BID9LwjjhG492Nvri0tJilnGVY7kRjqGZOULK Egq6MraEzhqNOWGTZuxPmaIvVhPos8AlibLrd44gbHRuWRuShn/8qSvBAiLKhD1PoKO1 Ss7igPYQZ+vkTDcXnTR8Relahk4x2OFFI9NFqypxC1FIFghx85X/XVo+CGVwMKHH2jDR PnuLgRC1mvH2ur1wMznK0eTxw27SMEMW4hTro1rQsfX4V0zu3aKYxCjODqF0ODPeC+oe JiKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=QBpdRA1E3EnVcsXXAsUj1FDSkIaOFQnH4smRfWbKrWo=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=qtUrBhWWQHj0Q9dhIjGpX7B0WmmkQhbWiumP1awKfYExMRdIAN4H2eDeuJLVJ+P4Sa gW8oiMIXBdMnk0t0WddWguRDgC8T6NrleG+l0iBmxEEu9XKhMfQa6iniSqI6hpwQZmQU rtuJczF+SDJXH+9CAVZtIPQUvnGWD2vpW6j25IcCGsPXRqmQEkiUnSveBGHbrCBEKvtX 1+skHFuybam0UkqIoaPWGxMo6LXAz1tYsOYuG+11DCDLv64xJZAD7VRWfrZA2+UqUTBU Qg9SfYkENgvmp1+hipayo3WePNJyfrBI+ZkrIGddtMvhiOpmhQWMVAt56UNi4fwI71pZ wCrg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s26-20020a170906061a00b00a58ede3db0fsi4595252ejb.250.2024.04.30.23.40.19; Tue, 30 Apr 2024 23:40:21 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 25E3E68D67A; Wed, 1 May 2024 09:40:16 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 82F6268D582 for ; Wed, 1 May 2024 09:40:09 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 97370C006A for ; Wed, 1 May 2024 09:40:08 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 1 May 2024 09:40:08 +0300 Message-ID: <20240501064008.16802-1-remi@remlab.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240429192144.84571-1-remi@remlab.net> References: <20240429192144.84571-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] lavc/ac3dsp: R-V V min_exponents X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Pb2zFZPU4KVp T-Head C908: ac3_exponent_min_reuse0_c: 7.5 ac3_exponent_min_reuse0_rvv_i32: 7.5 ac3_exponent_min_reuse1_c: 1820.7 ac3_exponent_min_reuse1_rvv_i32: 102.5 ac3_exponent_min_reuse2_c: 3088.5 ac3_exponent_min_reuse2_rvv_i32: 138.7 ac3_exponent_min_reuse3_c: 5073.7 ac3_exponent_min_reuse3_rvv_i32: 174.7 ac3_exponent_min_reuse4_c: 4624.2 ac3_exponent_min_reuse4_rvv_i32: 204.2 ac3_exponent_min_reuse5_c: 5138.7 ac3_exponent_min_reuse5_rvv_i32: 238.0 --- libavcodec/riscv/ac3dsp_init.c | 4 ++++ libavcodec/riscv/ac3dsp_rvv.S | 22 ++++++++++++++++++++++ 2 files changed, 26 insertions(+) diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c index e120aa2dce..c7c375273d 100644 --- a/libavcodec/riscv/ac3dsp_init.c +++ b/libavcodec/riscv/ac3dsp_init.c @@ -26,6 +26,7 @@ #include "libavutil/cpu.h" #include "libavcodec/ac3dsp.h" +void ff_ac3_exponent_min_rvv(uint8_t *exp, int, int); void ff_extract_exponents_rvb(uint8_t *exp, int32_t *coef, int nb_coefs); void ff_float_to_fixed24_rvv(int32_t *dst, const float *src, size_t len); void ff_sum_square_butterfly_int32_rvv(int64_t *, const int32_t *, @@ -38,6 +39,9 @@ av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c) #if HAVE_RV int flags = av_get_cpu_flags(); + if (flags & AV_CPU_FLAG_RVV_I32) + c->ac3_exponent_min = ff_ac3_exponent_min_rvv; + if (flags & AV_CPU_FLAG_RVB_ADDR) { if (flags & AV_CPU_FLAG_RVB_BASIC) c->extract_exponents = ff_extract_exponents_rvb; diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S index 397e000ab0..1b5f67a9ec 100644 --- a/libavcodec/riscv/ac3dsp_rvv.S +++ b/libavcodec/riscv/ac3dsp_rvv.S @@ -21,6 +21,28 @@ #include "config.h" #include "libavutil/riscv/asm.S" +func ff_ac3_exponent_min_rvv, zve32x + beqz a1, 3f +1: + vsetvli t2, a2, e8, m8, ta, ma + vle8.v v8, (a0) + addi t0, a0, 256 + sub a2, a2, t2 + mv t1, a1 +2: + vle8.v v16, (t0) + addi t1, t1, -1 + vminu.vv v8, v8, v16 + addi t0, t0, 256 + bnez t1, 2b + + vse8.v v8, (a0) + add a0, a0, t2 + bnez a2, 1b +3: + ret +endfunc + func ff_float_to_fixed24_rvv, zve32f li t1, 1 << 24 fcvt.s.w f0, t1