From patchwork Fri Sep 29 16:26:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 44025 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:394d:b0:15d:8365:d4b8 with SMTP id r13csp669757pzg; Fri, 29 Sep 2023 09:26:26 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGWQQqiHFaH6lW+kLck4jEmHnh6eFChV52llVBtQH1ZNbwjSa2BBrVXSO7TZrXL6TS1kibD X-Received: by 2002:a17:906:845b:b0:9ae:6538:643b with SMTP id e27-20020a170906845b00b009ae6538643bmr4595222ejy.38.1696004785753; Fri, 29 Sep 2023 09:26:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696004785; cv=none; d=google.com; s=arc-20160816; b=Gidu4v9EzhbuI6FgS0hezSnQHWKR5IWB+3DVgLTmtqPL1HVRU3uo8k2CFfFsG/RWFw rULWFoK2+3dpvnzGWj7dh5q+V7oEAMrEupQFOsMw7DToQPU6W8oYSLZwnY7aW+E70SQu nS4lAWY/F3cT7gXC/JhbMSRHD21a12Ku17w2HmhLkQgnxPWJvDjHXT/s8SF/glV66xUh oh+lWaxIW60mN1LbGB8abNzMJfgEg4sjYkFjheZD61rTebzl4EkgWqlKETELM2tHOns5 QTL8iB27M4mIxCRbWZ410kpae9bRs8Pwf0VbBnakRtAXpw2KKu17aUob7mvLl+pP3K2l ncdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=hRdhVei5L6JAeT2Kk/KeT8pSoq/mjLv8JsEcW6F4+to=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=xow7MPKtOUdOJs6jDD2rCdmFM9S21Evb+LNecSy/5icqEnLMFtVL0TJ5j5I5oLYaTy Ojj++IdLxq7ZOiRj5z50xHuDkcvh2eb0lcod9rZi0o2hu+TU+nTSDOxYlDqUjfNm4G74 rHT87Sxm4fXGlaDCegFoMli7pTpukW03XFUTTB2YKcGiDTqIrRXWW5uswj75uuMf3nHR SV3CQ3bhMfmMCJLKAn/HpfY78IvkZw9QTGp8M8oIuEJ5biapNHCTNmQJ81hCZpSzTnp3 ARZN5bykEFKLXH0IH1KUojuPZ1mGVYlmzKizeAghHoI5ObUMEvSrqm4i/JpYR8gOhUZ6 OfvQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id bs26-20020a170906d1da00b009a1b857e3b7si15886972ejb.481.2023.09.29.09.26.25; Fri, 29 Sep 2023 09:26:25 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5095468CC6B; Fri, 29 Sep 2023 19:26:14 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 83C8368CC4D for ; Fri, 29 Sep 2023 19:26:06 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 1C5DAC00A2 for ; Fri, 29 Sep 2023 19:26:06 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Fri, 29 Sep 2023 19:26:02 +0300 Message-Id: <20230929162605.80421-2-remi@remlab.net> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230929162605.80421-1-remi@remlab.net> References: <20230929162605.80421-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/5] lavc/aacpsdsp: unroll R-V V stereo interpolate X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: AnVPGHo8ofRJ --- libavcodec/riscv/aacpsdsp_rvv.S | 46 ++++++++++++++++----------------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/libavcodec/riscv/aacpsdsp_rvv.S b/libavcodec/riscv/aacpsdsp_rvv.S index b85a5cc92c..1a92fed515 100644 --- a/libavcodec/riscv/aacpsdsp_rvv.S +++ b/libavcodec/riscv/aacpsdsp_rvv.S @@ -223,7 +223,7 @@ func ff_ps_hybrid_synthesis_deint_rvv, zve32x endfunc func ff_ps_stereo_interpolate_rvv, zve32f - vsetvli t0, zero, e32, m1, ta, ma + vsetvli t0, zero, e32, m2, ta, ma vid.v v24 flw ft0, (a2) vadd.vi v24, v24, 1 // v24[i] = i + 1 @@ -232,43 +232,43 @@ func ff_ps_stereo_interpolate_rvv, zve32f flw ft2, 8(a2) vfmv.v.f v16, ft0 flw ft3, 12(a2) - vfmv.v.f v17, ft1 + vfmv.v.f v18, ft1 flw ft0, (a3) - vfmv.v.f v18, ft2 + vfmv.v.f v20, ft2 flw ft1, 4(a3) - vfmv.v.f v19, ft3 + vfmv.v.f v22, ft3 flw ft2, 8(a3) flw ft3, 12(a3) fcvt.s.wu ft4, t0 // (float)(vlenb / sizeof (float)) vfmacc.vf v16, ft0, v24 // h0 += (i + 1) * h0_step fmul.s ft0, ft0, ft4 - vfmacc.vf v17, ft1, v24 + vfmacc.vf v18, ft1, v24 fmul.s ft1, ft1, ft4 - vfmacc.vf v18, ft2, v24 + vfmacc.vf v20, ft2, v24 fmul.s ft2, ft2, ft4 - vfmacc.vf v19, ft3, v24 + vfmacc.vf v22, ft3, v24 fmul.s ft3, ft3, ft4 1: - vsetvli t0, a4, e32, m1, ta, ma - vlseg2e32.v v8, (a0) // v8:l_re, v9:l_im + vsetvli t0, a4, e32, m2, ta, ma + vlseg2e32.v v0, (a0) // v0:l_re, v2:l_im sub a4, a4, t0 - vlseg2e32.v v10, (a1) // v10:r_re, v11:r_im - vfmul.vv v12, v8, v16 - vfmul.vv v13, v9, v16 - vfmul.vv v14, v8, v17 - vfmul.vv v15, v9, v17 - vfmacc.vv v12, v10, v18 - vfmacc.vv v13, v11, v18 - vfmacc.vv v14, v10, v19 - vfmacc.vv v15, v11, v19 - vsseg2e32.v v12, (a0) + vlseg2e32.v v4, (a1) // v4:r_re, v6:r_im + vfmul.vv v8, v0, v16 + vfmul.vv v10, v2, v16 + vfmul.vv v12, v0, v18 + vfmul.vv v14, v2, v18 + vfmacc.vv v8, v4, v20 + vfmacc.vv v10, v6, v20 + vfmacc.vv v12, v4, v22 + vfmacc.vv v14, v6, v22 + vsseg2e32.v v8, (a0) sh3add a0, t0, a0 - vsseg2e32.v v14, (a1) + vsseg2e32.v v12, (a1) sh3add a1, t0, a1 vfadd.vf v16, v16, ft0 // h0 += (vlenb / sizeof (float)) * h0_step - vfadd.vf v17, v17, ft1 - vfadd.vf v18, v18, ft2 - vfadd.vf v19, v19, ft3 + vfadd.vf v18, v18, ft1 + vfadd.vf v20, v20, ft2 + vfadd.vf v22, v22, ft3 bnez a4, 1b ret