From patchwork Mon Jul 17 19:03:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 42789 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:b813:b0:130:ccc6:6c4b with SMTP id fi19csp35596pzb; Mon, 17 Jul 2023 12:03:23 -0700 (PDT) X-Google-Smtp-Source: APBJJlFm1lXuheq9Lmtuk6+GmXq1Ga1eaIK73yQlZ15vApUhhGU0mX5yfD5NpafywyvW/mJ6bQ8a X-Received: by 2002:a05:6512:36d2:b0:4f8:e4e9:499e with SMTP id e18-20020a05651236d200b004f8e4e9499emr7749436lfs.12.1689620603488; Mon, 17 Jul 2023 12:03:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689620603; cv=none; d=google.com; s=arc-20160816; b=LVE6YGNG+/zS+GNx4UtM8GFUATXm/YnbP2CQDGJcywXJxHY4PI7ynlyIO5yvg0fhd7 C+4s0ToC2XVe0fYPpjyt6NutYaV8xTNChhi4ov7Uif5zuq8ceC0xB5aDvm48A/Iu6Rr1 GqQksLBQFpprJDPBb6cYj4Af8C9c86JX6lGDO1VWWGRYCNo+6OTaEMU7wAb/GCt21j+a /4Tr2LiTFKnWvEX495gVZTSPFwv5yo7fLh8peXLDbhtYsLMbidQ0PQyhoMe/oRbJ/xJV cu4+DYj0Ko6GQMiamYGd6oyK1cHlwX/CpbaNb/8dgRp+NWMzCqojmeAls97LfkDaoLw3 To4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=9+dLDbmoU1kRVLJzWIcWy7Do4xtcEIQay/2dVJufQck=; fh=hQcp50obTJ8bXC1it5NuEN23RGKfx0/zZ3s2gmreL+A=; b=ZC46QKhYkMCM7RqWCI0ZR61/rLcSlDxo4x+4/fb6rgASxagOIAMZW6CYyL0GoZk2p+ LdHYrKGgen4xxPEX7AFwXHbQguOXRfQCW2vgtjBx4UppZKtUT9EqKw1uWLj1IjeGqZae Nx8nJawUPdeAuOkmLjcMgNauPT1m6wHmwNbAI4nz8HH4San2ztsk4Ah6Q08CulE/M/Z4 Jt+d/wf7SaB+D6b6fwmfbCb7QolaxG44kR6RtxN8+z4crhH7LgrOJOkTld9SZBCYn3/B /P4O61UJZ1id7jHwlqrQg+kiQbM3GstVYqsUVf64A/6m+IS8ow3wPySIXv+WLY/hOwyW nQiA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id d17-20020a056402517100b0051e472983d1si5725ede.524.2023.07.17.12.03.22; Mon, 17 Jul 2023 12:03:23 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 435AF68C4DE; Mon, 17 Jul 2023 22:03:19 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2BC9368C419 for ; Mon, 17 Jul 2023 22:03:13 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id B707AC0014 for ; Mon, 17 Jul 2023 22:03:12 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 17 Jul 2023 22:03:12 +0300 Message-Id: <20230717190312.33395-1-remi@remlab.net> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] lavc/audiodsp: rework RISC-V V scalar product X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 6cpSO+LsroUr Take vector reduction out of the loop and unroll. Before: audiodsp.scalarproduct_int16_c: 12321.0 audiodsp.scalarproduct_int16_rvv_i32: 4175.7 After: audiodsp.scalarproduct_int16_c: 12320.5 audiodsp.scalarproduct_int16_rvv_i32: 1230.2 --- libavcodec/riscv/audiodsp_rvv.S | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/libavcodec/riscv/audiodsp_rvv.S b/libavcodec/riscv/audiodsp_rvv.S index af1e07bef9..f7eba2114f 100644 --- a/libavcodec/riscv/audiodsp_rvv.S +++ b/libavcodec/riscv/audiodsp_rvv.S @@ -21,21 +21,22 @@ #include "libavutil/riscv/asm.S" func ff_scalarproduct_int16_rvv, zve32x - vsetivli zero, 1, e32, m1, ta, ma - vmv.s.x v8, zero + vsetvli t0, zero, e32, m8, ta, ma + vmv.v.x v8, zero + vmv.s.x v0, zero 1: - vsetvli t0, a2, e16, m1, ta, ma + vsetvli t0, a2, e16, m4, tu, ma vle16.v v16, (a0) sub a2, a2, t0 vle16.v v24, (a1) sh1add a0, t0, a0 - vwmul.vv v0, v16, v24 + vwmacc.vv v8, v16, v24 sh1add a1, t0, a1 - vsetvli zero, t0, e32, m2, ta, ma - vredsum.vs v8, v0, v8 bnez a2, 1b - vmv.x.s a0, v8 + vsetvli t0, zero, e32, m8, ta, ma + vredsum.vs v0, v8, v0 + vmv.x.s a0, v0 ret endfunc