From patchwork Wed May 15 17:47:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 48903 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp1761193pzb; Wed, 15 May 2024 10:47:24 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUnhrgPESNecXULtPhpPfLqja34hFBmnnCRQAMPf6jAIjlytotQsjHAqIKYCvxEuhLJp4LE9lQYcV3VVFUw3OkKFxjH+z8ng7AWYA== X-Google-Smtp-Source: AGHT+IFjLMshjTOruGNU6NPjdSkYWKsDrh5VcsnRjZ/f/zhPFKnaNvHil/tHJ2qMf3Ox+8fwygZa X-Received: by 2002:a17:906:fe07:b0:a5a:81b0:a6a9 with SMTP id a640c23a62f3a-a5a81b0a73cmr794840966b.53.1715795243704; Wed, 15 May 2024 10:47:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715795243; cv=none; d=google.com; s=arc-20160816; b=Dh6apUQrRpghTZZiXAgIVpEEGCcYl+8nKKf3NV28Cs1YeyrU8dwJ3TgyWyjLjvk48U BE4AgbDTIJC+RvufckJJRu7ljopgWePjzKe9m0Q6Ii+w3L5AhDFJGk6iK6DsH0jD7pHK hEQqeRSXbTXwQLVuWHaCXUAWLIneiFbmdJdxjJlTh40di6/q/FdRu7v1rwL2JQTB0Lvo iNeI7h/IehBfHR1fbp+MbFy78/RaRc7fi+DHMTuzW6gcGNX2HCPvn77h62ZheS1n4ES7 4VSw05wc0EI5CuvrovKlTGPETj4M3rDuIs+DJ6rMEac4NVMBX0oXlqDENoFq59D1qXrc s+7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=l9zSj2rnmChItY02T58asMeaX+mPVRbAw5bYmPRlzK4=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=Vw8Swf7WEeskHmDWg2hovmD8y/aMOV7q9vOEIL8DhY6WMsR7ZuZv7ZYDAJi4P087PO PGNoxYBffQx9qmgZeV9oel/bQkdCiyNCiTSVA0sqrYCdLgL5RRyhdpJw7gs5ybfWDQn+ a6Tyw3q05E3GgNyNsLwWxmSiqgiEH9pwoA3BzSk6mjsbIMtzSO7hbTpS2wrqwu6/QySD +egxftEf+nmKfzW3syHYsYBNzdaMkA80WEnT2bsdeBYNY97oZHke84nWi5KTSYhY54DH MaFWfbWsyTUoiiswzK/5N3YhylikL6w5Wu1nLtmhAoIEf4GwEF7Ij6dxrbzwhCeKGNv7 pIag==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17ba565csi783817266b.605.2024.05.15.10.47.23; Wed, 15 May 2024 10:47:23 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B265068D5AB; Wed, 15 May 2024 20:47:19 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5E7BA68CF1F for ; Wed, 15 May 2024 20:47:13 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id DB7E6C006D for ; Wed, 15 May 2024 20:47:12 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 15 May 2024 20:47:12 +0300 Message-ID: <20240515174712.17701-1-remi@remlab.net> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCHv2 2/2] lavc/flacdsp: optimise RVV vector type for lpc16 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: FeYnIJxcdomp This calculates the optimal vector type value at run-time based on the hardware vector length and the FLAC LPC prediction order. In this particular case, the additional computation is easily amortised over the loop iterations: T-Head C908: C V before V after 1 48.0 214.7 95.2 2 64.7 214.2 94.7 3 79.7 213.5 94.5 4 96.2 196.5 94.2 # 5 111.0 195.7 118.5 6 127.0 211.2 102.0 7 143.7 194.2 101.5 8 175.7 193.2 101.2 # 9 176.2 224.2 126.0 10 191.5 192.0 125.5 11 224.5 191.2 124.7 12 223.0 190.2 124.2 13 239.2 189.5 123.7 14 253.7 188.7 139.5 15 286.2 188.0 122.7 16 284.0 187.0 122.5 # 17 300.2 186.5 186.5 18 314.0 185.5 185.7 19 329.7 184.7 185.0 20 343.0 184.2 184.2 21 358.7 199.2 183.7 22 371.7 182.7 182.7 23 387.5 181.7 182.0 24 400.7 181.0 181.2 25 431.5 180.2 196.5 26 443.7 195.5 196.0 27 459.0 178.7 196.2 28 470.7 177.7 194.2 29 470.0 177.0 193.5 30 481.2 176.2 176.5 31 496.2 175.5 175.7 32 507.2 174.7 191.0 # # Power of two boundary. With 128-bit vectors, improvements are expected for the first two test cases only. For the other two, there is overhead but below noise. Improvements should be better observable with prediction order of 8 and less, or on hardware with larger vector sizes. The same optimisation strategy should be applicable to LPC32 (and work-in-progress LPC33), but is left as a future exercise. --- libavcodec/riscv/flacdsp_init.c | 2 +- libavcodec/riscv/flacdsp_rvv.S | 12 ++++++++++-- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/libavcodec/riscv/flacdsp_init.c b/libavcodec/riscv/flacdsp_init.c index 4f1652dbe7..735aec0691 100644 --- a/libavcodec/riscv/flacdsp_init.c +++ b/libavcodec/riscv/flacdsp_init.c @@ -71,7 +71,7 @@ av_cold void ff_flacdsp_init_riscv(FLACDSPContext *c, enum AVSampleFormat fmt, if ((flags & AV_CPU_FLAG_RVV_I32) && (flags & AV_CPU_FLAG_RVB_ADDR)) { int vlenb = ff_get_rv_vlenb(); - if (vlenb >= 16) + if ((flags & AV_CPU_FLAG_RVB_BASIC) && vlenb >= 16) c->lpc16 = ff_flac_lpc16_rvv; # if (__riscv_xlen >= 64) diff --git a/libavcodec/riscv/flacdsp_rvv.S b/libavcodec/riscv/flacdsp_rvv.S index 6287faa260..45608dfd47 100644 --- a/libavcodec/riscv/flacdsp_rvv.S +++ b/libavcodec/riscv/flacdsp_rvv.S @@ -20,8 +20,16 @@ #include "libavutil/riscv/asm.S" -func ff_flac_lpc16_rvv, zve32x - vsetvli zero, a2, e32, m8, ta, ma +func ff_flac_lpc16_rvv, zve32x, zbb + csrr t0, vlenb + addi t2, a2, -1 + clz t0, t0 + clz t2, t2 + addi t0, t0, VTYPE_E32 | VTYPE_M8 | VTYPE_TA | VTYPE_MA + li t1, VTYPE_E32 | VTYPE_M1 | VTYPE_TA | VTYPE_MA + sub t0, t0, t2 // t0 += log2(next_power_of_two(len) / vlenb) + 1 + max t0, t0, t1 + vsetvl zero, a2, t0 vle32.v v8, (a1) sub a4, a4, a2 vle32.v v16, (a0)