From patchwork Wed May 15 17:47:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 48903 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp1761193pzb; Wed, 15 May 2024 10:47:24 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUnhrgPESNecXULtPhpPfLqja34hFBmnnCRQAMPf6jAIjlytotQsjHAqIKYCvxEuhLJp4LE9lQYcV3VVFUw3OkKFxjH+z8ng7AWYA== X-Google-Smtp-Source: AGHT+IFjLMshjTOruGNU6NPjdSkYWKsDrh5VcsnRjZ/f/zhPFKnaNvHil/tHJ2qMf3Ox+8fwygZa X-Received: by 2002:a17:906:fe07:b0:a5a:81b0:a6a9 with SMTP id a640c23a62f3a-a5a81b0a73cmr794840966b.53.1715795243704; Wed, 15 May 2024 10:47:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715795243; cv=none; d=google.com; s=arc-20160816; b=Dh6apUQrRpghTZZiXAgIVpEEGCcYl+8nKKf3NV28Cs1YeyrU8dwJ3TgyWyjLjvk48U BE4AgbDTIJC+RvufckJJRu7ljopgWePjzKe9m0Q6Ii+w3L5AhDFJGk6iK6DsH0jD7pHK hEQqeRSXbTXwQLVuWHaCXUAWLIneiFbmdJdxjJlTh40di6/q/FdRu7v1rwL2JQTB0Lvo iNeI7h/IehBfHR1fbp+MbFy78/RaRc7fi+DHMTuzW6gcGNX2HCPvn77h62ZheS1n4ES7 4VSw05wc0EI5CuvrovKlTGPETj4M3rDuIs+DJ6rMEac4NVMBX0oXlqDENoFq59D1qXrc s+7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=l9zSj2rnmChItY02T58asMeaX+mPVRbAw5bYmPRlzK4=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=Vw8Swf7WEeskHmDWg2hovmD8y/aMOV7q9vOEIL8DhY6WMsR7ZuZv7ZYDAJi4P087PO PGNoxYBffQx9qmgZeV9oel/bQkdCiyNCiTSVA0sqrYCdLgL5RRyhdpJw7gs5ybfWDQn+ a6Tyw3q05E3GgNyNsLwWxmSiqgiEH9pwoA3BzSk6mjsbIMtzSO7hbTpS2wrqwu6/QySD +egxftEf+nmKfzW3syHYsYBNzdaMkA80WEnT2bsdeBYNY97oZHke84nWi5KTSYhY54DH MaFWfbWsyTUoiiswzK/5N3YhylikL6w5Wu1nLtmhAoIEf4GwEF7Ij6dxrbzwhCeKGNv7 pIag==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17ba565csi783817266b.605.2024.05.15.10.47.23; Wed, 15 May 2024 10:47:23 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B265068D5AB; Wed, 15 May 2024 20:47:19 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5E7BA68CF1F for ; Wed, 15 May 2024 20:47:13 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id DB7E6C006D for ; Wed, 15 May 2024 20:47:12 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 15 May 2024 20:47:12 +0300 Message-ID: <20240515174712.17701-1-remi@remlab.net> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCHv2 2/2] lavc/flacdsp: optimise RVV vector type for lpc16 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: FeYnIJxcdomp This calculates the optimal vector type value at run-time based on the hardware vector length and the FLAC LPC prediction order. In this particular case, the additional computation is easily amortised over the loop iterations: T-Head C908: C V before V after 1 48.0 214.7 95.2 2 64.7 214.2 94.7 3 79.7 213.5 94.5 4 96.2 196.5 94.2 # 5 111.0 195.7 118.5 6 127.0 211.2 102.0 7 143.7 194.2 101.5 8 175.7 193.2 101.2 # 9 176.2 224.2 126.0 10 191.5 192.0 125.5 11 224.5 191.2 124.7 12 223.0 190.2 124.2 13 239.2 189.5 123.7 14 253.7 188.7 139.5 15 286.2 188.0 122.7 16 284.0 187.0 122.5 # 17 300.2 186.5 186.5 18 314.0 185.5 185.7 19 329.7 184.7 185.0 20 343.0 184.2 184.2 21 358.7 199.2 183.7 22 371.7 182.7 182.7 23 387.5 181.7 182.0 24 400.7 181.0 181.2 25 431.5 180.2 196.5 26 443.7 195.5 196.0 27 459.0 178.7 196.2 28 470.7 177.7 194.2 29 470.0 177.0 193.5 30 481.2 176.2 176.5 31 496.2 175.5 175.7 32 507.2 174.7 191.0 # # Power of two boundary. With 128-bit vectors, improvements are expected for the first two test cases only. For the other two, there is overhead but below noise. Improvements should be better observable with prediction order of 8 and less, or on hardware with larger vector sizes. The same optimisation strategy should be applicable to LPC32 (and work-in-progress LPC33), but is left as a future exercise. --- libavcodec/riscv/flacdsp_init.c | 2 +- libavcodec/riscv/flacdsp_rvv.S | 12 ++++++++++-- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/libavcodec/riscv/flacdsp_init.c b/libavcodec/riscv/flacdsp_init.c index 4f1652dbe7..735aec0691 100644 --- a/libavcodec/riscv/flacdsp_init.c +++ b/libavcodec/riscv/flacdsp_init.c @@ -71,7 +71,7 @@ av_cold void ff_flacdsp_init_riscv(FLACDSPContext *c, enum AVSampleFormat fmt, if ((flags & AV_CPU_FLAG_RVV_I32) && (flags & AV_CPU_FLAG_RVB_ADDR)) { int vlenb = ff_get_rv_vlenb(); - if (vlenb >= 16) + if ((flags & AV_CPU_FLAG_RVB_BASIC) && vlenb >= 16) c->lpc16 = ff_flac_lpc16_rvv; # if (__riscv_xlen >= 64) diff --git a/libavcodec/riscv/flacdsp_rvv.S b/libavcodec/riscv/flacdsp_rvv.S index 6287faa260..45608dfd47 100644 --- a/libavcodec/riscv/flacdsp_rvv.S +++ b/libavcodec/riscv/flacdsp_rvv.S @@ -20,8 +20,16 @@ #include "libavutil/riscv/asm.S" -func ff_flac_lpc16_rvv, zve32x - vsetvli zero, a2, e32, m8, ta, ma +func ff_flac_lpc16_rvv, zve32x, zbb + csrr t0, vlenb + addi t2, a2, -1 + clz t0, t0 + clz t2, t2 + addi t0, t0, VTYPE_E32 | VTYPE_M8 | VTYPE_TA | VTYPE_MA + li t1, VTYPE_E32 | VTYPE_M1 | VTYPE_TA | VTYPE_MA + sub t0, t0, t2 // t0 += log2(next_power_of_two(len) / vlenb) + 1 + max t0, t0, t1 + vsetvl zero, a2, t0 vle32.v v8, (a1) sub a4, a4, a2 vle32.v v16, (a0) From patchwork Wed May 15 20:16:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 48905 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp1836549pzb; Wed, 15 May 2024 13:16:32 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUWtzgNKgSYEewfS5QRpELlPIt2iFg/PFKqStJ3HOVsAE3vmSm5iaZ1GyHcke9L+JszR7hvh0UBFTvC0nVlX3FxLLrSqW+d1TJaNw== X-Google-Smtp-Source: AGHT+IET7PmzRNXCewEh55H6rT4Tox0n/ud2z8kxsHhqG9mKEoLDXxWa9YcFhjcvyWptvT2PSFTN X-Received: by 2002:a50:d59d:0:b0:572:d4fc:cc7 with SMTP id 4fb4d7f45d1cf-5734d5c0d5bmr16011510a12.12.1715804191969; Wed, 15 May 2024 13:16:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715804191; cv=none; d=google.com; s=arc-20160816; b=mlZq3BbNjpjcCW5U4Im1Wbey1eQkNMOG6USW9ZJ7N/tZRgFQt7Tq0k3lYOw5X1tubE G9LQqCbuotM5K4nOUHBZ6OS/XBjMYA6ikwZeq9RWnZkFi5XOqo6w/hjzfUcfNVBAsd6r m3A70qUhsSui1eBDdCEiH9gOWPFcvKnqaNGH6JuJvSOKMEc3iYvDWIvdvKIPAKB8DWzw nqVsIMaRNYdEuji67ctZBipFYSepgVnx9jomqGlxW+O16fyT1DeKcpQhs4UWqKoUNL06 PaKYPK4atDkCLxd+UrIjYcSpY9SXyfrFqsGpConG6SwgQGEUpCHjU3TA/HMnoBWPr3xE Rgrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=DhFvZ/E5EhIeRGd17A3Pv4GarRpqdUvTpg8jEcDfGkc=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=zT803XEOLoFJ34aHTO0W9qrh83bEdxM3ee4dd/8e3HnphLHi81zEH5PJQd19ORxatC BMzz11kJcMmYGUmHXLKSBq2d/bL8AZ/4X450KvOvTrWJIk1jmen8l5irPqbk6nUNap2W lCAvx2om+tGSAjEIMz35FHyGABGv7tqsTn2MUr0B4eRtFk/Be78FF5lbe8yGMx5SaWLH oOs6UJrJmAr9C26PN30SdiIZystv/DgWRfpr3SsZV3pnYgGrqBbo6danjTMN5o4slVgQ XfGYz3La5SGuzUaBPWoSmVyNSVj4Mo3SdstoRr9M+uMsC1B5QkwrOHvi0/fkqXi7ThG2 8Q2Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-574f35ce770si1387094a12.571.2024.05.15.13.16.31; Wed, 15 May 2024 13:16:31 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B2D6668D5F4; Wed, 15 May 2024 23:16:27 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EDB6E68D517 for ; Wed, 15 May 2024 23:16:20 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 30C70C006D for ; Wed, 15 May 2024 23:16:20 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 15 May 2024 23:16:19 +0300 Message-ID: <20240515201619.22348-1-remi@remlab.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240515174712.17701-1-remi@remlab.net> References: <20240515174712.17701-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] lavc/flacdsp: optimise RVV vector type for lpc32 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: kPJXkxH5hYjJ This is pretty much the same as for lpc16, though it only improves half as large prediction orders. With 128-bit vectors, this gives: C V old V new 1 69.2 181.5 95.5 2 107.7 180.7 95.2 3 145.5 180.0 103.5 4 183.0 179.2 102.7 5 220.7 178.5 128.0 6 257.7 194.0 127.5 7 294.5 193.7 126.7 8 331.0 193.0 126.5 Larger prediction orders see no significant changes at that size. The code is pretty ugly, so clean-up suggestions are most welcome. --- libavcodec/riscv/flacdsp_init.c | 15 ++++++++------- libavcodec/riscv/flacdsp_rvv.S | 25 ++++++++++++++++++++----- 2 files changed, 28 insertions(+), 12 deletions(-) diff --git a/libavcodec/riscv/flacdsp_init.c b/libavcodec/riscv/flacdsp_init.c index 735aec0691..830ae36534 100644 --- a/libavcodec/riscv/flacdsp_init.c +++ b/libavcodec/riscv/flacdsp_init.c @@ -71,17 +71,18 @@ av_cold void ff_flacdsp_init_riscv(FLACDSPContext *c, enum AVSampleFormat fmt, if ((flags & AV_CPU_FLAG_RVV_I32) && (flags & AV_CPU_FLAG_RVB_ADDR)) { int vlenb = ff_get_rv_vlenb(); - if ((flags & AV_CPU_FLAG_RVB_BASIC) && vlenb >= 16) + if ((flags & AV_CPU_FLAG_RVB_BASIC) && vlenb >= 16) { c->lpc16 = ff_flac_lpc16_rvv; # if (__riscv_xlen >= 64) - if (flags & AV_CPU_FLAG_RVV_I64) { - if (vlenb > 16) - c->lpc32 = ff_flac_lpc32_rvv_simple; - else - c->lpc32 = ff_flac_lpc32_rvv; - } + if (flags & AV_CPU_FLAG_RVV_I64) { + if (vlenb > 16) + c->lpc32 = ff_flac_lpc32_rvv_simple; + else + c->lpc32 = ff_flac_lpc32_rvv; + } # endif + } c->wasted32 = ff_flac_wasted32_rvv; diff --git a/libavcodec/riscv/flacdsp_rvv.S b/libavcodec/riscv/flacdsp_rvv.S index 7d83909335..b292c15c8c 100644 --- a/libavcodec/riscv/flacdsp_rvv.S +++ b/libavcodec/riscv/flacdsp_rvv.S @@ -20,6 +20,12 @@ #include "libavutil/riscv/asm.S" + .macro vnarrow rd, rs + xori \rd, \rs, 4 + addi \rd, \rd, -9 + xori \rd, \rd, 4 + .endm + func ff_flac_lpc16_rvv, zve32x, zbb csrr t0, vlenb addi t2, a2, -1 @@ -83,22 +89,31 @@ func ff_flac_lpc32_rvv, zve64x ret endfunc -func ff_flac_lpc32_rvv_simple, zve64x - vsetivli zero, 1, e64, m1, ta, ma +func ff_flac_lpc32_rvv_simple, zve64x, zbb + csrr t0, vlenb + addi t2, a2, -1 + clz t0, t0 + clz t2, t2 + addi t0, t0, (VTYPE_E64 | VTYPE_M8 | VTYPE_TA | VTYPE_MA) + 1 + li t1, VTYPE_E64 | VTYPE_M1 | VTYPE_TA | VTYPE_MA + sub t0, t0, t2 // t0 += log2(next_power_of_two(len) / vlenb) - 1 + max t3, t0, t1 + vnarrow t2, t3 + vsetvl zero, a2, t3 // e64 vmv.s.x v0, zero - vsetvli zero, a2, e32, m4, ta, ma + vsetvl zero, zero, t2 // e32 vle32.v v8, (a1) sub a4, a4, a2 vle32.v v16, (a0) sh2add a0, a2, a0 1: vwmul.vv v24, v8, v16 - vsetvli zero, zero, e64, m8, ta, ma + vsetvl zero, zero, t3 // e64 vredsum.vs v24, v24, v0 lw t0, (a0) addi a4, a4, -1 vmv.x.s t1, v24 - vsetvli zero, zero, e32, m4, ta, ma + vsetvl zero, zero, t2 // e32 sra t1, t1, a3 add t0, t0, t1 vslide1down.vx v16, v16, t0