From patchwork Thu May 16 16:48:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 48931 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp2365203pzb; Thu, 16 May 2024 09:48:54 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUf93fLPE21p1eLTC8omoLhUUtcPZUBWS//KbYavfWRuYPErLJI2fvqhS0+e4V+73BLtOYkAeb1e4vmc8pRz2mIJiVdha1Q/Xsgsw== X-Google-Smtp-Source: AGHT+IHuzbpkWO1u/N5bDQNFfrBK9T4kmPVs4/1L9wIbVga+7X5NGc3/hfzgf5r+X/IYalvdbC+N X-Received: by 2002:a05:6512:1321:b0:523:93e8:1cf3 with SMTP id 2adb3069b0e04-52393e820a7mr6470818e87.51.1715878134112; Thu, 16 May 2024 09:48:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715878134; cv=none; d=google.com; s=arc-20160816; b=OPwjPys7GcO2n18GPH3VmdPnu0xl8jh1F5oYHJ5a9zcKyuYz19kLatrkH8FDtbeggs JQ49J9XUTaz87X2nNO23lKg5mku8gzwHkVoW+IuMyCH0q6fQbBoU3H3So9k8MhDQyWy6 Rbq6LtQL+0Y0L71vxUJ9r0u/3B3kiKnHm0lsISJlBjuCvSCHmT8zwISdg0QGJBZ6Tmiw jNnMRSyuqHwnnqOv3tAnio4j2I7PeKRQZnPfOPVPPLr0r4npmSgcSSQtGwsMcCTUHZRZ O4VMWMQ0IdP6ohGgrZODpXu3SaXH0rlcPbulgZNTPQmyyk6fjVeQN5/5KGK7PPuO0R9q Uq9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=+ORwVUXKQZizpRXAbAC8Tw+Su0++1+ZSqhZ5XCnzgSI=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=U79ZgLsGF4E8bvP60jaJUv69TLhkDkwkTmUHKyEPBD+b7wtmvX1iCz/LHQZ7UxOipe ReYbLtqlyvkombYlXxxAqRQq/9UcnnZl83x0VUtjQ/luRW0tUpFGUNKXx0ldvyBZAeP2 AE9iEB6jHPjqmTM5rcMs5cbcwC3ld7AUzJ1JiOvR543xf7GNNQgrEpbTc+b8KbPZ+pR6 HEJvX2GXBR0sYmjuhH4e7dhmRW0dfiU1DGk+A9R0bjlis5vJIGh2KrI964MLRPlLpv7r ldTKk9DzM2cwmPN7wHrYeLWPiYv2AwRWEXK0FeBbkHMQQs1ntDlsDEurbd/U1zZWUrUe S2Zw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-521f39d32a0si5765254e87.486.2024.05.16.09.48.53; Thu, 16 May 2024 09:48:54 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DDD8068D363; Thu, 16 May 2024 19:48:47 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4493F68C72F for ; Thu, 16 May 2024 19:48:41 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 96CD2C013B for ; Thu, 16 May 2024 19:48:40 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Thu, 16 May 2024 19:48:37 +0300 Message-ID: <20240516164840.19025-1-remi@remlab.net> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCHv4 1/4] lavu/riscv: add assembler macros for adjusting vector LMUL X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: UQ/rhbeiAzV+ vtype_vli computes the VTYPE value with the optimal LMUL for a given element width, tail and mask policies and a run-time vector length. vtype_ivli does the same, but with the compile-time constant vector length. vwtypei and vntypei can be used to widen or narrow a VTYPE value for use in mixed-width vector-optimised functions. --- libavutil/riscv/asm.S | 166 +++++++++++++++++++++++++++++------------- 1 file changed, 117 insertions(+), 49 deletions(-) diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S index 14be5055f5..1e6358dcb5 100644 --- a/libavutil/riscv/asm.S +++ b/libavutil/riscv/asm.S @@ -96,77 +96,145 @@ .endm #endif - /* Convenience macro to load a Vector type (vtype) as immediate */ - .macro lvtypei rd, e, m=m1, tp=tu, mp=mu +#if defined (__riscv_v_elen) +# define RV_V_ELEN __riscv_v_elen +#else +/* Run-time detection of the V extension implies ELEN >= 64. */ +# define RV_V_ELEN 64 +#endif +#if RV_V_ELEN == 32 +# define VSEW_MAX 2 +#else +# define VSEW_MAX 3 +#endif - .ifc \e,e8 - .equ ei, 0 + .macro parse_vtype ew, tp, mp + .ifc \ew,e8 + .equ vsew, 0 .else - .ifc \e,e16 - .equ ei, 8 + .ifc \ew,e16 + .equ vsew, 1 .else - .ifc \e,e32 - .equ ei, 16 + .ifc \ew,e32 + .equ vsew, 2 .else - .ifc \e,e64 - .equ ei, 24 + .ifc \ew,e64 + .equ vsew, 3 .else - .error "Unknown element type" + .error "Unknown element width \ew" .endif .endif .endif .endif - .ifc \m,m1 - .equ mi, 0 - .else - .ifc \m,m2 - .equ mi, 1 - .else - .ifc \m,m4 - .equ mi, 2 + .ifc \tp,tu + .equ tp, 0 .else - .ifc \m,m8 - .equ mi, 3 + .ifc \tp,ta + .equ tp, 1 .else - .ifc \m,mf8 - .equ mi, 5 - .else - .ifc \m,mf4 - .equ mi, 6 - .else - .ifc \m,mf2 - .equ mi, 7 - .else - .error "Unknown multiplier" - .equ mi, 3 - .endif - .endif - .endif - .endif - .endif + .error "Unknown tail policy \tp" .endif .endif - .ifc \tp,tu - .equ tpi, 0 + .ifc \mp,mu + .equ mp, 0 .else - .ifc \tp,ta - .equ tpi, 64 + .ifc \mp,ma + .equ mp, 1 .else - .error "Unknown tail policy" + .error "Unknown mask policy \mp" .endif .endif + .endm - .ifc \mp,mu - .equ mpi, 0 - .else - .ifc \mp,ma - .equ mpi, 128 + /** + * Gets the vector type with the smallest suitable LMUL value. + * @param[out] rd vector type destination register + * @param vl vector length constant + * @param ew element width: e8, e16, e32 or e64 + * @param tp tail policy: tu or ta + * @param mp mask policty: mu or ma + */ + .macro vtype_ivli rd, avl, ew, tp=tu, mp=mu + .if \avl <= 1 + .equ log2vl, 0 + .elseif \avl <= 2 + .equ log2vl, 1 + .elseif \avl <= 4 + .equ log2vl, 2 + .elseif \avl <= 8 + .equ log2vl, 3 + .elseif \avl <= 16 + .equ log2vl, 4 + .elseif \avl <= 32 + .equ log2vl, 5 + .elseif \avl <= 64 + .equ log2vl, 6 + .elseif \avl <= 128 + .equ log2vl, 7 .else - .error "Unknown mask policy" + .error "Vector length \avl out of range" .endif + parse_vtype \ew, \tp, \mp + csrr \rd, vlenb + clz \rd, \rd + addi \rd, \rd, log2vl + 1 + VSEW_MAX - __riscv_xlen + max \rd, \rd, zero // VLMUL must be >= VSEW - VSEW_MAX + .if vsew < VSEW_MAX + addi \rd, \rd, vsew - VSEW_MAX + andi \rd, \rd, 7 .endif + ori \rd, \rd, (vsew << 3) | (tp << 6) | (mp << 7) + .endm + + /** + * Gets the vector type with the smallest suitable LMUL value. + * @param[out] rd vector type destination register + * @param rs vector length source register + * @param[out] tmp temporary register to be clobbered + * @param ew element width: e8, e16, e32 or e64 + * @param tp tail policy: tu or ta + * @param mp mask policty: mu or ma + */ + .macro vtype_vli rd, rs, tmp, ew, tp=tu, mp=mu + parse_vtype \ew, \tp, \mp + /* + * The difference between the CLZ's notionally equals the VLMUL value + * for 4-bit elements. But we want the value for SEW_MAX-bit elements. + */ + slli \tmp, \rs, 1 + VSEW_MAX + csrr \rd, vlenb + addi \tmp, \tmp, -1 + clz \rd, \rd + clz \tmp, \tmp + sub \rd, \rd, \tmp + max \rd, \rd, zero // VLMUL must be >= VSEW - VSEW_MAX + .if vsew < VSEW_MAX + addi \rd, \rd, vsew - VSEW_MAX + andi \rd, \rd, 7 + .endif + ori \rd, \rd, (vsew << 3) | (tp << 6) | (mp << 7) + .endm + + /** + * Widens a vector type. + * @param[out] rd widened vector type destination register + * @param rs vector type source register + * @param n number of times to widen (once by default) + */ + .macro vwtypei rd, rs, n=1 + xori \rd, \rs, 4 + addi \rd, \rd, (\n) * 011 + xori \rd, \rd, 4 + .endm - li \rd, (ei | mi | tpi | mpi) + /** + * Narrows a vector type. + * @param[out] rd narrowed vector type destination register + * @param rs vector type source register + * @param n number of times to narrow (once by default) + */ + .macro vntypei rd, rs, n=1 + vwtypei \rd, \rs, -(\n) .endm From patchwork Thu May 16 16:48:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 48932 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp2365298pzb; Thu, 16 May 2024 09:49:04 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWn0pAJjqzzOZbLuy/m/jsgZpIjFNzfuCJh9tHkvqU0OGKbIFl277pvml+t2Xvb7cm10+sAUuquJ2CikkWETXec5NnAQE2gNtKHBg== X-Google-Smtp-Source: AGHT+IFbC25mYMS/n62kCSsFAbr7mFQ+sPg4YvvVGUGmPnKx6D9n0J713WQOHNPZCFfbHb8vQDqP X-Received: by 2002:a17:907:7f94:b0:a59:ab57:741e with SMTP id a640c23a62f3a-a5a2d675779mr1692473766b.76.1715878143985; Thu, 16 May 2024 09:49:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715878143; cv=none; d=google.com; s=arc-20160816; b=J5x+F5uBy3KwP3K6HMeuz++GmiA6lcW5VQTm5XhyRy2r+2iBOmcgLQ53xICZwhXItc LeNF8WEveqyJYJN42EsvXJOTs6YZz7rCNbo2f7NhwMo43seMd7aOctAQNY4xIR1Tc8c1 +iCnUsUKOwFqz+DAOFQoDmA+OX89ccfxyJhAOZXEgE88FQfmmNSLVOkoBsWr8Poj744k SK8OnT2GcpqWf2SObsmDYXysh3/iVvTcQqjSAjR6H66lGhGe+kth61zzw/k2ZffIBjES Sf9/w4Cu+Bquj/hKTzYr7+CO44gVT06ubusKvOnbSKLwCXlOFiiSiNlKC7TO2wUQEtCd SFKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=h5s64xFuuPMMa8hy6npAnXvDo7drSsRPnAJB0HDqs9k=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=BIJ1NK4O8aeVX+k2j7ykyZekt+sXLptLxFLoGDgRHjQSSepb1UBR9PRMergbc8OsIM BR7735UOVYcbwcAWNPYHEKbFmvv5Hu8WwFnAZt4+XTZTCuDSx1DC9rUi/eQ8GTz0ZNe7 FZd7ndcCtMj9SHhcyE7olWwfu6lRWqFHkMvN7XO8b3OgP7qQiXdC45u7i1cTRBFL95pb PbmBj7fz4ezb9F31J8LmG/CYxvayFyWiUqj2XLne+vYIod1RkK8ZZh4tD9TarCZ35Nm8 awSBaXKkWuT84Me1MQq69DqkJTDHiXtJM04bRIIP0E8jj32YZqf8vuoqEQM/pMvvXFgw hKmw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5cdfeda045si123329066b.205.2024.05.16.09.49.03; Thu, 16 May 2024 09:49:03 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 46FA468D4AD; Thu, 16 May 2024 19:48:49 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 47A6168D3E9 for ; Thu, 16 May 2024 19:48:41 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id CB183C013D for ; Thu, 16 May 2024 19:48:40 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Thu, 16 May 2024 19:48:38 +0300 Message-ID: <20240516164840.19025-2-remi@remlab.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240516164840.19025-1-remi@remlab.net> References: <20240516164840.19025-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCHv4 2/4] lavc/flacdsp: optimise RVV vector type for lpc16 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: LALDAnvlt2CP This calculates the optimal vector type value at run-time based on the hardware vector length and the FLAC LPC prediction order. In this particular case, the additional computation is easily amortised over the loop iterations: T-Head C908: C V before V after 1 48.0 214.7 95.2 2 64.7 214.2 94.7 3 79.7 213.5 94.5 4 96.2 196.5 94.2 # 5 111.0 195.7 118.5 6 127.0 211.2 102.0 7 143.7 194.2 101.5 8 175.7 193.2 101.2 # 9 176.2 224.2 126.0 10 191.5 192.0 125.5 11 224.5 191.2 124.7 12 223.0 190.2 124.2 13 239.2 189.5 123.7 14 253.7 188.7 139.5 15 286.2 188.0 122.7 16 284.0 187.0 122.5 # 17 300.2 186.5 186.5 18 314.0 185.5 185.7 19 329.7 184.7 185.0 20 343.0 184.2 184.2 21 358.7 199.2 183.7 22 371.7 182.7 182.7 23 387.5 181.7 182.0 24 400.7 181.0 181.2 25 431.5 180.2 196.5 26 443.7 195.5 196.0 27 459.0 178.7 196.2 28 470.7 177.7 194.2 29 470.0 177.0 193.5 30 481.2 176.2 176.5 31 496.2 175.5 175.7 32 507.2 174.7 191.0 # # Power of two boundary. With 128-bit vectors, improvements are expected for the first two test cases only. For the other two, there is overhead but below noise. Improvements should be better observable with prediction order of 8 and less, or on hardware with larger vector sizes. The same optimisation strategy should be applicable to LPC32 (and work-in-progress LPC33), but is left as a future exercise. flac lpc16 --- libavcodec/riscv/flacdsp_init.c | 2 +- libavcodec/riscv/flacdsp_rvv.S | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/libavcodec/riscv/flacdsp_init.c b/libavcodec/riscv/flacdsp_init.c index 4f1652dbe7..735aec0691 100644 --- a/libavcodec/riscv/flacdsp_init.c +++ b/libavcodec/riscv/flacdsp_init.c @@ -71,7 +71,7 @@ av_cold void ff_flacdsp_init_riscv(FLACDSPContext *c, enum AVSampleFormat fmt, if ((flags & AV_CPU_FLAG_RVV_I32) && (flags & AV_CPU_FLAG_RVB_ADDR)) { int vlenb = ff_get_rv_vlenb(); - if (vlenb >= 16) + if ((flags & AV_CPU_FLAG_RVB_BASIC) && vlenb >= 16) c->lpc16 = ff_flac_lpc16_rvv; # if (__riscv_xlen >= 64) diff --git a/libavcodec/riscv/flacdsp_rvv.S b/libavcodec/riscv/flacdsp_rvv.S index 6287faa260..e1a20ce8e1 100644 --- a/libavcodec/riscv/flacdsp_rvv.S +++ b/libavcodec/riscv/flacdsp_rvv.S @@ -20,8 +20,9 @@ #include "libavutil/riscv/asm.S" -func ff_flac_lpc16_rvv, zve32x - vsetvli zero, a2, e32, m8, ta, ma +func ff_flac_lpc16_rvv, zve32x, zbb + vtype_vli t0, a2, t2, e32, ta, ma + vsetvl zero, a2, t0 vle32.v v8, (a1) sub a4, a4, a2 vle32.v v16, (a0) From patchwork Thu May 16 16:48:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 48933 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp2365380pzb; Thu, 16 May 2024 09:49:13 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCW6mT2XOM7NGsoD25iNCMR2A52izJVfhEC0Vy+1ffO1EuC97khQPXwe3GoOYej8zwhQO9byJH7VkDbPybFxRc/5kO77mOxwkWNt7g== X-Google-Smtp-Source: AGHT+IGPdYdiP4i4zEmt6SY9wV5O1PChqWXUvmE8Vhc9vOFVfMtxnJW2ZtOwmJ/EnYpSyysiVMXF X-Received: by 2002:ac2:5f92:0:b0:523:a6c2:f10f with SMTP id 2adb3069b0e04-523a6c2f317mr2662812e87.38.1715878153496; Thu, 16 May 2024 09:49:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715878153; cv=none; d=google.com; s=arc-20160816; b=mgogeKaarR0L9GFNeAF6LlFjqAn92pZrOM1dhsh1RhO/VnNi+UOn4zqwBd0OXRhWzY 9H9ZOYbIlvclKMX4Wr7PnnJI1qOdyIbV5U0UNuAe3wkNc12lTcG6CGLkwK4xVat/Bldl E9ek4kbaoEXelAPIwuooIET6wwPCk256jMq7fFRfPZBq8ZBiHfkduHwOZM7fBCDQuBkW H65cQjL9XmoFVThOw0sVdJcDbY32w791mm5ejspncpHPtVe4UDtc/XCQ1gF7eNT4ctGA X24cHNOQV1qSDGaRpKSl2kkX5gE10EBFz2dpY6fy1lnBl1XWxdWAQhKK55+JQb8jV+xH W/JQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=F/sKomaW1BG4Q4OYFebllXv160n+WHs3mnLuys/i8Fo=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=bhJHORSMyy1zfS7btR10EA07zh0/bp7I3lc8Hqa9UshxcW65aR2UDDPuS5zD9L1WIL v5cJxbiwSkqeatR6oMy2aSaqvQvit5dGJan2Og57MPYTAqzkkJjzyWiCLhStRNY9zbFs u0kyY5bQ3upDjuYcbY3WNqpyjKIHxX+MF7X4/QhnkNOZRldZeliBq4chCUXoLiURNPv4 3KA63YoGIdpOGiq29lizYlzJdgVUYwwt6ootdHKXe/VR/+BpB4/H4kDMZchJqqH8qqy4 /pMqKwKy1HIOhGmpC6LQgsBLn7w1j0BLywkH1P8z/Y/AGXfHzqNz6P/BV8/lHKPEa1PE 5MQg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-521f38d3d4fsi5343339e87.226.2024.05.16.09.49.12; Thu, 16 May 2024 09:49:13 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4D7DC68D4BD; Thu, 16 May 2024 19:48:50 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5E52868D44A for ; Thu, 16 May 2024 19:48:41 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 08673C01A9 for ; Thu, 16 May 2024 19:48:40 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Thu, 16 May 2024 19:48:39 +0300 Message-ID: <20240516164840.19025-3-remi@remlab.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240516164840.19025-1-remi@remlab.net> References: <20240516164840.19025-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCHv4 3/4] lavc/flacdsp: optimise RVV vector type for lpc32 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: UTYrqBic6pLG This is pretty much the same as for lpc16, though it only improves half as large prediction orders. With 128-bit vectors, this gives: C V old V new 1 69.2 181.5 95.5 2 107.7 180.7 95.2 3 145.5 180.0 103.5 4 183.0 179.2 102.7 5 220.7 178.5 128.0 6 257.7 194.0 127.5 7 294.5 193.7 126.7 8 331.0 193.0 126.5 Larger prediction orders see no significant changes at that size. --- libavcodec/riscv/flacdsp_init.c | 15 ++++++++------- libavcodec/riscv/flacdsp_rvv.S | 12 +++++++----- 2 files changed, 15 insertions(+), 12 deletions(-) diff --git a/libavcodec/riscv/flacdsp_init.c b/libavcodec/riscv/flacdsp_init.c index 735aec0691..830ae36534 100644 --- a/libavcodec/riscv/flacdsp_init.c +++ b/libavcodec/riscv/flacdsp_init.c @@ -71,17 +71,18 @@ av_cold void ff_flacdsp_init_riscv(FLACDSPContext *c, enum AVSampleFormat fmt, if ((flags & AV_CPU_FLAG_RVV_I32) && (flags & AV_CPU_FLAG_RVB_ADDR)) { int vlenb = ff_get_rv_vlenb(); - if ((flags & AV_CPU_FLAG_RVB_BASIC) && vlenb >= 16) + if ((flags & AV_CPU_FLAG_RVB_BASIC) && vlenb >= 16) { c->lpc16 = ff_flac_lpc16_rvv; # if (__riscv_xlen >= 64) - if (flags & AV_CPU_FLAG_RVV_I64) { - if (vlenb > 16) - c->lpc32 = ff_flac_lpc32_rvv_simple; - else - c->lpc32 = ff_flac_lpc32_rvv; - } + if (flags & AV_CPU_FLAG_RVV_I64) { + if (vlenb > 16) + c->lpc32 = ff_flac_lpc32_rvv_simple; + else + c->lpc32 = ff_flac_lpc32_rvv; + } # endif + } c->wasted32 = ff_flac_wasted32_rvv; diff --git a/libavcodec/riscv/flacdsp_rvv.S b/libavcodec/riscv/flacdsp_rvv.S index e1a20ce8e1..2941928465 100644 --- a/libavcodec/riscv/flacdsp_rvv.S +++ b/libavcodec/riscv/flacdsp_rvv.S @@ -76,22 +76,24 @@ func ff_flac_lpc32_rvv, zve64x ret endfunc -func ff_flac_lpc32_rvv_simple, zve64x - vsetivli zero, 1, e64, m1, ta, ma +func ff_flac_lpc32_rvv_simple, zve64x, zbb + vtype_vli t3, a2, t1, e64, ta, ma + vntypei t2, t3 + vsetvl zero, a2, t3 // e64 vmv.s.x v0, zero - vsetvli zero, a2, e32, m4, ta, ma + vsetvl zero, zero, t2 // e32 vle32.v v8, (a1) sub a4, a4, a2 vle32.v v16, (a0) sh2add a0, a2, a0 1: vwmul.vv v24, v8, v16 - vsetvli zero, zero, e64, m8, ta, ma + vsetvl zero, zero, t3 // e64 vredsum.vs v24, v24, v0 lw t0, (a0) addi a4, a4, -1 vmv.x.s t1, v24 - vsetvli zero, zero, e32, m4, ta, ma + vsetvl zero, zero, t2 // e32 sra t1, t1, a3 add t0, t0, t1 vslide1down.vx v16, v16, t0 From patchwork Thu May 16 16:48:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 48934 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp2365471pzb; Thu, 16 May 2024 09:49:22 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWeJ2CGejMD1nZtz/yogitsZCzjzR+4hGzYOiAEDl4zO7SIh/+TdW2HeiDAYWxQuBhurpOGXS48sV0pymZ8DoWDtPgIMmsWQiLtfA== X-Google-Smtp-Source: AGHT+IHd3YsWQgb6mmUDQtv13K51ffVyyZPX64Y1lNpJ1wyZkdWkgMdpuVI1OqjgfWdK9CNWQzDj X-Received: by 2002:a05:6402:3483:b0:574:fe51:b3b9 with SMTP id 4fb4d7f45d1cf-574fe51b4eemr3796883a12.1.1715878162169; Thu, 16 May 2024 09:49:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715878162; cv=none; d=google.com; s=arc-20160816; b=Gk6Atp1bjLO99h2WISfcwU2COI4DwW0zoR3GjAHMbuY6rwK/wN6IslLMRtidEfJTOl oQM2A5j7DdLJfqyNqtYO0j76hvJmVCr/k0upM6kp97DZB+5HOmmp+JMZT726G+zz4Cka v97dMdIExeuk5JO+ZCzFE1SoQpDMptZdAGwHWC8RwweePi9SGaROjSuq4VJMVoEgxhIz I8AiM0DScYz76AC2wkPHC1p7wFK1KlrH701A/zOUQuNHqoEXvMKxBYwVL+HToMqhtfGP I6z5u7mQwnMVJgYFLmBBMQ+sOZ4XmHwI97OV6G05djSvz74ECFHcXPxF1b9p2kji8Yy6 hfSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=6mFA88bCt8VG5Z26v2b/Kh9TR02t4umx0PaAdlHvAoI=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=CaFlJet4tHBoN+NoS/YbbJGz9r02UuGOeEEYLNG1Z9tUg+WCVc8ap+OLVX8XbhaklB EgFoIDXuvhoT0YEhG7IcO6oYRizjZ2ITZ9sGNjUCok1E2RlYNqnhfFGPJob5WXg6Kam/ e89q3NvDPLTVmL3eW6VgCV6Ux5TyV8KtvRS41YNQQN6+Cr2rO03vO/sN8vaHVOGIkxP/ H6cl72k3Q+7ZKp0xfMpsSUTX4CIKYvkMjZoT3Y15RX4mLMFWJ9OSx9NE0FuEiIM6sWyP dy+YGFBPCAw7iyfA8Vdz1O7fkK+lAtB72Q7NhAudyfdFsrkljrrGwJFLcBAAtzJSwNmv Ltqw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5750ca6c206si746528a12.129.2024.05.16.09.49.21; Thu, 16 May 2024 09:49:22 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7CE2A68D47E; Thu, 16 May 2024 19:48:51 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AD76268D44A for ; Thu, 16 May 2024 19:48:41 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 3531EC01F0 for ; Thu, 16 May 2024 19:48:41 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Thu, 16 May 2024 19:48:40 +0300 Message-ID: <20240516164840.19025-4-remi@remlab.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240516164840.19025-1-remi@remlab.net> References: <20240516164840.19025-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCHv4 4/4] lavc/huffyuvdsp: optimise RVV vtype for add_hfyu_left_pred_bgr32 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: aCSE+/9sT4lJ T-Head C908: add_hfyu_left_pred_bgr32_c: 237.5 add_hfyu_left_pred_bgr32_rvv_i32: 173.5 (before) add_hfyu_left_pred_bgr32_rvv_i32: 110.0 (after) --- libavcodec/riscv/huffyuvdsp_rvv.S | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libavcodec/riscv/huffyuvdsp_rvv.S b/libavcodec/riscv/huffyuvdsp_rvv.S index 9c4434907d..d334f5c6d0 100644 --- a/libavcodec/riscv/huffyuvdsp_rvv.S +++ b/libavcodec/riscv/huffyuvdsp_rvv.S @@ -36,8 +36,10 @@ func ff_add_int16_rvv, zve32x ret endfunc -func ff_add_hfyu_left_pred_bgr32_rvv, zve32x - vsetivli zero, 4, e8, m1, ta, ma +func ff_add_hfyu_left_pred_bgr32_rvv, zve32x, zbb + vtype_ivli t1, 4, e8, ta, ma + li t0, 4 + vsetvl zero, t0, t1 vle8.v v8, (a3) sh2add a2, a2, a1 1: