From patchwork Sat Jul 6 10:52:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 50376 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:cc64:0:b0:482:c625:d099 with SMTP id k4csp4807077vqv; Sat, 6 Jul 2024 03:52:46 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWJBCagdu6Kf0r9tT6pfmYs8e5PvEAStrUpN0fSOzca+9Ot9RMx8+Hdd4vKsw7BiTCnPeqKEqy6TMlvXyok9a0Qu18TfVlKTsMkFQ== X-Google-Smtp-Source: AGHT+IGSqW74HzEOVlnBh0fw1bab9u30E5LZLN/k4QOPTGjppCMSckd9n4fuN2GoRt1moMpY3SRF X-Received: by 2002:a05:6402:51d2:b0:58b:a92f:2917 with SMTP id 4fb4d7f45d1cf-58e5cc0abb8mr4638778a12.36.1720263165907; Sat, 06 Jul 2024 03:52:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1720263165; cv=none; d=google.com; s=arc-20160816; b=PSvC/3co7fssDBamha1PeUswXmycZoK9yepDY7Sbd/+Z27yhIs1nI57zvwaqtGAkwi n4DZYoRMdhtsVR8cw6WwaNF7vB4StVKWAvl0xNYXOvgmAc6sQDwV4k1skH8rG4nesTzW xgG4iHiypBt9YG/wu5oOQswAXc9OGGQhUzUj00k6On/U9FL+BglGREec2nr8Yi4y1Fbf nBhGsu7NsxqvNcZQfj1fF+hXfiaHyoEapdkIWmPLS1F1J30TlkJ6czo2TNZYl8ifXwky jGbKuuSW6pe4YBNqNbvsA3i+k6DHbSaZE4Vm/r8f3GmkgbWJ8ibUuJaUYOJdQyhyXu2r BMRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=q8BzmTTD0JbIv/B0TCP54BJnNr16EJZX2bnIdr0FE24=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=RPYxc4V2TIVvV9njbrAdJSfCplktMJEQulV+0gZAHjOqbXcV1l7Abd1mZUvcVM0dz1 jkYNs1hYwcBTdeWo77WL1sKiJhe+ayxF5tDnz45JX1T8dH/zkNbzbeUOqmq3um0yAdZu GBee6XnS4EWdt4xSibKs89HiKnXSPiIe+TNOcFZ2ab3AOL+9p1JZ/b3pOMX/q5mYy89B WvYZ/1j1L1MAMrbbZGP6X3cQXLIsnUG3k4uie5fe2qNvHX2mMhSfHenswOaVet6PSCZW 1df5avTab46MHH3aTMufupWAMYqGGeDWNg16pXoZdpJ2SGnkTEFG46HbqiOi9fD/kaES akvQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-58bd969f29dsi4710943a12.363.2024.07.06.03.52.45; Sat, 06 Jul 2024 03:52:45 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7ED6768DBC6; Sat, 6 Jul 2024 13:52:41 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C6D3468D9CB for ; Sat, 6 Jul 2024 13:52:34 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 4902BC006B for ; Sat, 6 Jul 2024 13:52:34 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Sat, 6 Jul 2024 13:52:33 +0300 Message-ID: <20240706105234.145689-1-remi@remlab.net> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCHv2 1/2] lavc/h264dsp: R-V V 8-bit h264_weight_pixels X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: d2phW1n6pgU5 There are two implementations here: - a generic scalable one processing two columns at a time, - a specialised processing one (fixed-size) row at a time. Unsurprisingly, the generic one works out better with smaller widths. With larger widths, the gains from filling vectors are outweighed by the extra cost of strided loads and stores. In other words, memory accesses become the bottleneck. T-Head C908: h264_weight2_8_c: 54.5 h264_weight2_8_rvv_i32: 13.7 h264_weight4_8_c: 101.7 h264_weight4_8_rvv_i32: 27.5 h264_weight8_8_c: 197.0 h264_weight8_8_rvv_i32: 75.5 h264_weight16_8_c: 385.0 h264_weight16_8_rvv_i32: 74.2 SpacemiT X60: h264_weight2_8_c: 48.5 h264_weight2_8_rvv_i32: 8.2 h264_weight4_8_c: 90.7 h264_weight4_8_rvv_i32: 16.5 h264_weight8_8_c: 175.0 h264_weight8_8_rvv_i32: 37.7 h264_weight16_8_c: 342.2 h264_weight16_8_rvv_i32: 66.0 --- Changes since version 1: - Fix arithmetic overflows in bi-weight. - Process two columns rather than one per iteration to fill vectors. --- libavcodec/riscv/h264dsp_init.c | 7 +++ libavcodec/riscv/h264dsp_rvv.S | 83 +++++++++++++++++++++++++++++++++ 2 files changed, 90 insertions(+) diff --git a/libavcodec/riscv/h264dsp_init.c b/libavcodec/riscv/h264dsp_init.c index bf9743eb6b..e1b725dcbb 100644 --- a/libavcodec/riscv/h264dsp_init.c +++ b/libavcodec/riscv/h264dsp_init.c @@ -21,12 +21,15 @@ #include "config.h" #include +#include #include "libavutil/attributes.h" #include "libavutil/cpu.h" #include "libavutil/riscv/cpu.h" #include "libavcodec/h264dsp.h" +extern const h264_weight_func ff_h264_weight_funcs_8_rvv[]; + void ff_h264_v_loop_filter_luma_8_rvv(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); void ff_h264_h_loop_filter_luma_8_rvv(uint8_t *pix, ptrdiff_t stride, @@ -60,6 +63,10 @@ av_cold void ff_h264dsp_init_riscv(H264DSPContext *dsp, const int bit_depth, # if HAVE_RVV if (flags & AV_CPU_FLAG_RVV_I32) { if (bit_depth == 8 && ff_rv_vlen_least(128)) { + memcpy(dsp->weight_h264_pixels_tab, + ff_h264_weight_funcs_8_rvv, + sizeof (dsp->weight_h264_pixels_tab)); + dsp->h264_v_loop_filter_luma = ff_h264_v_loop_filter_luma_8_rvv; dsp->h264_h_loop_filter_luma = ff_h264_h_loop_filter_luma_8_rvv; dsp->h264_h_loop_filter_luma_mbaff = diff --git a/libavcodec/riscv/h264dsp_rvv.S b/libavcodec/riscv/h264dsp_rvv.S index 96a8a0a8a3..bbcbf2e4de 100644 --- a/libavcodec/riscv/h264dsp_rvv.S +++ b/libavcodec/riscv/h264dsp_rvv.S @@ -26,6 +26,89 @@ #include "libavutil/riscv/asm.S" +func ff_h264_weight_pixels_simple_8_rvv, zve32x + csrwi vxrm, 0 + sll a5, a5, a3 +1: + vsetvli zero, a6, e32, m4, ta, ma + vle8.v v8, (a0) + addi a2, a2, -1 + vmv.v.x v16, a5 + vsetvli zero, zero, e16, m2, ta, ma + vzext.vf2 v24, v8 + vwmaccsu.vx v16, a4, v24 + vnclip.wx v16, v16, a3 + vmax.vx v16, v16, zero + vsetvli zero, zero, e8, m1, ta, ma + vnclipu.wi v8, v16, 0 + vse8.v v8, (a0) + add a0, a0, a1 + bnez a2, 1b + + ret +endfunc + +func ff_h264_weight_pixels_8_rvv, zve32x + csrwi vxrm, 0 + sll a5, a5, a3 +1: + mv t0, a0 + mv t6, a6 +2: + vsetvli t2, a2, e32, m8, ta, ma + vlsseg2e8.v v0, (t0), a1 + addi t6, t6, -2 + vmv.v.x v16, a5 + vmv.v.x v24, a5 + vsetvli zero, zero, e16, m4, ta, ma + vzext.vf2 v8, v0 + vzext.vf2 v12, v2 + vwmaccsu.vx v16, a4, v8 + vwmaccsu.vx v24, a4, v12 + vnclip.wx v8, v16, a3 + vnclip.wx v12, v24, a3 + vmax.vx v8, v8, zero + vmax.vx v12, v12, zero + vsetvli zero, zero, e8, m2, ta, ma + vnclipu.wi v0, v8, 0 + vnclipu.wi v2, v12, 0 + vssseg2e8.v v0, (t0), a1 + addi t0, t0, 2 + bnez t6, 2b + + mul t3, a1, t2 + sub a2, a2, t2 + add a0, a0, t3 + bnez a2, 1b + + ret +endfunc + +.irp w, 16, 8, 4, 2 +func ff_h264_weight_pixels\w\()_8_rvv, zve32x + li a6, \w + .if \w == 16 + j ff_h264_weight_pixels_simple_8_rvv + .else + j ff_h264_weight_pixels_8_rvv + .endif +endfunc +.endr + + .global ff_h264_weight_funcs_8_rvv + .hidden ff_h264_weight_funcs_8_rvv +const ff_h264_weight_funcs_8_rvv + .irp w, 16, 8, 4, 2 +#if __riscv_xlen == 32 + .word ff_h264_weight_pixels\w\()_8_rvv +#elif __riscv_xlen == 64 + .dword ff_h264_weight_pixels\w\()_8_rvv +#else + .qword ff_h264_weight_pixels\w\()_8_rvv +#endif + .endr +endconst + .variant_cc ff_h264_loop_filter_luma_8_rvv func ff_h264_loop_filter_luma_8_rvv, zve32x # p2: v8, p1: v9, p0: v10, q0: v11, q1: v12, q2: v13 From patchwork Sat Jul 6 10:52:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 50377 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:cc64:0:b0:482:c625:d099 with SMTP id k4csp4809108vqv; Sat, 6 Jul 2024 03:59:08 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUJIiZJAZ1QkxgNghkRV0EMIPm9FCPs26pbGxvpfjq0e4oCRJ65uFFZHOF+5Ocfo8M2yXlu6eUcrAQd4GchTV0RsLeBCPE1Gw6sPQ== X-Google-Smtp-Source: AGHT+IHIbThPHyJfT1wB1gwH0WrxgCPZkTU7kj0CXabgHk0osTHraSYgIdZ2lcit7oIJvquZFOc/ X-Received: by 2002:a05:6402:1a2f:b0:58f:9874:ada7 with SMTP id 4fb4d7f45d1cf-58f9874aecdmr2841815a12.39.1720263547709; Sat, 06 Jul 2024 03:59:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1720263547; cv=none; d=google.com; s=arc-20160816; b=i1iRRhCTU6GMNinZRnRy+R9DqpcS7ZtbYjMemFYCszAGqkeOiWJIi+RdHHf8XGWvEp 4/cuVIgcPPuT3OhtP2Rb+gmwRhuTqkzkbcqx6jxX/QCnCuYRGCFQwcoaIbE1GaDsDia1 Gza6WzkO7MoCTDnuI2SZIsla12ZzXPn7FFNA1minOFB4XrAGSoJj0HWp5aX1YbvhqQPA AD/4qm96h8Ct6ubbpSdO2+EYsS3eyyHT0m49tZfPzR3GxUGeUo/Ve1J2hOn8TZvUaJQH t3GkFItZnsaQZqsXrvsffdY4ka/BatPO5T8F5D6vdvNUajlFHxhfM07pLLBy9FQ8gryO HKfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=xxJHltYZC9XBo47inl3QSWUO4DVarcb4zSDvO9Fw9ss=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=frLs+/eX2Fkj+Er4u3oHizrieSHb9WlTk2cSYJ6eOHVdNTiw1rfubcebrQDn9Ca6fd AVJKsKhDjs7twYkkml5+1SQrlxmeoctKVDmpwkTVK5gPPTa6fkhDrP8fspsaRrIsTKYx bPEuHa7VjvveU0X9mxJJbRBYDNuwIFeGkaz9WmV7DiAY+dhb3Bb4ilKQIpyGJ9ebh4B7 t7JTxKDC+wykOG2S5pyWNcdfY6rLCeI+v6wUgkx8C+qIbsbg1lx5rHJ4C64mhDkRFku4 fTCSqTBokU0ic9/Coz5x98zBfmLUqgYJ+eoDXWnWH+lXM7sik2+TF9ybtLEPlUBP79xe K5vQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-58cdc8edd10si4101478a12.71.2024.07.06.03.59.07; Sat, 06 Jul 2024 03:59:07 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CCF8668DBCB; Sat, 6 Jul 2024 13:52:43 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 39F1768D9CB for ; Sat, 6 Jul 2024 13:52:35 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 7D3D2C006C for ; Sat, 6 Jul 2024 13:52:34 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Sat, 6 Jul 2024 13:52:34 +0300 Message-ID: <20240706105234.145689-2-remi@remlab.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240706105234.145689-1-remi@remlab.net> References: <20240706105234.145689-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] lavc/h264dsp: R-V V 8-bit h264_biweight_pixels X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: WJzDqNKa6iEy T-Head C908: h264_biweight2_8_c: 58.0 h264_biweight2_8_rvv_i32: 11.2 h264_biweight4_8_c: 106.0 h264_biweight4_8_rvv_i32: 22.7 h264_biweight8_8_c: 205.7 h264_biweight8_8_rvv_i32: 50.0 h264_biweight16_8_c: 403.5 h264_biweight16_8_rvv_i32: 83.2 SpacemiT X60: h264_weight2_8_c: 48.2 h264_weight2_8_rvv_i32: 8.2 h264_weight4_8_c: 90.5 h264_weight4_8_rvv_i32: 16.5 h264_weight8_8_c: 175.2 h264_weight8_8_rvv_i32: 38.0 h264_weight16_8_c: 342.2 h264_weight16_8_rvv_i32: 66.0 --- libavcodec/riscv/h264dsp_init.c | 14 ++++-- libavcodec/riscv/h264dsp_rvv.S | 88 +++++++++++++++++++++++++++++++++ 2 files changed, 98 insertions(+), 4 deletions(-) diff --git a/libavcodec/riscv/h264dsp_init.c b/libavcodec/riscv/h264dsp_init.c index e1b725dcbb..88afec8df0 100644 --- a/libavcodec/riscv/h264dsp_init.c +++ b/libavcodec/riscv/h264dsp_init.c @@ -28,7 +28,10 @@ #include "libavutil/riscv/cpu.h" #include "libavcodec/h264dsp.h" -extern const h264_weight_func ff_h264_weight_funcs_8_rvv[]; +extern const struct { + const h264_weight_func weight; + const h264_biweight_func biweight; +} ff_h264_weight_funcs_8_rvv[]; void ff_h264_v_loop_filter_luma_8_rvv(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); @@ -63,9 +66,12 @@ av_cold void ff_h264dsp_init_riscv(H264DSPContext *dsp, const int bit_depth, # if HAVE_RVV if (flags & AV_CPU_FLAG_RVV_I32) { if (bit_depth == 8 && ff_rv_vlen_least(128)) { - memcpy(dsp->weight_h264_pixels_tab, - ff_h264_weight_funcs_8_rvv, - sizeof (dsp->weight_h264_pixels_tab)); + for (int i = 0; i < 4; i++) { + dsp->weight_h264_pixels_tab[i] = + ff_h264_weight_funcs_8_rvv[i].weight; + dsp->biweight_h264_pixels_tab[i] = + ff_h264_weight_funcs_8_rvv[i].biweight; + } dsp->h264_v_loop_filter_luma = ff_h264_v_loop_filter_luma_8_rvv; dsp->h264_h_loop_filter_luma = ff_h264_h_loop_filter_luma_8_rvv; diff --git a/libavcodec/riscv/h264dsp_rvv.S b/libavcodec/riscv/h264dsp_rvv.S index bbcbf2e4de..5c3931569b 100644 --- a/libavcodec/riscv/h264dsp_rvv.S +++ b/libavcodec/riscv/h264dsp_rvv.S @@ -48,6 +48,35 @@ func ff_h264_weight_pixels_simple_8_rvv, zve32x ret endfunc + .variant_cc ff_h264_biweight_pixels_simple_8_rvv +func ff_h264_biweight_pixels_simple_8_rvv, zve32x + csrwi vxrm, 2 + addi a7, a7, 1 + ori a7, a7, 1 + sll a7, a7, a4 +1: + vsetvli zero, t6, e32, m4, ta, ma + vle8.v v8, (a0) + addi a3, a3, -1 + vle8.v v12, (a1) + add a1, a1, a2 + vmv.v.x v16, a7 + vsetvli zero, zero, e16, m2, ta, ma + vzext.vf2 v24, v8 + vzext.vf2 v28, v12 + vwmaccsu.vx v16, a5, v24 + vwmaccsu.vx v16, a6, v28 + vnclip.wx v16, v16, a4 + vmax.vx v16, v16, zero + vsetvli zero, zero, e8, m1, ta, ma + vnclipu.wi v8, v16, 1 + vse8.v v8, (a0) + add a0, a0, a2 + bnez a3, 1b + + ret +endfunc + func ff_h264_weight_pixels_8_rvv, zve32x csrwi vxrm, 0 sll a5, a5, a3 @@ -84,6 +113,53 @@ func ff_h264_weight_pixels_8_rvv, zve32x ret endfunc + .variant_cc ff_h264_biweight_pixels_8_rvv +func ff_h264_biweight_pixels_8_rvv, zve32x + csrwi vxrm, 2 + addi a7, a7, 1 + ori a7, a7, 1 + sll a7, a7, a4 +1: + mv t0, a0 + mv t1, a1 + mv t5, t6 +2: + vsetvli t2, a3, e32, m8, ta, ma + vlsseg2e8.v v0, (t0), a2 + vlsseg2e8.v v4, (t1), a2 + addi t5, t5, -2 + vmv.v.x v16, a7 + vmv.v.x v24, a7 + vsetvli zero, zero, e16, m4, ta, ma + vzext.vf2 v8, v0 + vzext.vf2 v12, v2 + vwmaccsu.vx v16, a5, v8 + vwmaccsu.vx v24, a5, v12 + vzext.vf2 v8, v4 + vzext.vf2 v12, v6 + vwmaccsu.vx v16, a6, v8 + vwmaccsu.vx v24, a6, v12 + vnclip.wx v8, v16, a4 + vnclip.wx v12, v24, a4 + vmax.vx v8, v8, zero + vmax.vx v12, v12, zero + vsetvli zero, zero, e8, m2, ta, ma + vnclipu.wi v0, v8, 1 + vnclipu.wi v2, v12, 1 + vssseg2e8.v v0, (t0), a2 + addi t0, t0, 2 + addi t1, t1, 2 + bnez t5, 2b + + mul t3, a2, t2 + sub a3, a3, t2 + add a0, a0, t3 + add a1, a1, t3 + bnez a3, 1b + + ret +endfunc + .irp w, 16, 8, 4, 2 func ff_h264_weight_pixels\w\()_8_rvv, zve32x li a6, \w @@ -93,6 +169,15 @@ func ff_h264_weight_pixels\w\()_8_rvv, zve32x j ff_h264_weight_pixels_8_rvv .endif endfunc + +func ff_h264_biweight_pixels\w\()_8_rvv, zve32x + li t6, \w + .if \w == 16 + j ff_h264_biweight_pixels_simple_8_rvv + .else + j ff_h264_biweight_pixels_8_rvv + .endif +endfunc .endr .global ff_h264_weight_funcs_8_rvv @@ -101,10 +186,13 @@ const ff_h264_weight_funcs_8_rvv .irp w, 16, 8, 4, 2 #if __riscv_xlen == 32 .word ff_h264_weight_pixels\w\()_8_rvv + .word ff_h264_biweight_pixels\w\()_8_rvv #elif __riscv_xlen == 64 .dword ff_h264_weight_pixels\w\()_8_rvv + .dword ff_h264_biweight_pixels\w\()_8_rvv #else .qword ff_h264_weight_pixels\w\()_8_rvv + .qword ff_h264_biweight_pixels\w\()_8_rvv #endif .endr endconst