From patchwork Sat Sep 28 09:41:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 51898 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:d8ca:0:b0:48e:c0f8:d0de with SMTP id dy10csp892113vqb; Sat, 28 Sep 2024 02:41:51 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXiN5lK6nT/bV8hjSYOlzqM9eKC+xhB19OH5BxvHnpgMWWXoueT7LSwHZUpdsxzZY8crTlTI614Zb7+4bdaIqTS@gmail.com X-Google-Smtp-Source: AGHT+IHi/NkB4zDQfjJh6XwJCaPV4lyyPLjmxOzL9ZzqDtkdRMRaK5WB/NVTplS4OB9LnNVC1aA5 X-Received: by 2002:a05:6402:50cd:b0:5c8:8d60:600f with SMTP id 4fb4d7f45d1cf-5c88d6060fdmr548135a12.14.1727516510914; Sat, 28 Sep 2024 02:41:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1727516510; cv=none; d=google.com; s=arc-20240605; b=lgkt6tYEJHfgH185v01PED+XvLjUYi9OL5X0Tf+D+XQ7OLsKWsd7KgcwTXYldgLwRD 6aNAY8CPaMpKdxl/rLenHpcykOL6slQ6YVYG0Fe3zrRBDKdpYOti9EmNkRS1ipdfBzPW K0jVn8Gy92j5HiMQQkQ/rQsgcKoOFue9qxVVPiB69vu6SycubuP/fR7/24r9PwIL4SFx p3ocxjbNH6wYDKXi57FI7J4buW6u1OY413to04vLCl9NO2Fr0wjQZrcjajt0SnwsDE9t /bbHPFZcdoIHICHv5hoIix2otuLC0B5BCiqtIyg76RgNdQg1RjEVe8s7zFmll18HqIeV A6bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=42IMJCGhHIQb1cnIyteV6ronuDjpnNZLoOu45MDVRfI=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=SfE2MaK+4GTGGB9hVuWq3WlMd4DL8KmyFTJDlcAC3sWUNkylW/WrqZm7IOSEXDSvxg x9EkPIImgj612b1vhj6NogQvoxpF2k9oFArpoF2ulo0feMa0H8m2EtNWMxffFwFxCm89 8P2L09AWFHE1UMQQMwpGOHYPgmxkmff5cpi685iMKmOxl4lAGkiLg3wm2P/1olcr9PPO Q0m7m+r0T1LTuGjMEpUTj4yJoVlDrGp/3oiP1xJjM6FGFTzxoMzlkvDZm9jDm2sGAuVt zLf74t5AdBwkRnR148zP2IoF/K7SNaK8nhIlxy58JVGz+Yz2pBHZyOVEbkD8Y/L7q4F3 8wjg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=lbrTikHG; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5c882467292si2845634a12.166.2024.09.28.02.41.50; Sat, 28 Sep 2024 02:41:50 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=lbrTikHG; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AE7C368DA40; Sat, 28 Sep 2024 12:41:46 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-58-211.mail.qq.com (out162-62-58-211.mail.qq.com [162.62.58.211]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 57B5368D999 for ; Sat, 28 Sep 2024 12:41:37 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1727516488; bh=5KklRgup84pdkyfGsCf7KGbKY68iqNdeFR7IVpOqAsk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=lbrTikHGZjf0Mr5AbzWDs4VoBTsIQEZVM5wz1ZjMjl7jCpxcHVN7NjoaMi0+3r/Zy 7VMiS8Wlw0aHumjnB6TieCI/dhTdxfSf8ERplLjeB4RtMxHoQRyXjgM0tthB+AwY45 gEKtR9B2y9eFKY4+MEH1l6Smkz5HFyl5SUEwqrLQ= Received: from localhost.localdomain ([42.58.231.173]) by newxmesmtplogicsvrsza15-1.qq.com (NewEsmtp) with SMTP id A5B27C02; Sat, 28 Sep 2024 17:41:27 +0800 X-QQ-mid: xmsmtpt1727516487t4rxewaz7 Message-ID: X-QQ-XMAILINFO: Mg/0DM1Zd+nHeuasNpuQAp6DdN9hQpn50sE3ziVcnc1qPqj1oVo8DpcAWZSDkX B+8WegoOx6oRJ9G5UNtpKhHuxdtiITifmdRdkQzSeiWoWaKy4t31vYbOH2JMqZv+wTrgsYTJy1oy DIUeg3AnbfLQnmlNV4D0szaAKyXULjjNsMHE4SXR/CokBTHAoIiBmpiB3UftMVb1ys7mGKTgf1DJ 4qgCZgax/Q4x5UTOU7bgWm8hmzIUkDrdxUMjBdzi590EHbrKobE3b7Nij2D7pGRMcx3gQ/ThNZH6 4kBdQ8OgueiVVLW234Tee4qjSCwFVjfN31lq39Fw4YPWEEm1Fa/vuSgGvHMz1GkrzU+mAC+w7THI 6ZVmRGo1A+5DcZ4nklczzuynidS2EaiqO+p4IdF8OYw73J5hnBAImftTQ+ny+fdMcyWO6q00wR2d pXk+J+7bijaZlrrTSa1s6w0rLvJF0Y8S6vJJndXBUHnuVFIJ7tct+eYvbQGQKhwpSr7pdZUmzWpR 81T5ln8oc6TxDW40rD1QTGxekyvjFE7lGwFPYbPcjB2o22RdXKwKeGuAXAKhNOTb96Uo8bEzNmIN oZEvvHfqxMe5MP7BW/vhCSJuNTZ5dcBr9SdFZPR2LhgjOh3mlMhRFR4ERaV9y+dOFSFvaf+pH4wW 3811gDPUEz6Y07sVhUPQ7PpgiCwStwaVlFyryz1pDBcCzs3GsG8Ifis8F8BvR1SVHQAyHl8qpXF4 JufaKPFUi8pPvdIf0d4Ou5ZtSrW77/QtYbuf7mp7M7aFCC5rYwXooiNddsY7rP+S2XDZIvubWnCU 7gnNx/GFLuPJ/nNc1w+gxUkqXVog8KJZrg3+mJMVw5Pu9WhL/3DgvI4QA4nJ/sjhwjK4YWdXS8qT pHuPuGC/xxtQ9Yz/tJ5lb9V89oXm28YbYg8fWhJiqTBTYZkpkuqLIsTNptVATbUA== X-QQ-XMRINFO: Nq+8W0+stu50PRdwbJxPCL0= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sat, 28 Sep 2024 17:41:25 +0800 X-OQ-MSGID: <20240928094125.1091833-1-uk7b@foxmail.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <2101666.xLvz2vlYTB@basile.remlab.net> References: <2101666.xLvz2vlYTB@basile.remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] lavc/vvc_mc: R-V V dmvr X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: XWqPmX4UBFp7 From: sunyuechi k230 banana_f3 dmvr_8_12x20_c: 626.5 ( 1.00x) 621.7 ( 1.00x) dmvr_8_12x20_rvv_i32: 126.3 ( 4.96x) 79.9 ( 7.78x) dmvr_8_20x12_c: 608.0 ( 1.00x) 652.9 ( 1.00x) dmvr_8_20x12_rvv_i32: 135.5 ( 4.49x) 90.4 ( 7.22x) dmvr_8_20x20_c: 1006.0 ( 1.00x) 1079.9 ( 1.00x) dmvr_8_20x20_rvv_i32: 228.3 ( 4.41x) 142.4 ( 7.58x) dmvr_h_8_12x20_c: 2005.8 ( 1.00x) 2007.2 ( 1.00x) dmvr_h_8_12x20_rvv_i32: 274.5 ( 7.31x) 184.2 (10.90x) dmvr_h_8_20x12_c: 1987.5 ( 1.00x) 2006.9 ( 1.00x) dmvr_h_8_20x12_rvv_i32: 302.3 ( 6.58x) 173.7 (11.56x) dmvr_h_8_20x20_c: 3302.3 ( 1.00x) 3340.4 ( 1.00x) dmvr_h_8_20x20_rvv_i32: 487.5 ( 6.77x) 267.4 (12.49x) dmvr_hv_8_12x20_c: 3607.8 ( 1.00x) 3600.7 ( 1.00x) dmvr_hv_8_12x20_rvv_i32: 459.8 ( 7.85x) 371.7 ( 9.69x) dmvr_hv_8_20x12_c: 3626.3 ( 1.00x) 3621.7 ( 1.00x) dmvr_hv_8_20x12_rvv_i32: 422.8 ( 8.58x) 298.7 (12.13x) dmvr_hv_8_20x20_c: 5931.8 ( 1.00x) 5934.4 ( 1.00x) dmvr_hv_8_20x20_rvv_i32: 672.5 ( 8.82x) 475.9 (12.47x) dmvr_v_8_12x20_c: 2154.0 ( 1.00x) 2152.9 ( 1.00x) dmvr_v_8_12x20_rvv_i32: 274.5 ( 7.85x) 183.9 (11.71x) dmvr_v_8_20x12_c: 2774.5 ( 1.00x) 2152.9 ( 1.00x) dmvr_v_8_20x12_rvv_i32: 302.3 ( 9.18x) 173.7 (12.40x) dmvr_v_8_20x20_c: 3552.0 ( 1.00x) 3590.4 ( 1.00x) dmvr_v_8_20x20_rvv_i32: 487.5 ( 7.29x) 267.4 (13.43x) --- libavcodec/riscv/vvc/vvc_mc_rvv.S | 139 +++++++++++++++++++++++++++++ libavcodec/riscv/vvc/vvcdsp_init.c | 22 +++++ 2 files changed, 161 insertions(+) diff --git a/libavcodec/riscv/vvc/vvc_mc_rvv.S b/libavcodec/riscv/vvc/vvc_mc_rvv.S index 18532616d9..61fe840c4d 100644 --- a/libavcodec/riscv/vvc/vvc_mc_rvv.S +++ b/libavcodec/riscv/vvc/vvc_mc_rvv.S @@ -285,3 +285,142 @@ endfunc func_w_avg 128 func_w_avg 256 #endif + +func dmvr zve32x, zbb, zba + lpad 0 + li t0, 4 +1: + add t1, a1, a2 + addi t4, a0, 128*2 + add t2, t1, a2 + addi t5, a0, 128*2*2 + add t3, t2, a2 + addi t6, a0, 128*2*3 + vle8.v v0, (a1) + vle8.v v4, (t1) + vle8.v v8, (t2) + vle8.v v12, (t3) + addi a3, a3, -4 + vwmulu.vx v16, v0, t0 + vwmulu.vx v20, v4, t0 + vwmulu.vx v24, v8, t0 + vwmulu.vx v28, v12, t0 + vse16.v v16, (a0) + vse16.v v20, (t4) + vse16.v v24, (t5) + vse16.v v28, (t6) + sh2add a1, a2, a1 + add a0, a0, 128*2*4 + bnez a3, 1b + ret +endfunc + +.macro dmvr_h_v mn, type + lla t4, ff_vvc_inter_luma_dmvr_filters + sh1add t4, \mn, t4 + lbu t5, (t4) + lbu t6, 1(t4) +1: +.ifc \type,h + addi t0, a1, 1 + addi t1, a1, 2 +.else + add t0, a1, a2 + add t1, t0, a2 +.endif + vle8.v v0, (a1) + vle8.v v4, (t0) + vle8.v v8, (t1) + addi a3, a3, -2 + vzext.vf2 v12, v0 + vzext.vf2 v16, v4 + vzext.vf2 v20, v8 + addi t2, a0, 128*2 + vmul.vx v12, v12, t5 + vmul.vx v24, v16, t5 + vmacc.vx v12, t6, v16 + vmacc.vx v24, t6, v20 + vssrl.vi v12, v12, 2 + vssrl.vi v24, v24, 2 + vse16.v v12, (a0) + vse16.v v24, (t2) + add a0, a0, 128*4 + sh1add a1, a2, a1 + bnez a3, 1b + ret +.endm + +func dmvr_h zve32x, zbb, zba + lpad 0 + dmvr_h_v a4, h +endfunc + +func dmvr_v zve32x, zbb, zba + lpad 0 + dmvr_h_v a5, v +endfunc + +.macro dmvr_load_h dst, filter0, filter1 + addi a6, a1, 1 + vle8.v \dst, (a1) + vle8.v v2, (a6) + vzext.vf2 v4, \dst + vzext.vf2 v8, v2 + vmul.vx \dst, v4, \filter0 + vmacc.vx \dst, \filter1, v8 + vssrl.vi \dst, \dst, 2 +.endm + +func dmvr_hv zve32x, zbb, zba + lpad 0 + lla t0, ff_vvc_inter_luma_dmvr_filters + sh1add t1, a4, t0 + sh1add t2, a5, t0 + lbu t3, (t1) // filter[mx][0] + lbu t4, 1(t1) // filter[mx][1] + lbu t5, (t2) // filter[my][0] + lbu t6, 1(t2) // filter[my][1] + dmvr_load_h v12, t3, t4 + add a1, a1, a2 +1: + vmul.vx v28, v12, t5 + addi a3, a3, -1 + dmvr_load_h v12, t3, t4 + vmacc.vx v28, t6, v12 + vssrl.vi v28, v28, 4 + vse16.v v28, (a0) + add a1, a1, a2 + addi a0, a0, 128*2 + bnez a3, 1b + ret +endfunc + +.macro func_dmvr vlen, name +func ff_vvc_\name\()_8_rvv_\vlen\(), zve32x, zbb, zba + lpad 0 + li t0, 20 + beq a6, t0, DMVR20\vlen\name + .ifc \name, dmvr + vsetvlstatic8 12, \vlen + .else + csrwi vxrm, 0 + vsetvlstatic16 12, \vlen + .endif + j \name +DMVR20\vlen\name: + .ifc \name, dmvr + vsetvlstatic8 20, \vlen + .else + csrwi vxrm, 0 + vsetvlstatic16 20, \vlen + .endif + j \name +endfunc +.endm + +.irp vlen,256,128 +func_dmvr \vlen, dmvr +func_dmvr \vlen, dmvr_h +func_dmvr \vlen, dmvr_v +func_dmvr \vlen, dmvr_hv +.endr diff --git a/libavcodec/riscv/vvc/vvcdsp_init.c b/libavcodec/riscv/vvc/vvcdsp_init.c index ac1e7dda7d..7df3ce58db 100644 --- a/libavcodec/riscv/vvc/vvcdsp_init.c +++ b/libavcodec/riscv/vvc/vvcdsp_init.c @@ -37,6 +37,26 @@ void bf(ff_vvc_w_avg, bd, opt)(uint8_t *dst, ptrdiff_t dst_stride, AVG_PROTOTYPES(8, rvv_128) AVG_PROTOTYPES(8, rvv_256) +#define DMVR_PROTOTYPES(bd, opt) \ +void ff_vvc_dmvr_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ + int height, intptr_t mx, intptr_t my, int width); \ +void ff_vvc_dmvr_h_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ + int height, intptr_t mx, intptr_t my, int width); \ +void ff_vvc_dmvr_v_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ + int height, intptr_t mx, intptr_t my, int width); \ +void ff_vvc_dmvr_hv_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ + int height, intptr_t mx, intptr_t my, int width); \ + +DMVR_PROTOTYPES(8, rvv_128) +DMVR_PROTOTYPES(8, rvv_256) + +#define DMVR_INIT(bd, opt) do { \ + c->inter.dmvr[0][0] = ff_vvc_dmvr_##bd##_##opt; \ + c->inter.dmvr[0][1] = ff_vvc_dmvr_h_##bd##_##opt; \ + c->inter.dmvr[1][0] = ff_vvc_dmvr_v_##bd##_##opt; \ + c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_##bd##_##opt; \ +} while (0) + void ff_vvc_dsp_init_riscv(VVCDSPContext *const c, const int bd) { #if HAVE_RVV @@ -51,6 +71,7 @@ void ff_vvc_dsp_init_riscv(VVCDSPContext *const c, const int bd) # if (__riscv_xlen == 64) c->inter.w_avg = ff_vvc_w_avg_8_rvv_256; # endif + DMVR_INIT(8, rvv_256); break; default: break; @@ -63,6 +84,7 @@ void ff_vvc_dsp_init_riscv(VVCDSPContext *const c, const int bd) # if (__riscv_xlen == 64) c->inter.w_avg = ff_vvc_w_avg_8_rvv_128; # endif + DMVR_INIT(8, rvv_128); break; default: break;