From patchwork Sat Sep 28 16:47:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 51916 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:d8ca:0:b0:48e:c0f8:d0de with SMTP id dy10csp1074056vqb; Sat, 28 Sep 2024 09:47:40 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU0EtNKIoHVTCcs29hGxR9iOutH/9PeiXYyyJzp46l2dCzTAj+rSY1kPiZRRi2J44luw0r63UeZ3ZtUBWhj/OGs@gmail.com X-Google-Smtp-Source: AGHT+IFMBqezvn8YUJOdYJPANEmmC9t7WpTiG9eheaEna2d+VZ0zm7bl+ewBl4OFCDBMpH3qBNtT X-Received: by 2002:a05:6512:6c2:b0:536:a7a4:c3d4 with SMTP id 2adb3069b0e04-5389fc7d38amr5442583e87.39.1727542060388; Sat, 28 Sep 2024 09:47:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1727542060; cv=none; d=google.com; s=arc-20240605; b=UoaLF5W4Hw/gPK8L99JdzLVwgkj7uagkoJ6kJWL4j+1Nnr8y196vN3YntV8Ln7bllU GkEYGJrLMCUj5c+FM5H3C+Cf3h6kfYNvZ2YI2ALNkVanGgeHQyOyxHALZ//MIsO/F8nS ndXZTE9LaB6Lxt8gauXSRfUL/spQX48Rhanzq0PSFI3fry7w8URSN0bFt2qhnRcJXfoL JOPLMEzY6MZNeA84WMuURsYfXzWeOCbSzBq9NlmYa+vcdZRbroxWgsO1ylobC2liuPME 9pybXDU82J1WggLmkHVm5HGJc1t7jHtZg6UCDk2Q+ioT6aCYv40plQr79g3MFxySpiM8 Hsdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=sQA/juVlwg6HKFM3DIy+gNGzj+eD/d1+MME09nAXJ88=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=XyICOoGfRikEq2RC+kTrHaOUK6CO8SukvwsKB7v8q2M4snq2O81ZNWhsnLMGQb9CBz RtYU2HArDU6PzCzYbtVwT5hXUqQG17RlOf2s+VUHk1MAmA5oOVfNJdj+R0kkv9lvg59E 5IsatpHamtrQ0/p8Hc6tzYe94Vm6jdQ64uKoZPR2pYY7epTPwzYLFQxIeg4uQq2EkpSM aWbwBYhuh9cUChjgxAwxmmPtBMTDMIspu0UC3DiSpPnF/hSZQJORO1x1P9ivPMC0B3bi GndJvR5sESiU+7O3+wCe8c4puy/Y0/ePiVlt1FOS8YL8rJtzebPhwSyWcJyeStMrvtat Bbmg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=nip+yCtu; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-538a045949bsi1475194e87.544.2024.09.28.09.47.39; Sat, 28 Sep 2024 09:47:40 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=nip+yCtu; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3617868D5DE; Sat, 28 Sep 2024 19:47:35 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-57-49.mail.qq.com (out162-62-57-49.mail.qq.com [162.62.57.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B45B668D5DE for ; Sat, 28 Sep 2024 19:47:25 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1727542035; bh=MFqvVoBcHCQi6f3qn/VGgm3nni5KeaQOoiqLQ/y+gEM=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=nip+yCtuRD8bMQQo9oEYvN8tY8vLk4KhNmwnJO8pQCjxVVBC6SjuVx/91yWSMi6at 3CDsC+gdq9aSdgIU9Fo9WdjlO4QkM40cDQhHaGtcAbb3wl4b76ie0PP9/pK4jIP0p0 qUSoBN3LhssCog+6PWjYPlhYMqlO0rtpj/AS5GiA= Received: from localhost.localdomain ([42.58.231.173]) by newxmesmtplogicsvrszb16-1.qq.com (NewEsmtp) with SMTP id BCEA0442; Sun, 29 Sep 2024 00:47:14 +0800 X-QQ-mid: xmsmtpt1727542034t9vuq0eih Message-ID: X-QQ-XMAILINFO: N7h1OCCDntujmkTNnY59iuJtdQbg+nNkZzozzEX0pvX3VZMnaAZc3ALB5L0bu9 ugUsTCEbX7YP4a464P5ZuchajccG+LOibRtu2DfJz0xSaPehXFIuOUFlHbeawzPixQO9PaQfXFaS lu/VntTO0oDAmSRVG0WARYMcQWPUuzSem07j2IiOi+GwRIociT5PVi9HMHjR7jbF/u83YKsm9HhY soKzLd4zMso9zqACq9LE5BfB+C23fJe3bsbLjYhIZVOdZjRln11ReuXDTmIS2e3udFMQDkWGqrhU EVmwtG0H+oTuOvsmu3t0KnFZj0SKB9kumdqoMdSXer3M/2424AZZwm/XK/Wakpd5i50V7yFNQhZS N7L8tF6GRLFqCNlkKnNJPnnS9UqDw4CnHK4in5bPL7JMc6HyLZt7cLNuTS9cFeq8VpKy+I9MyEgf feYOjq7BIo4BRjdoIs+iDkXaxUHGk0kCoqdPhvh5RlxiUgAwqFnH58Iuls2Ww5GjWJ8CqHcKnUJA mRamRiuLVQuVZ6W/5Bgi+HWInydpFNZMYNgBqul4dNUWu0mcfRiUwTiDcT2MihqCg56UQqBl38tp WQNw2ZSw7Ci6w1O2p3LJkB6L2YO7ebp2SzA16fwuf0fvN0BF2DxvConwYRhTQ3hISss+cjyuSMAe hMB43sBMVsWaUFAtsM57OuNhEwWLORS6mIw7ZqgXNx7JdLNO1s3OZlli7WAufgqwfxIZUOgyhxYB mYWSRoELxRMETEL3q+DG6L2qr5VK82IkI5qeuZOYOh7uPO59iE3fzSvr57+90P3gpmtLqVsFAiR+ NKLTvcJ+QPeu6ZHl5W5FmOLJqDx8r9chqdI1UXTgvVr/13lS2yVdrE7NEvqr4pGcC0mXNu0pVPC9 x03kyjML5g/6e1Rp1D/RRisgLZK3667AFEckNd0tXvmA+by6MbBA4= X-QQ-XMRINFO: MPJ6Tf5t3I/ycC2BItcBVIA= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 29 Sep 2024 00:47:12 +0800 X-OQ-MSGID: <20240928164712.3420798-1-uk7b@foxmail.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <4291FBEC-48A4-4FCF-9262-C18BFEB20D77@remlab.net> References: <4291FBEC-48A4-4FCF-9262-C18BFEB20D77@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] lavc/vvc_mc: R-V V dmvr X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: pUsyDQo3rUw9 From: sunyuechi k230 banana_f3 dmvr_8_12x20_c: 619.3 ( 1.00x) 624.1 ( 1.00x) dmvr_8_12x20_rvv_i32: 128.6 ( 4.82x) 103.4 ( 6.04x) dmvr_8_20x12_c: 610.0 ( 1.00x) 665.6 ( 1.00x) dmvr_8_20x12_rvv_i32: 137.6 ( 4.44x) 92.9 ( 7.17x) dmvr_8_20x20_c: 1008.0 ( 1.00x) 1082.7 ( 1.00x) dmvr_8_20x20_rvv_i32: 221.1 ( 4.56x) 155.4 ( 6.97x) dmvr_h_8_12x20_c: 2008.0 ( 1.00x) 2009.7 ( 1.00x) dmvr_h_8_12x20_rvv_i32: 239.6 ( 8.38x) 186.7 (10.77x) dmvr_h_8_20x12_c: 1989.5 ( 1.00x) 2009.4 ( 1.00x) dmvr_h_8_20x12_rvv_i32: 230.3 ( 8.64x) 155.4 (12.93x) dmvr_h_8_20x20_c: 3304.1 ( 1.00x) 3342.9 ( 1.00x) dmvr_h_8_20x20_rvv_i32: 378.3 ( 8.73x) 248.9 (13.43x) dmvr_hv_8_12x20_c: 3609.8 ( 1.00x) 3603.4 ( 1.00x) dmvr_hv_8_12x20_rvv_i32: 369.1 ( 9.78x) 322.1 (11.19x) dmvr_hv_8_20x12_c: 3628.3 ( 1.00x) 3624.2 ( 1.00x) dmvr_hv_8_20x12_rvv_i32: 322.8 (11.24x) 238.7 (15.19x) dmvr_hv_8_20x20_c: 5933.8 ( 1.00x) 5936.6 ( 1.00x) dmvr_hv_8_20x20_rvv_i32: 526.5 (11.27x) 374.1 (15.87x) dmvr_v_8_12x20_c: 2156.3 ( 1.00x) 2155.4 ( 1.00x) dmvr_v_8_12x20_rvv_i32: 239.6 ( 9.00x) 176.2 (12.24x) dmvr_v_8_20x12_c: 2137.6 ( 1.00x) 2165.9 ( 1.00x) dmvr_v_8_20x12_rvv_i32: 230.3 ( 9.28x) 155.2 (13.96x) dmvr_v_8_20x20_c: 4183.8 ( 1.00x) 3592.9 ( 1.00x) dmvr_v_8_20x20_rvv_i32: 369.3 (11.33x) 249.2 (14.42x) --- libavcodec/riscv/vvc/vvc_mc_rvv.S | 120 +++++++++++++++++++++++++++++ libavcodec/riscv/vvc/vvcdsp_init.c | 22 ++++++ 2 files changed, 142 insertions(+) diff --git a/libavcodec/riscv/vvc/vvc_mc_rvv.S b/libavcodec/riscv/vvc/vvc_mc_rvv.S index 18532616d9..2c634af48f 100644 --- a/libavcodec/riscv/vvc/vvc_mc_rvv.S +++ b/libavcodec/riscv/vvc/vvc_mc_rvv.S @@ -285,3 +285,123 @@ endfunc func_w_avg 128 func_w_avg 256 #endif + +func dmvr zve32x, zbb, zba + lpad 0 + li t0, 4 +1: + add t1, a1, a2 + addi t4, a0, 128*2 + vle8.v v0, (a1) + vle8.v v4, (t1) + addi a3, a3, -2 + vwmulu.vx v16, v0, t0 + vwmulu.vx v20, v4, t0 + vse16.v v16, (a0) + vse16.v v20, (t4) + sh1add a1, a2, a1 + add a0, a0, 128*2*2 + bnez a3, 1b + ret +endfunc + +.macro dmvr_h_v mn, type, w, vlen +dmvr_\type\vlen\w: + lla t4, ff_vvc_inter_luma_dmvr_filters + sh1add t4, \mn, t4 + lbu t5, (t4) + lbu t6, 1(t4) +1: + vsetvlstatic8 \w, \vlen +.ifc \type,h + addi t0, a1, 1 + addi t1, a1, 2 +.else + add t0, a1, a2 + add t1, t0, a2 +.endif + vle8.v v0, (a1) + vle8.v v4, (t0) + vle8.v v8, (t1) + addi a3, a3, -2 + addi t2, a0, 128*2 + vwmulu.vx v12, v0, t5 + vwmulu.vx v24, v4, t5 + vwmaccu.vx v12, t6, v4 + vwmaccu.vx v24, t6, v8 + vsetvlstatic16 \w, \vlen + vssrl.vi v12, v12, 2 + vssrl.vi v24, v24, 2 + vse16.v v12, (a0) + vse16.v v24, (t2) + add a0, a0, 128*4 + sh1add a1, a2, a1 + bnez a3, 1b + ret +.endm + +.macro dmvr_load_h dst, filter0, filter1, w, vlen + vsetvlstatic8 \w, \vlen + addi a6, a1, 1 + vle8.v \dst, (a1) + vle8.v v2, (a6) + vwmulu.vx v4, \dst, \filter0 + vwmaccu.vx v4, \filter1, v2 + vsetvlstatic16 \w, \vlen + vssrl.vi \dst, v4, 2 +.endm + +.macro dmvr_hv w, vlen +dmvr_hv\vlen\w: + lla t0, ff_vvc_inter_luma_dmvr_filters + sh1add t1, a4, t0 + sh1add t2, a5, t0 + lbu t3, (t1) // filter[mx][0] + lbu t4, 1(t1) // filter[mx][1] + lbu t5, (t2) // filter[my][0] + lbu t6, 1(t2) // filter[my][1] + dmvr_load_h v12, t3, t4, \w, \vlen + add a1, a1, a2 +1: + vmul.vx v28, v12, t5 + addi a3, a3, -1 + dmvr_load_h v12, t3, t4, \w, \vlen + vmacc.vx v28, t6, v12 + vssrl.vi v28, v28, 4 + vse16.v v28, (a0) + add a1, a1, a2 + addi a0, a0, 128*2 + bnez a3, 1b + ret +.endm + +.macro func_dmvr vlen, name +func ff_vvc_\name\()_8_rvv_\vlen\(), zve32x, zbb, zba + lpad 0 + li t0, 20 + beq a6, t0, DMVR\name\vlen\()20 + .irp w,12,20 +DMVR\name\vlen\w: + .ifc \name, dmvr + vsetvlstatic8 \w, \vlen + j \name + .else + csrwi vxrm, 0 + j \name\()\vlen\w + .endif + .endr +endfunc +.endm + + +.irp vlen,256,128 +.irp w,12,20 +dmvr_h_v a4, h, \w, \vlen +dmvr_h_v a5, v, \w, \vlen +dmvr_hv \w, \vlen +.endr +func_dmvr \vlen, dmvr +func_dmvr \vlen, dmvr_h +func_dmvr \vlen, dmvr_v +func_dmvr \vlen, dmvr_hv +.endr diff --git a/libavcodec/riscv/vvc/vvcdsp_init.c b/libavcodec/riscv/vvc/vvcdsp_init.c index ac1e7dda7d..7df3ce58db 100644 --- a/libavcodec/riscv/vvc/vvcdsp_init.c +++ b/libavcodec/riscv/vvc/vvcdsp_init.c @@ -37,6 +37,26 @@ void bf(ff_vvc_w_avg, bd, opt)(uint8_t *dst, ptrdiff_t dst_stride, AVG_PROTOTYPES(8, rvv_128) AVG_PROTOTYPES(8, rvv_256) +#define DMVR_PROTOTYPES(bd, opt) \ +void ff_vvc_dmvr_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ + int height, intptr_t mx, intptr_t my, int width); \ +void ff_vvc_dmvr_h_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ + int height, intptr_t mx, intptr_t my, int width); \ +void ff_vvc_dmvr_v_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ + int height, intptr_t mx, intptr_t my, int width); \ +void ff_vvc_dmvr_hv_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ + int height, intptr_t mx, intptr_t my, int width); \ + +DMVR_PROTOTYPES(8, rvv_128) +DMVR_PROTOTYPES(8, rvv_256) + +#define DMVR_INIT(bd, opt) do { \ + c->inter.dmvr[0][0] = ff_vvc_dmvr_##bd##_##opt; \ + c->inter.dmvr[0][1] = ff_vvc_dmvr_h_##bd##_##opt; \ + c->inter.dmvr[1][0] = ff_vvc_dmvr_v_##bd##_##opt; \ + c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_##bd##_##opt; \ +} while (0) + void ff_vvc_dsp_init_riscv(VVCDSPContext *const c, const int bd) { #if HAVE_RVV @@ -51,6 +71,7 @@ void ff_vvc_dsp_init_riscv(VVCDSPContext *const c, const int bd) # if (__riscv_xlen == 64) c->inter.w_avg = ff_vvc_w_avg_8_rvv_256; # endif + DMVR_INIT(8, rvv_256); break; default: break; @@ -63,6 +84,7 @@ void ff_vvc_dsp_init_riscv(VVCDSPContext *const c, const int bd) # if (__riscv_xlen == 64) c->inter.w_avg = ff_vvc_w_avg_8_rvv_128; # endif + DMVR_INIT(8, rvv_128); break; default: break;