From patchwork Sat Jun 22 15:58:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 50085 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:ae71:0:b0:482:c625:d099 with SMTP id w17csp1137800vqz; Sat, 22 Jun 2024 08:58:47 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUhWvtLy9BqapvQ3POGwXjCfXhak2sNn5q/rw7tX9aPQl6lamTFRt4euA3MmlwUvmEHjTnLdpJxSGnERBwpdQPH4G64XpAdn1C79w== X-Google-Smtp-Source: AGHT+IGsY7cG/gp9+bBxLixlZSVfuzTtdPoScoUBHTkYyVmuobQz0t/Fif6+AjblABjTTHENY8DD X-Received: by 2002:a05:6512:3b8e:b0:52c:80f6:d384 with SMTP id 2adb3069b0e04-52cdf7e66c4mr1257872e87.3.1719071927365; Sat, 22 Jun 2024 08:58:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1719071927; cv=none; d=google.com; s=arc-20160816; b=j/Ss5e7f0u7G5S+Jvp6grq62RWQAwCVWY9R8wzJUIOgXaCM4jftY0DqbJj3ZZMLQE5 0JREJiSt/pSj5Da4OAlblm2xeZi2KcOc1TGcDT7uaUe5eBzKgiEemViWwvYhCPMCZf1f 1DVHbVYzN+Ed0W4cZ7QfkgKg4Vm79BmE4rln36KeXR/aVKTe5H5wuxxprtyUQ/Y1cXCB JYhgVA//3z8pWxfBIaLFtdbGbRFIxQ9Sib3uLHD3vvit8BI3nIorlkU+zbLoEHi90hfE 78bvp+26HgAWY2cxm+peDBpW2KcucyETWkScM1kW8JFtT2uajNOyiAgYUWMGuj1cLNTb Ll1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=R6nsFeyUtGD9ntMkKVrnL/aXJbxy4rryD3etvPqitJ4=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=rd6IdnFsiiDmuov1FSHpiPUqaJ10iSJkSSKuRrTnT65w99XcdbFHRqXkdw5jDIdlI6 AaRCVT77s9y0esFQgpRqMuliWJtRHI449S5/CTkbqONClsMoja1guKxSyhinmUtr3szf EzsgO0RxeBx9VQ7RWEpQ2e99MDbCo2wS8lJXRPvK47IUVVVZMSnYuUn/5elr8uucMd9/ fCXm+xzYnP3prJc5VXJDDPc3jLIhx4sRjE5PFTdeYuex4YPe4AXjrqCSp/wGbzHYjSD1 s5V266Yje10J3B9omoj4xnrRIEWRt0RGenuQoLfs7ZDM2YOznRGmwWIBYmOAIjbgF0se Fc5w==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=MP14KIRc; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-52cd641cf6bsi1120223e87.303.2024.06.22.08.58.47; Sat, 22 Jun 2024 08:58:47 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=MP14KIRc; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0FFE868D128; Sat, 22 Jun 2024 18:58:33 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-251-27.mail.qq.com (out203-205-251-27.mail.qq.com [203.205.251.27]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CF4E368D283 for ; Sat, 22 Jun 2024 18:58:22 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1719071892; bh=ysciBxj3c/xr3BsiYo9Chx+PqspppN3J6vFACdVZwWE=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=MP14KIRc2K9CC3ZsZEEoAHlGPHTC+xC1dRpryS54+zlHW6e7DH0RGjsjbT8CIT7yp eqS5KGYFfvOa/6k820WPXMhIXLpkd3Z9qYxDr1q26KNm7Ou2Hm8rQ01JHLiVe9/r6p YZLScGATcqxOls1MSLkGsMIBy377DpJUe6Np8XvM= Received: from localhost.localdomain ([116.139.97.96]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id E8A868C7; Sat, 22 Jun 2024 23:58:10 +0800 X-QQ-mid: xmsmtpt1719071891tkpfz76mz Message-ID: X-QQ-XMAILINFO: MmPNY57tR1XnkZQ7BBmorJlLGtn39wzSri/8GN3EshvSXQdVU+Kq7w1PT9FwJf p2DGp8vup1MonCljjfUImEbyHZg//sdMWTX2N4ncRSD5Fx6PLnzGPLAHeAwigzxjcCLkdQUHFYIV T2yrVenX1EM8CFQjeEcKVEq+gmgG5Tvr7Xmkk3+B+iOaLsphgf7XswTt6wSyePHT8I/03xw0UbWa 7AO4sCGGYzrJccHy4hG1W1vDyIFbB9zPX0tULA+ZqXRiIdyhb7//29KsUp6y0Imk73wRyr0jkbts RggwUc41yqzH1wm2PIHTPiDLtOtqJgvuUSp46fWGbSBKDgJnZiDCsd7NsX6YmXtT51wBkuJlQO0g 8SrilLtMpBd2a6UpD3+Xteor+1V+yEYj0acR/T6snyjAz7I/w1roY5fVIWr1gegcvmLGkBipnziH /DPchR/bL3pEC0TUob/j4K8SGrzU9rXnRYnCOqHHX2wn+C2i9vVQ6zjv2SaRAOgfcIhLcAcKbYWc 1lsWdgXXts8g/2lx5u14AdBe4JhE8DLEMv37DrupUNY+wPhR7yMShd9nmPOkavipzCWdD+LQiZKT tJcNwRxnwPBum2ld9BwkgOnPl62+CCDE6fPYQCjM9QBDmeEI3zULysHNkK0npKeExXABeNn4cHNt lrOOYuEZ/p77aDSv1QzsVGI7SnjVLcAtiiVSbcxqmXVLFW42s6X2LvNRtbj/ubXNrTipBNQdRiIP 6qmywLGkPbym6S17DtXqcuSyIbgjCiXts9yoHfBCg3lrsQs7qzo69piwVopQsot9Cq7ywPPBmNS+ sHOmO4+48eOCs8TcdomJL0s9RP/nx4agKrfYzBJ/auoMxUIxiQZ/QU3APLeE0Lt8tb3ieNc6pOLT zGiTJ2yEswMT1UMNdwnA28YL2y9F+zeuZSodusc2KZq3PzVRMLjaW4ZAYyn8wpUYCR8eMN/z5t X-QQ-XMRINFO: MSVp+SPm3vtS1Vd6Y4Mggwc= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sat, 22 Jun 2024 23:58:04 +0800 X-OQ-MSGID: <20240622155806.3191984-2-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240622155806.3191984-1-uk7b@foxmail.com> References: <20240622155806.3191984-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/4] lavc/vp8dsp: R-V V loop_filter_simple X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: e2EmYksufVZb From: sunyuechi C908 X60 vp8_loop_filter_simple_h_c : 7.0 6.0 vp8_loop_filter_simple_h_rvv_i32 : 3.2 2.7 vp8_loop_filter_simple_v_c : 7.2 6.5 vp8_loop_filter_simple_v_rvv_i32 : 1.7 1.2 --- libavcodec/riscv/vp8dsp_init.c | 18 ++++++- libavcodec/riscv/vp8dsp_rvv.S | 87 ++++++++++++++++++++++++++++++++++ 2 files changed, 104 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/vp8dsp_init.c b/libavcodec/riscv/vp8dsp_init.c index dcb6307d5b..8c5b2c8b04 100644 --- a/libavcodec/riscv/vp8dsp_init.c +++ b/libavcodec/riscv/vp8dsp_init.c @@ -49,6 +49,9 @@ VP8_BILIN(16, rvv256); VP8_BILIN(8, rvv256); VP8_BILIN(4, rvv256); +VP8_LF(rvv128); +VP8_LF(rvv256); + av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) { #if HAVE_RV @@ -147,9 +150,15 @@ av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) av_cold void ff_vp8dsp_init_riscv(VP8DSPContext *c) { #if HAVE_RVV + int vlenb = ff_get_rv_vlenb(); + +#define init_loop_filter(vlen) \ + c->vp8_v_loop_filter_simple = ff_vp8_v_loop_filter16_simple_rvv##vlen; \ + c->vp8_h_loop_filter_simple = ff_vp8_h_loop_filter16_simple_rvv##vlen; + int flags = av_get_cpu_flags(); - if (flags & AV_CPU_FLAG_RVV_I32 && ff_rv_vlen_least(128)) { + if (flags & AV_CPU_FLAG_RVV_I32 && vlenb >= 16) { #if __riscv_xlen >= 64 if (flags & AV_CPU_FLAG_RVV_I64) c->vp8_luma_dc_wht = ff_vp8_luma_dc_wht_rvv; @@ -159,6 +168,13 @@ av_cold void ff_vp8dsp_init_riscv(VP8DSPContext *c) c->vp8_idct_dc_add4y = ff_vp8_idct_dc_add4y_rvv; if (flags & AV_CPU_FLAG_RVV_I64) c->vp8_idct_dc_add4uv = ff_vp8_idct_dc_add4uv_rvv; + + if (vlenb >= 32) { + init_loop_filter(256); + } else { + init_loop_filter(128); + } } +#undef init_loop_filter #endif } diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S index 0cbf1672f7..b5f8bb31b4 100644 --- a/libavcodec/riscv/vp8dsp_rvv.S +++ b/libavcodec/riscv/vp8dsp_rvv.S @@ -275,6 +275,93 @@ func ff_vp78_idct_dc_add4uv_rvv, zve64x ret endfunc +.macro filter_fmin len, vlen, a, f1, p0f2, q0f1 + vsetvlstatic16 \len, \vlen + vsext.vf2 \q0f1, \a + vmin.vx \p0f2, \q0f1, a7 + vmin.vx \q0f1, \q0f1, t3 + vadd.vi \p0f2, \p0f2, 3 + vadd.vi \q0f1, \q0f1, 4 + vsra.vi \p0f2, \p0f2, 3 + vsra.vi \f1, \q0f1, 3 + vadd.vv \p0f2, \p0f2, v8 + vsub.vv \q0f1, v16, \f1 + vmax.vx \p0f2, \p0f2, zero + vmax.vx \q0f1, \q0f1, zero +.endm + +.macro filter len, vlen, type, normal, inner, dst, stride, fE, fI, thresh +.ifc \type,v + slli a6, \stride, 1 + sub t2, \dst, a6 + add t4, \dst, \stride + sub t1, \dst, \stride + vle8.v v1, (t2) + vle8.v v11, (t4) + vle8.v v17, (t1) + vle8.v v22, (\dst) +.else + addi t1, \dst, -1 + addi a6, \dst, -2 + addi t4, \dst, 1 + vlse8.v v1, (a6), \stride + vlse8.v v11, (t4), \stride + vlse8.v v17, (t1), \stride + vlse8.v v22, (\dst), \stride +.endif + vwsubu.vv v12, v1, v11 // p1-q1 + vwsubu.vv v24, v22, v17 // q0-p0 + vnclip.wi v23, v12, 0 + vsetvlstatic16 \len, \vlen + // vp8_simple_limit(dst + i, stride, flim) + li a7, 2 + vneg.v v18, v12 + vmax.vv v18, v18, v12 + vneg.v v8, v24 + vmax.vv v8, v8, v24 + vsrl.vi v18, v18, 1 + vmacc.vx v18, a7, v8 + vmsleu.vx v0, v18, \fE + + li t5, 3 + li a7, 124 + li t3, 123 + vmul.vx v30, v24, t5 + vsext.vf2 v4, v23 + vzext.vf2 v8, v17 // p0 + vzext.vf2 v16, v22 // q0 + vadd.vv v12, v30, v4 + vsetvlstatic8 \len, \vlen + vnclip.wi v11, v12, 0 + filter_fmin \len, \vlen, v11, v24, v4, v6 + vsetvlstatic8 \len, \vlen + vnclipu.wi v4, v4, 0 + vnclipu.wi v6, v6, 0 + +.ifc \type,v + vse8.v v4, (t1), v0.t + vse8.v v6, (\dst), v0.t +.else + vsse8.v v4, (t1), \stride, v0.t + vsse8.v v6, (\dst), \stride, v0.t +.endif + +.endm + +.irp vlen,256,128 +func ff_vp8_v_loop_filter16_simple_rvv\vlen, zve32x + vsetvlstatic8 16, \vlen + filter 16, \vlen, v, 0, 0, a0, a1, a2, a3, a4 + ret +endfunc + +func ff_vp8_h_loop_filter16_simple_rvv\vlen, zve32x + vsetvlstatic8 16, \vlen + filter 16, \vlen, h, 0, 0, a0, a1, a2, a3, a4 + ret +endfunc +.endr + .macro bilin_load_h dst mn addi t5, a2, 1 vle8.v \dst, (a2)