From patchwork Sun Jul 14 16:28:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 50536 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:2298:b0:482:c625:d099 with SMTP id fp24csp1850482vqb; Sun, 14 Jul 2024 09:30:15 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUmGQYep2W3RZD9oE0esyMZy117NnkhALsaDjYZgj7GL44UDYjko2B6Y8gaM1jZwh3GCEl49tWA7ghw89ib56rYQ8mCYCbk6RjY4w== X-Google-Smtp-Source: AGHT+IFcbIYo/Y4MyRF2w4fYTsZBUDkF54j14rUAZmg5k9imRObr8QqAgR9iY00+siE+DZLnKSkO X-Received: by 2002:a17:906:71d4:b0:a77:cdaa:88aa with SMTP id a640c23a62f3a-a780b68913emr1219336266b.4.1720974614959; Sun, 14 Jul 2024 09:30:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1720974614; cv=none; d=google.com; s=arc-20160816; b=VNca2xLS9oE13m6ikqkg/wUDWD+k/6Zx+kCfze4fH7D5VGYZ1HiaW+os9E/cMgaq8U bdkU5gopqrcJDZo0ohGbgTIn8/LcuW1FnGX+sEnTXYcuHVg3scuCPD20ShqAoS9gw+j+ XTHFfHF/E25rSNz3cZ0IIgPq/CVbjt/tMJg90lCn5MkS507jgFiPfmNxaZmaGtCLNTf3 0d5wLbsJC39obEbJXH/YhFbpdBR9KxTQl9Gyhjp3rkwBkQfWW1UpjGfAam+4DUEJ47dz QOo768s95BAS6uzrQh2sSN4fdQIPEw8Y18jhpj/Mjhk0IauY/Iv+Et8W3jyFZlAlyvIB GS3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=tjXX76A/dADaEKEOtT2zsLx9tgY8zR37lzj7CkWUIOg=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=G1r1X58n5gGWpVB/3+icP83RD07XhXRlFxgM0Y9cKMFvXS3J8LZGiWS1IUh95bJv9l q3igxq6hfLYXOsZ13+b2K/UE4C1VQB+JdtilZ+mTBCbT2dP+V7MDV/LJR26qDSZtVrDe BLMYzKGc981i5XKsRap+PHbAW8kWkShVg1mnk45IcMI3zFe61WtCRxBbja5RJjPQhqHG w81Lb5jgqYMvZRcGG/M1o+FHaTbzItLKz+wEUZVJaQ/gQc2TyAYKVQDdRrg6pF22AyMX 3A4594k8phc5TDcaobgY7mqRZBaWLDm7hEnCr29/lteOxehzc1JZbDuEYT2lk44OTF5M sReg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=ORCCPBzX; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a79bc58b138si167018366b.97.2024.07.14.09.30.14; Sun, 14 Jul 2024 09:30:14 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=ORCCPBzX; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4AE9F68DA8E; Sun, 14 Jul 2024 19:29:42 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-251-59.mail.qq.com (out203-205-251-59.mail.qq.com [203.205.251.59]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0350268D9AA for ; Sun, 14 Jul 2024 19:29:27 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1720974554; bh=fBFWfgS0Pp+wSUI0J1npzXGFUI4CLsMKbiU8NgZ8MEw=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=ORCCPBzX8dSBh6W09JRHfNeY/gMfHq8KpDwm9jSp32ST92+EuUYhsMEKhSwdF05Q8 U3Es0IVHGKYFvnH85laidEE3Ixg6o+6uyzgC+pCClGGBxDAvEl4SU0G3s18+AgoMzi M7wT9AOx0PTDKE5OVwYqXD+bL8ZuOtEfZ0E53ayE= Received: from localhost.localdomain ([42.57.187.151]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id 74C3C47E; Mon, 15 Jul 2024 00:29:12 +0800 X-QQ-mid: xmsmtpt1720974553tov6oflda Message-ID: X-QQ-XMAILINFO: My/elTziho8NFgr+y3cOM5Ftg2lV5uwDS3x/yKHL27zaIHxJSYJ8qHWMi9aXbT beJdiJa/pjE4GP0wQ+GhbS/QjxlYVGqk9HkSUdLT6bCgeS0dwY0n55rCVy4p0r6y2WU4PoIb4Qm6 eGH1c4KY9DSQdLdiX3TurlWOpSRb39SrNoA+smbZpiNOsJDxDIONsBR1rAYKqfxS40DxYcaaD9gN 02SI65ZV+Antq/f8JCv9kCBEIqXX7YGfBWx42YPZhGRY+R2OBelKa+GzXQNm404+6SXhSGttkZL1 3RWvWJS0asnD/5rdPZ4abD6d8hiEVzJRUheOFWeewApGMgAJZG1pvHbfppJ2OZenu49J3oYds7gE sl8yfktE2+X16uveQFN8rZkZ8jxJthDRnsW4bYyuwKovELiSQ/c/c3v3B9Rz7KASs7cuCIjxfs3s zmUDhlz4UmfZEWcfUKEr0EnkgCNDUk3XlwRwiAyH/2J+N3DJBRtU48SQKsLXSmtAIxIYdO8rjm1M fKnK2+GWP9gCx1RWVNtas0DfGMULLRJmpJe+fgfshqqAv4HE4aoK149Hk7YAWGt92ZVNOsyXsuhk QMg1ahj9ybdse45Xw+8Yl9TTtF879dzwhqiZ6Stp0EJmulzTGDNMy0PNGZhQU7hXWjoLCKVEP6Ca s2jpFrdWR1iu8Gj1SGPLbAhmUP+OtevYX0v4TYjnikxd/YSXviHiNeqwu5IDsrh2W5OmBrXjHHBl CeXodOtdk6Q92GDsjM/bhSe0zPdoYT4txnUGRjnstif5eL5Ld7QFzC+2DqF8gA9RL9XrDudeNUB1 Il0MefyOCaxiaD+OAl9YbdWtnn0artUayz+4vRi55G1yEApUtW3YhZfqXGI2lH0SAN6HbRLg4YfI Qe5BX/ZuxNPzww1paFbrBnecAZCq7ahj4457ryam0nySW4eK7RdhYNRLKrdrvuz20yeiRb3tHr1h H+a4VHJMg= X-QQ-XMRINFO: OD9hHCdaPRBwq3WW+NvGbIU= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Mon, 15 Jul 2024 00:28:22 +0800 X-OQ-MSGID: <20240714162824.2728146-2-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240714162824.2728146-1-uk7b@foxmail.com> References: <20240714162824.2728146-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 2/4] lavc/vp8dsp: R-V V loop_filter_simple X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: lGuUSdLNlZ3m From: sunyuechi C908 X60 vp8_loop_filter_simple_h_c : 6.2 5.7 vp8_loop_filter_simple_h_rvv_i32 : 3.0 2.5 vp8_loop_filter_simple_v_c : 6.5 6.2 vp8_loop_filter_simple_v_rvv_i32 : 2.0 1.5 --- libavcodec/riscv/vp8dsp_init.c | 18 +++++++- libavcodec/riscv/vp8dsp_rvv.S | 77 ++++++++++++++++++++++++++++++++++ 2 files changed, 94 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/vp8dsp_init.c b/libavcodec/riscv/vp8dsp_init.c index dcb6307d5b..8c5b2c8b04 100644 --- a/libavcodec/riscv/vp8dsp_init.c +++ b/libavcodec/riscv/vp8dsp_init.c @@ -49,6 +49,9 @@ VP8_BILIN(16, rvv256); VP8_BILIN(8, rvv256); VP8_BILIN(4, rvv256); +VP8_LF(rvv128); +VP8_LF(rvv256); + av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) { #if HAVE_RV @@ -147,9 +150,15 @@ av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) av_cold void ff_vp8dsp_init_riscv(VP8DSPContext *c) { #if HAVE_RVV + int vlenb = ff_get_rv_vlenb(); + +#define init_loop_filter(vlen) \ + c->vp8_v_loop_filter_simple = ff_vp8_v_loop_filter16_simple_rvv##vlen; \ + c->vp8_h_loop_filter_simple = ff_vp8_h_loop_filter16_simple_rvv##vlen; + int flags = av_get_cpu_flags(); - if (flags & AV_CPU_FLAG_RVV_I32 && ff_rv_vlen_least(128)) { + if (flags & AV_CPU_FLAG_RVV_I32 && vlenb >= 16) { #if __riscv_xlen >= 64 if (flags & AV_CPU_FLAG_RVV_I64) c->vp8_luma_dc_wht = ff_vp8_luma_dc_wht_rvv; @@ -159,6 +168,13 @@ av_cold void ff_vp8dsp_init_riscv(VP8DSPContext *c) c->vp8_idct_dc_add4y = ff_vp8_idct_dc_add4y_rvv; if (flags & AV_CPU_FLAG_RVV_I64) c->vp8_idct_dc_add4uv = ff_vp8_idct_dc_add4uv_rvv; + + if (vlenb >= 32) { + init_loop_filter(256); + } else { + init_loop_filter(128); + } } +#undef init_loop_filter #endif } diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S index 0cbf1672f7..3cec4dd135 100644 --- a/libavcodec/riscv/vp8dsp_rvv.S +++ b/libavcodec/riscv/vp8dsp_rvv.S @@ -275,6 +275,83 @@ func ff_vp78_idct_dc_add4uv_rvv, zve64x ret endfunc +.macro filter_fmin len, vlen, a, f1, p0f2, q0f1, p0, q0 + vsetvlstatic16 \len, \vlen + vsext.vf2 \q0f1, \a + vmin.vx \p0f2, \q0f1, a6 + vmin.vx \q0f1, \q0f1, t6 + vadd.vi \p0f2, \p0f2, 3 + vadd.vi \q0f1, \q0f1, 4 + vsra.vi \p0f2, \p0f2, 3 + vsra.vi \f1, \q0f1, 3 + vadd.vv \p0f2, \p0f2, \p0 + vsub.vv \q0f1, \q0, \f1 + vmax.vx \p0f2, \p0f2, zero + vmax.vx \q0f1, \q0f1, zero +.endm + +.macro filter len, vlen, type, normal, inner, dst, stride, fE, fI, thresh +.ifc \type,v + sub t3, \dst, \stride // -1 + sub t2, t3, \stride // -2 + add t4, \dst, \stride // 1 + vle8.v v3, (t2) // p1 + vle8.v v4, (t3) // p0 + vle8.v v5, (\dst) // q0 + vle8.v v6, (t4) // q1 +.else + addi t2, \dst, -2 + addi t3, \dst, -1 + vlsseg4e8.v v3, (t2), \stride +.endif + vwsubu.vv v10, v3, v6 // p1-q1 + vwsubu.vv v12, v5, v4 // q0-p0 + + vnclip.wi v16, v10, 0 // clip_int8(p1 - q1) + vsetvlstatic16 \len, \vlen + // vp8_simple_limit(dst + i, stride, flim) + li a6, 2 + vneg.v v22, v10 + vneg.v v24, v12 + vmax.vv v22, v22, v10 + vmax.vv v24, v24, v12 + vsrl.vi v22, v22, 1 + vmacc.vx v22, a6, v24 + vmsleu.vx v0, v22, \fE + + li a7, 3 + li a6, 124 + li t6, 123 + vmul.vx v22, v12, a7 // 3 * (q0 - p0) + vzext.vf2 v24, v4 // p0 + vzext.vf2 v20, v5 // q0 + vsetvlstatic8 \len, \vlen + vwadd.wv v10, v22, v16 + vnclip.wi v28, v10, 0 + filter_fmin \len, \vlen, v28, v12, v26, v10, v24, v20 + vsetvlstatic8 \len, \vlen + vnclipu.wi v30, v26, 0 + vnclipu.wi v31, v10, 0 +.ifc \type,v + vse8.v v30, (t3), v0.t + vse8.v v31, (\dst), v0.t +.else + vssseg2e8.v v30, (t3), \stride, v0.t +.endif + +.endm + +.irp type,v,h +.irp vlen,256,128 +func ff_vp8_\type\()_loop_filter16_simple_rvv\vlen, zve32x + csrwi vxrm, 0 + vsetvlstatic8 16, \vlen + filter 16, \vlen, \type, 0, 0, a0, a1, a2, a3, a4 + ret +endfunc +.endr +.endr + .macro bilin_load_h dst mn addi t5, a2, 1 vle8.v \dst, (a2)