From patchwork Sun May 5 16:45:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48546 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:e68f:b0:1af:836d:81b3 with SMTP id mz15csp962821pzb; Sun, 5 May 2024 09:46:38 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCV3M/bBw+0A96YoeTIwpI5Q/OK1AHXgZP2kGhX0gHLtEltIlU/mJQR0EC+5rnOmpABOqKGeAu61JzqnA27kYWFAWgTIc2KKHFx1og== X-Google-Smtp-Source: AGHT+IGZhIJB7wQuKF6R2HJzFFxHB3p6LP+yHr1lNi2wAu5WMvZHdSi+VH5pt+BSsWJL8vFNB4oT X-Received: by 2002:a19:e009:0:b0:51f:4f9c:8591 with SMTP id x9-20020a19e009000000b0051f4f9c8591mr4408535lfg.6.1714927598591; Sun, 05 May 2024 09:46:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714927598; cv=none; d=google.com; s=arc-20160816; b=TLo+7nN2GH8MgMgj/FydwS77f2HFB0n++nc7pSUcusCSvjiyHPhAtuDbS3OdZAkZ18 NY797IzYI6hJRbw7ubSnom2OgDlAReKq1VPmbn76M9NjVj3fNtS45HqOKTN0+rGCOGQR IGyBf+EqP5VrYqBGRTKmg6DKGlh0W/hXGmzs4f8OgjRurpF1UnqoSv5oJwJP9+BSWUcb JjQjZYGGRa7f3kGR3HcZ8kP0bLACfwirK0uy/2eaxJ4E9/kiNfqyAZB+l3wQcfifOQxo g4aH35nVPcjZ3JcBEbYEee+cnvW3WMIk+k9qU1n3t2fqm5PBQ7EUsRYn2UisC/UF1baB UOCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=6Uxqf2QoCAE9/VLb7ParmREnrzvqTL3vGofhDjsbsqU=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=EcFlgVO2OKMTvfU2KHhMksgcooxouzNpxBXGf/Wqt98b2AwHMakqmBS4pHYUm7v7kw RjEv2xwmt2cnBAarMSOPXAKoNrL3VmqZ4nyAzjQvMta5/O3wb9Cjgi0/sbzGvE1ElYFe rLeYziTAI6u4md8sWZWF81BbNuOzeJXFhfVqkzgrU5xE9voMgXY6J4l82tE/3W36THrM yoB3vvCL3NiqYNYkNKQE3ab47zT/yMQjLhEVyFqyUom0UzCIrWUkW6bSzj3oXDiqlbcw fPFOPApmrjU9/XbuPLm11ftbILrDul+ernUZnSZ0LnPTpUhLzRIKPxgHPnVTEUcEYHLt sbsg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="KJSH/Etp"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id f27-20020a19381b000000b0051d31f7d44esi2364681lfa.314.2024.05.05.09.46.38; Sun, 05 May 2024 09:46:38 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="KJSH/Etp"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3A52168D5F2; Sun, 5 May 2024 19:46:01 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-251-84.mail.qq.com (unknown [203.205.251.84]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 257DE68D59D for ; Sun, 5 May 2024 19:45:50 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1714927541; bh=iZOfBIg6gtmo1mk1kr3B1/fY8HMJHY3+API5G2NAfog=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=KJSH/EtpPdJX8NOcjfYaYgvCw+N75xNZlbJe6D7XMiYE3amRkgLxOAeBRThWszfb3 /fDGOpag180MD+ZcN0E91AesqJBg6Iu9htEOIqbIXxf34OE+WtVK5+/GowVhbLXKHK 5N6P+oQ6NfB7zHxXx/ahurSVS/LuvD3IIkbtcgMU= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id B65A42EA; Mon, 06 May 2024 00:45:37 +0800 X-QQ-mid: xmsmtpt1714927541tj16pv6ik Message-ID: X-QQ-XMAILINFO: NyTsQ4JOu2J24jr+YPFhK0JP6nvQIy41aUw1Ajh/zrchsRmQG8InqWabcgH4/E cgZGEAeHpJSrNZnT8RrpPZqrm9PQEGiabSVqmXR6xzuzWv9I7Rl+3KGrr6Hd+Ht9hmBj5vtMK+7/ Vf4DAKWMoYkyhyBije5DoTswmpmttlr2OvvzF5sdmI0DX92eDBN7CKFSRd59UcJXmRzdvyJiaLCP MxXkGHtcSNqnrSnGcUNyACIir9IVdH4cOnOImz5T1urrns62ardC/Q5yainQnt7RybvxDXeGyYVT mxJfR2GxwlPYBk11u94iBK0Kih37LpdjOUNY6oSxOFdC8Qtcmha8c+2xzxNk6QqGaF1Ti69Oeti4 /uUu6lS/OtGUvXruzEcAMw2+4DeY5Hj8fZkOnDZimV0TCJdH4lFuGCK7Tx6unXi9KvTCpJGgnUGm zHFEFIHPrql7RrgIP9tk3Q+fwVx5f5Cks4BNFH4RIb5n7WYwDsOTTYBmf/4LRClJmxX1lyyDF9kq KYH6nnwDiGM9Fcpggubx3PI0RWeMByVgv9JNsbJ3YXptJUJNPkmko8wGZNDR6YTA8dGiEI3XLXoB 0/+qxHB0Ynr1OJwe9xI1tN99GVDuDtSu4NdjQrq1c/cdMy+Q0HoPIw/UTGCs7oAwaF1lZII7IOEg 2ul1O8vztyi2BaajswvHvNmvD3s5zFoeMrNeQRD2uz9LqGzmuqdjDWkkAt4sAqS4QTquTOLo50y0 smgz9AMqFMwbVA6vVOloWKLDjxaF9fCT7+adyXOk+uWOk57e4Lp10NBTGvBgmTbCcXS3pbytzvQs MB4eP2c4hL++smNi0dbVLjTIWRBSJ5IPtnA2DgHTxx7aBgleng6IJMic5lrt5SvI746M4m5EDfvh 5ZqQF8kAKl/hiZ42Q4kAe+rVsOpw59Lw== X-QQ-XMRINFO: M/715EihBoGSf6IYSX1iLFg= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Mon, 6 May 2024 00:45:31 +0800 X-OQ-MSGID: <20240505164536.872683-5-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240505164536.872683-1-uk7b@foxmail.com> References: <20240505164536.872683-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 05/10] lavc/vp8dsp: R-V V put_epel h X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: IiH1Mhkl4Vpd From: sunyuechi C908: vp8_put_epel4_h4_c: 10.7 vp8_put_epel4_h4_rvv_i32: 5.0 vp8_put_epel4_h6_c: 15.0 vp8_put_epel4_h6_rvv_i32: 6.2 vp8_put_epel8_h4_c: 43.2 vp8_put_epel8_h4_rvv_i32: 11.2 vp8_put_epel8_h6_c: 57.5 vp8_put_epel8_h6_rvv_i32: 13.5 vp8_put_epel16_h4_c: 92.5 vp8_put_epel16_h4_rvv_i32: 13.7 vp8_put_epel16_h6_c: 139.0 vp8_put_epel16_h6_rvv_i32: 16.5 --- libavcodec/riscv/vp8dsp_init.c | 10 ++++ libavcodec/riscv/vp8dsp_rvv.S | 87 ++++++++++++++++++++++++++++++++++ 2 files changed, 97 insertions(+) diff --git a/libavcodec/riscv/vp8dsp_init.c b/libavcodec/riscv/vp8dsp_init.c index 9627105fc8..a4b7d49932 100644 --- a/libavcodec/riscv/vp8dsp_init.c +++ b/libavcodec/riscv/vp8dsp_init.c @@ -33,6 +33,9 @@ void ff_vp8_idct_dc_add4uv_rvv(uint8_t *dst, int16_t block[4][16], ptrdiff_t str VP8_EPEL(16, rvi); VP8_EPEL(8, rvi); VP8_EPEL(4, rvi); +VP8_EPEL(16, rvv); +VP8_EPEL(8, rvv); +VP8_EPEL(4, rvv); VP8_BILIN(16, rvv); VP8_BILIN(8, rvv); @@ -80,6 +83,13 @@ av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) c->put_vp8_bilinear_pixels_tab[2][1][2] = ff_put_vp8_bilin4_hv_rvv; c->put_vp8_bilinear_pixels_tab[2][2][1] = ff_put_vp8_bilin4_hv_rvv; c->put_vp8_bilinear_pixels_tab[2][2][2] = ff_put_vp8_bilin4_hv_rvv; + + c->put_vp8_epel_pixels_tab[0][0][2] = ff_put_vp8_epel16_h6_rvv; + c->put_vp8_epel_pixels_tab[1][0][2] = ff_put_vp8_epel8_h6_rvv; + c->put_vp8_epel_pixels_tab[2][0][2] = ff_put_vp8_epel4_h6_rvv; + c->put_vp8_epel_pixels_tab[0][0][1] = ff_put_vp8_epel16_h4_rvv; + c->put_vp8_epel_pixels_tab[1][0][1] = ff_put_vp8_epel8_h4_rvv; + c->put_vp8_epel_pixels_tab[2][0][1] = ff_put_vp8_epel4_h4_rvv; } #endif #endif diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S index f8105010c9..f5c4c1d85d 100644 --- a/libavcodec/riscv/vp8dsp_rvv.S +++ b/libavcodec/riscv/vp8dsp_rvv.S @@ -32,6 +32,16 @@ .endif .endm +.macro vsetvlstatic16 len +.if \len <= 4 + vsetivli zero, \len, e16, mf2, ta, ma +.elseif \len <= 8 + vsetivli zero, \len, e16, m1, ta, ma +.elseif \len <= 16 + vsetivli zero, \len, e16, m2, ta, ma +.endif +.endm + .macro vp8_idct_dc_add vlse32.v v0, (a0), a2 lh a5, 0(a1) @@ -162,8 +172,85 @@ func ff_put_vp8_bilin\len\()_hv_rvv, zve32x endfunc .endm +const subpel_filters + .byte 0, -6, 123, 12, -1, 0 + .byte 2, -11, 108, 36, -8, 1 + .byte 0, -9, 93, 50, -6, 0 + .byte 3, -16, 77, 77, -16, 3 + .byte 0, -6, 50, 93, -9, 0 + .byte 1, -8, 36, 108, -11, 2 + .byte 0, -1, 12, 123, -6, 0 +endconst + +.macro epel_filter size + lla t2, subpel_filters + addi t0, a5, -1 + li t1, 6 + mul t0, t0, t1 + add t0, t0, t2 + .irp n 1,2,3,4 + lb t\n, \n(t0) + .endr +.ifc \size,6 + lb t5, 5(t0) + lb t0, (t0) +.endif +.endm + +.macro epel_load dst len size + addi t6, a2, -1 + addi a7, a2, 1 + vle8.v v24, (a2) + vle8.v v22, (t6) + vle8.v v26, (a7) + addi a7, a7, 1 + vle8.v v28, (a7) + vwmulu.vx v16, v24, t2 + vwmulu.vx v20, v26, t3 +.ifc \size,6 + addi t6, t6, -1 + addi a7, a7, 1 + vle8.v v24, (t6) + vle8.v v26, (a7) + vwmaccu.vx v16, t0, v24 + vwmaccu.vx v16, t5, v26 +.endif + li t6, 64 + vwmaccsu.vx v16, t1, v22 + vwmaccsu.vx v16, t4, v28 + vwadd.wx v16, v16, t6 + vsetvlstatic16 \len + vwadd.vv v24, v16, v20 + vnsra.wi v24, v24, 7 + vmax.vx v24, v24, zero + vsetvlstatic8 \len + vnclipu.wi \dst, v24, 0 +.endm + +.macro epel_load_inc dst len size + epel_load \dst \len \size + add a2, a2, a3 +.endm + +.macro epel len size type +func ff_put_vp8_epel\len\()_\type\()\size\()_rvv, zve32x + epel_filter \size + vsetvlstatic8 \len +1: + addi a4, a4, -1 + epel_load_inc v30 \len \size + vse8.v v30, (a0) + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + .irp len 16,8,4 put_vp8_bilin_h \len put_vp8_bilin_v \len put_vp8_bilin_hv \len +epel \len 6 h +epel \len 4 h .endr