From patchwork Sat May 4 15:03:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48505 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:e68f:b0:1af:836d:81b3 with SMTP id mz15csp432076pzb; Sat, 4 May 2024 08:05:05 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUtcndMsxJPn9m2k9h+IXfnCyHnAYWeonn5eBTudehyd1KxtBnPrdcChVmkq7n7jt1QOKcREuDi2slaE5YF3mjwxsp0R4iF3NIRuQ== X-Google-Smtp-Source: AGHT+IHcz0yk6vzc4WAK1b51/Ejew0j/AQqEX2fol9FeOfsJ5NtGpq9/p3v/FShREl9BIEQRPXLT X-Received: by 2002:a05:6512:522:b0:51b:ada6:f1a2 with SMTP id o2-20020a056512052200b0051bada6f1a2mr3642148lfc.3.1714835104959; Sat, 04 May 2024 08:05:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714835104; cv=none; d=google.com; s=arc-20160816; b=VPIpIgQ88Y648A2m23IYwMX901HB4LthDS0qKfy3eiDWnDXXHWzmF7HvBG8OchQKp3 PkvfHpdeU45kTOnj4iTg+pzH99fb2QrXCNhwaAE2PCVlBKgXd/KUaNfpk94eoLnssM83 sB+cCaGI96dv6hFEsNb8qj+pWa95SXjkmx/ZLNzrBQaCavHHyf9o729l+i9h56R9h5WD B4Tk8mI8VDrnzs9hCSUO/vOy/oRJeQCSGeulzcM/UpMZkCWBEObM6GZGwiOlq0MFceSs 6vwbseH3JtGiEXh/iKKF8QtHvxkC0JhTZC451xdoGEzgas76j8RoqHX95zIbKmutHrjz Ki4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=Bkyg0D6tk4OZQ8WLlR9cglHr2WfYAXFv4mruIHOUq/Q=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=DxHjihpNAt8fwJm7lMNihHeakcxmRCDtGezoAgyvD92Hp56Anm2NOi1ZcwqzFTvfIj mFgCRt07Zeix89f5ImzaNWk9XYRCDB2+tRx5uuJ1nUVPiOU0A5h6TWvuNjogy9zcTx0I YTCeJFnQTFmW6AV6wrDt6reLN61UJbo+Sk0z2PY2P1mI9xajwTjX3BydqhcunZ0N8MeW 4UeXt49C/dRp925lZWlsbw1g3rO1Zc/mTcW6adqJor2SgUp5Elhooy/29MdCR1wEYh/j ks4WUhl9iKQ1S0EtlbfOKeelNCusTrz9dlvQKgyW+7KGGxAZV1U++IouhQ2oTwLsomRF 3vkQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=FubhUxCQ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id n16-20020a05651203f000b0051aecc1baefsi1573547lfq.346.2024.05.04.08.05.04; Sat, 04 May 2024 08:05:04 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=FubhUxCQ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BBC5D68D7D5; Sat, 4 May 2024 18:03:47 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-164.mail.qq.com (out203-205-221-164.mail.qq.com [203.205.221.164]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 323F468D7A1 for ; Sat, 4 May 2024 18:03:35 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1714835007; bh=XMF0+z5veo9aNJOyiYyyciN3aCDfmPm+x1YZJqD+vjk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=FubhUxCQTAaIQ2N1cCOZC1NKqEbN5uWRYezcnNLWpxsGeLCfu/GPEXrUQYXw+ULQ1 cQVey6IQ/rNN7q/h+8qFMFsE810wRdcmB/z5/aetY/D6x6XA2oK51ggAmsufc20aNi oII/JZP9qGzUkKeaJ6eGDnT9heD6nPRm6yxFXV8I= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza15-1.qq.com (NewEsmtp) with SMTP id D215A30; Sat, 04 May 2024 23:03:18 +0800 X-QQ-mid: xmsmtpt1714835006t2e5ix2yz Message-ID: X-QQ-XMAILINFO: NIbHSc2ysKlDGr+yaqXiE9zjMRwSRDykYnhAy3Ooh/rwwiPFcZs78t6E3CkO/Y +kspw646B6ayMWoyhJCB4wyr/P/ToI7LfMNj+/TdmJM/Y0ldSZsaS7tsMf8QeeJfxrRjYOzOrPwO hkGlWAWyiJk5ujJKUbQYmg8hMwy6ltP8tznLQckQSrkG2hCjAhS68G48oRcsIXuvUEomcCKmc0JF 5hoBdX+j6p7xbqM7UpHWmIpwjJ7SUXaf5HuxL2gnAfZU9ejoUgHYH83nTo8nwspCU1Mvn8BYcJNp gF7N+Qun7OufaaSlo6ip9/7LsDt0jQCsiYnVAgn+bOVxtCaLbGcSqLxBLRK2tSmnSXWiLOFv5Q01 Sk29Cx8RpK7xHpQyoCoNB1A4OnYeUa0tNEHJWF8qRb7p5chhyGAbn6E6RNKOcfVhGAqxD+8ETRv1 7B0vDZoY1AE+0YSFDCMnAOtkPqOzTf0QoTHLKuLjvjcnOqA1fZV+tguGDw4OEbQOdCwY97avOAqw 7K2X469KUnNjMfzoWHDanXsUKXemG3HcAVMASQAp1tUNNMqP25UdaUjiiMTkfHeOwgNLnucD6EuV X+Q1Ju6u8AkIJHncqa4UxtAbZd6J0vx8woelkBIqMCliLpeGQPLdI09faIIKkuVYOtnQcXXt00fj iVdUbWUbEoVodIjFN8uqeh8R6x6FA9N1aAiU1kfmukwU3Afsv/zh4lzxLle35N39sxTCDPMGX+cQ FvxlJesOIfY3L+kkdmf6aByDOWrcd6m17LfeAMeaQMoze38JevCbDXp2OwhibviwRoMZY4RCT4cb 2RZLSYTkrpwORKtfKplFkEwWhPu0lTNQat9TpkDdyA7l5zntD3pNeiRq/9whbx8ZZ1dOz4ensSCq 1+PIuMs64LKYn8HdVgEpIrR1SgtIlTDoZ0fuOFpKZSHDGPINzPHjJ1oFAfVFW3PQ4E6ILXzpNSQq XkxfEfFshuAJbBsTtraA== X-QQ-XMRINFO: Nq+8W0+stu50PRdwbJxPCL0= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sat, 4 May 2024 23:03:13 +0800 X-OQ-MSGID: <20240504150313.2472910-10-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240504150313.2472910-1-uk7b@foxmail.com> References: <20240504150313.2472910-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 10/10] lavc/vp9dsp: R-V V mc tap hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 9eT80VP1nRcZ From: sunyuechi C908: vp9_avg_8tap_smooth_4hv_8bpp_c: 32.2 vp9_avg_8tap_smooth_4hv_8bpp_rvv_i64: 15.2 vp9_avg_8tap_smooth_8hv_8bpp_c: 98.5 vp9_avg_8tap_smooth_8hv_8bpp_rvv_i64: 23.5 vp9_avg_8tap_smooth_16hv_8bpp_c: 355.5 vp9_avg_8tap_smooth_16hv_8bpp_rvv_i64: 46.2 vp9_avg_8tap_smooth_32hv_8bpp_c: 1270.7 vp9_avg_8tap_smooth_32hv_8bpp_rvv_i64: 133.2 vp9_avg_8tap_smooth_64hv_8bpp_c: 4936.5 vp9_avg_8tap_smooth_64hv_8bpp_rvv_i64: 521.7 vp9_put_8tap_smooth_4hv_8bpp_c: 30.2 vp9_put_8tap_smooth_4hv_8bpp_rvv_i64: 14.2 vp9_put_8tap_smooth_8hv_8bpp_c: 91.5 vp9_put_8tap_smooth_8hv_8bpp_rvv_i64: 22.7 vp9_put_8tap_smooth_16hv_8bpp_c: 330.0 vp9_put_8tap_smooth_16hv_8bpp_rvv_i64: 45.0 vp9_put_8tap_smooth_32hv_8bpp_c: 1296.5 vp9_put_8tap_smooth_32hv_8bpp_rvv_i64: 131.0 vp9_put_8tap_smooth_64hv_8bpp_c: 4497.7 vp9_put_8tap_smooth_64hv_8bpp_rvv_i64: 513.2 --- libavcodec/riscv/vp9_mc_rvv.S | 79 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 3 +- 2 files changed, 81 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index c8a42c7159..6ad7ea2433 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -446,12 +446,90 @@ endconst ret .endm +.macro epel_hv_once len name do + sub a2, a2, a3 + sub a2, a2, a3 + sub a2, a2, a3 + .irp n 0 2 4 6 8 10 12 14 + epel_load_inc v\n \len put \name h 1 t + .endr + addi a4, a4, -1 +1: + addi a4, a4, -1 + epel_load v30 \len \do \name v 0 s + vse8.v v30, (a0) + vmv.v.v v0, v2 + vmv.v.v v2, v4 + vmv.v.v v4, v6 + vmv.v.v v6, v8 + vmv.v.v v8, v10 + vmv.v.v v10, v12 + vmv.v.v v12, v14 + epel_load v14 \len put \name h 1 t + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + epel_load v30 \len \do \name v 0 s + vse8.v v30, (a0) +.endm + +.macro epel_hv do name len + addi sp, sp, -64 + .irp n 0,1,2,3,4,5,6,7 + sd s\n, \n\()<<3(sp) + .endr +.ifc \len,64 + addi sp, sp, -48 + .irp n 0,1,2,3,4,5 + sd a\n, \n\()<<3(sp) + .endr +.endif +.ifc \do,avg + csrwi vxrm, 0 +.endif + epel_filter \name h t + epel_filter \name v s +.ifc \len,4 + vsetivli zero, 4, e8, mf4, ta, ma +.elseif \len == 8 + vsetivli zero, 8, e8, mf2, ta, ma +.elseif \len == 16 + vsetivli zero, 16, e8, m1, ta, ma +.else + li a6, 32 + vsetvli zero, a6, e8, m2, ta, ma +.endif + epel_hv_once \len \name \do +.ifc \len,64 + .irp n 0,1,2,3,4,5 + ld a\n, \n\()<<3(sp) + .endr + addi sp, sp, 48 + addi a0, a0, 32 + addi a2, a2, 32 + epel_filter \name h t + epel_hv_once \len \name \do +.endif + .irp n 0,1,2,3,4,5,6,7 + ld s\n, \n\()<<3(sp) + .endr + addi sp, sp, 64 + + ret +.endm + .macro gen_epel len do name type func ff_\do\()_8tap_\name\()_\len\()\type\()_rvv, zve32x epel \len \do \name \type endfunc .endm +.macro gen_epelhv len name do +func ff_\do\()_8tap_\name\()_\len\()hv_rvv, zve32x + epel_hv \do \name \len +endfunc +.endm + .irp len 64, 32, 16, 8, 4 func ff_avg\len\()_rvv, zve32x copy_avg \len avg @@ -481,6 +559,7 @@ endfunc .irp type h v gen_epel \len \do \name \type .endr + gen_epelhv \len \name \do .endr .endr .endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index ff7d445f6a..0c75ef38dc 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -126,7 +126,8 @@ static av_cold void vp9dsp_mc_init_rvv(VP9DSPContext *dsp, int bpp) #define init_subpel3(idx, type) \ init_subpel2(idx, 1, 0, h, type); \ - init_subpel2(idx, 0, 1, v, type) + init_subpel2(idx, 0, 1, v, type); \ + init_subpel2(idx, 1, 1, hv, type) init_subpel3(0, put); init_subpel3(1, avg);