From patchwork Tue May 21 17:13:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 49108 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:9214:b0:1af:fc2d:ff5a with SMTP id tl20csp128316pzb; Tue, 21 May 2024 10:15:40 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXqwawFCukgh3F5+ccnGYaErbKp1TSLzM6+Eo9KQDb14vRpYX8NyL6xjIeAGhG9y/3Ae8RsPVQiTUtKue6P8jQGsumpcJdmtSCQ8A== X-Google-Smtp-Source: AGHT+IE1YVgtV5X6HuvWdiQLQrS4KWgliBZYBuJ92R2zb+3jt0H5Td5z1K8I6MjBw8zKwlMNpqa1 X-Received: by 2002:a05:6512:238a:b0:51a:c7d0:9e84 with SMTP id 2adb3069b0e04-5220fc7c57bmr37155954e87.12.1716311740021; Tue, 21 May 2024 10:15:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716311739; cv=none; d=google.com; s=arc-20160816; b=NOEqkXPJn2ikf3vIV51q/n7/RCR0jR+L+5aSI8Q08AvWsOfW1TZICJ6TWBHx2JNc01 /+mAcxW3Vcux9v9tpqCoLG4tKjSsiESA0Oh2cxfvU6U++UCvY1PYpM4QBCgreV82Qo/x aHIAgxayBXX0Ooyft3BZy3RM+opaqgsixlRMtKi3tSvifSNMuDE632CyfDDBMp6SQMAk MFJE/tj4IO8NJ9MZZo8NEyIPmMzglOdpczK7P8ROK/2ssPTkXtEWDUrW3kN+l3by3kjl zE8npzrX1Td1Vekx5jVmIC7KNUrr7Utg7py94lRPtEGVgkMn3Ui++6f1BvpCowIt/yaZ Gjfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=m7W0pKg4k7dAvDA8PTzatCMtnwlbB83w88QxrFquM9c=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=A+dCqLr9H/Q+jbFKLlesi89S/0qiZiKwltQkn1zRaqmY1gk9/oel2TMRySojB75d/5 Nu0qRWQd8qfdbx/w4yd2g9uFBVeQP4AibOFfwkTFuM6lajwWV/Lhv9T6aiNW4g4A/A0a Le6tmbEVSsKpg1eXmSGs/bq9ZsY9ioeneTnWDppwvSyPBKHIG1MlF4T3ClKaAQhWCO2q Q2WDIlHfwSA+dhQB+r81C2qsGgiJPiou+u5Pyna/iKvu9zB/Zh5nnxPp/25eLhLQUqcl mUt4gEfKbHjiIBh1qG4Cux/+JOEtyc9mwWRCVJKuH9LUyiZtCwcr7Y3sQCkLvAdDhZNZ vn/A==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=D5ZqZbbW; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a56168906si1147794066b.485.2024.05.21.10.15.38; Tue, 21 May 2024 10:15:39 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=D5ZqZbbW; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E963468CFCE; Tue, 21 May 2024 20:15:36 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from xmbghk7.mail.qq.com (unknown [119.28.226.17]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2633368D2FE for ; Tue, 21 May 2024 20:14:04 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1716311636; bh=QydTk7Oo2LJk9UkfURE9mbJ+ztCn/jPViFVhDDybxsw=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=D5ZqZbbWEx8jX++MbB1Kyrcmm6K01sbQs2gKMDrf83T2igZUeViDdcy653b9tA08N NceLQv2hUOqbAsRXuQiIuZO8svCMWFkNpzZOX+K/JDjFWhQEn06bOqpkrJjmj83IVT bwjzX/sJOryWt62pg6eIdaPdI5a3ixeskw2YmXnA= Received: from localhost.localdomain ([42.177.176.187]) by newxmesmtplogicsvrszb9-1.qq.com (NewEsmtp) with SMTP id 374AC612; Wed, 22 May 2024 01:13:52 +0800 X-QQ-mid: xmsmtpt1716311636tvxujz342 Message-ID: X-QQ-XMAILINFO: MBjwNRQMz5zUFAa1kEu9WuKu/nN5tzHkNJ09DuGaMXPNCo1GXbKd3BKQ/Dlkhh ouAMAvPKn8u7RYbzKyRokdwjB0ZXyim2zjXSirS9kzg+xJs3io/8hQzzKbcD3uMBe9hJS1EHFhWt +eHk6Lvx+Vi6bwhJ0oVBfc10K4pdMT9cofOxXhHYEEReV2Aa68yIABAO+/Vbsjh9fvKAdlxl2c5o hFniqqcCxEtnyKt1wimyfj4xO0O995ydBoEtpWEgU9/hGN0EA2cCiiehBrnoW2ojPbZhKsQbF2TM R44zm58ZPCOPM+MlRZ1yIxRYHswVkH6sd34RWA1rs1oQqzav3nT2T3Xiox2OeHy76R2Tm86Ac2bd 5u9LAyrsEzAhbS/t+tDtagcw6lVsnLAQgZ6ldizM1118XQmIiqZYBVI1YYsiahfQppT92a5V3iNn g0h4MlMg3Nw87f5QyXln1cLiq1MyAf7s2u+DPNVVzKy8F0eMC+pjsAhhU8JJzCcsGhA81/2LFydT 9kPgPt6x1sXePgRPovHTuPZ76QcF3BpNd0xmDz8N5eHRE6Yu5hgz9PibOOvWVwTvYelwZtEAF8Zy KT0EWUC5u5aWWiaRwAorx8QZ9H1Sa6IxYatrVyKFf9CVvrfFGbMQVtOso/wZbs+2cl5woefiEoFC gaEROcqsLyNqAnKv31KbcjeOG1xbCAeB0URjqcxo/44HnZ6JVW1WWjPQJ2gyIEvi6QSHhb5ohHQI Y73MM0Vt2FwsfH1c4xNWrlFraFQ2IOqrxMTTZVZGB5djhdOGcV7WOR5gIoVQa3SKVgAEBTNe4RSy p+dj2M9wc3cfaAMJtQBl47iFSV7+mAg+Zw+CaCdX1eKrIoSItzjePiVGmN9NImA/adqv8DuY/fWz f02ZJn+JQx76al/UCE43bjbAobqgd8NnEcgd+s0xk2Sil7br1Sveiz7HWHMeXOkg== X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Wed, 22 May 2024 01:13:19 +0800 X-OQ-MSGID: <20240521171319.2629938-5-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240521171319.2629938-1-uk7b@foxmail.com> References: <20240521171319.2629938-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 5/5] lavc/vp9dsp: R-V V mc tap hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: MW5lBhkL9Jyv From: sunyuechi C908 X60 vp9_avg_8tap_smooth_4hv_8bpp_c : 32.0 28.2 vp9_avg_8tap_smooth_4hv_8bpp_rvv_i32 : 15.0 13.2 vp9_avg_8tap_smooth_8hv_8bpp_c : 98.0 86.2 vp9_avg_8tap_smooth_8hv_8bpp_rvv_i32 : 23.7 21.0 vp9_avg_8tap_smooth_16hv_8bpp_c : 355.5 297.0 vp9_avg_8tap_smooth_16hv_8bpp_rvv_i32 : 62.7 41.2 vp9_avg_8tap_smooth_32hv_8bpp_c : 1273.0 1099.7 vp9_avg_8tap_smooth_32hv_8bpp_rvv_i32 : 133.7 119.2 vp9_avg_8tap_smooth_64hv_8bpp_c : 4933.0 4240.5 vp9_avg_8tap_smooth_64hv_8bpp_rvv_i32 : 506.7 227.0 vp9_put_8tap_smooth_4hv_8bpp_c : 30.2 27.0 vp9_put_8tap_smooth_4hv_8bpp_rvv_i32 : 14.5 12.7 vp9_put_8tap_smooth_8hv_8bpp_c : 91.2 81.2 vp9_put_8tap_smooth_8hv_8bpp_rvv_i32 : 22.7 20.2 vp9_put_8tap_smooth_16hv_8bpp_c : 329.2 277.7 vp9_put_8tap_smooth_16hv_8bpp_rvv_i32 : 44.7 40.0 vp9_put_8tap_smooth_32hv_8bpp_c : 1183.7 1022.7 vp9_put_8tap_smooth_32hv_8bpp_rvv_i32 : 130.7 116.5 vp9_put_8tap_smooth_64hv_8bpp_c : 4502.7 3954.5 vp9_put_8tap_smooth_64hv_8bpp_rvv_i32 : 496.0 224.7 --- libavcodec/riscv/vp9_mc_rvv.S | 75 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 8 ++++ 2 files changed, 83 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index d7db775df7..06c79b16f7 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -362,6 +362,77 @@ func ff_\op\()_vp9_8tap_\name\()_\len\()\type\()_rvv\vlen\(), zve32x endfunc .endm +#if __riscv_xlen == 64 +.macro epel_hv_once len name op + sub a2, a2, a3 + sub a2, a2, a3 + sub a2, a2, a3 + .irp n,0,2,4,6,8,10,12,14 + epel_load_inc v\n, \len, put, \name, h, 1, t + .endr + addi a4, a4, -1 +1: + addi a4, a4, -1 + epel_load v30, \len, \op, \name, v, 0, s + vse8.v v30, (a0) + vmv.v.v v0, v2 + vmv.v.v v2, v4 + vmv.v.v v4, v6 + vmv.v.v v6, v8 + vmv.v.v v8, v10 + vmv.v.v v10, v12 + vmv.v.v v12, v14 + epel_load v14, \len, put, \name, h, 1, t + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + epel_load v30, \len, \op, \name, v, 0, s + vse8.v v30, (a0) +.endm + +.macro epel_hv op name len vlen +func ff_\op\()_vp9_8tap_\name\()_\len\()hv_rvv\vlen\(), zve32x + addi sp, sp, -64 + .irp n,0,1,2,3,4,5,6,7 + sd s\n, \n\()<<3(sp) + .endr +.if \len == 64 && \vlen < 256 + addi sp, sp, -48 + .irp n,0,1,2,3,4,5 + sd a\n, \n\()<<3(sp) + .endr +.endif +.ifc \op,avg + csrwi vxrm, 0 +.endif + epel_filter \name, h, t + epel_filter \name, v, s +.if \vlen < 256 + vsetvlstatic8 \len, a6, 32, m2 +.else + vsetvlstatic8 \len, a6, 64, m2 +.endif + epel_hv_once \len, \name, \op +.if \len == 64 && \vlen < 256 + .irp n,0,1,2,3,4,5 + ld a\n, \n\()<<3(sp) + .endr + addi sp, sp, 48 + addi a0, a0, 32 + addi a2, a2, 32 + epel_filter \name, h, t + epel_hv_once \len, \name, \op +.endif + .irp n,0,1,2,3,4,5,6,7 + ld s\n, \n\()<<3(sp) + .endr + addi sp, sp, 64 + + ret +endfunc +.endm +#endif + .irp len, 64, 32, 16, 8, 4 copy_avg \len .irp op, put, avg @@ -373,6 +444,10 @@ endfunc epel \len, \op, \name, \type, 128 epel \len, \op, \name, \type, 256 .endr + #if __riscv_xlen == 64 + epel_hv \op, \name, \len, 128 + epel_hv \op, \name, \len, 256 + #endif .endr .endr .endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index be5369d506..887dba461f 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -118,6 +118,10 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) if (flags & AV_CPU_FLAG_RVB_ADDR) { init_subpel2(0, 0, 1, v, put, 128); init_subpel2(1, 0, 1, v, avg, 128); +# if __riscv_xlen == 64 + init_subpel2(0, 1, 1, hv, put, 128); + init_subpel2(1, 1, 1, hv, avg, 128); +# endif } } @@ -128,6 +132,10 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) if (flags & AV_CPU_FLAG_RVB_ADDR) { init_subpel2(0, 0, 1, v, put, 256); init_subpel2(1, 0, 1, v, avg, 256); +# if __riscv_xlen == 64 + init_subpel2(0, 1, 1, hv, put, 256); + init_subpel2(1, 1, 1, hv, avg, 256); +# endif } } }