From patchwork Sat May 18 18:15:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48993 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp3572042pzb; Sat, 18 May 2024 11:16:33 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVv3/+ltgjh4PW3lVj2jpu4oBmhBlgGQt/4T6t8+VqoMBVxL3mnYt6ioSMGh8cQ0DZW3By2PAEyqY11VP9txw1yKbN8L76Wy2csxQ== X-Google-Smtp-Source: AGHT+IHtcb+0uOawyiGRAwIiSK2580UUln02ntOX/Kbm8Ugw+hfRdm7UuSLdr4nRlmOisokR+r2f X-Received: by 2002:a17:907:1b08:b0:a5a:62c9:bdf3 with SMTP id a640c23a62f3a-a5a62c9be77mr1487200466b.1.1716056193262; Sat, 18 May 2024 11:16:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716056193; cv=none; d=google.com; s=arc-20160816; b=mMlSS5hO76TqM1+tHBvQnVqrrdDoWWbZRTOV4JVhKADfks4/ysFmjB9t1232VTtR2O fOs2YJ6uMQYq0VLNnwKbH0Q2F3FN/W6bpJaZgVFukzGEV2r/0pqH8RS1OTR6Nhj+B/1E ugT00LySQDAC931bk3w/Jx2ZstS4YH3T+RYvymkYyfoEZGOqDBem7RTTZmPVg+oMMJNG xZb/NnUPHQwR9nrT+fhdfbhLqXlbp8lO1/YYL6xv7LD9/+xJzBUl340oW9cLwibAKUIH ZJPO8rDFh4pF23hhun5pypNYVtGkyX0CT8u7m9baHnX7ND4aoMq+9OQsCQ8qLjCqEDnq H+bA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=ef674aU8kJ5b0Bwy2pU3YxDb8ygLFSjCMm705McSu7U=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=nBIAvxwiM34lGzaXyuNJ0mnqWtnmbgXvietcrHvZccGwed2HL85NNGhsYgT4oCG56U zPDiwAuJHakd6tfECRMu1zm67nTAxIa8cLyVaafYVk6jsynzFUURu1bXlke26C//LEC5 qmQcNlg5IuzVOBfn6GqD1I2mGA/ugTfqFySpoPUUsL1q77/t9zKhnZwL/JTu6NDNce5o RcV/acvBT2LXXkJtmJShESHFBVrAN/YyOoLnnwf1JRangKMcDVKGVcTiUIZaovi/1GiH /V2rOS7Uem40hwVa8qm8CS8nzL4NDcXJ6Cv6KyQmcUEbw2Vj3AGBrRc0iDxy904iElLf z4NA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=LVsJiL6R; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a1794606fsi1227517666b.190.2024.05.18.11.16.32; Sat, 18 May 2024 11:16:33 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=LVsJiL6R; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6A10B68CC58; Sat, 18 May 2024 21:16:07 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-190.mail.qq.com (out203-205-221-190.mail.qq.com [203.205.221.190]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8C6CF68CCDC for ; Sat, 18 May 2024 21:15:57 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1716056136; bh=5gfjgyBK/r7h+nIEyrlxA+BY+ej5HByzJ4xLkmVgvA8=; h=From:To:Cc:Subject:Date; b=LVsJiL6RTzUYvXHk7PXyRTjFzQ2Wtc49/vwZsJKMEQXBNphhaKewlHA5WLvp7d4ST ArXY9LVjzePZvKQK5SYEtHcoDLWuM2qpXUg8TJM15Trwv/ivgeIIrHt8EE1xOno7tz Iv/R0QpeLg6BH7XrF3WsfGfDcQVzZ3oSbA44tTDA= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb16-1.qq.com (NewEsmtp) with SMTP id 3E33EA47; Sun, 19 May 2024 02:15:35 +0800 X-QQ-mid: xmsmtpt1716056135t9v7iyjxz Message-ID: X-QQ-XMAILINFO: Nfm/+M6ONQ57RaE9uLTjB3orplYALrd3PdDwOybrUeq+lHoiaAnYCEG0/0ll4j h47CRStlAC991bGGcWYB55tFYpwZ/9BsWDZ0cwdlOEpyJQGUl1cZwYy2J0bo79HbnT9vEkVY5Sf1 t0Y1b1BL67eoZeu5/XfguCTS2BN3elBn/2eajoFV7ReYv28waGCeAgDjwFhesisTiG8vtzGCrBdI g7dyO/0U0Jz3+nN/KPtUEWO/KQyUoW8PguQCawU1NEXvRqZR+CdgyCvnUEUFDqALqj5RXxR7QZq3 0bRy4OHguz3adY8DjKLKc94w9COXZpN2wdubdTM1BMZJhgI50pY4lB/RsEV59K1TBmAvF0NQYfSl pvFDKSEVg959kH1C53WIGk5GJAH4sYGXInPbfOYINxJDYWTYcwfGVCySkQUIYSyju2MYU/JHHk/p qoAkxC91Sl7RpoAaQgHO23v4cDlNSBfT6QV8ldnMJUgAY0NsmzhGbfyvkOXbgCdEXr/WVnYzDm9g auUlYtAEC0A9BjbhTSUOpQUSuBlBBoqHknANPTymj7izGmkRonyelWUTaWZfgdmgPbu8ClfB+jCr rIr6sllzznTLNIiBex4v3jwYM1kMzhdNTtRuFMZjQZzmGy+tfAK+99fmBcsnFHwFVlDmyuyMY2/A gemLsZs5AaYKWJ90RqY13vIvjFeB6CYYoBbWnnFl7r1DMtTCOqoEt1rMzjCdv9870dcg86UlfHhB +GySXWlcr6FyItbQEtGV7RrtbHjPMufZvbUOergmJVFrAM5OJ/JkN0zXc2vLEJutCBMWPMJmkJUo SJAyiDtQ99lsoczX2pXQ6gnTemRfMAKQhymwq0ESjAqT5Dc27TGUtLHl5kUiVAD7Mefrwz2DYQZF KL6Uk+7C8w5LYljyBaL83Fa5rw64JvrntWihofNTdt X-QQ-XMRINFO: Nq+8W0+stu50PRdwbJxPCL0= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 19 May 2024 02:15:29 +0800 X-OQ-MSGID: <20240518181533.3124314-1-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 1/5] lavc/vp9dsp: R-V V mc avg X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: JN8wOyCFUAOU From: sunyuechi C908: vp9_avg4_8bpp_c: 1.2 vp9_avg4_8bpp_rvv_i64: 1.0 vp9_avg8_8bpp_c: 3.7 vp9_avg8_8bpp_rvv_i64: 1.5 vp9_avg16_8bpp_c: 14.7 vp9_avg16_8bpp_rvv_i64: 3.5 vp9_avg32_8bpp_c: 57.7 vp9_avg32_8bpp_rvv_i64: 10.0 vp9_avg64_8bpp_c: 229.0 vp9_avg64_8bpp_rvv_i64: 31.7 --- libavcodec/riscv/Makefile | 3 +- libavcodec/riscv/vp9_mc_rvv.S | 58 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 18 +++++++++++ 3 files changed, 78 insertions(+), 1 deletion(-) create mode 100644 libavcodec/riscv/vp9_mc_rvv.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 27b268ae39..4739d83522 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -65,6 +65,7 @@ RVV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvv.o OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9dsp_init.o RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o \ riscv/vp9_mc_rvi.o -RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o +RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o \ + riscv/vp9_mc_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S new file mode 100644 index 0000000000..7811cd9928 --- /dev/null +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -0,0 +1,58 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/riscv/asm.S" + +.macro vsetvlstatic8 len an maxlen mn=m4 +.if \len == 4 + vsetivli zero, \len, e8, mf4, ta, ma +.elseif \len == 8 + vsetivli zero, \len, e8, mf2, ta, ma +.elseif \len == 16 + vsetivli zero, \len, e8, m1, ta, ma +.elseif \len == 32 + li \an, \len + vsetvli zero, \an, e8, m2, ta, ma +.elseif \len == 64 + li \an, \maxlen + vsetvli zero, \an, e8, \mn, ta, ma +.endif +.endm + +.macro copy_avg len +func ff_avg\len\()_rvv, zve32x + csrwi vxrm, 0 + vsetvlstatic8 \len t0 64 +1: + vle8.v v8, (a2) + vle8.v v16, (a0) + vaaddu.vv v8, v8, v16 + addi a4, a4, -1 + vse8.v v8, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + ret +endfunc +.endm + +.irp len, 64, 32, 16, 8, 4 + copy_avg \len +.endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index ab99294d44..6bfe23563a 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -48,6 +48,24 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) } # endif +#if HAVE_RVV + if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I32) && ff_rv_vlen_least(128)) { + +#define init_fpel(idx1, sz) \ + dsp->mc[idx1][FILTER_8TAP_SMOOTH ][1][0][0] = ff_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_8TAP_REGULAR][1][0][0] = ff_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_8TAP_SHARP ][1][0][0] = ff_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_BILINEAR ][1][0][0] = ff_avg##sz##_rvv + + init_fpel(0, 64); + init_fpel(1, 32); + init_fpel(2, 16); + init_fpel(3, 8); + init_fpel(4, 4); + +#undef init_fpel + } +#endif #endif } From patchwork Sat May 18 18:15:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48992 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp3571986pzb; Sat, 18 May 2024 11:16:24 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU47voqH4jHtVQmyvj7joVdN12Jspv8GuKDSGWbQQ2Iu++M9G+L/SSB0iUw8mSmrcGbf3Wf5PA9g/lDMBUMh65eJ98seV71OyjctA== X-Google-Smtp-Source: AGHT+IExi1hqiLmVS5Xv0QZJACgVz1Fx+HYNpFuBeI6AXqiI/4L/lGko7EUruNG5UdvVR0V8S+43 X-Received: by 2002:a2e:a1ca:0:b0:2e5:67a7:dda7 with SMTP id 38308e7fff4ca-2e567a7df67mr159591041fa.3.1716056184010; Sat, 18 May 2024 11:16:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716056183; cv=none; d=google.com; s=arc-20160816; b=wbU2wXjGed0FjLSgiPiLvzjhzyw1izYyDcAh7QUVhbKGslJ6Gbb1e+6LAK2wyLCVzo iqj9YPN7zqfeFF9ywdobmF+52+JqFY7dT+9R7beTHC+lwjBgiQ9hVuK/zcPCCdKKyK2Y F6yo8H7tCMwvDPNaLLwsGF/GJhrIFCa4JA6PVV1IAvqTJQUfSFW8Hc/2Te6RerD2S/i0 F3hF3YdNpig8BrOCJ8IjGlvcUIj+YGfxugAmPq+f3lg2mxqMRAnYPxvMeGMItIgHqzP1 un5+LBquUKYinSE9SWEzE6Rz4e5wQRDp11hzeDk34vgk/JFipMq9VRjOyzp89880j6WP ndlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=42lSXGDppeU09ABPOvAgFly7LbmmP5bZamK29NeEhok=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=mDzpu8le6dAnyoNByyH3NEn/ilFLQhiseQ3reoeIWsa3CbgWJw02WJOpWowWxJpGUM lFZKt2QFZaEGl6ULUdAMndGG52FfP/m6dgjoehkifeepU4rlzOkiEVMtBpenXlOLcTOW +DEZ2jg8eOCYX+W6AwLIxOqp23daoYnipkMZSfNyldSsvIKk1FcrY0yNCdz8QT3g6gyr l8aqomXym7b6ik/3GVirII7CC0z8qngsKYgiBaN3hlpJFE6iFNiom/Ro/u8rsjkjDdtq Mxou4L/h5Vn6HCU4QEENuMA30UraI835zrkLE7/AN47y7RaHacJ+g4G1xzLrmzPs/ilv 5Q7w==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=BVeVT0M5; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-574e1d5f0f2si6611348a12.79.2024.05.18.11.16.23; Sat, 18 May 2024 11:16:23 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=BVeVT0M5; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7339F68C825; Sat, 18 May 2024 21:16:06 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-191.mail.qq.com (out203-205-221-191.mail.qq.com [203.205.221.191]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8066468C825 for ; Sat, 18 May 2024 21:15:57 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1716056137; bh=nBkUAx35ozTGl3HCFwX8RH63j3Q/xjsl/Ah0wKSjs/U=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=BVeVT0M5audb6CR1t8W6hbj8DGteT5F6CDo6zQb+5BBQkZRmpnHsiA6BCS4hYguu7 srV3J0vV11NIVosBUc6Xys1nzSYRPP9BY2VDrDQWPiYuZ1fgrEy6wBt9WpbtE72U2F 3RQ14ZrijR423YA7x+WgSQlgdGNj2/kBRfJwfI0w= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb16-1.qq.com (NewEsmtp) with SMTP id 3E33EA47; Sun, 19 May 2024 02:15:35 +0800 X-QQ-mid: xmsmtpt1716056136t6zvhj1bo Message-ID: X-QQ-XMAILINFO: NafziRg7Bx69bGzgJtECDdvSYulAfUQlPQl2HkfTRVsiHzy3FdP9tX5Cx934mJ 2XK+bavhq4XfSHyn7XHg11PBqXJ4X1lG+KxJogyia/eV9zav8uCkTtJTwAUSHp/s5ySWNNuo+BMU R5G4hGpNDC611k5ndl66Atape023zc/zEyFjiTqZWaeyeiNPAvnIhZ+hUYn87EYdBzfaqucOiaw2 GW2mYicnk92xtoQyWBoAaZs0qTq6IDOggmnYOV/qSnSoung8bC0mOWcWPYTcBJTNmEBCy/K/SZr7 sXn9JQl4dDuNWvoBZcQ/Sj7iz+7vANEUeFAOFt8Jnibk9VIrbGlMR0gh4Laj+03ooeeGRehQMJPA G8XD/lRkkrLFIKm1LN8v/idyaKka5Nmjimo2UEW+saoPhvXLdZzcDIdNkcX8kzoQoN+ibAOCU/Im 5krxku0yWE69jTpjbaCZ1AV2C+urjCg+l1KTPYNOSfWh4tjKf3y0OnWCmDYF6Xf/mDFh8drGTa6g bEZLbnXoKnysZbRZMO6h5xhvCblYLZmPxvIB4drtvR7owMvcugRr7XrSVv6iSuiEsLV/A8A/VhtR x4so/5sGRfCjBwyl+PlgTGdZQSICRfkRZ5rNvHbHZth61Z2+OoLoM+oNz5+Bo1Lv9hpA4XTS2jPu WIusgxlw4o2InhhfiuypV9vOsLteWFQpB7m+blRMiB4BShBAUyevzBdDIi3xQ0KcqBc0LwhgcWMr 7KWoex5C457tyL90jHBeHh9mNRS/Vp7DK2cw1Ji14sZCOwSGViWZ/MuFBtuvWqskXOP9kYUCrqGM ZXkiE/bEZvlRyDL6wMWLb2yi/jH1ud85AhzU0XqGkmpo8cTo8NLn2pFF4uigre73D5AJzKJ9vZxz kJbNjSxf11QF8SyvOX43QI0yReLEosjfwV9/GVPn+FGcHQTKoAIwIbe+2gwPWs0d88HJVctDSYEh JxPEGRXGk= X-QQ-XMRINFO: MPJ6Tf5t3I/ycC2BItcBVIA= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 19 May 2024 02:15:30 +0800 X-OQ-MSGID: <20240518181533.3124314-2-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240518181533.3124314-1-uk7b@foxmail.com> References: <20240518181533.3124314-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 2/5] lavc/vp9dsp: R-V V mc bilin h v X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: U6NbBdigzMe4 From: sunyuechi C908: vp9_avg_bilin_4h_8bpp_c: 5.2 vp9_avg_bilin_4h_8bpp_rvv_i64: 2.2 vp9_avg_bilin_4v_8bpp_c: 5.5 vp9_avg_bilin_4v_8bpp_rvv_i64: 2.2 vp9_avg_bilin_8h_8bpp_c: 20.0 vp9_avg_bilin_8h_8bpp_rvv_i64: 4.5 vp9_avg_bilin_8v_8bpp_c: 21.0 vp9_avg_bilin_8v_8bpp_rvv_i64: 4.2 vp9_avg_bilin_16h_8bpp_c: 78.2 vp9_avg_bilin_16h_8bpp_rvv_i64: 9.0 vp9_avg_bilin_16v_8bpp_c: 82.0 vp9_avg_bilin_16v_8bpp_rvv_i64: 9.0 vp9_avg_bilin_32h_8bpp_c: 325.5 vp9_avg_bilin_32h_8bpp_rvv_i64: 26.2 vp9_avg_bilin_32v_8bpp_c: 326.2 vp9_avg_bilin_32v_8bpp_rvv_i64: 26.2 vp9_avg_bilin_64h_8bpp_c: 1265.7 vp9_avg_bilin_64h_8bpp_rvv_i64: 91.5 vp9_avg_bilin_64v_8bpp_c: 1317.0 vp9_avg_bilin_64v_8bpp_rvv_i64: 91.2 vp9_put_bilin_4h_8bpp_c: 4.5 vp9_put_bilin_4h_8bpp_rvv_i64: 1.7 vp9_put_bilin_4v_8bpp_c: 4.7 vp9_put_bilin_4v_8bpp_rvv_i64: 1.7 vp9_put_bilin_8h_8bpp_c: 17.0 vp9_put_bilin_8h_8bpp_rvv_i64: 3.5 vp9_put_bilin_8v_8bpp_c: 18.0 vp9_put_bilin_8v_8bpp_rvv_i64: 3.5 vp9_put_bilin_16h_8bpp_c: 65.2 vp9_put_bilin_16h_8bpp_rvv_i64: 7.5 vp9_put_bilin_16v_8bpp_c: 85.7 vp9_put_bilin_16v_8bpp_rvv_i64: 7.5 vp9_put_bilin_32h_8bpp_c: 257.5 vp9_put_bilin_32h_8bpp_rvv_i64: 23.5 vp9_put_bilin_32v_8bpp_c: 274.5 vp9_put_bilin_32v_8bpp_rvv_i64: 23.5 vp9_put_bilin_64h_8bpp_c: 1040.5 vp9_put_bilin_64h_8bpp_rvv_i64: 82.5 vp9_put_bilin_64v_8bpp_c: 1108.7 vp9_put_bilin_64v_8bpp_rvv_i64: 82.2 --- libavcodec/riscv/vp9_mc_rvv.S | 43 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 21 +++++++++++++++++ 2 files changed, 64 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index 7811cd9928..b0052c0ece 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -53,6 +53,49 @@ func ff_avg\len\()_rvv, zve32x endfunc .endm +.macro bilin_load dst len op type mn +.ifc \type,v + add t5, a2, a3 +.else + addi t5, a2, 1 +.endif + vle8.v v8, (a2) + vle8.v v0, (t5) + vwmulu.vx v16, v0, \mn + vwmaccsu.vx v16, t1, v8 + vwadd.wx v16, v16, t4 + vnsra.wi v16, v16, 4 + vadd.vv \dst, v16, v8 +.ifc \op,avg + vle8.v v16, (a0) + vaaddu.vv \dst, \dst, v16 +.endif +.endm + +.macro bilin_h_v len op type mn +func ff_\op\()_bilin_\len\()\type\()_rvv, zve32x +.ifc \op,avg + csrwi vxrm, 0 +.endif + vsetvlstatic8 \len t0 64 + li t4, 8 + neg t1, \mn +1: + addi a4, a4, -1 + bilin_load v0, \len, \op, \type, \mn + vse8.v v0, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + .irp len, 64, 32, 16, 8, 4 copy_avg \len + .irp op, put, avg + bilin_h_v \len \op h a5 + bilin_h_v \len \op v a6 + .endr .endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 6bfe23563a..565b68959f 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -63,6 +63,27 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) init_fpel(3, 8); init_fpel(4, 4); + dsp->mc[0][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_64v_rvv; + dsp->mc[0][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_64h_rvv; + dsp->mc[0][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_64v_rvv; + dsp->mc[0][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_64h_rvv; + dsp->mc[1][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_32v_rvv; + dsp->mc[1][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_32h_rvv; + dsp->mc[1][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_32v_rvv; + dsp->mc[1][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_32h_rvv; + dsp->mc[2][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_16v_rvv; + dsp->mc[2][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_16h_rvv; + dsp->mc[2][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_16v_rvv; + dsp->mc[2][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_16h_rvv; + dsp->mc[3][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_8v_rvv; + dsp->mc[3][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_8h_rvv; + dsp->mc[3][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_8v_rvv; + dsp->mc[3][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_8h_rvv; + dsp->mc[4][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_4v_rvv; + dsp->mc[4][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_4h_rvv; + dsp->mc[4][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_4v_rvv; + dsp->mc[4][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_4h_rvv; + #undef init_fpel } #endif From patchwork Sat May 18 18:15:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48990 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp3571834pzb; Sat, 18 May 2024 11:16:02 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWkYrA7MVI8X2Vu4yByNG0g7xD8k0shNawGJ2TyH+FPo2yYVyneW7UwiF3bVwNo1odNerfdCbCRclRe+/pfwPz41rbnYAWhqDVWaA== X-Google-Smtp-Source: AGHT+IGArqV8FUgcdvoNp88gsA0THskpwKD7wTtxrznE3AjL1z7iGOWNwSUHY/c3EDOUVPmlQElr X-Received: by 2002:a17:906:7110:b0:a59:a18e:3fd9 with SMTP id a640c23a62f3a-a5a2d54c365mr1514343366b.5.1716056161661; Sat, 18 May 2024 11:16:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716056161; cv=none; d=google.com; s=arc-20160816; b=SGiA9u0NchsmiRHZdeWRq677XGvJyOGGYc4r9/ZxlOzMxrKeOSP3LiHYZJOfBKNKtL RNpdSkYDGquu8DqIfuV/8l10PlViAXt4fwDeH4wxivai1zXmTd/HS/JHUTWTKGsdNI3n DqEgo2ZYYuU8027GDp8Q4rNkCb16+TQjiSMA5PqdoaRj1qybTq71t9STEkqtz2mRv7CF hKKvZ8Y1IJBIr4RXp9UC301X2GvGJGDb74Cfdw4hkFWdBEiiWCR/XNOICFY/9IFleiNn oy9cumwhitCLX/U9br0vVVbt/KwHOiES6NATnKWlgsxAn5rn6NFkbTYqpObUd9nV7oT4 Jz2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=NKgNeRPfMU2+L9dSKAUxSQj/tOKoEamMr/6banWLvbA=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=lmA5SmODzTO8uXvDYjPzVuPh6Oa1NstO7p2ieNICHAC63sHnFrDPg90FDUQNA7pITy XC+GZpqRrgCvDhWhGd9wEY2RbTMVjpbpMVaVQKHtQB8wSLpUfmSvWpv8REq26HTgsqL/ ypJVL6Pm1wAxV/Z3QTuSdp3X45KR2gFp9AAO1H0yZF24Jr3m5PjdVTEDwV9j8oQe437E uiRGmZDHu1rDAXZz4GtEzeE3aqHzYYD4snmcJ8ry/kI9CDGsQz5fm8Pz6+u6lj3vIO7c QJjk39/Y4C8iq6daClTxJYN9KlB4onU4bAC1mwfCBM55Nq0x04ksv59yEtaNbmkFuaL7 oEjw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=EcA9JdKm; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17ba3c16si1123386666b.550.2024.05.18.11.16.00; Sat, 18 May 2024 11:16:01 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=EcA9JdKm; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 67DAC68CC49; Sat, 18 May 2024 21:15:55 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-57-210.mail.qq.com (out162-62-57-210.mail.qq.com [162.62.57.210]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id F0B0368C978 for ; Sat, 18 May 2024 21:15:47 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1716056138; bh=D+bgcojUAZXAQiy1Zir87yMrROZYLyhiV45vP0FrHLk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=EcA9JdKmSU1hbdkuXtu8xaju8VEkGNtjfh/QrUHLqaoYgkoXE2qf5sH6a5LeWz1mD 8nNxLqDN98bDpkw333+BHLYwu8tep2sF/mZLuPaInBLxg4WGkrpEKjAsB3HIQxy1be apoNjT2Ub0QOzkEXTvFUA+SSnjoS1Ng03pZ3jRmY= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb16-1.qq.com (NewEsmtp) with SMTP id 3E33EA47; Sun, 19 May 2024 02:15:35 +0800 X-QQ-mid: xmsmtpt1716056137trxuyj0me Message-ID: X-QQ-XMAILINFO: OEgU9iHXvJ5F/NiDn1gFR9f6+Ii8hva55cGlP3UImOOrdm6t5ZfKALwV8WoOef QA1dE1Ogid88SWl7jigZ5OpFXU7AUPrXfd/7B8Mz8lvr+srP2mGWEo6SIdlhw/PPePAPjic8dFyS 0m5hPRst7TpN54xrlNYhcTUQ6fgl577e8r/4jPEKCP+bbPvuO2dELcj7yKN2oIj+aBX5er74CXf1 ZNfcGzq1Un9n2DUyxAsujAmNq3fnyFgcsPZf1smJd8BEgSBmAhgIuzIPAaq7ihv1W6jSYYBNlUFE XtcIFzQ7JyDZt3K/y4IsHqfHnG9X0xAjRV2Wa/kTiv779uJVueO0FopnIiFJPg/nRlfevjAaiK09 hnKSemMszsyWuKb4CzVOuIbdNsjVFTjEXFI+P7wgpOf87msUeE3bk0SPCwXwJXubGTWVGxXNlMIG LNnnwo53CTpcW7DlO+2jcxP7nQkLpyAYc9MSLe0Q8S6VoZrSdaPnbF78ev5JzbfEtFAuG5uIXzAA EhBtyVa0N9V33AdIIyu8fyj3hshdMuYuDwd4FCEci2B8+RwUwNaPIV+XAY8sTseZycRJukXpV+zo 9r5Jgp2gc1gnopfvmCv1Gsodll9F5DeCIT6q0G5TYEVqs8Ft1brSJowXysAJ6OS79BGDRohpf6Nj SYGaPaouGvtlrIa1yCfq3C1+1GpMAVX+O7MZh7QSTjgkVmU6cSkjJKjzdfeY8ExKcc9qC9wjBDbl jpTp15cl/a8eLp9BqBex7dUlqTKAdOedH2EFjqFdhaHTf1++4ff8XT72iIFGI0WoMyKaOwBVI+T1 YFrNZ1loLQQBZvWZob3AKlzsrgQsYpAbGr1hKPaIJyk82wlqSakxl1oDFMY4dg8VI4RvY5iRlPgf uw1DsBuVut57/IinxTJ4VbSm98W/MQiicsO5QP/wCN X-QQ-XMRINFO: NyFYKkN4Ny6FSmKK/uo/jdU= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 19 May 2024 02:15:31 +0800 X-OQ-MSGID: <20240518181533.3124314-3-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240518181533.3124314-1-uk7b@foxmail.com> References: <20240518181533.3124314-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 3/5] lavc/vp9dsp: R-V V mc tap h v X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: HSMMId2YmKAb From: sunyuechi C908 X60 vp9_avg_8tap_smooth_4h_8bpp_c : 13.0 11.2 vp9_avg_8tap_smooth_4h_8bpp_rvv_i32 : 5.0 4.2 vp9_avg_8tap_smooth_4v_8bpp_c : 13.7 12.5 vp9_avg_8tap_smooth_4v_8bpp_rvv_i32 : 5.0 4.2 vp9_avg_8tap_smooth_8h_8bpp_c : 49.5 42.2 vp9_avg_8tap_smooth_8h_8bpp_rvv_i32 : 9.2 8.5 vp9_avg_8tap_smooth_8v_8bpp_c : 66.5 45.0 vp9_avg_8tap_smooth_8v_8bpp_rvv_i32 : 9.5 8.5 vp9_avg_8tap_smooth_16h_8bpp_c : 192.7 166.5 vp9_avg_8tap_smooth_16h_8bpp_rvv_i32 : 21.2 18.7 vp9_avg_8tap_smooth_16v_8bpp_c : 192.2 175.7 vp9_avg_8tap_smooth_16v_8bpp_rvv_i32 : 21.5 19.0 vp9_avg_8tap_smooth_32h_8bpp_c : 780.2 663.7 vp9_avg_8tap_smooth_32h_8bpp_rvv_i32 : 83.5 60.0 vp9_avg_8tap_smooth_32v_8bpp_c : 770.5 689.2 vp9_avg_8tap_smooth_32v_8bpp_rvv_i32 : 67.2 60.0 vp9_avg_8tap_smooth_64h_8bpp_c : 3115.5 2647.2 vp9_avg_8tap_smooth_64h_8bpp_rvv_i32 : 283.5 119.2 vp9_avg_8tap_smooth_64v_8bpp_c : 3082.2 2729.0 vp9_avg_8tap_smooth_64v_8bpp_rvv_i32 : 305.2 119.0 vp9_put_8tap_smooth_4h_8bpp_c : 11.2 9.7 vp9_put_8tap_smooth_4h_8bpp_rvv_i32 : 4.2 4.0 vp9_put_8tap_smooth_4v_8bpp_c : 11.7 10.7 vp9_put_8tap_smooth_4v_8bpp_rvv_i32 : 4.2 4.0 vp9_put_8tap_smooth_8h_8bpp_c : 42.0 37.5 vp9_put_8tap_smooth_8h_8bpp_rvv_i32 : 8.5 7.7 vp9_put_8tap_smooth_8v_8bpp_c : 44.2 38.7 vp9_put_8tap_smooth_8v_8bpp_rvv_i32 : 8.5 7.7 vp9_put_8tap_smooth_16h_8bpp_c : 165.7 147.2 vp9_put_8tap_smooth_16h_8bpp_rvv_i32 : 19.5 17.5 vp9_put_8tap_smooth_16v_8bpp_c : 169.0 149.7 vp9_put_8tap_smooth_16v_8bpp_rvv_i32 : 19.7 17.5 vp9_put_8tap_smooth_32h_8bpp_c : 659.7 586.7 vp9_put_8tap_smooth_32h_8bpp_rvv_i32 : 64.2 57.2 vp9_put_8tap_smooth_32v_8bpp_c : 680.5 591.2 vp9_put_8tap_smooth_32v_8bpp_rvv_i32 : 64.2 57.2 vp9_put_8tap_smooth_64h_8bpp_c : 2681.5 2339.0 vp9_put_8tap_smooth_64h_8bpp_rvv_i32 : 255.5 114.2 vp9_put_8tap_smooth_64v_8bpp_c : 2709.7 2348.7 vp9_put_8tap_smooth_64v_8bpp_rvv_i32 : 255.5 114.0 --- libavcodec/riscv/vp9_mc_rvv.S | 243 +++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 72 ++++++---- libavcodec/riscv/vp9dsp_init.c | 40 +++++- 3 files changed, 329 insertions(+), 26 deletions(-) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index b0052c0ece..9d7caeb005 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -36,6 +36,18 @@ .endif .endm +.macro vsetvlstatic16 len +.ifc \len,4 + vsetvli zero, zero, e16, mf2, ta, ma +.elseif \len == 8 + vsetvli zero, zero, e16, m1, ta, ma +.elseif \len == 16 + vsetvli zero, zero, e16, m2, ta, ma +.else + vsetvli zero, zero, e16, m4, ta, ma +.endif +.endm + .macro copy_avg len func ff_avg\len\()_rvv, zve32x csrwi vxrm, 0 @@ -92,10 +104,241 @@ func ff_\op\()_bilin_\len\()\type\()_rvv, zve32x endfunc .endm +const subpel_filters_regular + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte 0, 1, -5, 126, 8, -3, 1, 0 + .byte -1, 3, -10, 122, 18, -6, 2, 0 + .byte -1, 4, -13, 118, 27, -9, 3, -1 + .byte -1, 4, -16, 112, 37, -11, 4, -1 + .byte -1, 5, -18, 105, 48, -14, 4, -1 + .byte -1, 5, -19, 97, 58, -16, 5, -1 + .byte -1, 6, -19, 88, 68, -18, 5, -1 + .byte -1, 6, -19, 78, 78, -19, 6, -1 + .byte -1, 5, -18, 68, 88, -19, 6, -1 + .byte -1, 5, -16, 58, 97, -19, 5, -1 + .byte -1, 4, -14, 48, 105, -18, 5, -1 + .byte -1, 4, -11, 37, 112, -16, 4, -1 + .byte -1, 3, -9, 27, 118, -13, 4, -1 + .byte 0, 2, -6, 18, 122, -10, 3, -1 + .byte 0, 1, -3, 8, 126, -5, 1, 0 +subpel_filters_sharp: + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte -1, 3, -7, 127, 8, -3, 1, 0 + .byte -2, 5, -13, 125, 17, -6, 3, -1 + .byte -3, 7, -17, 121, 27, -10, 5, -2 + .byte -4, 9, -20, 115, 37, -13, 6, -2 + .byte -4, 10, -23, 108, 48, -16, 8, -3 + .byte -4, 10, -24, 100, 59, -19, 9, -3 + .byte -4, 11, -24, 90, 70, -21, 10, -4 + .byte -4, 11, -23, 80, 80, -23, 11, -4 + .byte -4, 10, -21, 70, 90, -24, 11, -4 + .byte -3, 9, -19, 59, 100, -24, 10, -4 + .byte -3, 8, -16, 48, 108, -23, 10, -4 + .byte -2, 6, -13, 37, 115, -20, 9, -4 + .byte -2, 5, -10, 27, 121, -17, 7, -3 + .byte -1, 3, -6, 17, 125, -13, 5, -2 + .byte 0, 1, -3, 8, 127, -7, 3, -1 +subpel_filters_smooth: + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte -3, -1, 32, 64, 38, 1, -3, 0 + .byte -2, -2, 29, 63, 41, 2, -3, 0 + .byte -2, -2, 26, 63, 43, 4, -4, 0 + .byte -2, -3, 24, 62, 46, 5, -4, 0 + .byte -2, -3, 21, 60, 49, 7, -4, 0 + .byte -1, -4, 18, 59, 51, 9, -4, 0 + .byte -1, -4, 16, 57, 53, 12, -4, -1 + .byte -1, -4, 14, 55, 55, 14, -4, -1 + .byte -1, -4, 12, 53, 57, 16, -4, -1 + .byte 0, -4, 9, 51, 59, 18, -4, -1 + .byte 0, -4, 7, 49, 60, 21, -3, -2 + .byte 0, -4, 5, 46, 62, 24, -3, -2 + .byte 0, -4, 4, 43, 63, 26, -2, -2 + .byte 0, -3, 2, 41, 63, 29, -2, -2 + .byte 0, -3, 1, 38, 64, 32, -1, -3 +endconst + +.macro epel_filter name type regtype + lla \regtype\()2, subpel_filters_\name + li \regtype\()1, 8 +.ifc \type,v + mul \regtype\()0, a6, \regtype\()1 +.else + mul \regtype\()0, a5, \regtype\()1 +.endif + add \regtype\()0, \regtype\()0, \regtype\()2 + .irp n,1,2,3,4,5,6 + lb \regtype\n, \n(\regtype\()0) + .endr +.ifc \regtype,t + lb a7, 7(\regtype\()0) +.else + lb s7, 7(\regtype\()0) +.endif + lb \regtype\()0, 0(\regtype\()0) +.endm + +.macro epel_load dst len op name type from_mem regtype + li a5, 64 +.ifc \from_mem, 1 + vle8.v v22, (a2) +.ifc \type,v + sub a2, a2, a3 + vle8.v v20, (a2) + sh1add a2, a3, a2 + vle8.v v24, (a2) + add a2, a2, a3 + vle8.v v26, (a2) + add a2, a2, a3 + vle8.v v28, (a2) + add a2, a2, a3 + vle8.v v30, (a2) +.else + addi a2, a2, -1 + vle8.v v20, (a2) + addi a2, a2, 2 + vle8.v v24, (a2) + addi a2, a2, 1 + vle8.v v26, (a2) + addi a2, a2, 1 + vle8.v v28, (a2) + addi a2, a2, 1 + vle8.v v30, (a2) +.endif + +.ifc \name,smooth + vwmulu.vx v16, v24, \regtype\()4 + vwmaccu.vx v16, \regtype\()2, v20 + vwmaccu.vx v16, \regtype\()5, v26 + vwmaccsu.vx v16, \regtype\()6, v28 +.else + vwmulu.vx v16, v28, \regtype\()6 + vwmaccsu.vx v16, \regtype\()2, v20 + vwmaccsu.vx v16, \regtype\()5, v26 +.endif + +.ifc \regtype,t + vwmaccsu.vx v16, a7, v30 +.else + vwmaccsu.vx v16, s7, v30 +.endif + +.ifc \type,v + .rept 6 + sub a2, a2, a3 + .endr + vle8.v v28, (a2) + sub a2, a2, a3 + vle8.v v26, (a2) + sh1add a2, a3, a2 + add a2, a2, a3 +.else + addi a2, a2, -6 + vle8.v v28, (a2) + addi a2, a2, -1 + vle8.v v26, (a2) + addi a2, a2, 3 +.endif + +.ifc \name,smooth + vwmaccsu.vx v16, \regtype\()1, v28 +.else + vwmaccu.vx v16, \regtype\()1, v28 + vwmulu.vx v28, v24, \regtype\()4 +.endif + vwmaccsu.vx v16, \regtype\()0, v26 + vwmulu.vx v20, v22, \regtype\()3 +.else +.ifc \name,smooth + vwmulu.vx v16, v8, \regtype\()4 + vwmaccu.vx v16, \regtype\()2, v4 + vwmaccu.vx v16, \regtype\()5, v10 + vwmaccsu.vx v16, \regtype\()6, v12 + vwmaccsu.vx v16, \regtype\()1, v2 +.else + vwmulu.vx v16, v2, \regtype\()1 + vwmaccu.vx v16, \regtype\()6, v12 + vwmaccsu.vx v16, \regtype\()5, v10 + vwmaccsu.vx v16, \regtype\()2, v4 + vwmulu.vx v28, v8, \regtype\()4 +.endif + vwmaccsu.vx v16, \regtype\()0, v0 + vwmulu.vx v20, v6, \regtype\()3 + +.ifc \regtype,t + vwmaccsu.vx v16, a7, v14 +.else + vwmaccsu.vx v16, s7, v14 +.endif + +.endif + vwadd.wx v16, v16, a5 + vsetvlstatic16 \len + +.ifc \name,smooth + vwadd.vv v24, v16, v20 +.else + vwadd.vv v24, v16, v28 + vwadd.wv v24, v24, v20 +.endif + vnsra.wi v24, v24, 7 + vmax.vx v24, v24, zero + vsetvlstatic8 \len, zero, 32, m2 + + vnclipu.wi \dst, v24, 0 +.ifc \op,avg + vle8.v v24, (a0) + vaaddu.vv \dst, \dst, v24 +.endif + +.endm + +.macro epel_load_inc dst len op name type from_mem regtype + epel_load \dst \len \op \name \type \from_mem \regtype + add a2, a2, a3 +.endm + +.macro epel len op name type vlen +func ff_\op\()_8tap_\name\()_\len\()\type\()_rvv\vlen\(), zve32x + epel_filter \name \type t +.if \vlen < 256 + vsetvlstatic8 \len a5 32 m2 +.else + vsetvlstatic8 \len a5 64 m2 +.endif +.ifc \op,avg + csrwi vxrm, 0 +.endif + +1: + addi a4, a4, -1 + epel_load v30 \len \op \name \type 1 t + vse8.v v30, (a0) +.if \len == 64 && \vlen < 256 + addi a0, a0, 32 + addi a2, a2, 32 + epel_load v30 \len \op \name \type 1 t + vse8.v v30, (a0) + addi a0, a0, -32 + addi a2, a2, -32 +.endif + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + .irp len, 64, 32, 16, 8, 4 copy_avg \len .irp op, put, avg bilin_h_v \len \op h a5 bilin_h_v \len \op v a6 + .irp name, regular, sharp, smooth + .irp type, h, v + epel \len \op \name \type 128 + epel \len \op \name \type 256 + .endr + .endr .endr .endr diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index 79330b4968..1638daaae3 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -81,33 +81,39 @@ void ff_tm_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, void ff_tm_4x4_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); -#define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) \ -void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ +#define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx, min_vlen) \ +void ff_put_8tap_##type##_##SIZE##h_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_put_8tap_##type##_##SIZE##v_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_put_8tap_##type##_##SIZE##v_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_put_8tap_##type##_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_put_8tap_##type##_##SIZE##hv_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_8tap_##type##_##SIZE##h_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_8tap_##type##_##SIZE##v_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_8tap_##type##_##SIZE##v_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_8tap_##type##_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_8tap_##type##_##SIZE##hv_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); @@ -146,23 +152,41 @@ void ff_avg##SIZE##_rvv(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); -VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR); - -VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP); - -VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH); +VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR, 128); + +VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP, 128); + +VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH, 128); + +VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR, 256); + +VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP, 256); + +VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH, 256); VP9_BILINEAR_RISCV_RVV_FUNC(64); VP9_BILINEAR_RISCV_RVV_FUNC(32); diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 565b68959f..931b92928a 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -49,7 +49,8 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) # endif #if HAVE_RVV - if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I32) && ff_rv_vlen_least(128)) { + if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I32)) { + if (ff_rv_vlen_least(128)) { #define init_fpel(idx1, sz) \ dsp->mc[idx1][FILTER_8TAP_SMOOTH ][1][0][0] = ff_avg##sz##_rvv; \ @@ -63,6 +64,26 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) init_fpel(3, 8); init_fpel(4, 4); +#undef init_fpel + +#define init_subpel1(idx1, idx2, idxh, idxv, sz, dir, type, vlen) \ + dsp->mc[idx1][FILTER_8TAP_SMOOTH ][idx2][idxh][idxv] = \ + ff_##type##_8tap_smooth_##sz##dir##_rvv##vlen; \ + dsp->mc[idx1][FILTER_8TAP_REGULAR][idx2][idxh][idxv] = \ + ff_##type##_8tap_regular_##sz##dir##_rvv##vlen; \ + dsp->mc[idx1][FILTER_8TAP_SHARP ][idx2][idxh][idxv] = \ + ff_##type##_8tap_sharp_##sz##dir##_rvv##vlen; + +#define init_subpel2(idx, idxh, idxv, dir, type, vlen) \ + init_subpel1(0, idx, idxh, idxv, 64, dir, type, vlen); \ + init_subpel1(1, idx, idxh, idxv, 32, dir, type, vlen); \ + init_subpel1(2, idx, idxh, idxv, 16, dir, type, vlen); \ + init_subpel1(3, idx, idxh, idxv, 8, dir, type, vlen); \ + init_subpel1(4, idx, idxh, idxv, 4, dir, type, vlen) + + init_subpel2(0, 1, 0, h, put, 128); + init_subpel2(1, 1, 0, h, avg, 128); + dsp->mc[0][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_64v_rvv; dsp->mc[0][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_64h_rvv; dsp->mc[0][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_64v_rvv; @@ -84,8 +105,23 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) dsp->mc[4][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_4v_rvv; dsp->mc[4][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_4h_rvv; -#undef init_fpel + if (flags & AV_CPU_FLAG_RVB_ADDR) { + init_subpel2(0, 0, 1, v, put, 128); + init_subpel2(1, 0, 1, v, avg, 128); } + + } + if (ff_rv_vlen_least(256)) { + init_subpel2(0, 1, 0, h, put, 256); + init_subpel2(1, 1, 0, h, avg, 256); + + if (flags & AV_CPU_FLAG_RVB_ADDR) { + init_subpel2(0, 0, 1, v, put, 256); + init_subpel2(1, 0, 1, v, avg, 256); + } + } + } + #endif #endif } From patchwork Sat May 18 18:15:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48994 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp3572107pzb; Sat, 18 May 2024 11:16:43 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUTS/aAjQBkc7z/Y6gO8W4/4dKTxiKrr7SmQG3ZXeQQmCYWMi1Zwrk0+aCh8QaZbCQRPt0KKHqUc4q6Eo9u5gfnvJmzrqH8dyYLRg== X-Google-Smtp-Source: AGHT+IEYswDj5JMtQJE2SAPc3ymM0PiwWHT9j8c4ct5pj+DRKrIr0Oo1EluqJbl1fRUFvrNNoTtr X-Received: by 2002:a50:c30e:0:b0:575:a7e:4f82 with SMTP id 4fb4d7f45d1cf-5752b3f8acbmr2104627a12.4.1716056203063; Sat, 18 May 2024 11:16:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716056203; cv=none; d=google.com; s=arc-20160816; b=JRl9CF+FA3inEhJk6enF3ZeB30wHwbQo09k30FeYN0u2wSDke3P/OmjBL9xvGrSpW8 LFK5IiBQ2eo+yrAlXGkdFfWAiBk4fwsVaQdxNyWVjK5bbbQq8Fptsyyn34yE0tjpjBOT wBWr0ZlJuTRquz/Rm5VdgmWhvKlgdu3U+NsH8IlL0SrqRIY16QLDsXVmoogN2CM30h+Y d3+I7ETph71WHSXJ5EYR1iVWf/6dxiHrWTn3ls74fM4KJ9PadPwwUs6KIy/8vj28WOqu aEujtLQIZZOHSdHDO7BViquF2B8/tgbBf3hMUXMT0VvkBQMIhgpFbqmz0WOCYEBjSZep +W6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=fzqhnKNDn8FKlSneBTQNDRhqVTGZDrSwowa7yFRcspI=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=0EGzT1+E0ErXpaJNStG/mr4B4pMa4tXOHWALOlPMcSZBh0BaxXCqIGe7xe4Yjhsxzc WFWBKZSwvc/X+rSG9lO3p7bdqqBb4gQ97qkAmJx/S+BF5dxx0lNydcGF1XgBKR6tSZCo CBh3i6liqGBRNMlyv5XJ3Qmwc4YuiDee/z1EkdtuPTH+9CfcaUxfznGiZ/XX70CVNEky Eh5hnm7ib6ILs8gGCCqa5j2BkMJT3gSZbc5QMrbGOtTfiOJv4ppe1veQfUhv44e9qwym Pb7BafhhXYNGY9Xlqz64FOhGihMio91XLeTTqGVA7pu4qUbrn3K3HdI8aBRrTkPiskr/ 4Gfw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=ceseSfgP; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-574ebffeffcsi5302085a12.473.2024.05.18.11.16.42; Sat, 18 May 2024 11:16:43 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=ceseSfgP; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 57C5268CF4C; Sat, 18 May 2024 21:16:08 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-155.mail.qq.com (out203-205-221-155.mail.qq.com [203.205.221.155]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A5C3D68CB87 for ; Sat, 18 May 2024 21:15:58 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1716056139; bh=cwhGLAUalmA6WuZ2cW6n1YAm/Y0C5pQxkaQij9SVG7A=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=ceseSfgPvSzVjicKzZzkxzIKIBRokpp6EbW+6zp1ChdR9XIoEETmZWraPNAyg/PPK ffTiAz5TQ2C/aUeTncPmCAz0Tv9JL3PV/NlqMJbSezQ3+0GMYut1LbD5yKzW6AAkqi B4AgpQsGXMmRG6RN4diPlfx6dT8OVqHNd8u//XbU= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb16-1.qq.com (NewEsmtp) with SMTP id 3E33EA47; Sun, 19 May 2024 02:15:35 +0800 X-QQ-mid: xmsmtpt1716056138tln1bapz8 Message-ID: X-QQ-XMAILINFO: NwU6Bou9okj/cpNcK/xGAZl827/t7SB0YjCkeXvC7SNpF+y9JaWxw11bI0OqoM k13F8r60DswJ6U+L4z/uJMpSfZwlVb4KHqdVpofX9btdKcs2xzAni/NBrCpHwcQEr9TpM9mAeNQi VVMEMEWqXZ1cByfokAjn30EmcZu2jG8E7aQL67OA14+hSxiwrBI/MZ3k43QwSK6zPs3MzPe9kU+x vBNCgjlkenFlGEnGPHHoJ9A8ZjAGuXdJn05MuvjWly8AbMaJCJNErCK4woMWPdCorD/1dpeuUJYM 2XPjVl/I7BuTMaSJKGvFx9Eyv/yUqcUSQQCaLT5Vrgug8xDisydeAULv1p++eLCtTy9S2MplOzgO I0itpxifb54xD06qQQzBdRHNbq7PncyqoKZhx6reR/+vn0OCyA+xWpS63CExUEZEf83InDePfoLf +tVux4UPwkgVAqPfTgNtTIevc6NkysF62XG+VXNYoJgqeb+7eQoTOVbH0GW1IUJRAV6y03bLUvxO Xssmt7d+ndxYyDZ1U45rl4y0G2Zdy0KjY+HF4jCmSG6SPJm9D5/pc6uf90hTkDIKNVcUA0bKLrvj S20Z4Clw5vwwSfQ0G55d5iQjABwlpaT5O8RbR8f45XNW4R41mhPJXWPbe3wfOoedd8I4rxpP5g/J +ZxMcd0CV5WDE840Lb7OinKxq6XMvACG01n+rbKFVGdI+F1yPDjlABNdwS9eVqJbgNYbwUVIqV/M UdPtOD92jYW37q9mc530eJz1WfcT79bL1PCUSJUPQhN3YJ8YXwfO6paD+IEaJlrrVJivpHsmiZUC BcG1CLLKukjIfzMTWOEY3CnPzRBXlZ85vLJ+EnhgX7Zi+Vl9rZlHgfkAsr7l4tTtLDY0wigjtNrT 0olJC9cRoijOTYXi5XoeNWboK0nL8xl9w9k13f6l0xl/fgtId3/SpTpI1Tr1QiTh6c14JuJ75Y/u 6F50bbDCg= X-QQ-XMRINFO: NyFYKkN4Ny6FSmKK/uo/jdU= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 19 May 2024 02:15:32 +0800 X-OQ-MSGID: <20240518181533.3124314-4-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240518181533.3124314-1-uk7b@foxmail.com> References: <20240518181533.3124314-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 4/5] lavc/vp9dsp: R-V V mc bilin hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 5xMT1S8f2lvL From: sunyuechi C908: vp9_avg_bilin_4hv_8bpp_c: 11.0 vp9_avg_bilin_4hv_8bpp_rvv_i64: 3.7 vp9_avg_bilin_8hv_8bpp_c: 38.7 vp9_avg_bilin_8hv_8bpp_rvv_i64: 7.2 vp9_avg_bilin_16hv_8bpp_c: 147.0 vp9_avg_bilin_16hv_8bpp_rvv_i64: 14.2 vp9_avg_bilin_32hv_8bpp_c: 574.5 vp9_avg_bilin_32hv_8bpp_rvv_i64: 42.7 vp9_avg_bilin_64hv_8bpp_c: 2311.5 vp9_avg_bilin_64hv_8bpp_rvv_i64: 201.7 vp9_put_bilin_4hv_8bpp_c: 10.0 vp9_put_bilin_4hv_8bpp_rvv_i64: 3.2 vp9_put_bilin_8hv_8bpp_c: 35.2 vp9_put_bilin_8hv_8bpp_rvv_i64: 6.5 vp9_put_bilin_16hv_8bpp_c: 133.7 vp9_put_bilin_16hv_8bpp_rvv_i64: 13.0 vp9_put_bilin_32hv_8bpp_c: 538.2 vp9_put_bilin_32hv_8bpp_rvv_i64: 39.7 vp9_put_bilin_64hv_8bpp_c: 2114.0 vp9_put_bilin_64hv_8bpp_rvv_i64: 153.7 --- libavcodec/riscv/vp9_mc_rvv.S | 34 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 10 ++++++++++ 2 files changed, 44 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index 9d7caeb005..d2c9393fc9 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -104,6 +104,39 @@ func ff_\op\()_bilin_\len\()\type\()_rvv, zve32x endfunc .endm +.macro bilin_hv len op +func ff_\op\()_bilin_\len\()hv_rvv, zve32x +.ifc \op,avg + csrwi vxrm, 0 +.endif + vsetvlstatic8 \len t0 64 + neg t1, a5 + neg t2, a6 + li t4, 8 + bilin_load v24, \len, put, h, a5 + add a2, a2, a3 +1: + addi a4, a4, -1 + bilin_load v4, \len, put, h, a5 + vwmulu.vx v16, v4, a6 + vwmaccsu.vx v16, t2, v24 + vwadd.wx v16, v16, t4 + vnsra.wi v16, v16, 4 + vadd.vv v0, v16, v24 +.ifc \op,avg + vle8.v v16, (a0) + vaaddu.vv v0, v0, v16 +.endif + vse8.v v0, (a0) + vmv.v.v v24, v4 + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + const subpel_filters_regular .byte 0, 0, 0, 128, 0, 0, 0, 0 .byte 0, 1, -5, 126, 8, -3, 1, 0 @@ -334,6 +367,7 @@ endfunc .irp op, put, avg bilin_h_v \len \op h a5 bilin_h_v \len \op v a6 + bilin_hv \len \op .irp name, regular, sharp, smooth .irp type, h, v epel \len \op \name \type 128 diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 931b92928a..09519f4005 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -104,6 +104,16 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) dsp->mc[4][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_4h_rvv; dsp->mc[4][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_4v_rvv; dsp->mc[4][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_4h_rvv; + dsp->mc[0][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_64hv_rvv; + dsp->mc[0][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_64hv_rvv; + dsp->mc[1][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_32hv_rvv; + dsp->mc[1][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_32hv_rvv; + dsp->mc[2][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_16hv_rvv; + dsp->mc[2][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_16hv_rvv; + dsp->mc[3][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_8hv_rvv; + dsp->mc[3][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_8hv_rvv; + dsp->mc[4][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_4hv_rvv; + dsp->mc[4][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_4hv_rvv; if (flags & AV_CPU_FLAG_RVB_ADDR) { init_subpel2(0, 0, 1, v, put, 128); From patchwork Sat May 18 18:15:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48991 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp3571922pzb; Sat, 18 May 2024 11:16:14 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXgTM1eZkE5CRqY8i6V/3nuloQvrqJLzBNC8HbNahmU6BZBfWb14PP28/sJEXrIqiw0/7kWGZDfIfsj1KTkGY6+robNkRJR4uiDeA== X-Google-Smtp-Source: AGHT+IEvQQCyEajWkTza4G3soDBvKVfyNATwPYKi5IBWVKAwj73SxsR0HqRYupi0WHgP8bMiSnZA X-Received: by 2002:a50:cdd6:0:b0:574:ecc4:6b4e with SMTP id 4fb4d7f45d1cf-574ecc46b7amr11301329a12.18.1716056174325; Sat, 18 May 2024 11:16:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716056174; cv=none; d=google.com; s=arc-20160816; b=RDO0t5wTS1RX+NHnY6HnBBeoMCaVQ29KXbBYcHY9RQGMJOylDJBLnXGjLkDWkLZGYJ tBe7tlZINZlDpUfdXSMYsf3OXY+/x72bq7W/wpe4vqG6jBMx5qHWk6wAxoqiShuN5zFp JAy54Y/KrAHpxma1QT4LqnLsml9Ft6EWnC+pAh3BNzv2zSgS1i1L+oeb0T8dUGn5MdQb /LDpkznyB61ouinhSpF27pVv1ohENKjXzqwyTE9pddA4FqU8ItzPraN8GGe3Cj4HHEr/ k+zQ7scpjhbLWUPwtMuCkEuCfi0mFyTJo864M98VfrIi7cIuUxIHinYVgmlTIHKUOUEf CRWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=29Tf4NoPCIopH+2lVDZgLdDc4z1PUzPNZQnS6IU7W3k=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=z/BJzFuxZ6R9R3BOzhGEBzzZf6rVkK/awztuGgymKnwrgJzzFVoWIoA9jFOotX3Fsx YLb4fYs8omeNBcSTdKcmSwD2ZTOj7CnnQUBXbUgXJtH2fY4mLX0eDRIlPTHIsPhobLJx g9i9etSYmkLNSCTnyufmXPCP8AaSdCktz6+sFbGZXpV8zAXKvE+p6KB8YM8GSqqMIQaa SWjlRacKSjXtaE8cFmXPZZdBXAbdZCFkVeTL8D+vArolRHsmRZ4eIhBR8uWxmzsQvG0R ifHzgqYesdnA36IWoCZHAxlGZuvo2kJKDjUsJNDsqflrxNQdeeu2O1IdgZS0WBEXCQs3 cMXw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=WKscyRbd; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5733c2d536fsi11135128a12.255.2024.05.18.11.16.13; Sat, 18 May 2024 11:16:14 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=WKscyRbd; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D4F7B68CB87; Sat, 18 May 2024 21:16:04 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-205.mail.qq.com (out203-205-221-205.mail.qq.com [203.205.221.205]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3C05368CCBC for ; Sat, 18 May 2024 21:15:56 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1716056139; bh=NY8ZisaX5Pk2JYp/ctLBTS/zfgUDozhyNyTxTRjLKcU=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=WKscyRbdzYeNvJQh0Pci0mEgkH0fzqDMly/fQPsWx2zDECCvjiQKbF321tHpRLjSO qopNB9PZ8oHR4nx/dFzbPYI0Vld/aZMThjUKGpXXul1/7DDvVor1mtXCFbhaQz6lOw xu4UjeKc7UEnHyn6nOonJpIgxtnN4gYM7p2eITX4= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb16-1.qq.com (NewEsmtp) with SMTP id 3E33EA47; Sun, 19 May 2024 02:15:35 +0800 X-QQ-mid: xmsmtpt1716056138tr8x049jb Message-ID: X-QQ-XMAILINFO: M0PjjqbLT90w3nyP7Da4/7HQIobubq4MJTvp9/56YNrk6Boc9WrRGlmX6UfO03 4ndP5AR8zEM027Fsj/KfKBalu/HPRdiPEMbF9TSm9eQkBOHCLJDw7GqtZ8tZIzGe57V29PJkEWXF Jv+OFvpQYl6zSnTRw0qfjww0bouXkx12pFDsRllxbm+l5LQhE5dxI0BlDLHa0lapSHwL47Q4gcZZ 9ZBMeoYLOsCVPgPaRJo2apkBfbViHo644GtyiH9bsRIrwebLH1VNm+74YxQ2M955wao63WeBZiLc QVedSe8C6PEcgW3AQTsVaxxymr7sWjy8sBWkmE3hyV3aiqUXh28DVdo9P3XUVmDiu1ed6ILwxAbP yb0s02mOsH068Px/nYR0UmZpd7LpSegkFuqp7hX1mwdYsBHq4+FxbsVIbcSpfjTEYJvdPUGp8XO5 29TnPkc2kzTnODkjH/WZ61QAQ4Bp0sPR3rfl4QDJTrHrexKS3XTB3CSnsnO7yFmLsYydWqqz0R/7 Q0mbMhD4898FXUV9uHj4GXQQQpqfys22kMVBn8U6WZmn2fLMRHHRS7cFbE2lJKzj9LQbQOM+kQQ6 CyzzGTGrJsMpITaEuovUhnZ1x1ITd216kYBgl7xOpe8HUWu6QoeAxoIlYIFbocmkH+a+48aynxBk VZGjg9ko11dtO8E8PyMmA6SdofLdlcaaL92sBtonnlvJMcAFJB81Em22Zcr9PedZv6uzYfojS3za NKAdeorspsngzRUZW7jyDJDL65Xsp+9Z8pE6xtRLpt4jytUkvtv59ruEsW1uBScv4x36OpCxswhM /emqMPFHjBSARTvDM7fgFgNQu1sObhMUE7wpukigiqzeJPobm2F3W2qU3X1MQJCJijvLh5egckhE pLlrO71ZdU3z1fTRAf6Sal1EGiKHIDVkoG86P5TCSfcsWRDXJmuRXPT0WvG91MA2LWpvJ2OXh/qE 5xYk751gc= X-QQ-XMRINFO: MPJ6Tf5t3I/ycC2BItcBVIA= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 19 May 2024 02:15:33 +0800 X-OQ-MSGID: <20240518181533.3124314-5-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240518181533.3124314-1-uk7b@foxmail.com> References: <20240518181533.3124314-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 5/5] lavc/vp9dsp: R-V V mc tap hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: LsA0/LpdemM+ From: sunyuechi C908 X60 vp9_avg_8tap_smooth_4hv_8bpp_c : 32.0 28.2 vp9_avg_8tap_smooth_4hv_8bpp_rvv_i32 : 15.0 13.2 vp9_avg_8tap_smooth_8hv_8bpp_c : 98.0 86.2 vp9_avg_8tap_smooth_8hv_8bpp_rvv_i32 : 23.7 21.0 vp9_avg_8tap_smooth_16hv_8bpp_c : 355.5 297.0 vp9_avg_8tap_smooth_16hv_8bpp_rvv_i32 : 62.7 41.2 vp9_avg_8tap_smooth_32hv_8bpp_c : 1273.0 1099.7 vp9_avg_8tap_smooth_32hv_8bpp_rvv_i32 : 133.7 119.2 vp9_avg_8tap_smooth_64hv_8bpp_c : 4933.0 4240.5 vp9_avg_8tap_smooth_64hv_8bpp_rvv_i32 : 506.7 227.0 vp9_put_8tap_smooth_4hv_8bpp_c : 30.2 27.0 vp9_put_8tap_smooth_4hv_8bpp_rvv_i32 : 14.5 12.7 vp9_put_8tap_smooth_8hv_8bpp_c : 91.2 81.2 vp9_put_8tap_smooth_8hv_8bpp_rvv_i32 : 22.7 20.2 vp9_put_8tap_smooth_16hv_8bpp_c : 329.2 277.7 vp9_put_8tap_smooth_16hv_8bpp_rvv_i32 : 44.7 40.0 vp9_put_8tap_smooth_32hv_8bpp_c : 1183.7 1022.7 vp9_put_8tap_smooth_32hv_8bpp_rvv_i32 : 130.7 116.5 vp9_put_8tap_smooth_64hv_8bpp_c : 4502.7 3954.5 vp9_put_8tap_smooth_64hv_8bpp_rvv_i32 : 496.0 224.7 --- libavcodec/riscv/vp9_mc_rvv.S | 75 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 8 ++++ 2 files changed, 83 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index d2c9393fc9..8dba76297b 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -362,6 +362,77 @@ func ff_\op\()_8tap_\name\()_\len\()\type\()_rvv\vlen\(), zve32x endfunc .endm +#if __riscv_xlen == 64 +.macro epel_hv_once len name op + sub a2, a2, a3 + sub a2, a2, a3 + sub a2, a2, a3 + .irp n,0,2,4,6,8,10,12,14 + epel_load_inc v\n \len put \name h 1 t + .endr + addi a4, a4, -1 +1: + addi a4, a4, -1 + epel_load v30 \len \op \name v 0 s + vse8.v v30, (a0) + vmv.v.v v0, v2 + vmv.v.v v2, v4 + vmv.v.v v4, v6 + vmv.v.v v6, v8 + vmv.v.v v8, v10 + vmv.v.v v10, v12 + vmv.v.v v12, v14 + epel_load v14 \len put \name h 1 t + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + epel_load v30 \len \op \name v 0 s + vse8.v v30, (a0) +.endm + +.macro epel_hv op name len vlen +func ff_\op\()_8tap_\name\()_\len\()hv_rvv\vlen\(), zve32x + addi sp, sp, -64 + .irp n,0,1,2,3,4,5,6,7 + sd s\n, \n\()<<3(sp) + .endr +.if \len == 64 && \vlen < 256 + addi sp, sp, -48 + .irp n,0,1,2,3,4,5 + sd a\n, \n\()<<3(sp) + .endr +.endif +.ifc \op,avg + csrwi vxrm, 0 +.endif + epel_filter \name h t + epel_filter \name v s +.if \vlen < 256 + vsetvlstatic8 \len a6 32 m2 +.else + vsetvlstatic8 \len a6 64 m2 +.endif + epel_hv_once \len \name \op +.if \len == 64 && \vlen < 256 + .irp n,0,1,2,3,4,5 + ld a\n, \n\()<<3(sp) + .endr + addi sp, sp, 48 + addi a0, a0, 32 + addi a2, a2, 32 + epel_filter \name h t + epel_hv_once \len \name \op +.endif + .irp n,0,1,2,3,4,5,6,7 + ld s\n, \n\()<<3(sp) + .endr + addi sp, sp, 64 + + ret +endfunc +.endm +#endif + .irp len, 64, 32, 16, 8, 4 copy_avg \len .irp op, put, avg @@ -373,6 +444,10 @@ endfunc epel \len \op \name \type 128 epel \len \op \name \type 256 .endr + #if __riscv_xlen == 64 + epel_hv \op \name \len 128 + epel_hv \op \name \len 256 + #endif .endr .endr .endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 09519f4005..fe05d118de 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -118,6 +118,10 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) if (flags & AV_CPU_FLAG_RVB_ADDR) { init_subpel2(0, 0, 1, v, put, 128); init_subpel2(1, 0, 1, v, avg, 128); +# if __riscv_xlen == 64 + init_subpel2(0, 1, 1, hv, put, 128); + init_subpel2(1, 1, 1, hv, avg, 128); +# endif } } @@ -128,6 +132,10 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) if (flags & AV_CPU_FLAG_RVB_ADDR) { init_subpel2(0, 0, 1, v, put, 256); init_subpel2(1, 0, 1, v, avg, 256); +# if __riscv_xlen == 64 + init_subpel2(0, 1, 1, hv, put, 256); + init_subpel2(1, 1, 1, hv, avg, 256); +# endif } } }