From patchwork Tue May 21 17:13:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 49104 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:9214:b0:1af:fc2d:ff5a with SMTP id tl20csp127422pzb; Tue, 21 May 2024 10:14:18 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXwPU0CNNGxJI5jLWO/wR8FKRdRk40xMC8K2TsyWet6cyMUaM1Y25OzUvikzPbv8ZajvmzXuc9rRzVnTlgYYTWn45PziPtJeTGX6A== X-Google-Smtp-Source: AGHT+IFeYRbbuN/3rHvcT0BBY4Sq5GIJJ0XGuH+lIipITTOPAnDnASyAP3khsrQko2HVSU+TmhDf X-Received: by 2002:a17:907:75c4:b0:a61:8c98:88c7 with SMTP id a640c23a62f3a-a618c98895bmr336520666b.2.1716311658322; Tue, 21 May 2024 10:14:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716311658; cv=none; d=google.com; s=arc-20160816; b=C1XDul+Yd6yvLfytGkfyqWB3L2LJoJ/v2nC/DXD//owwla+oVCu9NZSDfBtn3W7Q8a Buipr9+qLFfpSSJoSmTvNpEM/OCjC5ufWBdYgi2vrny6lLqY8RDpq5hh1d+gAj5nmdHW Dl7MLXI9wc1c1GqlA5ngWmMRexwg/MlxdRXYNnvdgghdjXvXnZyYVUFhJjdImZZbfvc/ TU0dJfp11teQR+0KTac3Z9hsD4/7ViQ0ibYBKp9LXwJbMbRpOGtG9nKdKKDiDYkcOefQ bN6VkDiffGqYlRqE4pFkC1L5YWjHb0jjYDgVq9fJoivt+yeGFNM4k7o2G9OryETq+Kwf CmbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=//N4pNZQI5UlPNfrwvTBp9F7OMm7sp0+QR3wlul7yok=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=0dhYMi1j3OplHMb8ttDkpykz2jryKxQvuJ+Lf5Jk4/p0zSRvpM9jz6MsQD+8Lwbvjj t09IQCvGxmN20qm44bKTbo6dK0D49pV4OjHm+SBSu2jFfJN5LGaPin5dCdxQwGBDqnLB RYjytG3IykMcgua6Wq2srvS/zOvuQ8OrFRW9kw21lp9zvL6N0uHfHbYQQBpI8ZAzmF2n 5pmxgE/V0GlJLL5jbH9vD6FpfhRA08u2n5hR5pq81rqswl0RLhtmXEOfVKKVlqFCjgFg AZ/G3WoOoKGgD6pdq4EqX1Tx7ENr2299l4DYguFktlB0hsKmgDg3w8DvH6SVP07ZHc60 XPGQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=v+TCpYPC; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17b4c747si1363785366b.309.2024.05.21.10.14.16; Tue, 21 May 2024 10:14:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=v+TCpYPC; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6975A68D362; Tue, 21 May 2024 20:14:12 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-58-216.mail.qq.com (out162-62-58-216.mail.qq.com [162.62.58.216]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4BEBB68D2A9 for ; Tue, 21 May 2024 20:14:03 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1716311634; bh=gl8I/BCAqBQT57+P8EvAgTro76kd1GqRcTSDKc8XcG8=; h=From:To:Cc:Subject:Date; b=v+TCpYPCL431pIF7hpqzIjtjhEnze6yXij1qTT0zs0Tchfh0qKtYGv9JBD8bzJhqm TvNUo1eFOdNGrMpwakQXpjbv6/jmqmjzj7md8MAMOwtO3fAevFE8o8TV7lpAiCR/ax VF+jFgxWQAfYx4/91iQxnd2m18+U++WgVz8gtkTc= Received: from localhost.localdomain ([42.177.176.187]) by newxmesmtplogicsvrszb9-1.qq.com (NewEsmtp) with SMTP id 374AC612; Wed, 22 May 2024 01:13:52 +0800 X-QQ-mid: xmsmtpt1716311632tcp6kgdlb Message-ID: X-QQ-XMAILINFO: NvxNySBpH0QldVXeWP53Pucc/5l2vhHDVIcWYERC5B50NZ6dYR3XjLZMasuSpZ 4iMAAKhycvMFDrxt8ejzJlEOxVX/ousY6Pp0e07ZYMd6afJnabgXaPeZTMCGGLx30YsKUiBWqm3a Um4dv2Kjk05q6uc7VDJqeq8RIWDgg/XirFpSzNYzpX7fSWEkQSQ6mLpxgQEMDKcILnvMmmzThs+e KZ/yoNtqxZ9zywAr5cipmNwbd93g6OSIMMtTB558GCnLiMGZ/uvFZJR9zPHF5EcWxx6flZ7pBVsI gOtWMslbdtIw8Q0nasU24GEn3101aelKu9LnOo4u8sWoBi8jSFLMvRw2hVb6vtGexgTTTlTiher/ 2zPS7sPnMk+QHCNiaYPVUO/6/7HrPlEkOrRirEi/nfOR0zwncxrRyou8P8b4BFVwjG09f0AvdoeD E3w3lxrKx5PFxcNafREfa8F1fh/vAPDzFYEVBzPvqLyNa0F/E2kHovKLRofxDXQDNTlbA0sM86XE RtxL0Gdsh4wpmPtGsizP8pFpTNKFEBzNQP+dLhUAWZNLK6GWFdJCb7vZ5Dl/IoKwV3YGHghXTjbZ vHPEvoenSc8bCq7A7nPYLzpWgauS3ny8qopu5ysfbR7mGxjqP5OKI3c4wq1ZRWmK8k1DRyOzvuC5 W1iSu0fXJ7/aEmXjAivNdFwPrDr1MHDikjyfo1CsmWFp1CUKEeh3QH3bNsTKT2p8fBF+Dq7mf9AS BQP0gp4AGp9fWyMMzwS7GUGG8+eOBBHo5xsou2CSF7td0tXAJoTq+7Xc+HYD6U9l7mCWboL1F1wt moW7/GKsqS699+qHUseWoNx8Y5696ynYl6hVu4o4KwVpYdtR6jQNIli/BlrK0Fjfdp4LDc//iUAy zKOfFzLb6PjQ2B1uKEEheTSYit9yS8HN9sVFSnvTb2 X-QQ-XMRINFO: MSVp+SPm3vtS1Vd6Y4Mggwc= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Wed, 22 May 2024 01:13:15 +0800 X-OQ-MSGID: <20240521171319.2629938-1-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: FpA9XOoQeAH0 From: sunyuechi C908: vp9_avg4_8bpp_c: 1.2 vp9_avg4_8bpp_rvv_i64: 1.0 vp9_avg8_8bpp_c: 3.7 vp9_avg8_8bpp_rvv_i64: 1.5 vp9_avg16_8bpp_c: 14.7 vp9_avg16_8bpp_rvv_i64: 3.5 vp9_avg32_8bpp_c: 57.7 vp9_avg32_8bpp_rvv_i64: 10.0 vp9_avg64_8bpp_c: 229.0 vp9_avg64_8bpp_rvv_i64: 31.7 --- libavcodec/riscv/Makefile | 3 +- libavcodec/riscv/vp9_mc_rvv.S | 58 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 4 +-- libavcodec/riscv/vp9dsp_init.c | 18 +++++++++++ 4 files changed, 80 insertions(+), 3 deletions(-) create mode 100644 libavcodec/riscv/vp9_mc_rvv.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 07d5c2915d..67e198d754 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -69,6 +69,7 @@ RVV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvv.o OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9dsp_init.o RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o \ riscv/vp9_mc_rvi.o -RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o +RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o \ + riscv/vp9_mc_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S new file mode 100644 index 0000000000..7cb38ec94a --- /dev/null +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -0,0 +1,58 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/riscv/asm.S" + +.macro vsetvlstatic8 len an maxlen mn=m4 +.if \len == 4 + vsetivli zero, \len, e8, mf4, ta, ma +.elseif \len == 8 + vsetivli zero, \len, e8, mf2, ta, ma +.elseif \len == 16 + vsetivli zero, \len, e8, m1, ta, ma +.elseif \len == 32 + li \an, \len + vsetvli zero, \an, e8, m2, ta, ma +.elseif \len == 64 + li \an, \maxlen + vsetvli zero, \an, e8, \mn, ta, ma +.endif +.endm + +.macro copy_avg len +func ff_vp9_avg\len\()_rvv, zve32x + csrwi vxrm, 0 + vsetvlstatic8 \len, t0, 64 +1: + vle8.v v8, (a2) + vle8.v v16, (a0) + vaaddu.vv v8, v8, v16 + addi a4, a4, -1 + vse8.v v8, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + ret +endfunc +.endm + +.irp len, 64, 32, 16, 8, 4 + copy_avg \len +.endr diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index 79330b4968..ff8431591c 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -138,11 +138,11 @@ void ff_avg_bilin_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ int h, int mx, int my); #define VP9_COPY_AVG_RISCV_RVV_FUNC(SIZE) \ -void ff_copy##SIZE##_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_vp9_copy##SIZE##_rvv(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg##SIZE##_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_vp9_avg##SIZE##_rvv(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index ab99294d44..454dcd963f 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -48,6 +48,24 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) } # endif +#if HAVE_RVV + if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I32) && ff_rv_vlen_least(128)) { + +#define init_fpel(idx1, sz) \ + dsp->mc[idx1][FILTER_8TAP_SMOOTH ][1][0][0] = ff_vp9_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_8TAP_REGULAR][1][0][0] = ff_vp9_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_8TAP_SHARP ][1][0][0] = ff_vp9_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_BILINEAR ][1][0][0] = ff_vp9_avg##sz##_rvv + + init_fpel(0, 64); + init_fpel(1, 32); + init_fpel(2, 16); + init_fpel(3, 8); + init_fpel(4, 4); + +#undef init_fpel + } +#endif #endif } From patchwork Tue May 21 17:13:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 49105 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:9214:b0:1af:fc2d:ff5a with SMTP id tl20csp127536pzb; Tue, 21 May 2024 10:14:30 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUXmvJz6slT4BOQySChAhg+/9CeYRb41rdKlNQsfhlT0RdlHAAI/7ronq+niCIjV9NzIOzymil5ydmtu78HMnWMFFrqidmrqmTzKw== X-Google-Smtp-Source: AGHT+IEK5rKtkgJApvmQZLPRa0xoNMiyqPY6+vhr+uNQqHj0YHR5E22vNGZ92PTqg2UVZnlULhCo X-Received: by 2002:a50:cd5d:0:b0:574:f26f:cb63 with SMTP id 4fb4d7f45d1cf-574f26fd504mr14169131a12.27.1716311670168; Tue, 21 May 2024 10:14:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716311670; cv=none; d=google.com; s=arc-20160816; b=rB34Gv5xxUgCquzvkxsODU6FeuPv2FZ/SkAMxUV7Evw+TBjNI/6z9uGUSWYVbs+kOk 18ca88FI/klJynvfRQgeaJ36TVWqrc+9/s5efCckNxGAFA51f/dIfeOKyE+rMYFBPYhQ 7mo2iAdVdsJO0Ry7FSGgheJxSeZYbJ9TRAYEx9tLcj6sdGO9bJJatbIyySJrimrYSMNd YrgFVf4GvCMpUaqs0N6CI/skExA9kCzCyChNtn2prCorv0KuhKJ7gjRI1+A2sm9NDEmD caMpr528+Pquvr3Yvfonpp1lPSct4ifdoB9XdVYd6XUFm6VLuoK0y0ozYKGMv5pcfg8u y03g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=J5JQJUzl+fG+jNvm51dSD+gLQb3jqnHVTuHH4kVBf4k=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=O+obM9X+dxrR3ea+CVh17rq2R9pExV6U6+wPCjUbtfHSxv9Gj3mFF0jY/NRy/eQXQP NQjredUVwBRVG/Fk3VW3gNs9+4IY5MtvNvxJny2ic4rcRWG5cpeYV3KsvkLpg2PvwM+2 ftriFM1D/CWsEZXbkkMtfBk2OlA+P2Ti/PXgM/1ja+m5cG8XxGEptEwGAITaFf3f7TWM unz8IKcrR3VdyIdLIqSgFHpJVtaVeSi6l1bsZW4ramOtjcuAXaYb8Bm5MM2WXRbysLVv Vhc6WGFAcrotDRboTE+Yg/WS/DFCOx2kkOHoekpTeZ00jvMsINE7DI3c6eh2DXd1+URi 2XOA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=ezG34Y41; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5733bbc2d7esi14932059a12.0.2024.05.21.10.14.28; Tue, 21 May 2024 10:14:30 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=ezG34Y41; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E1E6B68D38D; Tue, 21 May 2024 20:14:13 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-57-49.mail.qq.com (out162-62-57-49.mail.qq.com [162.62.57.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 489E768D0A4 for ; Tue, 21 May 2024 20:14:03 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1716311634; bh=Zf0LdU5LsZAJv653iXyP6xd3/+5sucD6XuPCbuB69Pw=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=ezG34Y41BSUIhykIQIkCKYwmHJKkyigFc7AJgZ7ix/wreXQrrzDXjuFb+Od1z0z9N pdrU2HkVIzFbiG7a2U2BHOsnSxv0Ujqr6Io8+RvcmqB1wton+kVtqCuJbhwVU5hdXy u/jkGJZJZJ+tJRkWeZIOCAttLPfIgNCITkWB5hRo= Received: from localhost.localdomain ([42.177.176.187]) by newxmesmtplogicsvrszb9-1.qq.com (NewEsmtp) with SMTP id 374AC612; Wed, 22 May 2024 01:13:52 +0800 X-QQ-mid: xmsmtpt1716311633t913sz8vi Message-ID: X-QQ-XMAILINFO: OOPJ7pYMv25tfMoNpiIDa4s69xJSes5EBzfOeR61bDeq5g4Ogd7fVBHJbQiFYn TC5Ai6LKGE5jhpqgQK1DxtrdcnDDUf+RFTj5IJX6NRa89heS7WQNuDd4jXPupctHq+hvI5VJ6DeQ GvfTN30BiuFzu57bXXvt7Akxf5Fvlkr4kqROPYf4uapMhAVFkdZdPKKb90R/UUzoVOUSmIa3yikG AWLhFzfdZ30NMZ3XzMqHC5RdgYCVCkz6aBh1WsVqtQ8sR8aeG/4pMGTjYo4i31VmSkAuqud8lYeM WPAgtpTwnR5kjezTDTG/3abcDPDbOkEotMxQGWo3JepCmW1u8qgcdpOBL6GmaZr59za4t2yp14+H ldkiFEFzK2+/Ogom39wPLRg/PhSNfHHRX9Gx+Wp6vQ1y4rvOgnXJImtw7wuiiVaedklGl4poz+/y gxx6+w4wYqxq1LJs4tuKzz2XkeenbycLQFeFwHtsd95EV8QkUVZoM/b7XhXkEg62OlWtci6/vJYD NlbqFwqRO3MfVy7oTVPknmeBiwVm8lw/9egAuj1D+0UfsIRCmFL+TUb6ckb71W9aSwStaY446hPc GE79Q/HWxbYMgV9GOXtfE9tvc6Ktagy8yufsvOid2BdOyNWk9dto0C6r0Bda6FoxN7BYU0DYvvFg FGhVpMaSJsHCoHs/orzZU8g7OhtDKiGaWBNIAOWBFv8+ji29mnwyu9SrggQm67W2xgRWTm/ZaPJP 5OYOglVggFx8FdSAoNrhORY6eKUjKO9Z7WTAekF0iGO8IRbey1HckRPppjVcjlC4C0fCYPfCvKuh /FnbMaUFD+UaBIts9l63gDsp/LaE9FDkzfWQlxg7pjsGe6WHnEpQY/LGovt/AVyTNYcaIUB/7G9i e72h9iZqQS7ar7f+ShSjbyhdl0vlCgfFbgeunucE+C+GyVLKKojJscujFBA9KDy4PT4bFjbCTi X-QQ-XMRINFO: OD9hHCdaPRBwq3WW+NvGbIU= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Wed, 22 May 2024 01:13:16 +0800 X-OQ-MSGID: <20240521171319.2629938-2-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240521171319.2629938-1-uk7b@foxmail.com> References: <20240521171319.2629938-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 2/5] lavc/vp9dsp: R-V V mc bilin h v X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: RRU2AwqFtNzX From: sunyuechi C908: vp9_avg_bilin_4h_8bpp_c: 5.2 vp9_avg_bilin_4h_8bpp_rvv_i64: 2.2 vp9_avg_bilin_4v_8bpp_c: 5.5 vp9_avg_bilin_4v_8bpp_rvv_i64: 2.2 vp9_avg_bilin_8h_8bpp_c: 20.0 vp9_avg_bilin_8h_8bpp_rvv_i64: 4.5 vp9_avg_bilin_8v_8bpp_c: 21.0 vp9_avg_bilin_8v_8bpp_rvv_i64: 4.2 vp9_avg_bilin_16h_8bpp_c: 78.2 vp9_avg_bilin_16h_8bpp_rvv_i64: 9.0 vp9_avg_bilin_16v_8bpp_c: 82.0 vp9_avg_bilin_16v_8bpp_rvv_i64: 9.0 vp9_avg_bilin_32h_8bpp_c: 325.5 vp9_avg_bilin_32h_8bpp_rvv_i64: 26.2 vp9_avg_bilin_32v_8bpp_c: 326.2 vp9_avg_bilin_32v_8bpp_rvv_i64: 26.2 vp9_avg_bilin_64h_8bpp_c: 1265.7 vp9_avg_bilin_64h_8bpp_rvv_i64: 91.5 vp9_avg_bilin_64v_8bpp_c: 1317.0 vp9_avg_bilin_64v_8bpp_rvv_i64: 91.2 vp9_put_bilin_4h_8bpp_c: 4.5 vp9_put_bilin_4h_8bpp_rvv_i64: 1.7 vp9_put_bilin_4v_8bpp_c: 4.7 vp9_put_bilin_4v_8bpp_rvv_i64: 1.7 vp9_put_bilin_8h_8bpp_c: 17.0 vp9_put_bilin_8h_8bpp_rvv_i64: 3.5 vp9_put_bilin_8v_8bpp_c: 18.0 vp9_put_bilin_8v_8bpp_rvv_i64: 3.5 vp9_put_bilin_16h_8bpp_c: 65.2 vp9_put_bilin_16h_8bpp_rvv_i64: 7.5 vp9_put_bilin_16v_8bpp_c: 85.7 vp9_put_bilin_16v_8bpp_rvv_i64: 7.5 vp9_put_bilin_32h_8bpp_c: 257.5 vp9_put_bilin_32h_8bpp_rvv_i64: 23.5 vp9_put_bilin_32v_8bpp_c: 274.5 vp9_put_bilin_32v_8bpp_rvv_i64: 23.5 vp9_put_bilin_64h_8bpp_c: 1040.5 vp9_put_bilin_64h_8bpp_rvv_i64: 82.5 vp9_put_bilin_64v_8bpp_c: 1108.7 vp9_put_bilin_64v_8bpp_rvv_i64: 82.2 --- libavcodec/riscv/vp9_mc_rvv.S | 43 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 12 +++++----- libavcodec/riscv/vp9dsp_init.c | 21 +++++++++++++++++ 3 files changed, 70 insertions(+), 6 deletions(-) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index 7cb38ec94a..739380d9a9 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -53,6 +53,49 @@ func ff_vp9_avg\len\()_rvv, zve32x endfunc .endm +.macro bilin_load dst len op type mn +.ifc \type,v + add t5, a2, a3 +.else + addi t5, a2, 1 +.endif + vle8.v v8, (a2) + vle8.v v0, (t5) + vwmulu.vx v16, v0, \mn + vwmaccsu.vx v16, t1, v8 + vwadd.wx v16, v16, t4 + vnsra.wi v16, v16, 4 + vadd.vv \dst, v16, v8 +.ifc \op,avg + vle8.v v16, (a0) + vaaddu.vv \dst, \dst, v16 +.endif +.endm + +.macro bilin_h_v len op type mn +func ff_\op\()_vp9_bilin_\len\()\type\()_rvv, zve32x +.ifc \op,avg + csrwi vxrm, 0 +.endif + vsetvlstatic8 \len, t0, 64 + li t4, 8 + neg t1, \mn +1: + addi a4, a4, -1 + bilin_load v0, \len, \op, \type, \mn + vse8.v v0, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + .irp len, 64, 32, 16, 8, 4 copy_avg \len + .irp op, put, avg + bilin_h_v \len, \op, h, a5 + bilin_h_v \len, \op, v, a6 + .endr .endr diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index ff8431591c..8fb326dae0 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -113,27 +113,27 @@ void ff_avg_8tap_##type##_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ int h, int mx, int my); #define VP9_BILINEAR_RISCV_RVV_FUNC(SIZE) \ -void ff_put_bilin_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_put_vp9_bilin_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_put_bilin_##SIZE##v_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_put_vp9_bilin_##SIZE##v_rvv(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_put_bilin_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_put_vp9_bilin_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_bilin_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_vp9_bilin_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_bilin_##SIZE##v_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_vp9_bilin_##SIZE##v_rvv(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_bilin_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_vp9_bilin_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 454dcd963f..9606d8545f 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -63,6 +63,27 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) init_fpel(3, 8); init_fpel(4, 4); + dsp->mc[0][FILTER_BILINEAR ][0][0][1] = ff_put_vp9_bilin_64v_rvv; + dsp->mc[0][FILTER_BILINEAR ][0][1][0] = ff_put_vp9_bilin_64h_rvv; + dsp->mc[0][FILTER_BILINEAR ][1][0][1] = ff_avg_vp9_bilin_64v_rvv; + dsp->mc[0][FILTER_BILINEAR ][1][1][0] = ff_avg_vp9_bilin_64h_rvv; + dsp->mc[1][FILTER_BILINEAR ][0][0][1] = ff_put_vp9_bilin_32v_rvv; + dsp->mc[1][FILTER_BILINEAR ][0][1][0] = ff_put_vp9_bilin_32h_rvv; + dsp->mc[1][FILTER_BILINEAR ][1][0][1] = ff_avg_vp9_bilin_32v_rvv; + dsp->mc[1][FILTER_BILINEAR ][1][1][0] = ff_avg_vp9_bilin_32h_rvv; + dsp->mc[2][FILTER_BILINEAR ][0][0][1] = ff_put_vp9_bilin_16v_rvv; + dsp->mc[2][FILTER_BILINEAR ][0][1][0] = ff_put_vp9_bilin_16h_rvv; + dsp->mc[2][FILTER_BILINEAR ][1][0][1] = ff_avg_vp9_bilin_16v_rvv; + dsp->mc[2][FILTER_BILINEAR ][1][1][0] = ff_avg_vp9_bilin_16h_rvv; + dsp->mc[3][FILTER_BILINEAR ][0][0][1] = ff_put_vp9_bilin_8v_rvv; + dsp->mc[3][FILTER_BILINEAR ][0][1][0] = ff_put_vp9_bilin_8h_rvv; + dsp->mc[3][FILTER_BILINEAR ][1][0][1] = ff_avg_vp9_bilin_8v_rvv; + dsp->mc[3][FILTER_BILINEAR ][1][1][0] = ff_avg_vp9_bilin_8h_rvv; + dsp->mc[4][FILTER_BILINEAR ][0][0][1] = ff_put_vp9_bilin_4v_rvv; + dsp->mc[4][FILTER_BILINEAR ][0][1][0] = ff_put_vp9_bilin_4h_rvv; + dsp->mc[4][FILTER_BILINEAR ][1][0][1] = ff_avg_vp9_bilin_4v_rvv; + dsp->mc[4][FILTER_BILINEAR ][1][1][0] = ff_avg_vp9_bilin_4h_rvv; + #undef init_fpel } #endif From patchwork Tue May 21 17:13:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 49106 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:9214:b0:1af:fc2d:ff5a with SMTP id tl20csp127655pzb; Tue, 21 May 2024 10:14:42 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUGHj5Zsxkny6b/KIth6ca0ig6DM++TuvPAuPd0s6WA3xvmhH02jJU1zJLfdoWy/RwY65UNpKThcLpNJsbKIsgZSEYZxXvn0V6L9A== X-Google-Smtp-Source: AGHT+IGOLFwaknsw4l0egSL0XUFcj5ihos9RAeFgvbF1v08AvivUVhN5ABcQzoxVAP3F78wXQAuz X-Received: by 2002:a50:bb4b:0:b0:571:bb79:51aa with SMTP id 4fb4d7f45d1cf-5734d6e0021mr27900836a12.39.1716311681812; Tue, 21 May 2024 10:14:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716311681; cv=none; d=google.com; s=arc-20160816; b=HJTkBfxbu3tEoHt0jFX8HYQPJF0lEgxuWRHH5jNaQAjG7C+xnQGkbRL71na3HPeDH5 jNmkKBvjo6TCplju+MWxPRvYh2gxMSLp89dUevYOmKJtXbWwSBFP4fWMte2ZxZc2sPxq D4CVNGP01Pr8yk+KXDaAncMKLYFt07131CriyXA/tCkoKczRiDS8Ke8rNyWHg+oRrBAP o70Fgh4g94wg63JwbXCKFWGHhISH7QaomwI/zR9P4mi2lQz421GMdU9Fv5cEJ0yB02iA fZ+0BBPCji9qcIK2oyrk0SbzJqH/zBTb/e553PklxIzM6wna9YXh6pMwjS8Nuy6n2t3l ztJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=BT3yfx/wOnxZwjY8l4lJdsqKfOwWTsLoc2pFbtkaTWE=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=ZRr8Ql5XiS6NJ+RhZsxu86j19Z7ZpVqJGrZn57Ux/KzvE17hpLwn6Dcn+vsp8fuCS4 EPn8gb3sArGGldcdy8fkUM3WP7R7tkPo336H9vnMxpQFkFrisXX3uM21uHfYzpfocLTW IAj/MQylQ4StapzDYbQbTowCAqm4dSdAcSc5V9r/KhQpSXg3UFZQxbpltw8nP6kBrEyz 3L54F6JyFZWJ/z9L/49QUoKvqRZKQWLt6Fuji6Lwc+Ez3qrf50gmg4zWuOxtJAZ6XVyY idx2mvodTvzBkJg0jTllmA6PBkHY60tCrU1VEVIkzQ6N0fYXDqIquyC2MeRYvnYyq7LT jcDg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=xeW+oM2o; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5769313c713si3238325a12.368.2024.05.21.10.14.39; Tue, 21 May 2024 10:14:41 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=xeW+oM2o; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C4AB868D39C; Tue, 21 May 2024 20:14:14 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from xmbghk7.mail.qq.com (unknown [119.28.226.22]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A42FE68CF59 for ; Tue, 21 May 2024 20:14:04 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1716311635; bh=5dbrqu5L/C7BEdbHkss0QFXWj/uj8UZ8DmSg4rvJL1A=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=xeW+oM2o4TIAOKUh5FFWIbYs/4ImDfpTv74j7FRiAyX0pGQsCXwM/0Cban67dcbOx kHuj52poVq9wCyLDQGSMudevlihtwtT5sJC61YDc4ZaOOgAGTWTwBrRV5avm1oUHwK k4hQmoC1qZoqmCmY8fbFMcRBEDN8gprKpPncfLJk= Received: from localhost.localdomain ([42.177.176.187]) by newxmesmtplogicsvrszb9-1.qq.com (NewEsmtp) with SMTP id 374AC612; Wed, 22 May 2024 01:13:52 +0800 X-QQ-mid: xmsmtpt1716311634t4cznxdyn Message-ID: X-QQ-XMAILINFO: MDI1tqtLPNztXfjV6lVBbPDzqhnOAvK+guwWDN98AxKMNflzwSQe1BARRyMU4U 6x84AAmt0JknmYPh2d8dPpj49WHFCn5efhYjNwqGqr+5ptamzm0JqQAoXjjo3xf5JNruONqAmeqg Zspt5Oo+uFoMJ5+U7j/E+nmQZpPfK3MwkYg1HW2Fz9LpgdGx+/5HaCPtI67x8FpaRpeS95/HqLaa ohEVI4Iw3DCxWv3Yn2XMAehl/RaChbld3c3sGM4Lwg7+mk9RPieve+7LHUaX+KT8UxKzE19nMgy2 e7TFo26wYZUpKd9+3OOBT+7BA6cWzqSCvYHpKsS0ckV9YF7155yFNPycuT7/ppNJOtJNyRTBBPUf 3VGzBCORAWKlrmggWDZbbkdwdiOItCW9tYUJdr75Dlhn8PvcybBVK33Pw23/AjCP0kTZ2Xo+K9Yc 3hnDRAcXs8rD4jPMLOimHoo+8rEDS0G11jHlbjwpcLnQDvHxpp1TquJ8pAPR1gYL8g+T7o0f6UUB D/le7bsbo1t5yjaBP5jsPZDzHQ4fmakXqvleqGwFV9oBDwK7q1wm+B0XDbCkChm6SheZEdtadaHg /wJT/Uc1+hLPFpyYiUA2YbiTlTC3hS+2EUHN3PtHOnWVjnw8owWqkTsaCq8PzcnA1Qjdhqrk8vsR bUSgGQVyfoWQ3ndx/SUJ5nSX/pBdY2ZnnxfG18apd3ILiz5NN7Uu9CJejLeHS3mIa3fX/dx+x9qw 5pemIi4lyaeq19SGhGwjxw6J397pUwzWokTTz6deb0grlwC9heJZsdpxrpASkhNUr1++4GIysW+Z NnBIiGOCjfKYfmGShjHHymFWZA7sEfSA5kx09RVbWNJtBRB+sUDiPMroWdsNycsfBViBhBy86UKO Yc21ymw8nCBviiphjaXqaMzIxmiETpTyKz2Ej0y7IEJnJi4vwoEyYA/o9PSqYt9w== X-QQ-XMRINFO: M/715EihBoGSf6IYSX1iLFg= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Wed, 22 May 2024 01:13:17 +0800 X-OQ-MSGID: <20240521171319.2629938-3-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240521171319.2629938-1-uk7b@foxmail.com> References: <20240521171319.2629938-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 3/5] lavc/vp9dsp: R-V V mc tap h v X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ygR8Ewi5BcOO From: sunyuechi C908 X60 vp9_avg_8tap_smooth_4h_8bpp_c : 13.0 11.2 vp9_avg_8tap_smooth_4h_8bpp_rvv_i32 : 5.0 4.2 vp9_avg_8tap_smooth_4v_8bpp_c : 13.7 12.5 vp9_avg_8tap_smooth_4v_8bpp_rvv_i32 : 5.0 4.2 vp9_avg_8tap_smooth_8h_8bpp_c : 49.5 42.2 vp9_avg_8tap_smooth_8h_8bpp_rvv_i32 : 9.2 8.5 vp9_avg_8tap_smooth_8v_8bpp_c : 66.5 45.0 vp9_avg_8tap_smooth_8v_8bpp_rvv_i32 : 9.5 8.5 vp9_avg_8tap_smooth_16h_8bpp_c : 192.7 166.5 vp9_avg_8tap_smooth_16h_8bpp_rvv_i32 : 21.2 18.7 vp9_avg_8tap_smooth_16v_8bpp_c : 192.2 175.7 vp9_avg_8tap_smooth_16v_8bpp_rvv_i32 : 21.5 19.0 vp9_avg_8tap_smooth_32h_8bpp_c : 780.2 663.7 vp9_avg_8tap_smooth_32h_8bpp_rvv_i32 : 83.5 60.0 vp9_avg_8tap_smooth_32v_8bpp_c : 770.5 689.2 vp9_avg_8tap_smooth_32v_8bpp_rvv_i32 : 67.2 60.0 vp9_avg_8tap_smooth_64h_8bpp_c : 3115.5 2647.2 vp9_avg_8tap_smooth_64h_8bpp_rvv_i32 : 283.5 119.2 vp9_avg_8tap_smooth_64v_8bpp_c : 3082.2 2729.0 vp9_avg_8tap_smooth_64v_8bpp_rvv_i32 : 305.2 119.0 vp9_put_8tap_smooth_4h_8bpp_c : 11.2 9.7 vp9_put_8tap_smooth_4h_8bpp_rvv_i32 : 4.2 4.0 vp9_put_8tap_smooth_4v_8bpp_c : 11.7 10.7 vp9_put_8tap_smooth_4v_8bpp_rvv_i32 : 4.2 4.0 vp9_put_8tap_smooth_8h_8bpp_c : 42.0 37.5 vp9_put_8tap_smooth_8h_8bpp_rvv_i32 : 8.5 7.7 vp9_put_8tap_smooth_8v_8bpp_c : 44.2 38.7 vp9_put_8tap_smooth_8v_8bpp_rvv_i32 : 8.5 7.7 vp9_put_8tap_smooth_16h_8bpp_c : 165.7 147.2 vp9_put_8tap_smooth_16h_8bpp_rvv_i32 : 19.5 17.5 vp9_put_8tap_smooth_16v_8bpp_c : 169.0 149.7 vp9_put_8tap_smooth_16v_8bpp_rvv_i32 : 19.7 17.5 vp9_put_8tap_smooth_32h_8bpp_c : 659.7 586.7 vp9_put_8tap_smooth_32h_8bpp_rvv_i32 : 64.2 57.2 vp9_put_8tap_smooth_32v_8bpp_c : 680.5 591.2 vp9_put_8tap_smooth_32v_8bpp_rvv_i32 : 64.2 57.2 vp9_put_8tap_smooth_64h_8bpp_c : 2681.5 2339.0 vp9_put_8tap_smooth_64h_8bpp_rvv_i32 : 255.5 114.2 vp9_put_8tap_smooth_64v_8bpp_c : 2709.7 2348.7 vp9_put_8tap_smooth_64v_8bpp_rvv_i32 : 255.5 114.0 --- libavcodec/riscv/vp9_mc_rvv.S | 243 +++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 72 ++++++---- libavcodec/riscv/vp9dsp_init.c | 38 +++++- 3 files changed, 328 insertions(+), 25 deletions(-) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index 739380d9a9..adba4afb90 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -36,6 +36,18 @@ .endif .endm +.macro vsetvlstatic16 len +.ifc \len,4 + vsetvli zero, zero, e16, mf2, ta, ma +.elseif \len == 8 + vsetvli zero, zero, e16, m1, ta, ma +.elseif \len == 16 + vsetvli zero, zero, e16, m2, ta, ma +.else + vsetvli zero, zero, e16, m4, ta, ma +.endif +.endm + .macro copy_avg len func ff_vp9_avg\len\()_rvv, zve32x csrwi vxrm, 0 @@ -92,10 +104,241 @@ func ff_\op\()_vp9_bilin_\len\()\type\()_rvv, zve32x endfunc .endm +const subpel_filters_regular + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte 0, 1, -5, 126, 8, -3, 1, 0 + .byte -1, 3, -10, 122, 18, -6, 2, 0 + .byte -1, 4, -13, 118, 27, -9, 3, -1 + .byte -1, 4, -16, 112, 37, -11, 4, -1 + .byte -1, 5, -18, 105, 48, -14, 4, -1 + .byte -1, 5, -19, 97, 58, -16, 5, -1 + .byte -1, 6, -19, 88, 68, -18, 5, -1 + .byte -1, 6, -19, 78, 78, -19, 6, -1 + .byte -1, 5, -18, 68, 88, -19, 6, -1 + .byte -1, 5, -16, 58, 97, -19, 5, -1 + .byte -1, 4, -14, 48, 105, -18, 5, -1 + .byte -1, 4, -11, 37, 112, -16, 4, -1 + .byte -1, 3, -9, 27, 118, -13, 4, -1 + .byte 0, 2, -6, 18, 122, -10, 3, -1 + .byte 0, 1, -3, 8, 126, -5, 1, 0 +subpel_filters_sharp: + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte -1, 3, -7, 127, 8, -3, 1, 0 + .byte -2, 5, -13, 125, 17, -6, 3, -1 + .byte -3, 7, -17, 121, 27, -10, 5, -2 + .byte -4, 9, -20, 115, 37, -13, 6, -2 + .byte -4, 10, -23, 108, 48, -16, 8, -3 + .byte -4, 10, -24, 100, 59, -19, 9, -3 + .byte -4, 11, -24, 90, 70, -21, 10, -4 + .byte -4, 11, -23, 80, 80, -23, 11, -4 + .byte -4, 10, -21, 70, 90, -24, 11, -4 + .byte -3, 9, -19, 59, 100, -24, 10, -4 + .byte -3, 8, -16, 48, 108, -23, 10, -4 + .byte -2, 6, -13, 37, 115, -20, 9, -4 + .byte -2, 5, -10, 27, 121, -17, 7, -3 + .byte -1, 3, -6, 17, 125, -13, 5, -2 + .byte 0, 1, -3, 8, 127, -7, 3, -1 +subpel_filters_smooth: + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte -3, -1, 32, 64, 38, 1, -3, 0 + .byte -2, -2, 29, 63, 41, 2, -3, 0 + .byte -2, -2, 26, 63, 43, 4, -4, 0 + .byte -2, -3, 24, 62, 46, 5, -4, 0 + .byte -2, -3, 21, 60, 49, 7, -4, 0 + .byte -1, -4, 18, 59, 51, 9, -4, 0 + .byte -1, -4, 16, 57, 53, 12, -4, -1 + .byte -1, -4, 14, 55, 55, 14, -4, -1 + .byte -1, -4, 12, 53, 57, 16, -4, -1 + .byte 0, -4, 9, 51, 59, 18, -4, -1 + .byte 0, -4, 7, 49, 60, 21, -3, -2 + .byte 0, -4, 5, 46, 62, 24, -3, -2 + .byte 0, -4, 4, 43, 63, 26, -2, -2 + .byte 0, -3, 2, 41, 63, 29, -2, -2 + .byte 0, -3, 1, 38, 64, 32, -1, -3 +endconst + +.macro epel_filter name type regtype + lla \regtype\()2, subpel_filters_\name + li \regtype\()1, 8 +.ifc \type,v + mul \regtype\()0, a6, \regtype\()1 +.else + mul \regtype\()0, a5, \regtype\()1 +.endif + add \regtype\()0, \regtype\()0, \regtype\()2 + .irp n,1,2,3,4,5,6 + lb \regtype\n, \n(\regtype\()0) + .endr +.ifc \regtype,t + lb a7, 7(\regtype\()0) +.else + lb s7, 7(\regtype\()0) +.endif + lb \regtype\()0, 0(\regtype\()0) +.endm + +.macro epel_load dst len op name type from_mem regtype + li a5, 64 +.ifc \from_mem, 1 + vle8.v v22, (a2) +.ifc \type,v + sub a2, a2, a3 + vle8.v v20, (a2) + sh1add a2, a3, a2 + vle8.v v24, (a2) + add a2, a2, a3 + vle8.v v26, (a2) + add a2, a2, a3 + vle8.v v28, (a2) + add a2, a2, a3 + vle8.v v30, (a2) +.else + addi a2, a2, -1 + vle8.v v20, (a2) + addi a2, a2, 2 + vle8.v v24, (a2) + addi a2, a2, 1 + vle8.v v26, (a2) + addi a2, a2, 1 + vle8.v v28, (a2) + addi a2, a2, 1 + vle8.v v30, (a2) +.endif + +.ifc \name,smooth + vwmulu.vx v16, v24, \regtype\()4 + vwmaccu.vx v16, \regtype\()2, v20 + vwmaccu.vx v16, \regtype\()5, v26 + vwmaccsu.vx v16, \regtype\()6, v28 +.else + vwmulu.vx v16, v28, \regtype\()6 + vwmaccsu.vx v16, \regtype\()2, v20 + vwmaccsu.vx v16, \regtype\()5, v26 +.endif + +.ifc \regtype,t + vwmaccsu.vx v16, a7, v30 +.else + vwmaccsu.vx v16, s7, v30 +.endif + +.ifc \type,v + .rept 6 + sub a2, a2, a3 + .endr + vle8.v v28, (a2) + sub a2, a2, a3 + vle8.v v26, (a2) + sh1add a2, a3, a2 + add a2, a2, a3 +.else + addi a2, a2, -6 + vle8.v v28, (a2) + addi a2, a2, -1 + vle8.v v26, (a2) + addi a2, a2, 3 +.endif + +.ifc \name,smooth + vwmaccsu.vx v16, \regtype\()1, v28 +.else + vwmaccu.vx v16, \regtype\()1, v28 + vwmulu.vx v28, v24, \regtype\()4 +.endif + vwmaccsu.vx v16, \regtype\()0, v26 + vwmulu.vx v20, v22, \regtype\()3 +.else +.ifc \name,smooth + vwmulu.vx v16, v8, \regtype\()4 + vwmaccu.vx v16, \regtype\()2, v4 + vwmaccu.vx v16, \regtype\()5, v10 + vwmaccsu.vx v16, \regtype\()6, v12 + vwmaccsu.vx v16, \regtype\()1, v2 +.else + vwmulu.vx v16, v2, \regtype\()1 + vwmaccu.vx v16, \regtype\()6, v12 + vwmaccsu.vx v16, \regtype\()5, v10 + vwmaccsu.vx v16, \regtype\()2, v4 + vwmulu.vx v28, v8, \regtype\()4 +.endif + vwmaccsu.vx v16, \regtype\()0, v0 + vwmulu.vx v20, v6, \regtype\()3 + +.ifc \regtype,t + vwmaccsu.vx v16, a7, v14 +.else + vwmaccsu.vx v16, s7, v14 +.endif + +.endif + vwadd.wx v16, v16, a5 + vsetvlstatic16 \len + +.ifc \name,smooth + vwadd.vv v24, v16, v20 +.else + vwadd.vv v24, v16, v28 + vwadd.wv v24, v24, v20 +.endif + vnsra.wi v24, v24, 7 + vmax.vx v24, v24, zero + vsetvlstatic8 \len, zero, 32, m2 + + vnclipu.wi \dst, v24, 0 +.ifc \op,avg + vle8.v v24, (a0) + vaaddu.vv \dst, \dst, v24 +.endif + +.endm + +.macro epel_load_inc dst len op name type from_mem regtype + epel_load \dst, \len, \op, \name, \type, \from_mem, \regtype + add a2, a2, a3 +.endm + +.macro epel len op name type vlen +func ff_\op\()_vp9_8tap_\name\()_\len\()\type\()_rvv\vlen\(), zve32x + epel_filter \name, \type, t +.if \vlen < 256 + vsetvlstatic8 \len, a5, 32, m2 +.else + vsetvlstatic8 \len, a5, 64, m2 +.endif +.ifc \op,avg + csrwi vxrm, 0 +.endif + +1: + addi a4, a4, -1 + epel_load v30, \len, \op, \name, \type, 1, t + vse8.v v30, (a0) +.if \len == 64 && \vlen < 256 + addi a0, a0, 32 + addi a2, a2, 32 + epel_load v30, \len, \op, \name, \type, 1, t + vse8.v v30, (a0) + addi a0, a0, -32 + addi a2, a2, -32 +.endif + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + .irp len, 64, 32, 16, 8, 4 copy_avg \len .irp op, put, avg bilin_h_v \len, \op, h, a5 bilin_h_v \len, \op, v, a6 + .irp name, regular, sharp, smooth + .irp type, h, v + epel \len, \op, \name, \type, 128 + epel \len, \op, \name, \type, 256 + .endr + .endr .endr .endr diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index 8fb326dae0..5fd64a1b8c 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -81,33 +81,39 @@ void ff_tm_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, void ff_tm_4x4_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); -#define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) \ -void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ +#define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx, min_vlen) \ +void ff_put_vp9_8tap_##type##_##SIZE##h_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_put_8tap_##type##_##SIZE##v_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_put_vp9_8tap_##type##_##SIZE##v_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_put_8tap_##type##_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_put_vp9_8tap_##type##_##SIZE##hv_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_vp9_8tap_##type##_##SIZE##h_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_8tap_##type##_##SIZE##v_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_vp9_8tap_##type##_##SIZE##v_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_8tap_##type##_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_vp9_8tap_##type##_##SIZE##hv_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); @@ -146,23 +152,41 @@ void ff_vp9_avg##SIZE##_rvv(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); -VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR); - -VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP); - -VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH); +VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR, 128); + +VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP, 128); + +VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH, 128); + +VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR, 256); + +VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP, 256); + +VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH, 256); VP9_BILINEAR_RISCV_RVV_FUNC(64); VP9_BILINEAR_RISCV_RVV_FUNC(32); diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 9606d8545f..314a1e5808 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -49,7 +49,8 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) # endif #if HAVE_RVV - if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I32) && ff_rv_vlen_least(128)) { + if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I32)) { + if (ff_rv_vlen_least(128)) { #define init_fpel(idx1, sz) \ dsp->mc[idx1][FILTER_8TAP_SMOOTH ][1][0][0] = ff_vp9_avg##sz##_rvv; \ @@ -85,7 +86,42 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) dsp->mc[4][FILTER_BILINEAR ][1][1][0] = ff_avg_vp9_bilin_4h_rvv; #undef init_fpel + +#define init_subpel1(idx1, idx2, idxh, idxv, sz, dir, type, vlen) \ + dsp->mc[idx1][FILTER_8TAP_SMOOTH ][idx2][idxh][idxv] = \ + ff_##type##_vp9_8tap_smooth_##sz##dir##_rvv##vlen; \ + dsp->mc[idx1][FILTER_8TAP_REGULAR][idx2][idxh][idxv] = \ + ff_##type##_vp9_8tap_regular_##sz##dir##_rvv##vlen; \ + dsp->mc[idx1][FILTER_8TAP_SHARP ][idx2][idxh][idxv] = \ + ff_##type##_vp9_8tap_sharp_##sz##dir##_rvv##vlen; + +#define init_subpel2(idx, idxh, idxv, dir, type, vlen) \ + init_subpel1(0, idx, idxh, idxv, 64, dir, type, vlen); \ + init_subpel1(1, idx, idxh, idxv, 32, dir, type, vlen); \ + init_subpel1(2, idx, idxh, idxv, 16, dir, type, vlen); \ + init_subpel1(3, idx, idxh, idxv, 8, dir, type, vlen); \ + init_subpel1(4, idx, idxh, idxv, 4, dir, type, vlen) + + init_subpel2(0, 1, 0, h, put, 128); + init_subpel2(1, 1, 0, h, avg, 128); + + if (flags & AV_CPU_FLAG_RVB_ADDR) { + init_subpel2(0, 0, 1, v, put, 128); + init_subpel2(1, 0, 1, v, avg, 128); + } + + } + if (ff_rv_vlen_least(256)) { + init_subpel2(0, 1, 0, h, put, 256); + init_subpel2(1, 1, 0, h, avg, 256); + + if (flags & AV_CPU_FLAG_RVB_ADDR) { + init_subpel2(0, 0, 1, v, put, 256); + init_subpel2(1, 0, 1, v, avg, 256); + } } + } + #endif #endif } From patchwork Tue May 21 17:13:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 49107 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:9214:b0:1af:fc2d:ff5a with SMTP id tl20csp127759pzb; Tue, 21 May 2024 10:14:50 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXx5I/Rr+0z6VZ/FuHvgmfZt6AJuG+PcDBvhBx/Ea9BL7AHX7j2HdLnv6zigKHUhvU/uHL912j1dJMkp3VnNXZHU8LIb8S7/B2UaQ== X-Google-Smtp-Source: AGHT+IF0p0GkziWzHQonJ5oMmMb+b8KMrM8AfWcN8n2hiPtu4NOBJm4ddnL3g0XvE5phWO1LHwuL X-Received: by 2002:a2e:9ac2:0:b0:2e0:298d:65ec with SMTP id 38308e7fff4ca-2e51fd45214mr282540511fa.17.1716311690061; Tue, 21 May 2024 10:14:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716311690; cv=none; d=google.com; s=arc-20160816; b=oFX+XTT5ETnBodYaccmbnv9SBzw61PO5haB9vepj3ElOCtWtC6DwULfUaDx57N7rW5 bE0SkzfLKwks48L5s+w9pVGQuTVfqqQtxS+sET08aJ8lx8ttyZogtzq4s0+JUA3cLJlV DtVnstZt5O6iQySkktaPYKBeQnalZH4MogpbCzgXCql4Qur0ZTheUt1FuXQ/AHVRPkfl RsJOpCN5pcUt584eJkklAQlxpqeMs6Cfl2TNUJlEvI1l17Zy9Gaaxz5Hgfg7eH+I1a1E xdZHrc9gvbOKZm2Qq7HSeMmpS83ytl9dqTUviTSeTLz8mnKcbnRlFuov7Zu+VN4VCeVC nV6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=byZ29F5/X7s2NosuE63EZ/Jp5wL5NXYBWKlUBdpRiTw=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=jn6EpxfXgdhi+ntT9c1MBYuEwojGE8npIZGFrmfSpnyORahXsM9O0BrKMtmLVtRx6b nNY77JlaWeEkGzcxFpNR8be6qtYA/ySPe6PZH5OzETqz1RAPIh4X/Zmy1u5H3p075klb +t+qX9B5l/IRlgeIabwxIl40qKXJeCtPc5uvh6ilx4oh0ZEPWQmtgGdDSTcKl9RRy00e mZ7PiqKkmaoqr9YDmBIz+NZAKTTFX821BlXOZzwEdk6EbdTHnPaQVZW0II7uKur4QXp1 JkvimTx99pV3jyCfJIWFpSiemLVutUKLExNu78EtbIMly/E2MKMx8MaiMMilfVEuF5Rr 4GIw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=aor1drxb; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17c2c712si1458017166b.938.2024.05.21.10.14.49; Tue, 21 May 2024 10:14:50 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=aor1drxb; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A289768D3C0; Tue, 21 May 2024 20:14:15 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-155.mail.qq.com (out203-205-221-155.mail.qq.com [203.205.221.155]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5D5A368D0A4 for ; Tue, 21 May 2024 20:14:05 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1716311636; bh=kwKeqirTVSpBgQ3f7DY/FukohOvZtEnXrykfz8mocds=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=aor1drxbKH+gssX4YQrmF5MGtQ0m4BdPsuZ2wjepZiIRf0Ezu4QeMK2fyNbQNcEHR 5r1ju43Gb22x7mUSO1yWwSWwa7PyasKwl2lhaeSVtv/qKFrE2IEulpGD7prANW/ziG s38etmDEC837r4qjiels1iFPjjrrDYU7ddl+fTaI= Received: from localhost.localdomain ([42.177.176.187]) by newxmesmtplogicsvrszb9-1.qq.com (NewEsmtp) with SMTP id 374AC612; Wed, 22 May 2024 01:13:52 +0800 X-QQ-mid: xmsmtpt1716311635t04agjpuu Message-ID: X-QQ-XMAILINFO: NDz66ktblfzJKu42naTa998anm/YXGdOcuQkPaCkpo4uXvuzxi90xURHXQqN53 5/iRTuAZBK7Wl3Xk+EVqpxV0b3xoOGBYNa6vmTTDT3hbNDtN4t4BsUJUwSM4MaUrQ5ewuCASerd4 8iwd8JlRf26Ntwd56Rkfh2qSmQ/4FQNReNIw7Eg+SBZLhYRIq7ldJ8KD6K+8WM5w5OaSE1za/J31 chSbT8EevFVbV6gxGyb3n33RXYGG2qTpWAlIZvnNxVK4wSo83GWY4czx1FgTohMWa/7Kx5yaBPzq t0kyzIB5SKZer9B/mLczYXK59bbMfk01wOolCIYMO1BYpAzpnyspSoFn7+I2p4qHDkqr9LG5SYrY yXsKMTEeBoy+B6+NSXIeCkgLTpiqKtnXHMw6Q8YpczkYMWxAaHfc8k3hxXkHYdLzgYDPk0lBNTBU SlKiRL2rYEXO5E82T39A/2YkklT0+Q8QjPqDT7Ii7aUGbBosnSMRdQDqKnVrxFedQIyTeZaGbVLS a7MCfH+AGmcVnoCHbQNygvIxY7p1E9ovLdE1meyRewHgyGEPSbA2vRxhPyZ78K341VXe0ht3Cgp2 deH5LJ8izwvVkODhyanGfgz2fNfQksK5tmFp9/LEi986D5CLoc1f0aP92T4ZvysvcuutQo8B7l54 jBkcUOudI6RIzKg7WvJ9oVLbTUqy4V6L7VVLv8OmhWaATnxr1Za0MwnC2WJWDHUUScmLKXUPtkx9 oUn8RWZ/oMu91Ygox/iIG9QM6euz5wjulShhy+pmm8/9wG5+gu6kAhjsZykMRdhn5lm7ud6qz30+ UgLPNb0c8dyfV8j2m5yEFkembUBqyZMIRpugRzqqEkAeb9UH6CDe2Gq2N8pKo+G4KgLhBnD1yZx6 0gJUCirPsv2Nlujp8/tuzvbkbqZ9uZqHu20Eyv39jwbMJazZCd7x1HA6IOJPYBjg== X-QQ-XMRINFO: MPJ6Tf5t3I/ycC2BItcBVIA= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Wed, 22 May 2024 01:13:18 +0800 X-OQ-MSGID: <20240521171319.2629938-4-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240521171319.2629938-1-uk7b@foxmail.com> References: <20240521171319.2629938-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 4/5] lavc/vp9dsp: R-V V mc bilin hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: GcGmFtvzCnv+ From: sunyuechi C908: vp9_avg_bilin_4hv_8bpp_c: 11.0 vp9_avg_bilin_4hv_8bpp_rvv_i64: 3.7 vp9_avg_bilin_8hv_8bpp_c: 38.7 vp9_avg_bilin_8hv_8bpp_rvv_i64: 7.2 vp9_avg_bilin_16hv_8bpp_c: 147.0 vp9_avg_bilin_16hv_8bpp_rvv_i64: 14.2 vp9_avg_bilin_32hv_8bpp_c: 574.5 vp9_avg_bilin_32hv_8bpp_rvv_i64: 42.7 vp9_avg_bilin_64hv_8bpp_c: 2311.5 vp9_avg_bilin_64hv_8bpp_rvv_i64: 201.7 vp9_put_bilin_4hv_8bpp_c: 10.0 vp9_put_bilin_4hv_8bpp_rvv_i64: 3.2 vp9_put_bilin_8hv_8bpp_c: 35.2 vp9_put_bilin_8hv_8bpp_rvv_i64: 6.5 vp9_put_bilin_16hv_8bpp_c: 133.7 vp9_put_bilin_16hv_8bpp_rvv_i64: 13.0 vp9_put_bilin_32hv_8bpp_c: 538.2 vp9_put_bilin_32hv_8bpp_rvv_i64: 39.7 vp9_put_bilin_64hv_8bpp_c: 2114.0 vp9_put_bilin_64hv_8bpp_rvv_i64: 153.7 --- libavcodec/riscv/vp9_mc_rvv.S | 34 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 10 ++++++++++ 2 files changed, 44 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index adba4afb90..d7db775df7 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -104,6 +104,39 @@ func ff_\op\()_vp9_bilin_\len\()\type\()_rvv, zve32x endfunc .endm +.macro bilin_hv len op +func ff_\op\()_vp9_bilin_\len\()hv_rvv, zve32x +.ifc \op,avg + csrwi vxrm, 0 +.endif + vsetvlstatic8 \len, t0, 64 + neg t1, a5 + neg t2, a6 + li t4, 8 + bilin_load v24, \len, put, h, a5 + add a2, a2, a3 +1: + addi a4, a4, -1 + bilin_load v4, \len, put, h, a5 + vwmulu.vx v16, v4, a6 + vwmaccsu.vx v16, t2, v24 + vwadd.wx v16, v16, t4 + vnsra.wi v16, v16, 4 + vadd.vv v0, v16, v24 +.ifc \op,avg + vle8.v v16, (a0) + vaaddu.vv v0, v0, v16 +.endif + vse8.v v0, (a0) + vmv.v.v v24, v4 + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + const subpel_filters_regular .byte 0, 0, 0, 128, 0, 0, 0, 0 .byte 0, 1, -5, 126, 8, -3, 1, 0 @@ -334,6 +367,7 @@ endfunc .irp op, put, avg bilin_h_v \len, \op, h, a5 bilin_h_v \len, \op, v, a6 + bilin_hv \len, \op .irp name, regular, sharp, smooth .irp type, h, v epel \len, \op, \name, \type, 128 diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 314a1e5808..be5369d506 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -84,6 +84,16 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) dsp->mc[4][FILTER_BILINEAR ][0][1][0] = ff_put_vp9_bilin_4h_rvv; dsp->mc[4][FILTER_BILINEAR ][1][0][1] = ff_avg_vp9_bilin_4v_rvv; dsp->mc[4][FILTER_BILINEAR ][1][1][0] = ff_avg_vp9_bilin_4h_rvv; + dsp->mc[0][FILTER_BILINEAR ][0][1][1] = ff_put_vp9_bilin_64hv_rvv; + dsp->mc[0][FILTER_BILINEAR ][1][1][1] = ff_avg_vp9_bilin_64hv_rvv; + dsp->mc[1][FILTER_BILINEAR ][0][1][1] = ff_put_vp9_bilin_32hv_rvv; + dsp->mc[1][FILTER_BILINEAR ][1][1][1] = ff_avg_vp9_bilin_32hv_rvv; + dsp->mc[2][FILTER_BILINEAR ][0][1][1] = ff_put_vp9_bilin_16hv_rvv; + dsp->mc[2][FILTER_BILINEAR ][1][1][1] = ff_avg_vp9_bilin_16hv_rvv; + dsp->mc[3][FILTER_BILINEAR ][0][1][1] = ff_put_vp9_bilin_8hv_rvv; + dsp->mc[3][FILTER_BILINEAR ][1][1][1] = ff_avg_vp9_bilin_8hv_rvv; + dsp->mc[4][FILTER_BILINEAR ][0][1][1] = ff_put_vp9_bilin_4hv_rvv; + dsp->mc[4][FILTER_BILINEAR ][1][1][1] = ff_avg_vp9_bilin_4hv_rvv; #undef init_fpel From patchwork Tue May 21 17:13:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 49108 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:9214:b0:1af:fc2d:ff5a with SMTP id tl20csp128316pzb; Tue, 21 May 2024 10:15:40 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXqwawFCukgh3F5+ccnGYaErbKp1TSLzM6+Eo9KQDb14vRpYX8NyL6xjIeAGhG9y/3Ae8RsPVQiTUtKue6P8jQGsumpcJdmtSCQ8A== X-Google-Smtp-Source: AGHT+IE1YVgtV5X6HuvWdiQLQrS4KWgliBZYBuJ92R2zb+3jt0H5Td5z1K8I6MjBw8zKwlMNpqa1 X-Received: by 2002:a05:6512:238a:b0:51a:c7d0:9e84 with SMTP id 2adb3069b0e04-5220fc7c57bmr37155954e87.12.1716311740021; Tue, 21 May 2024 10:15:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716311739; cv=none; d=google.com; s=arc-20160816; b=NOEqkXPJn2ikf3vIV51q/n7/RCR0jR+L+5aSI8Q08AvWsOfW1TZICJ6TWBHx2JNc01 /+mAcxW3Vcux9v9tpqCoLG4tKjSsiESA0Oh2cxfvU6U++UCvY1PYpM4QBCgreV82Qo/x aHIAgxayBXX0Ooyft3BZy3RM+opaqgsixlRMtKi3tSvifSNMuDE632CyfDDBMp6SQMAk MFJE/tj4IO8NJ9MZZo8NEyIPmMzglOdpczK7P8ROK/2ssPTkXtEWDUrW3kN+l3by3kjl zE8npzrX1Td1Vekx5jVmIC7KNUrr7Utg7py94lRPtEGVgkMn3Ui++6f1BvpCowIt/yaZ Gjfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=m7W0pKg4k7dAvDA8PTzatCMtnwlbB83w88QxrFquM9c=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=A+dCqLr9H/Q+jbFKLlesi89S/0qiZiKwltQkn1zRaqmY1gk9/oel2TMRySojB75d/5 Nu0qRWQd8qfdbx/w4yd2g9uFBVeQP4AibOFfwkTFuM6lajwWV/Lhv9T6aiNW4g4A/A0a Le6tmbEVSsKpg1eXmSGs/bq9ZsY9ioeneTnWDppwvSyPBKHIG1MlF4T3ClKaAQhWCO2q Q2WDIlHfwSA+dhQB+r81C2qsGgiJPiou+u5Pyna/iKvu9zB/Zh5nnxPp/25eLhLQUqcl mUt4gEfKbHjiIBh1qG4Cux/+JOEtyc9mwWRCVJKuH9LUyiZtCwcr7Y3sQCkLvAdDhZNZ vn/A==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=D5ZqZbbW; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a56168906si1147794066b.485.2024.05.21.10.15.38; Tue, 21 May 2024 10:15:39 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=D5ZqZbbW; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E963468CFCE; Tue, 21 May 2024 20:15:36 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from xmbghk7.mail.qq.com (unknown [119.28.226.17]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2633368D2FE for ; Tue, 21 May 2024 20:14:04 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1716311636; bh=QydTk7Oo2LJk9UkfURE9mbJ+ztCn/jPViFVhDDybxsw=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=D5ZqZbbWEx8jX++MbB1Kyrcmm6K01sbQs2gKMDrf83T2igZUeViDdcy653b9tA08N NceLQv2hUOqbAsRXuQiIuZO8svCMWFkNpzZOX+K/JDjFWhQEn06bOqpkrJjmj83IVT bwjzX/sJOryWt62pg6eIdaPdI5a3ixeskw2YmXnA= Received: from localhost.localdomain ([42.177.176.187]) by newxmesmtplogicsvrszb9-1.qq.com (NewEsmtp) with SMTP id 374AC612; Wed, 22 May 2024 01:13:52 +0800 X-QQ-mid: xmsmtpt1716311636tvxujz342 Message-ID: X-QQ-XMAILINFO: MBjwNRQMz5zUFAa1kEu9WuKu/nN5tzHkNJ09DuGaMXPNCo1GXbKd3BKQ/Dlkhh ouAMAvPKn8u7RYbzKyRokdwjB0ZXyim2zjXSirS9kzg+xJs3io/8hQzzKbcD3uMBe9hJS1EHFhWt +eHk6Lvx+Vi6bwhJ0oVBfc10K4pdMT9cofOxXhHYEEReV2Aa68yIABAO+/Vbsjh9fvKAdlxl2c5o hFniqqcCxEtnyKt1wimyfj4xO0O995ydBoEtpWEgU9/hGN0EA2cCiiehBrnoW2ojPbZhKsQbF2TM R44zm58ZPCOPM+MlRZ1yIxRYHswVkH6sd34RWA1rs1oQqzav3nT2T3Xiox2OeHy76R2Tm86Ac2bd 5u9LAyrsEzAhbS/t+tDtagcw6lVsnLAQgZ6ldizM1118XQmIiqZYBVI1YYsiahfQppT92a5V3iNn g0h4MlMg3Nw87f5QyXln1cLiq1MyAf7s2u+DPNVVzKy8F0eMC+pjsAhhU8JJzCcsGhA81/2LFydT 9kPgPt6x1sXePgRPovHTuPZ76QcF3BpNd0xmDz8N5eHRE6Yu5hgz9PibOOvWVwTvYelwZtEAF8Zy KT0EWUC5u5aWWiaRwAorx8QZ9H1Sa6IxYatrVyKFf9CVvrfFGbMQVtOso/wZbs+2cl5woefiEoFC gaEROcqsLyNqAnKv31KbcjeOG1xbCAeB0URjqcxo/44HnZ6JVW1WWjPQJ2gyIEvi6QSHhb5ohHQI Y73MM0Vt2FwsfH1c4xNWrlFraFQ2IOqrxMTTZVZGB5djhdOGcV7WOR5gIoVQa3SKVgAEBTNe4RSy p+dj2M9wc3cfaAMJtQBl47iFSV7+mAg+Zw+CaCdX1eKrIoSItzjePiVGmN9NImA/adqv8DuY/fWz f02ZJn+JQx76al/UCE43bjbAobqgd8NnEcgd+s0xk2Sil7br1Sveiz7HWHMeXOkg== X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Wed, 22 May 2024 01:13:19 +0800 X-OQ-MSGID: <20240521171319.2629938-5-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240521171319.2629938-1-uk7b@foxmail.com> References: <20240521171319.2629938-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 5/5] lavc/vp9dsp: R-V V mc tap hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: MW5lBhkL9Jyv From: sunyuechi C908 X60 vp9_avg_8tap_smooth_4hv_8bpp_c : 32.0 28.2 vp9_avg_8tap_smooth_4hv_8bpp_rvv_i32 : 15.0 13.2 vp9_avg_8tap_smooth_8hv_8bpp_c : 98.0 86.2 vp9_avg_8tap_smooth_8hv_8bpp_rvv_i32 : 23.7 21.0 vp9_avg_8tap_smooth_16hv_8bpp_c : 355.5 297.0 vp9_avg_8tap_smooth_16hv_8bpp_rvv_i32 : 62.7 41.2 vp9_avg_8tap_smooth_32hv_8bpp_c : 1273.0 1099.7 vp9_avg_8tap_smooth_32hv_8bpp_rvv_i32 : 133.7 119.2 vp9_avg_8tap_smooth_64hv_8bpp_c : 4933.0 4240.5 vp9_avg_8tap_smooth_64hv_8bpp_rvv_i32 : 506.7 227.0 vp9_put_8tap_smooth_4hv_8bpp_c : 30.2 27.0 vp9_put_8tap_smooth_4hv_8bpp_rvv_i32 : 14.5 12.7 vp9_put_8tap_smooth_8hv_8bpp_c : 91.2 81.2 vp9_put_8tap_smooth_8hv_8bpp_rvv_i32 : 22.7 20.2 vp9_put_8tap_smooth_16hv_8bpp_c : 329.2 277.7 vp9_put_8tap_smooth_16hv_8bpp_rvv_i32 : 44.7 40.0 vp9_put_8tap_smooth_32hv_8bpp_c : 1183.7 1022.7 vp9_put_8tap_smooth_32hv_8bpp_rvv_i32 : 130.7 116.5 vp9_put_8tap_smooth_64hv_8bpp_c : 4502.7 3954.5 vp9_put_8tap_smooth_64hv_8bpp_rvv_i32 : 496.0 224.7 --- libavcodec/riscv/vp9_mc_rvv.S | 75 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 8 ++++ 2 files changed, 83 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index d7db775df7..06c79b16f7 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -362,6 +362,77 @@ func ff_\op\()_vp9_8tap_\name\()_\len\()\type\()_rvv\vlen\(), zve32x endfunc .endm +#if __riscv_xlen == 64 +.macro epel_hv_once len name op + sub a2, a2, a3 + sub a2, a2, a3 + sub a2, a2, a3 + .irp n,0,2,4,6,8,10,12,14 + epel_load_inc v\n, \len, put, \name, h, 1, t + .endr + addi a4, a4, -1 +1: + addi a4, a4, -1 + epel_load v30, \len, \op, \name, v, 0, s + vse8.v v30, (a0) + vmv.v.v v0, v2 + vmv.v.v v2, v4 + vmv.v.v v4, v6 + vmv.v.v v6, v8 + vmv.v.v v8, v10 + vmv.v.v v10, v12 + vmv.v.v v12, v14 + epel_load v14, \len, put, \name, h, 1, t + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + epel_load v30, \len, \op, \name, v, 0, s + vse8.v v30, (a0) +.endm + +.macro epel_hv op name len vlen +func ff_\op\()_vp9_8tap_\name\()_\len\()hv_rvv\vlen\(), zve32x + addi sp, sp, -64 + .irp n,0,1,2,3,4,5,6,7 + sd s\n, \n\()<<3(sp) + .endr +.if \len == 64 && \vlen < 256 + addi sp, sp, -48 + .irp n,0,1,2,3,4,5 + sd a\n, \n\()<<3(sp) + .endr +.endif +.ifc \op,avg + csrwi vxrm, 0 +.endif + epel_filter \name, h, t + epel_filter \name, v, s +.if \vlen < 256 + vsetvlstatic8 \len, a6, 32, m2 +.else + vsetvlstatic8 \len, a6, 64, m2 +.endif + epel_hv_once \len, \name, \op +.if \len == 64 && \vlen < 256 + .irp n,0,1,2,3,4,5 + ld a\n, \n\()<<3(sp) + .endr + addi sp, sp, 48 + addi a0, a0, 32 + addi a2, a2, 32 + epel_filter \name, h, t + epel_hv_once \len, \name, \op +.endif + .irp n,0,1,2,3,4,5,6,7 + ld s\n, \n\()<<3(sp) + .endr + addi sp, sp, 64 + + ret +endfunc +.endm +#endif + .irp len, 64, 32, 16, 8, 4 copy_avg \len .irp op, put, avg @@ -373,6 +444,10 @@ endfunc epel \len, \op, \name, \type, 128 epel \len, \op, \name, \type, 256 .endr + #if __riscv_xlen == 64 + epel_hv \op, \name, \len, 128 + epel_hv \op, \name, \len, 256 + #endif .endr .endr .endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index be5369d506..887dba461f 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -118,6 +118,10 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) if (flags & AV_CPU_FLAG_RVB_ADDR) { init_subpel2(0, 0, 1, v, put, 128); init_subpel2(1, 0, 1, v, avg, 128); +# if __riscv_xlen == 64 + init_subpel2(0, 1, 1, hv, put, 128); + init_subpel2(1, 1, 1, hv, avg, 128); +# endif } } @@ -128,6 +132,10 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) if (flags & AV_CPU_FLAG_RVB_ADDR) { init_subpel2(0, 0, 1, v, put, 256); init_subpel2(1, 0, 1, v, avg, 256); +# if __riscv_xlen == 64 + init_subpel2(0, 1, 1, hv, put, 256); + init_subpel2(1, 1, 1, hv, avg, 256); +# endif } } }