From patchwork Tue May 7 07:36:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48608 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:6816:b0:1af:836d:81b3 with SMTP id wr22csp208590pzb; Tue, 7 May 2024 00:36:48 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXf1CrmK+SKwInGv9ghHFSiANovA+RcdlZO+YKngvM3qH8CkRTqqOw/fmXZ8ej+kznVGFqN1CaWyxw6pDfHhd0st2rXeOFy3Xs1cA== X-Google-Smtp-Source: AGHT+IGXFQaf0VFLGabvcgmZv8CaVYJ12jcv8jYorw3bM0pHfFG60kPC2Uxqvicm0gc6u8cg07IT X-Received: by 2002:a50:d7d9:0:b0:572:479e:9c23 with SMTP id m25-20020a50d7d9000000b00572479e9c23mr9227173edj.19.1715067408493; Tue, 07 May 2024 00:36:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715067408; cv=none; d=google.com; s=arc-20160816; b=xE64PkXpJn49849K0dCmhvqI5pHiAfgxOH2k14I6tT76nW0wvzymDyDV593OtgmbS9 4VnbSrqskEtUmJ8bIPb84jyPcrhpEAj0CHAG40q0t7MOiTZdj4EVXkqV38wX2fwbn0Pc A15jR9dFUt9Tr9SsUyOL8mtssqpa80IK+y/EC5q/tYa7fcTdd/d8uA9O92jju2WylS91 /LY5UUlo5O4hnNXkAO3SjsQ3vpsw6ITrtdGjraucTn3uFKkGht3jU7Hqy/EJTgiApN/b 9HEu1sFlQuZBgIhl51nBe5SozKzRfh7gdmow10SU5lEpIa0L7IhZn5BxLYt6gZSedsRp Oavg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=vXbR7KbsaeGl34do1DDVU+GtxqUtxyN6KzpDOuJsxGc=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=glG3lRI2H2CxATrTsKz8Zp37G1MANkjyWm3gBWLT6SHY18gXdHYT8CW/bqBdzRlBSD CcRieZSlN56YgbCLwOQ7X8TgWsf0py8D4HW0xmyZtsni/xc90OMX2MDtmn3oJKFzQnaO tj/uARdNIlYFEMYgBH+axwycfY7eMehUxU/CHdTWvPTQQmKEnE2NpzXqNzBBvs8Ms51s 4cZj+j+S0MEy+2aY0Y8mjvAF1bMA99XfFLsIpYjp9ZOU2+1qhMLPEmPSH32/1or1bMdY KGgeIJC2l9ogMUyV/hBPCHz7hxjlJTqrBVRJpVLFYi723087lDeAP1kUdfBK+lc/jZx9 /c5Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=GQvpmCYc; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id r13-20020a50d68d000000b00571b9aba848si3945684edi.566.2024.05.07.00.36.48; Tue, 07 May 2024 00:36:48 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=GQvpmCYc; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D09B668D723; Tue, 7 May 2024 10:36:43 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-240.mail.qq.com (out203-205-221-240.mail.qq.com [203.205.221.240]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D227568D65B for ; Tue, 7 May 2024 10:36:35 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715067386; bh=GD/jV8Hd4d9tpw3WfhOBxjIybmIvnSxg/6Cjt/J1eXE=; h=From:To:Cc:Subject:Date; b=GQvpmCYcqXoRNdNeRcl9nW6irg1HK6AIrcb+N4HOy2ua5I1j2w56jrLqnX2cSdAqW 8f4v10lyeyGTxBdy3FlT2G4iI2pA80940B8wfs1EtLCsk7w96EiN1FbqI3082w0b66 547n/v1JhCRv4u2wOaZmxIqtekv2ncaxP4DeAvUE= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id 9188D283; Tue, 07 May 2024 15:36:24 +0800 X-QQ-mid: xmsmtpt1715067384tbml8ve5r Message-ID: X-QQ-XMAILINFO: OMxr5LZhXS0MqOA1Db8nIG1EmmA3Tk9j2tauEtt8tZBq8EzkF5e7x6A2NzG5Lp O5gLM3jQ98Y9KKFHCmUkL0Czk0OpUwQf4RAhHfj199/wB8Sil36Dvs9egj0P77OqCt4hMDQEwIlA QTSnh44y8ErpwNqNag1e0fm6R76q3XiiOFxvLDWXP81A1WW0oZlqJfEP3BN7krDij14qW3Jc/C9W 4ExfqIJTsysDusG6RTYsvVb0nEvpVNbF1JTP0AQBRmLxzNK9/ZZPXxS2A4fVGC7IQ8jluWtLOqgf ajlitE9xgJ2H3sjEDrgGBHrmViUIDQ/Rte933u5kChOgc+sLchSDLPPHEH6CwTQ4zXdhZQDuK+2d PEeDMkkPWW6q0EDFn9dWcBHuTl/aOgfZV5NSqiN2XKmXQrIQFt76Hag7nmYgIlYcbXHXxuB87rsP qdlwWetfKT2VTnjKaIQQC82r5Hnq89HkH070LtdDaiX2XtCPNVNkSZjIUBqPV1RB6U1B36iolpTs 1B/lCM4rFTXxYKe3HE8piP+74bhecNzMdnBJoefOfTPwJe6IrYGiq5jC2MOvVh9FGgFUmWEsONi8 QjbxoEDWjVKDb0GGh3iNYtae0Doe2PZQMCdmWm7tajGxtzYZLU7uEeZv3GaKP1cfzCYKTdVo6frD dznyZn65flf8rNRis7u/uxvWodHr0zyPxIY+LHM4SrKGSWQK+9tPwKcUsuB1GQAc32kq/2tLMS+t 2uxwvrIhRRB6Z/6NldvsNKX82yruOQCTIZ3C6Qvw4mHLe/buXtfYTKEBwmvwapahCYEebpYpbRc2 gUfMbua4AoCaC7P/xnqrtPya1fH04KaopgIUSnlR1cCBWQ2PnH7/55NiTU/CV9IFrv11FYi7NHnw HVEhrCTvMBxINsiV4jBN+mIJ7qcckYvw== X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 7 May 2024 15:36:05 +0800 X-OQ-MSGID: <20240507073613.2871668-1-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 1/9] lavc/vp9dsp: R-V ipred vert X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: yzqrNjmwMA2o From: sunyuechi C908: vp9_vert_8x8_8bpp_c: 22.0 vp9_vert_8x8_8bpp_rvi: 15.7 vp9_vert_16x16_8bpp_c: 71.2 vp9_vert_16x16_8bpp_rvi: 39.0 vp9_vert_32x32_8bpp_c: 300.2 vp9_vert_32x32_8bpp_rvi: 135.2 --- libavcodec/riscv/Makefile | 1 + libavcodec/riscv/vp9_intra_rvi.S | 61 ++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 6 ++++ libavcodec/riscv/vp9dsp_init.c | 15 ++++++-- 4 files changed, 80 insertions(+), 3 deletions(-) create mode 100644 libavcodec/riscv/vp9_intra_rvi.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 050c08ee61..65dd0d656a 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -63,6 +63,7 @@ RVV-OBJS-$(CONFIG_VC1DSP) += riscv/vc1dsp_rvv.o OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_init.o RVV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvv.o OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9dsp_init.o +RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o diff --git a/libavcodec/riscv/vp9_intra_rvi.S b/libavcodec/riscv/vp9_intra_rvi.S new file mode 100644 index 0000000000..617f9f55a2 --- /dev/null +++ b/libavcodec/riscv/vp9_intra_rvi.S @@ -0,0 +1,61 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/riscv/asm.S" + +#if __riscv_xlen >= 64 +func ff_v_32x32_rvi + ld t0, (a3) + ld t1, 8(a3) + ld t2, 16(a3) + ld t3, 24(a3) + .rept 32 + sd t0, (a0) + sd t1, 8(a0) + sd t2, 16(a0) + sd t3, 24(a0) + add a0, a0, a1 + .endr + + ret +endfunc + +func ff_v_16x16_rvi + ld t0, (a3) + ld t1, 8(a3) + .rept 16 + sd t0, (a0) + sd t1, 8(a0) + add a0, a0, a1 + .endr + + ret +endfunc + +func ff_v_8x8_rvi + ld t0, (a3) + .rept 8 + sd t0, (a0) + add a0, a0, a1 + .endr + + ret +endfunc +#endif diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index 25047ed507..f8bc6563a5 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -60,6 +60,12 @@ void ff_dc_129_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); void ff_dc_129_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_v_32x32_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_v_16x16_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_v_8x8_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); #define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) \ void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 69ab39004c..d249dd71b2 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -24,11 +24,19 @@ #include "libavcodec/vp9dsp.h" #include "vp9dsp.h" -static av_cold void vp9dsp_intrapred_init_rvv(VP9DSPContext *dsp, int bpp) +static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) { - #if HAVE_RVV + #if HAVE_RV int flags = av_get_cpu_flags(); + if (bpp == 8 && flags & AV_CPU_FLAG_RVI) { +# if __riscv_xlen >= 64 + dsp->intra_pred[TX_32X32][VERT_PRED] = ff_v_32x32_rvi; + dsp->intra_pred[TX_16X16][VERT_PRED] = ff_v_16x16_rvi; + dsp->intra_pred[TX_8X8][VERT_PRED] = ff_v_8x8_rvi; +# endif + } + #if HAVE_RVV if (bpp == 8 && flags & AV_CPU_FLAG_RVV_I64 && ff_get_rv_vlenb() >= 16) { dsp->intra_pred[TX_8X8][DC_PRED] = ff_dc_8x8_rvv; dsp->intra_pred[TX_8X8][LEFT_DC_PRED] = ff_dc_left_8x8_rvv; @@ -53,9 +61,10 @@ static av_cold void vp9dsp_intrapred_init_rvv(VP9DSPContext *dsp, int bpp) dsp->intra_pred[TX_16X16][TOP_DC_PRED] = ff_dc_top_16x16_rvv; } #endif + #endif } av_cold void ff_vp9dsp_init_riscv(VP9DSPContext *dsp, int bpp, int bitexact) { - vp9dsp_intrapred_init_rvv(dsp, bpp); + vp9dsp_intrapred_init_riscv(dsp, bpp); } From patchwork Tue May 7 07:36:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48609 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:6816:b0:1af:836d:81b3 with SMTP id wr22csp208649pzb; Tue, 7 May 2024 00:37:00 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWVNYnjV0q0BRZnICAaO3yUQqrY3pNW23BwdClE5o/ku6OPre9x2gx0hG1fOq3aEciOqjmd6YCqha5k0ovir+QtXSuHPLXtgfB5FQ== X-Google-Smtp-Source: AGHT+IGSUgPKyjXKsKgvkE8baU3P3EY/BVvg7gBS4yUYF7GhjfSolKV88SOSsiApOr7RRPv3ERvq X-Received: by 2002:ac2:46f9:0:b0:51b:6366:3459 with SMTP id q25-20020ac246f9000000b0051b63663459mr10249000lfo.67.1715067419875; Tue, 07 May 2024 00:36:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715067419; cv=none; d=google.com; s=arc-20160816; b=aI88pIMnRzDwpOahvtp/ixhGz3g9w2Fw4oXakr7BpjLFP4CWhfuOuQ5YIw2VQNQExD jhTP8LlLD5PedwZYHsFL+ciPF1c9DYOY74c0zxzD/x2F46XlO7TcnVEHovcIlF3gPlLV GPTnF9XuRXQCuDMWXifEO71DiFyw+qj6DVRY43taQL1QMGSYLzd/swmOQfC7s9KImPls dk/AHNp6Gb3e9X6pqtb2N2hT0ypOEjM7AeNYhhrdvPBs5HcMUlVTb85c9VHoeBKSYrCL LyHZDrWK+TNrT6MXWIeUxLseXH7Uf8iMCOCS99QXPTmHgGjin6iAyTZ7z44pmk/wpdjd KzqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=M6JP+48VRW18QTkABFaofnps3hoVR+XDohJoZA67UJw=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=R+0tUkHxfkRWV2WnwwOuax81OuONxEcwDujVgxEQRBu9ViNmGjydIg/4SUiiVbb+xv eCbcqlUMvxIlhsTxPglZNpXRWLyAbURzvxAG0UKH8LL+XjU+OcTM6eeGhm/VLE9svPXx kqj7YV2GrfqRgjTbVwArQixxkkLx6MraMKRd2L8Pabs8khh1+Pp3HTotBQ8DHp254R4M R66f6rGE4dURwar55KsToeyeu5X/Ffc4J5C6BAij2XhHipru5X//E7bxWGqDhKdH3A0o +XIkZao05l9camvgnfLwxXScB5vk9khb9L2rLOD70Y9V7OKLLGNxKSxUnFKt3zaNMxUp hhpQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=oOnqbN8D; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id qw6-20020a1709066a0600b00a59a75d715bsi4028145ejc.59.2024.05.07.00.36.59; Tue, 07 May 2024 00:36:59 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=oOnqbN8D; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 032EB68D724; Tue, 7 May 2024 10:36:45 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-245.mail.qq.com (out203-205-221-245.mail.qq.com [203.205.221.245]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E231E68D672 for ; Tue, 7 May 2024 10:36:35 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715067386; bh=rfYmEsIn3a3xbYUqW3g9tyn7FhyMjzOzvVsi+8o8ghA=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=oOnqbN8DBxAI5rvWvRfwgQ+o81wnbqGI5+46loyriLY28o7V+Ozdztqgok4cEsVNc FwFr1sBNX0AyWyZBLdRUr4IxMfX6h+fqwoiWMl9PySXWvWlIwEzJvO8w++v+wNxLEh GbNCapWTbseanDNoAgXsdA1pWgY3aUlBKs9V1Hq4= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id 9188D283; Tue, 07 May 2024 15:36:24 +0800 X-QQ-mid: xmsmtpt1715067385tdpr67ywb Message-ID: X-QQ-XMAILINFO: NESW2WbOJFGIAohpK8iHD7Iu76UzytwoJ65pljeQpvZUVrVeb0szAvbAMLd5w3 WbsL+wshaF7ezvnXR0pebnb4V52fnQMoqee9VHJoLX5idq9+NIyO18QIQdn9EP0ucUfzeb/WwE+8 9CimzM723Z2kjvurOd7it83otq+YjfkVhnoTgva/PEyh5RbMzUCZUIpzIO+ugMDXpiOA43b5iU1x vip3D4lPH86HHjJE9HuurOLGZ3aeLcMHhxT/uEQZPDww04mh6WXCvvyqLJwqQeKzLfnyET5ZvfrZ jrTLjJoYAFufQHGE1xoeQqfAFs9qRHRhGImlmdsWgnmFD5uXlcmVo8Ef4lLxAROjrjPy7UQyL0E3 onERYNw5121Ou/yv7GA/r21haZgozIaOkzlKHZG7d7dzK1MZy4NC3cRK93fbUwi3TbAvEkTNWpG3 FPmJGcx922GNAgwb5V5+a2QYwK66zych/R9TPL4Ujs+x6XVrGSERNFbVtg+nM3LL56+fzDxV10b1 IM1kM/DXvm07L40nR7Y1nj83BMg+jguAJ1PVVOC+PHAlJS/UBzlSH9c6ja7T2jXZCtsubDLcpya1 0T+3IrrSElk3ZPuA9JeBNEv0Wgtl9Z87FnPW5xSx2eacHtYtZxP//DQZvvRWpgRNrQoVq1oRTpgU BQSl5HZQNHFZ/5ZElyfP+2ExUoscw/DnCHlBInVuKNVgZrRIXPqzbQePZpOPXySyALQGfbMCBaZ3 NkUYSisV+mEKfCfGCZJw9zEnxTRyk5wePWX1hXuWXA9r7h6yS5JTce5o61b1/bkvTV0i7h0blDNJ ovdiLJyBjDIcftOVS+if2HKWTM1GnrJ6HsJKFUmiDlQ9ts99WtSpeDvsI7O8HcNR50J91nyKyZyE lpkrQfEJb1WEml4MUeF90ZnmeJ7/M40snAHa6wEVpuhvOUjoIU6xk8cvVMRKNpIOoroUPQAqWzu4 48Qa7zP6A= X-QQ-XMRINFO: OD9hHCdaPRBwq3WW+NvGbIU= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 7 May 2024 15:36:06 +0800 X-OQ-MSGID: <20240507073613.2871668-2-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240507073613.2871668-1-uk7b@foxmail.com> References: <20240507073613.2871668-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 2/9] lavc/vp9dsp: R-V mc copy X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: knBqXV+YG1Oc From: sunyuechi C908: vp9_put4_8bpp_c: 0.7 vp9_put4_8bpp_rvi: 0.5 vp9_put8_8bpp_c: 2.5 vp9_put8_8bpp_rvi: 0.5 vp9_put16_8bpp_c: 16.7 vp9_put16_8bpp_rvi: 1.5 vp9_put32_8bpp_c: 37.2 vp9_put32_8bpp_rvi: 5.7 vp9_put64_8bpp_c: 107.5 vp9_put64_8bpp_rvi: 21.7 --- libavcodec/riscv/Makefile | 3 +- libavcodec/riscv/vp9_mc_rvi.S | 105 +++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 3 + libavcodec/riscv/vp9dsp_init.c | 25 ++++++++ 4 files changed, 135 insertions(+), 1 deletion(-) create mode 100644 libavcodec/riscv/vp9_mc_rvi.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 65dd0d656a..5846861bac 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -63,7 +63,8 @@ RVV-OBJS-$(CONFIG_VC1DSP) += riscv/vc1dsp_rvv.o OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_init.o RVV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvv.o OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9dsp_init.o -RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o +RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o \ + riscv/vp9_mc_rvi.o RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o diff --git a/libavcodec/riscv/vp9_mc_rvi.S b/libavcodec/riscv/vp9_mc_rvi.S new file mode 100644 index 0000000000..0db14e83c7 --- /dev/null +++ b/libavcodec/riscv/vp9_mc_rvi.S @@ -0,0 +1,105 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/riscv/asm.S" + +#if __riscv_xlen >= 64 +func ff_copy64_rvi +1: + addi a4, a4, -1 + ld t0, (a2) + ld t1, 8(a2) + ld t2, 16(a2) + ld t3, 24(a2) + ld t4, 32(a2) + ld t5, 40(a2) + ld t6, 48(a2) + ld a7, 56(a2) + sd t0, (a0) + sd t1, 8(a0) + sd t2, 16(a0) + sd t3, 24(a0) + sd t4, 32(a0) + sd t5, 40(a0) + sd t6, 48(a0) + sd a7, 56(a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc + +func ff_copy32_rvi +1: + addi a4, a4, -1 + ld t0, (a2) + ld t1, 8(a2) + ld t2, 16(a2) + ld t3, 24(a2) + sd t0, (a0) + sd t1, 8(a0) + sd t2, 16(a0) + sd t3, 24(a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc + +func ff_copy16_rvi +1: + addi a4, a4, -1 + ld t0, (a2) + ld t1, 8(a2) + sd t0, (a0) + sd t1, 8(a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc + +func ff_copy8_rvi +1: + addi a4, a4, -1 + ld t0, (a2) + sd t0, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +#endif + +func ff_copy4_rvi +1: + addi a4, a4, -1 + lw t0, (a2) + sw t0, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index f8bc6563a5..b8ff282f8a 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -167,6 +167,9 @@ void ff_copy##SIZE##_rvi(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); +VP9_COPY_RISCV_RVI_FUNC(64); +VP9_COPY_RISCV_RVI_FUNC(32); +VP9_COPY_RISCV_RVI_FUNC(16); VP9_COPY_RISCV_RVI_FUNC(8); VP9_COPY_RISCV_RVI_FUNC(4); diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index d249dd71b2..c10f8bbe41 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -64,7 +64,32 @@ static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) #endif } +static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) +{ +#if HAVE_RV + int flags = av_get_cpu_flags(); + + if (bpp == 8 && flags & AV_CPU_FLAG_RVI) { + +#define init_fpel(idx1, sz) \ + dsp->mc[idx1][FILTER_8TAP_SMOOTH ][0][0][0] = ff_copy##sz##_rvi; \ + dsp->mc[idx1][FILTER_8TAP_REGULAR][0][0][0] = ff_copy##sz##_rvi; \ + dsp->mc[idx1][FILTER_8TAP_SHARP ][0][0][0] = ff_copy##sz##_rvi; \ + dsp->mc[idx1][FILTER_BILINEAR ][0][0][0] = ff_copy##sz##_rvi + + init_fpel(0, 64); + init_fpel(1, 32); + init_fpel(2, 16); + init_fpel(3, 8); + init_fpel(4, 4); + +#undef init_fpel + } +#endif +} + av_cold void ff_vp9dsp_init_riscv(VP9DSPContext *dsp, int bpp, int bitexact) { vp9dsp_intrapred_init_riscv(dsp, bpp); + vp9dsp_mc_init_riscv(dsp, bpp); } From patchwork Tue May 7 07:36:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48610 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:6816:b0:1af:836d:81b3 with SMTP id wr22csp208698pzb; Tue, 7 May 2024 00:37:09 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVIbZuwImj3l0CvAdfo/r1ifVfDHvZ6K0iSQ/C84UM6T0ARdWHByeYPWv5kqyk9GRoBm88DU8q/KokeHcNmEuUdwYsfo7y6Va3usQ== X-Google-Smtp-Source: AGHT+IHO4Rnz3D+jpatX/g9eCn8Ql2g1+F3lFssgzHY9+kAHF84wH46y2pgSQjYEAhjXuhiHjJE0 X-Received: by 2002:a17:907:2da9:b0:a59:c8bf:1269 with SMTP id gt41-20020a1709072da900b00a59c8bf1269mr4639640ejc.37.1715067429242; Tue, 07 May 2024 00:37:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715067429; cv=none; d=google.com; s=arc-20160816; b=yLl+IOPwnP29OJMJyBj6aNbRwjunu+3ciObKklQrQ0+cFrCofuGsodxTwnZUqwHRD7 7wxTvDryJb7jjkpD7F2vAKz3ITQb8JvjHQSdvZhxxP/AdU3ai59JZae62eE3PssLA34S RwieZiwbJ/+2JvTvG1nKqBcM4YROMK2fbkFMGiVhV1pIFTvkUqI2Nc9q5XdMwu7S7NwP kC6r30zQdR0HRrCBvRA7yWJYi6ejQkPdV0TF+MR6HiQ2QUXxvXu8+cp5SH91PPI7oFcd ob20NXtFDGGaVKJjihPvb6o65OWpZx075thJ6B161JXfDYW1fHSUvLwgJSAyg0F86ddT Rw0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=yHgOD/RkBhlV5Q8JA3lE78Ac/Stzngt0AvuzspzD6kA=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=sGO5xQD0p3rzN8MkBERNKv/mjhMNnfeSxj5X7dEMCh2s21+wAqxRRs/IBKAqwLcO4m efGSl4ZpVB4OagwyvaoSFcZYWysP5u18mEWg4Ps+BBCmxqd4Xj7bG7kXDDPYTlWrSwMT urn36HeCVcTvWapwzDMjH83MD1fIFoZY28n+CJqXhCxEbFTmU0IoE6dTzrtNwMnTSKJP cIcm9Zhd7x5SMIf/YJFecAr0YcQChZ59/N94fKonCubvYCSloIIuuB4cdC+QCqSLlmBN 0RmYacop2fdax0/m+FwSdMj41lYx6yb1lVt2CzCYTmoDw1LCTMgoyUocnlhzKP+gaH5x WTmg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=l63ZR9W9; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id kq1-20020a170906abc100b00a58765be16fsi5584032ejb.918.2024.05.07.00.37.08; Tue, 07 May 2024 00:37:09 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=l63ZR9W9; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EF05E68D72E; Tue, 7 May 2024 10:36:45 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-192.mail.qq.com (out203-205-221-192.mail.qq.com [203.205.221.192]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E372168D672 for ; Tue, 7 May 2024 10:36:37 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715067388; bh=F5+UkpW8pQqQS1RzKvzIBgMF4o/GslDPM5vosCpD6eY=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=l63ZR9W9RVqAYvbcsph6BgD0crt7SKa59ALccIJuO+66nB5AphPS3NYNB4/9mID60 g9X8QDqW1YHXIAAwufbLUy/pRu13ejlqWzUvRKJHOACq7uuax3W+jUlnKovMtkiGm/ lQs0EhkLVXEjbKAdcLpp/inxSXrXNL35fCnIOW3c= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id 9188D283; Tue, 07 May 2024 15:36:24 +0800 X-QQ-mid: xmsmtpt1715067386tykmv3dgv Message-ID: X-QQ-XMAILINFO: MmPNY57tR1XnycgwNRWbJTxCBVhsa/4BeIc28psVZ+eUc3GbPxubrO67ztMzOU AWuSWlDyNGzKcQ+btvHkM+NKbIoV80CWWETBRIb8r4CVt8LsAbfkxyEUTEAXZrgGLkTD/bkrY1XQ W4ueVgQldIbKlSsuVgDewXlkwnnZlXgt57iCvTZpSRjyzx1B5j+SE0/wnW1kVqu7cJBH4FPuCgjj U97zUCdzZvv4WSaHFZCHkxC26yhbI9t0hWgKUQh1TXLWt90+obBj0cRtVgOfijN+RvMlubXdRUo4 PZ0grnkHZXLTVZp1hokC2wYqycuXYOXrVysAV4WStjE/0uF1EF97fRsQm6d956VfrNSyHtHxOWjo qJNYF5A64bcjJPVmgT3KBKxi+dGXhFLCfDAPOrNXSDsnrlOLC/mYYNUmGT8I1kY6aD6h+Wj9sv6O kd0ynf4hzvODt9ZZGFR7fuNPauoK8zeaGpuVgRZd4RAXPUtcBJkgM2wRorum5CFbcupQ+zLzfGQr 3CnDkhznsSYVEzituXnBG0zNngVeldmLGRDejcwhyTlHJIancKqyZfTpQY3zaXWMSsXOS6f3KX6m HaIgCsIuIuuy83L1auWTHindJUo/Ci3LQWm91JuyOhtDWk+B4vBdjk9f7YZ86fIjiHj1g5AYcpji DvuiXNq8DtLLofNLFq2aWSyEtXmPkJis5/utMm8wD3xZSVZxeDisTIrdOGEh2qtgXrGuZlzeD6MD no7mddh7ffUZbhWM9CrGyiZJSaB5AEAAMGn1t3R1U2bL4llaAs7ke+PyROB3Gu0VavgHXideaefF F0aso5rN4jGoTSoS6kg/PvOb0milnS0ZHjaDB2xV1bwlma1kQBRuMjAl3yL2llY6fOLP+NPsiBsp GxUwNA3r7rw19OPUNyvh3x7NUxYoMNPA== X-QQ-XMRINFO: MSVp+SPm3vtS1Vd6Y4Mggwc= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 7 May 2024 15:36:07 +0800 X-OQ-MSGID: <20240507073613.2871668-3-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240507073613.2871668-1-uk7b@foxmail.com> References: <20240507073613.2871668-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 3/9] lavc/vp9dsp: R-V V ipred hor X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: EojwquPDEwX7 From: sunyuechi C908: vp9_hor_8x8_8bpp_c: 74.7 vp9_hor_8x8_8bpp_rvv_i32: 35.7 vp9_hor_16x16_8bpp_c: 175.5 vp9_hor_16x16_8bpp_rvv_i32: 80.2 vp9_hor_32x32_8bpp_c: 510.2 vp9_hor_32x32_8bpp_rvv_i32: 264.0 --- libavcodec/riscv/vp9_intra_rvv.S | 56 ++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 6 ++++ libavcodec/riscv/vp9dsp_init.c | 3 ++ 3 files changed, 65 insertions(+) diff --git a/libavcodec/riscv/vp9_intra_rvv.S b/libavcodec/riscv/vp9_intra_rvv.S index db9774c263..dd9bc036e7 100644 --- a/libavcodec/riscv/vp9_intra_rvv.S +++ b/libavcodec/riscv/vp9_intra_rvv.S @@ -113,3 +113,59 @@ func_dc dc_left 8 left 3 0 zve64x func_dc dc_top 32 top 5 1 zve32x func_dc dc_top 16 top 4 1 zve32x func_dc dc_top 8 top 3 0 zve64x + +func ff_h_32x32_rvv, zve32x + li t0, 32 + addi a2, a2, 31 + vsetvli zero, t0, e8, m2, ta, ma + + .rept 2 + .irp n 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 + lbu t1, (a2) + addi a2, a2, -1 + vmv.v.x v\n, t1 + .endr + .irp n 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + .endr + + ret +endfunc + +func ff_h_16x16_rvv, zve32x + addi a2, a2, 15 + vsetivli zero, 16, e8, m1, ta, ma + + .irp n 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 + lbu t1, (a2) + addi a2, a2, -1 + vmv.v.x v\n, t1 + .endr + .irp n 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vse8.v v23, (a0) + + ret +endfunc + +func ff_h_8x8_rvv, zve32x + addi a2, a2, 7 + vsetivli zero, 8, e8, mf2, ta, ma + + .irp n 8, 9, 10, 11, 12, 13, 14, 15 + lbu t1, (a2) + addi a2, a2, -1 + vmv.v.x v\n, t1 + .endr + .irp n 8, 9, 10, 11, 12, 13, 14 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vse8.v v15, (a0) + + ret +endfunc diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index b8ff282f8a..0ad961c7e0 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -66,6 +66,12 @@ void ff_v_16x16_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); void ff_v_8x8_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_h_32x32_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_h_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_h_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); #define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) \ void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index c10f8bbe41..7816b13fe0 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -59,6 +59,9 @@ static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) dsp->intra_pred[TX_16X16][DC_129_PRED] = ff_dc_129_16x16_rvv; dsp->intra_pred[TX_32X32][TOP_DC_PRED] = ff_dc_top_32x32_rvv; dsp->intra_pred[TX_16X16][TOP_DC_PRED] = ff_dc_top_16x16_rvv; + dsp->intra_pred[TX_32X32][HOR_PRED] = ff_h_32x32_rvv; + dsp->intra_pred[TX_16X16][HOR_PRED] = ff_h_16x16_rvv; + dsp->intra_pred[TX_8X8][HOR_PRED] = ff_h_8x8_rvv; } #endif #endif From patchwork Tue May 7 07:36:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48611 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:6816:b0:1af:836d:81b3 with SMTP id wr22csp208771pzb; Tue, 7 May 2024 00:37:19 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUS73F5eb0xKVETmZF6N1M2k6N07Dg1yUCkaqhRB6+KQutSP2Q0qcrywXF70RDzlaembceLWkEC24kIUz2zSZP0i5OcxD3grvyFPA== X-Google-Smtp-Source: AGHT+IETLfPplddsdykr1UX6CTiHLP0bzWD955E/q90W78wfMBTg19wQssxv0Np0TU2d76HrUUww X-Received: by 2002:a19:ca4d:0:b0:513:ec32:aa89 with SMTP id h13-20020a19ca4d000000b00513ec32aa89mr8354217lfj.2.1715067438880; Tue, 07 May 2024 00:37:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715067438; cv=none; d=google.com; s=arc-20160816; b=JlO02RkgTkkSG+9kgtcYfglqyY4iaQAiAwrHdFd/dM6xzG+aZr+IUS94nDp/XSHn83 Phb37w2VVkFUFuLLpqH7zRyWZ8LL0gT45QMs3m8bvMHcpPY43lueS+j/h58DDu0Hgbj5 T+15eW8tAHTP1keu+2rumQn7jQdnuGr2u4o++/3eKnmc12UPMy2hoyG7BWRHvMOfvKC6 rZgyFpOQD2lVlo/yMTW7J5Fx4hu3MklYGj2Iz3SjhHDecmI7UnrS22AyTSf1gDHR1A9r jTeHfsHP74Lxgg8ybtE5KNciTpyW55+62/R7t5XQsK6BqAmB9gABN6DsFs3eIW0PibT/ O3tQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=hXxk/+Z1IbJtVFfbHOHZP40enPYKD1CimmrNatnJfeI=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=VaFLQe45F1o+nbSMGt6jP9UQl8firhAn1nmchL1ZGQGQ4zovXw0K/Hs4ohL84vgT7i j1TLyWldeZo9enqGXmjQ/Q8gbqzGEsq6nAWvgzcgVBBsU7VyWZxwOPvKGHTUEoB5bMim 0IH6kLlPfDgxrTmNTPSKnUrWXYqQJ0naQ5X3TaH1CKKZM5WT55NLu7h8/B3n2wiZMtgu Uki45RJ1LlUF/P8umM90YJbXBK++gccibUMxTn9TeTQfsGU4puazKc8pZ02KR8zPpN3R v+j9Gx2zUwKqoG4cMWp4GuqjTBUmWqZEu9JA9PB6qi8uPs8go+ecw7VyrZoVKLmIswjL dgcg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="xcQr5u9/"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id cm4-20020a170906f58400b00a59ebb2396csi535276ejd.105.2024.05.07.00.37.18; Tue, 07 May 2024 00:37:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="xcQr5u9/"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3EA9E68D73B; Tue, 7 May 2024 10:36:47 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-236.mail.qq.com (out203-205-221-236.mail.qq.com [203.205.221.236]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B300F68D722 for ; Tue, 7 May 2024 10:36:38 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715067388; bh=4x42jhR2lwqDokzvDVfGXOTttjce+yAReK5cTE0j2/U=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=xcQr5u9/oT5wG8kPn++epsi4LNaShFT9VjAfe+u8ZJ5GkS0JLi+5hjuEz8xSa97cQ cvf6J6p8O/tVOskkRjkjkEm19bH+qcF/r9DF74HxYGAHxCVXWl1RnlcQ/N+pMYPaNX oFogSDi1XDEEvFk9JMSDe0OqnVwrSJgPcDqh2Pjk= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id 9188D283; Tue, 07 May 2024 15:36:24 +0800 X-QQ-mid: xmsmtpt1715067387turx29040 Message-ID: X-QQ-XMAILINFO: OC0C8720A+YSqdfRO80waKStyJYVUp5WULK3/t28+KGBnfeJCFx4vdDxoOaguH 1qelIVOAKtb3MaQ/vU60Mb7EvzUzw9O46Ilw6WAy0avsJd9MuAd7k67RM3/Qhmzj/O+ku24RRqKn dX96ixw+ndEDkNBcTNmg2XdLTVYcdG/pFmVQxnkaREZyZyJFVzrK5m39RqyiKCUSLlKIwCtqtQII zotXTLWfpQ9wlb2BdXklsjbrxLkq0kMSgeAfFlFcjfmfQ/lFl0hQiZ6bIkjxiSv9v2IU121ABY1i fD+K6Y10k2cmpVKelqL/kym5bUIeImxUdrQxdBE3mC2F/+giSnvHk+5fA7Dnf4j5tSba6L4kaB/c bwh9OYWUZn8M+bl9pgwlopB0hRwT0z7d4riBsLck3bKGXVx7JDQANCGj7dWUxdycfcS9gW/zBD8O xeVdmtS5vTxysqiMFvovGgRPAt0XTND6FFPDolx284TPSrI7ihfIkGmNzTs7V5BVIP3BIm+Wa3Ry 0OUxR2VshclHvHiyNkLAV1owdAvIIBi7mJ/UtD4LMz15QapCpOmtef6NTTONZCFCsbpHQPng915L FN71SLYdNpxjAnjDRAio4kIyGgjWJGBlj9lJ7RvN3iMF0jCcOid6d7TZXX490s5MS+Dhs4Y4kcjB VMgOMrTbGmrT9TsW91a8+dVP7UCgWQGJCKXR9kMWI4Aw0+qh8+JEN9YjQpjMLBMdmeUjzPCYaXcL RO4Sx1Pofyp7RedYrumTxnb3XrNbNds/jTzSpGkVCF4GiNoUi8dvb6vvDyzuKLWNzdsqLwWFPhBQ nkdnuLhrR7CRKzl8934kbKTN0Bhvu7RAstZS5+BHyE7ysi9XU6yZeGBoaPA8oReg7Udvt5rKMvWO Ze7lj+SCHvxHw/Hahb6NLj0rT7NGQ3gA== X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 7 May 2024 15:36:08 +0800 X-OQ-MSGID: <20240507073613.2871668-4-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240507073613.2871668-1-uk7b@foxmail.com> References: <20240507073613.2871668-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 4/9] lavc/vp9dsp: R-V V ipred tm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: C01wD8hRU76B From: sunyuechi C908: vp9_tm_4x4_8bpp_c: 116.5 vp9_tm_4x4_8bpp_rvv_i32: 43.5 vp9_tm_8x8_8bpp_c: 416.2 vp9_tm_8x8_8bpp_rvv_i32: 86.0 vp9_tm_16x16_8bpp_c: 1665.5 vp9_tm_16x16_8bpp_rvv_i32: 187.2 vp9_tm_32x32_8bpp_c: 6974.2 vp9_tm_32x32_8bpp_rvv_i32: 625.7 --- libavcodec/riscv/vp9_intra_rvv.S | 141 +++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 8 ++ libavcodec/riscv/vp9dsp_init.c | 4 + 3 files changed, 153 insertions(+) diff --git a/libavcodec/riscv/vp9_intra_rvv.S b/libavcodec/riscv/vp9_intra_rvv.S index dd9bc036e7..7a51aa2bf1 100644 --- a/libavcodec/riscv/vp9_intra_rvv.S +++ b/libavcodec/riscv/vp9_intra_rvv.S @@ -169,3 +169,144 @@ func ff_h_8x8_rvv, zve32x ret endfunc + +.macro tm_sum dst, top, offset + lbu t3, \offset(a2) + sub t3, t3, a4 + vadd.vx \dst, \top, t3 +.endm + +func ff_tm_32x32_rvv, zve32x + lbu a4, -1(a3) + li t5, 32 + + .macro tm_sum32 n1,n2,n3,n4,n5,n6,n7,n8 + vsetvli zero, t5, e16, m4, ta, ma + vle8.v v8, (a3) + vzext.vf2 v28, v8 + + tm_sum v0, v28, \n1 + tm_sum v4, v28, \n2 + tm_sum v8, v28, \n3 + tm_sum v12, v28, \n4 + tm_sum v16, v28, \n5 + tm_sum v20, v28, \n6 + tm_sum v24, v28, \n7 + tm_sum v28, v28, \n8 + + .irp n 0, 4, 8, 12, 16, 20, 24, 28 + vmax.vx v\n, v\n, zero + .endr + + vsetvli zero, zero, e8, m2, ta, ma + .irp n 0, 4, 8, 12, 16, 20, 24, 28 + vnclipu.wi v\n, v\n, 0 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + .endm + + tm_sum32 31, 30, 29, 28, 27, 26, 25, 24 + tm_sum32 23, 22, 21, 20, 19, 18, 17, 16 + tm_sum32 15, 14, 13, 12, 11, 10, 9, 8 + tm_sum32 7, 6, 5, 4, 3, 2, 1, 0 + + ret +endfunc + +func ff_tm_16x16_rvv, zve32x + vsetivli zero, 16, e16, m2, ta, ma + vle8.v v8, (a3) + vzext.vf2 v30, v8 + lbu a4, -1(a3) + + tm_sum v0, v30, 15 + tm_sum v2, v30, 14 + tm_sum v4, v30, 13 + tm_sum v6, v30, 12 + tm_sum v8, v30, 11 + tm_sum v10, v30, 10 + tm_sum v12, v30, 9 + tm_sum v14, v30, 8 + tm_sum v16, v30, 7 + tm_sum v18, v30, 6 + tm_sum v20, v30, 5 + tm_sum v22, v30, 4 + tm_sum v24, v30, 3 + tm_sum v26, v30, 2 + tm_sum v28, v30, 1 + tm_sum v30, v30, 0 + + .irp n 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 + vmax.vx v\n, v\n, zero + .endr + + vsetvli zero, zero, e8, m1, ta, ma + .irp n 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28 + vnclipu.wi v\n, v\n, 0 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vnclipu.wi v30, v30, 0 + vse8.v v30, (a0) + + ret +endfunc + +func ff_tm_8x8_rvv, zve32x + vsetivli zero, 8, e16, m1, ta, ma + vle8.v v8, (a3) + vzext.vf2 v28, v8 + lbu a4, -1(a3) + + tm_sum v16, v28, 7 + tm_sum v17, v28, 6 + tm_sum v18, v28, 5 + tm_sum v19, v28, 4 + tm_sum v20, v28, 3 + tm_sum v21, v28, 2 + tm_sum v22, v28, 1 + tm_sum v23, v28, 0 + + .irp n 16, 17, 18, 19, 20, 21, 22, 23 + vmax.vx v\n, v\n, zero + .endr + + vsetvli zero, zero, e8, mf2, ta, ma + .irp n 16, 17, 18, 19, 20, 21, 22 + vnclipu.wi v\n, v\n, 0 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vnclipu.wi v24, v23, 0 + vse8.v v24, (a0) + + ret +endfunc + +func ff_tm_4x4_rvv, zve32x + vsetivli zero, 4, e16, mf2, ta, ma + vle8.v v8, (a3) + vzext.vf2 v28, v8 + lbu a4, -1(a3) + + tm_sum v16, v28, 3 + tm_sum v17, v28, 2 + tm_sum v18, v28, 1 + tm_sum v19, v28, 0 + + .irp n 16, 17, 18, 19 + vmax.vx v\n, v\n, zero + .endr + + vsetvli zero, zero, e8, mf4, ta, ma + .irp n 16, 17, 18 + vnclipu.wi v\n, v\n, 0 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vnclipu.wi v24, v19, 0 + vse8.v v24, (a0) + + ret +endfunc diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index 0ad961c7e0..79330b4968 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -72,6 +72,14 @@ void ff_h_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); void ff_h_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_tm_32x32_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_tm_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_tm_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_tm_4x4_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); #define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) \ void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 7816b13fe0..8023c333db 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -62,6 +62,10 @@ static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) dsp->intra_pred[TX_32X32][HOR_PRED] = ff_h_32x32_rvv; dsp->intra_pred[TX_16X16][HOR_PRED] = ff_h_16x16_rvv; dsp->intra_pred[TX_8X8][HOR_PRED] = ff_h_8x8_rvv; + dsp->intra_pred[TX_32X32][TM_VP8_PRED] = ff_tm_32x32_rvv; + dsp->intra_pred[TX_16X16][TM_VP8_PRED] = ff_tm_16x16_rvv; + dsp->intra_pred[TX_8X8][TM_VP8_PRED] = ff_tm_8x8_rvv; + dsp->intra_pred[TX_4X4][TM_VP8_PRED] = ff_tm_4x4_rvv; } #endif #endif From patchwork Tue May 7 07:36:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48616 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:6816:b0:1af:836d:81b3 with SMTP id wr22csp209087pzb; Tue, 7 May 2024 00:38:08 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUIOxIBTVrO0YWLTf07Y0WP0obduBgP/e4LdhtjQSGXZAAavnj8QeZL5vBRBTwLrEIFPV8YaupKwCce5x1uGt8BZ3xRU/EkFYyebg== X-Google-Smtp-Source: AGHT+IGZ7vGL+i4A+yYYI96zKjRx2vBS1n0csNHd2bTJTWBAKKQlsPrQOnISk38+RC9UDp3rCpWT X-Received: by 2002:a05:651c:604:b0:2e0:4cbb:858a with SMTP id k4-20020a05651c060400b002e04cbb858amr6020822lje.27.1715067488171; Tue, 07 May 2024 00:38:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715067488; cv=none; d=google.com; s=arc-20160816; b=KkFbcHS7YrkHG+99JXCTGOpFsCrGgxFpSSBSl00f1fYE6f8L6iG/oiGumKUaUc2Ecd dDFhr7218AbV0GugE/YGFkc1rEhE5tqmng5vQfpllCJI50uRyFbtp5SeE3gt6MoFsIL2 iGwGc/94o50HAbp6gsWI9w8109VU2QC0V5z6+GUQMpBg+1D2VXM6A6WuMROOHvKDtt3d Ukur7F8FV3D1dovCLQP7fNw/z8KmkoyGfT7+HEqInLxpC2uAiV3bt3n/SGp2oMf3Tc8I gOqBq4sNyxrE9a/BBpq18MLUqnCv6DhuiCpIDpq1bk0QCjJ13rfO+VlXMn2GcR5z1B0i mVHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=5OVzFYH041cdBcioYrJS55SW3TwDR50ntsbYGujtOWY=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=BwRm8FvCixcfdAlzOp+NNaLRJ7VwEujjC/B0G86bQlAL72kB0fM5e1xhBOPPDAKC7K aYwKhNyiUpRF+LqYjgUCGISNktnI3a/EpqnQTFDsPHrWbcBhCg38eK0lHdQEccrHkqar kh+BWMFXxHxZKmZnc9K7wMF4dLgs+KhTt+mrp1scLHEnBZwbzEWQJNNAlxUKRXGRG2XE ZtUip4K7rQ2r3PzjJ6elEzMuU3qXvVAPh73Gh4HtbmNCjl2kO294lY92Kh6l7Mh3XfyU Ukv7PfE0H7Rc3GW2LS5a+iWaJllJVc7H3j0D/PSGIaF0aE6q0HcYL7mDjO5v1mIWw6uy PC3Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=hQwvlFlw; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u19-20020a2e9f13000000b002dcaea129ddsi3321527ljk.642.2024.05.07.00.38.07; Tue, 07 May 2024 00:38:08 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=hQwvlFlw; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5F34968D75B; Tue, 7 May 2024 10:36:53 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-202.mail.qq.com (out203-205-221-202.mail.qq.com [203.205.221.202]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3376B68D738 for ; Tue, 7 May 2024 10:36:45 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715067389; bh=TiSjQvkrdCr4XN2rfTZT14rdn2wDZRj74lYI/A+RiPU=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=hQwvlFlwc9wK2rrm2ffBQ2B8h55V8FQMBxde3OK6lnTi7L5DKNLGSvHfnN5h81LiJ TJ8QyZ3FxKSqxOk3kqM3XvIdgL4uFVN7UFgFvlv13UVMh+z/waQY3GPh0Gq8TryJnh KUN+Fs7IKS+wgtsN2JdzlB5TRH7kg2Z5hqKIdZxA= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id 9188D283; Tue, 07 May 2024 15:36:24 +0800 X-QQ-mid: xmsmtpt1715067388tfyog0qc3 Message-ID: X-QQ-XMAILINFO: Mm/8i8/T4yneGdED9/VUHygXyhjmFuEAvrsc9pAyh3tRDC7iUxWiMsi1GKpdSY M28ywtuxOLUvMitu85r00gw28FidYMZ1qH0aPN/gEoyPvBRgybpnT98pGROmbTOVaRJ2GnMtBmOH 3/lS4COpi2DxKonlPS7Kfz50hmhbN+2CM4mbEceWVZx0I7u30WbLW0rpfjTcWMW6WfH4WEAmpXTX Q3ZzEjhUnmToInf0dx8pk5g8l5PFp5z6ErUVCRMQy9TOZXDvJTDyw7ukxR5TOc+fICr95lUQB31z 5MtS46GcgWX9oua9oWAqf6qCrhwrkVKPCa5HYfnmvVU94Ofi8S7plTHXHXLYxuDKdkDEUor8Y3/p zEB7JSC8LwXhP8oE666EgDGVEBOc3s9E0Z3jLgIa1k9Fhkjiu2shHbbLfoJpuKPtcYwlZYLv1WCU m4ChNXKHGtkF29vVUd5eZMPB3OM5zz/6FSMVG/ipZUnmElCriINXiqDQEyT+ysKbjKqLc/WAXoc4 z4LQP9skGwuL/Vp4VtoLC4re7N7nFQQiAmH9vlqEcCBYEsEghHKGkCrKy7bV8wUJtiTNSDarD2rX GWLZIe67WEOApOt9DR8+ucChiiQJUgEX400X5iZO3F5uCV3Vq3cKXJvYtuefvsYyHfNhW/IGQUoW auHOSYzKRKj/H46y1xTHHHxG92K9StyYtoFlQOaWWzOficXEkrl9Eld46qjGxK2jZerqez0Rkd6/ M/moWAMv9tGidEfjdLDlnQXsKfEzNxBNo8B/AvJIKYEKiWfU1M4af9fMuvIHRedqZbB9yiYBId9K gu3CjmuAcxM+G8+TsX00AEw1WT1xL8yrTmFFBPHTytxT1ntCTOQX533dQopOHCk5Db3j3vHjbTEL 4FmEo4iRmpYN94V9tcuu5z5Z9CN+2aui6+UmdBfIpulGaC3yguvwG1NfY8ut2kmw== X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 7 May 2024 15:36:09 +0800 X-OQ-MSGID: <20240507073613.2871668-5-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240507073613.2871668-1-uk7b@foxmail.com> References: <20240507073613.2871668-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 5/9] lavc/vp9dsp: R-V V mc avg X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Kdi5/UVe78wt From: sunyuechi C908: vp9_avg4_8bpp_c: 1.2 vp9_avg4_8bpp_rvv_i64: 1.0 vp9_avg8_8bpp_c: 3.7 vp9_avg8_8bpp_rvv_i64: 1.5 vp9_avg16_8bpp_c: 14.7 vp9_avg16_8bpp_rvv_i64: 3.5 vp9_avg32_8bpp_c: 57.7 vp9_avg32_8bpp_rvv_i64: 10.0 vp9_avg64_8bpp_c: 229.0 vp9_avg64_8bpp_rvv_i64: 31.7 --- libavcodec/riscv/Makefile | 3 +- libavcodec/riscv/vp9_mc_rvv.S | 58 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 19 +++++++++++ 3 files changed, 79 insertions(+), 1 deletion(-) create mode 100644 libavcodec/riscv/vp9_mc_rvv.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 5846861bac..73c9f24d97 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -65,6 +65,7 @@ RVV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvv.o OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9dsp_init.o RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o \ riscv/vp9_mc_rvi.o -RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o +RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o \ + riscv/vp9_mc_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S new file mode 100644 index 0000000000..81ecb49435 --- /dev/null +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -0,0 +1,58 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/riscv/asm.S" + +.macro vsetvlstatic8 len an mn=m4 +.if \len <= 4 + vsetivli zero, \len, e8, mf4, ta, ma +.elseif \len <= 8 + vsetivli zero, \len, e8, mf2, ta, ma +.elseif \len <= 16 + vsetivli zero, \len, e8, m1, ta, ma +.elseif \len <= 32 + li \an, \len + vsetvli zero, \an, e8, m2, ta, ma +.elseif \len <= 64 + li \an, \len + vsetvli zero, \an, e8, \mn, ta, ma +.endif +.endm + +.macro copy_avg len +func ff_avg\len\()_rvv, zve32x + csrwi vxrm, 0 + vsetvlstatic8 \len t0 +1: + addi a4, a4, -1 + vle8.v v8, (a2) + vle8.v v16, (a0) + vaaddu.vv v8, v8, v16 + vse8.v v8, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + ret +endfunc +.endm + +.irp len 64, 32, 16, 8, 4 + copy_avg \len +.endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 8023c333db..2caaf732db 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -92,6 +92,25 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) #undef init_fpel } + +#if HAVE_RVV + if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I64) && ff_get_rv_vlenb() >= 16) { + +#define init_fpel(idx1, sz) \ + dsp->mc[idx1][FILTER_8TAP_SMOOTH ][1][0][0] = ff_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_8TAP_REGULAR][1][0][0] = ff_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_8TAP_SHARP ][1][0][0] = ff_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_BILINEAR ][1][0][0] = ff_avg##sz##_rvv + + init_fpel(0, 64); + init_fpel(1, 32); + init_fpel(2, 16); + init_fpel(3, 8); + init_fpel(4, 4); + +#undef init_fpel + } +#endif #endif } From patchwork Tue May 7 07:36:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48612 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:6816:b0:1af:836d:81b3 with SMTP id wr22csp208853pzb; Tue, 7 May 2024 00:37:29 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXtHF8iBfawtSlLp250RPtWwNPqIP120JVb3i2BKk0ai71kT7R3IP14eHcw+Xk2l6c1wdXrPwH1+YBzXSTTj/Xmosgyoz9YHJX8VQ== X-Google-Smtp-Source: AGHT+IEE+xjqXH3k5srZmSwHOclpOVpdpUu9+B52xE/oBlmzc1abgRMQwA9XYijvyzlYt3qSFGeX X-Received: by 2002:a50:9ea9:0:b0:572:65d3:8084 with SMTP id a38-20020a509ea9000000b0057265d38084mr8095614edf.1.1715067448774; Tue, 07 May 2024 00:37:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715067448; cv=none; d=google.com; s=arc-20160816; b=Te71zNq3F8Gx07r2Bh0Ol5S5zJ9zDgEe02EI5Dj03elzJBKEqrBJBht/AJujEzUGxw bkvglF4Z3Iyh4B4JCxU5+tyjgkFLK6OOuuhdm2ypEdxYf3u/FCNHFJwQGN/zDJAZ3PIi weUwo8zCYCFsLiDDMqZCzuvbc5/EjOqFLVIzwykp0i+99+fQoWUK73zr/sU9KGRfOoTO xQSblyCzOnXT+C1CWb8WOjjcPErgI6/Ej+f4CPkmj64gThhu+fXzny0DwbiShnJn1Ckt UWqAroj/AaS195ie9pPP4X/2qzFh7Xh1QfZMxPhSyPAlOIfSyEhl1bljYv7p6VXvSE3f QdtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=DNua6GbPS73veJfT1r6jdJvejRma6xwGsEPAaqmmBh8=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=Q6HruKxWK3Wyv+QfbEMfnIWAluHuLs2rIPTIgMeI8F6HOyh6k0JY3Ugi/7Oqt+OghE W5w/f6MTXegFtbYqwngFFr4MvJ4YpOaHVEHfEuyyBdvK3pYcwzs9fsYP/9ZVXyy25JFJ KLdBvEIorJMPiYOjy3kWUrVUTHDYf7glwMvn1giSMsijc19uI+SLBHU2+C5/os5fM9Nl nbFvnATMlm3iFB/QBmP9FxRGzXdv3Qsrpq2Z4RfdVW2Qh/QxbzsA8+z8duFPHuWjDZ40 AQxPt117KxOeJjAPiGEWUu6c1tFyc0+Jsbwobm1cgUCFHreBBE7YMyDMjCnOa/9m7B60 K3IQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=QYPwTaZK; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id bf26-20020a0564021a5a00b005726f528effsi5861175edb.88.2024.05.07.00.37.28; Tue, 07 May 2024 00:37:28 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=QYPwTaZK; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5932A68D742; Tue, 7 May 2024 10:36:48 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-231.mail.qq.com (out203-205-221-231.mail.qq.com [203.205.221.231]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DE7D268D724 for ; Tue, 7 May 2024 10:36:38 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715067390; bh=dVrJ2inZO1EzfhRqPpWvWiDV5y/LslkQBBkzJ3EM2Qs=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=QYPwTaZKpYMrZcbJoCiNMq0uRh79T6ShJit2Obes57wKjBcXVJn/V/W9ZGLwv8MCM OhPKvB2gWX2uONJaKy0/vDWHu40aMQXRo29cAJR7PoyRKOD8PTG0uJH5WiCNLd0lSg iGabC+4Ciuw4+ntQ+ATCus++GkjgcqWJE87Pl9JE= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id 9188D283; Tue, 07 May 2024 15:36:24 +0800 X-QQ-mid: xmsmtpt1715067389tnos5g35q Message-ID: X-QQ-XMAILINFO: OabLgwGJB7xGL/sUIQZSuhGbVm3Lwg5GUdT84nDfALSTZa+35TQCYv/gVep9YZ scYaylyzR7gWgmGW2vRATk+XZizqFfu6DBilRdxP1UiqU4xC7nq0yN2QAPRXRX141GCoit7Gf/JJ 25yxJbJTqb5JZW2xMx2VNbHWCnPhRoeiCy6g9jpsfxjYmUetsnX0ZUM73O23isLc/JjH7wwliaq9 PyqZ/fiDQK+MFVx0LUMXN867JEqJocxDb24+O9jPdFqMXxXvaPDQJe908rwkbli4VG5vVPg1DMZ0 NQ2fxGEbu7UwFhM/itx3mjrpgrB3spbtSwMT6GRVEYMY7twK0sTA/+o7is2yXpFQLeLICjT09idu LIg5TeLkf9f8bzENFTdsZo+1GUJuZrvmNJcHe1UtXLRSfwWpkrb9b+0r8hYPyorua5KNxEKfyk5/ 9zz/a3Ya5Lo6x9z9EDrls0NcWYxY7Z3AGLTXIs1p5ucaEpI1FmOr/5lRfju4mdntisTMzg6ZyqUF aWjc58E7NjPEPSX8FFHYYp9IXxJepHvcUS/rsF5maRZx+9x4LdGW4jVJZ5XXPTXE0IZwcNcM4xmQ D3XEsGOU4xCWMrhxj3dmyNB0Vmp3+3Ase1IGALz9aaUXp9kUZ2KTDTa2fTliCRexJLMPSqQklXp+ FFUNiq0DUevmDWgkCE9tLhF7LTMYI8tfrhj857uwNEyGO2SZwJ8eo+zlc3jC0Zv1huX/2cpKrsY0 1tHPTfJ1nTk9hmkMkqrRhrhdrX9pcUByqbaiZ52P59twIqTpETJO8NIksxmGev8pUPQzEtkuQNNm E7bc5OiA4PjNEMobXRx3Xd2Jfpi7Sl/EVB5oieIUNqKbgLFdqniFPRxrMMun/gFjbULkEg4liXFJ 5hEtqETZwXmDbeRaj1aR2jXCGpZQgKfarObDAjicxEDQCPP895vNZViZ8EUiXQ7A== X-QQ-XMRINFO: Nq+8W0+stu50PRdwbJxPCL0= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 7 May 2024 15:36:10 +0800 X-OQ-MSGID: <20240507073613.2871668-6-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240507073613.2871668-1-uk7b@foxmail.com> References: <20240507073613.2871668-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 6/9] lavc/vp9dsp: R-V V mc bilin h v X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: EUMbQUBclgox From: sunyuechi C908: vp9_avg_bilin_4h_8bpp_c: 5.2 vp9_avg_bilin_4h_8bpp_rvv_i64: 2.2 vp9_avg_bilin_4v_8bpp_c: 5.5 vp9_avg_bilin_4v_8bpp_rvv_i64: 2.2 vp9_avg_bilin_8h_8bpp_c: 20.0 vp9_avg_bilin_8h_8bpp_rvv_i64: 4.5 vp9_avg_bilin_8v_8bpp_c: 21.0 vp9_avg_bilin_8v_8bpp_rvv_i64: 4.2 vp9_avg_bilin_16h_8bpp_c: 78.2 vp9_avg_bilin_16h_8bpp_rvv_i64: 9.0 vp9_avg_bilin_16v_8bpp_c: 82.0 vp9_avg_bilin_16v_8bpp_rvv_i64: 9.0 vp9_avg_bilin_32h_8bpp_c: 325.5 vp9_avg_bilin_32h_8bpp_rvv_i64: 26.2 vp9_avg_bilin_32v_8bpp_c: 326.2 vp9_avg_bilin_32v_8bpp_rvv_i64: 26.2 vp9_avg_bilin_64h_8bpp_c: 1265.7 vp9_avg_bilin_64h_8bpp_rvv_i64: 91.5 vp9_avg_bilin_64v_8bpp_c: 1317.0 vp9_avg_bilin_64v_8bpp_rvv_i64: 91.2 vp9_put_bilin_4h_8bpp_c: 4.5 vp9_put_bilin_4h_8bpp_rvv_i64: 1.7 vp9_put_bilin_4v_8bpp_c: 4.7 vp9_put_bilin_4v_8bpp_rvv_i64: 1.7 vp9_put_bilin_8h_8bpp_c: 17.0 vp9_put_bilin_8h_8bpp_rvv_i64: 3.5 vp9_put_bilin_8v_8bpp_c: 18.0 vp9_put_bilin_8v_8bpp_rvv_i64: 3.5 vp9_put_bilin_16h_8bpp_c: 65.2 vp9_put_bilin_16h_8bpp_rvv_i64: 7.5 vp9_put_bilin_16v_8bpp_c: 85.7 vp9_put_bilin_16v_8bpp_rvv_i64: 7.5 vp9_put_bilin_32h_8bpp_c: 257.5 vp9_put_bilin_32h_8bpp_rvv_i64: 23.5 vp9_put_bilin_32v_8bpp_c: 274.5 vp9_put_bilin_32v_8bpp_rvv_i64: 23.5 vp9_put_bilin_64h_8bpp_c: 1040.5 vp9_put_bilin_64h_8bpp_rvv_i64: 82.5 vp9_put_bilin_64v_8bpp_c: 1108.7 vp9_put_bilin_64v_8bpp_rvv_i64: 82.2 --- libavcodec/riscv/vp9_mc_rvv.S | 43 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 22 +++++++++++++++++ 2 files changed, 65 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index 81ecb49435..598a67fc94 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -53,6 +53,49 @@ func ff_avg\len\()_rvv, zve32x endfunc .endm +.macro bilin_load dst len op type mn +.ifc \type,v + add t5, a2, a3 +.elseif \type == h + addi t5, a2, 1 +.endif + vle8.v v8, (a2) + vle8.v v0, (t5) + vwmulu.vx v16, v0, \mn + vwmaccsu.vx v16, t1, v8 + vwadd.wx v16, v16, t4 + vnsra.wi v16, v16, 4 + vadd.vv \dst, v16, v8 +.ifc \op,avg + vle8.v v16, (a0) + vaaddu.vv \dst, \dst, v16 +.endif +.endm + +.macro bilin_h_v len op type mn +func ff_\op\()_bilin_\len\()\type\()_rvv, zve32x +.ifc \op,avg + csrwi vxrm, 0 +.endif + vsetvlstatic8 \len t0 + li t4, 8 + neg t1, \mn +1: + addi a4, a4, -1 + bilin_load v0, \len, \op, \type, \mn + vse8.v v0, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + .irp len 64, 32, 16, 8, 4 copy_avg \len + .irp op put avg + bilin_h_v \len \op h a5 + bilin_h_v \len \op v a6 + .endr .endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 2caaf732db..cfeaa06c0a 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -109,6 +109,28 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) init_fpel(4, 4); #undef init_fpel + +#define init_subpel1(idx1, idx2, idxh, idxv, sz, dir, type) \ + dsp->mc[idx1][FILTER_BILINEAR ][idx2][idxh][idxv] = \ + ff_##type##_bilin_##sz##dir##_rvv; \ + +#define init_subpel2(idx, idxh, idxv, dir, type) \ + init_subpel1(0, idx, idxh, idxv, 64, dir, type); \ + init_subpel1(1, idx, idxh, idxv, 32, dir, type); \ + init_subpel1(2, idx, idxh, idxv, 16, dir, type); \ + init_subpel1(3, idx, idxh, idxv, 8, dir, type); \ + init_subpel1(4, idx, idxh, idxv, 4, dir, type) + +#define init_subpel3(idx, type) \ + init_subpel2(idx, 1, 0, h, type); \ + init_subpel2(idx, 0, 1, v, type); \ + + init_subpel3(0, put); + init_subpel3(1, avg); + +#undef init_subpel1 +#undef init_subpel2 +#undef init_subpel3 } #endif #endif From patchwork Tue May 7 07:36:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48613 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:6816:b0:1af:836d:81b3 with SMTP id wr22csp208910pzb; Tue, 7 May 2024 00:37:39 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXlVqQXyfT2Nz8j56MByQpJV4WFi6Z83u7bVx/mN9X8mpcA60L6b8ojiTYj/42q1jgmD+DgDpHGUEWX6CPgzwyvZLErGiZF/N5chQ== X-Google-Smtp-Source: AGHT+IFgumyW5dpVH3DFJurvyU5ggrCDnNAB/z15CvYScsWuMoSbzAcBsYmVBb9+CAbZ13i28iJ7 X-Received: by 2002:a50:bb41:0:b0:572:7319:ab83 with SMTP id y59-20020a50bb41000000b005727319ab83mr9220354ede.6.1715067458707; Tue, 07 May 2024 00:37:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715067458; cv=none; d=google.com; s=arc-20160816; b=ow1af7BV5sVpZDwoYEGoPnFQ5pvYv3XEgpXocmRKDMsxKv3LmgX5hIDZPknsmXH1Du 6y1Fr970WtTOFh/AJh5yD0s5FPAmHRM52kTc7Dlrg6gCkr+OCX8AQNySMCV4pndqIA4t P5s8GfFs15Zegg0rb20wXwVyl+PCsSUPMpaAgoDh+5xJc84S/HyihAKnyYKlgn7pt6lx g7FhWbZQi0SQWT3yjhBfMeYP9EWstwSmgdCYeCi/gOV8SDB62oju669X8O9WJsu6bWkI A2/hPS6C1QmaiBbhLnANjbrrVJoAFcOXNCOzU14R+pW7qCS5Z5o6p+fRUHw+XW334rXl fl9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=8Q3jfKXQQYGNNtjiBKiL4cKkM8Zg0r5WEtGvYKfi8KM=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=N7lhwVeelr3rtvX0xc8mWZk1+FO0cpA7dkZt5ZLAwIoALeWYrwvaXKyr8G/WylehiO 8vdr0RZlGH4TOw3nwC/nBN4duFBpwrI/isPOXx1jfQFRvPOsSPzGMevYnpOABte5HOYE YlLU/ZsTPX4eH10ovxWkTVlODLnBt6eQ5opXXqCxQEIPfVarba4hLowxlLhFnhDLCr8V sfn3a25paE0PYCGYKDoFk2PVXnbISO91/9K+M/fp0/Oo44ENxR/aU0BR19BeKfMI5TkC DqorQOhkUOqaRnTSMEzQWtan2qpq+nBLst5+YCaM86UpulnyC/BO7oC5HQhpqqnyD2RD P0/g==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=oF1Z2L5P; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id h20-20020a05640250d400b00572122e47cbsi5856233edb.139.2024.05.07.00.37.38; Tue, 07 May 2024 00:37:38 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=oF1Z2L5P; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6DF4B68D725; Tue, 7 May 2024 10:36:49 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-57-252.mail.qq.com (out162-62-57-252.mail.qq.com [162.62.57.252]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9010468D71D for ; Tue, 7 May 2024 10:36:39 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715067391; bh=NepsvpqWMEn5RrHdmiCDZBPOR4VjeBVvnbRYW64aoJ4=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=oF1Z2L5PntC6eWptq71q+Zi8tITT5ssr/MdAj8uZR0LYGvPHtcfyr9BDR+vca219K WtqWsb9At0IlLcKB8MhaocNL1sR4YQq7Wxt72CFR3cjQ7Rmiiqx3ftYU9QgojKzgKo 4TRRpV3ZJyrS0lDHJbpQgw3tyFnzxs0O9b4NZ9Bk= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id 9188D283; Tue, 07 May 2024 15:36:24 +0800 X-QQ-mid: xmsmtpt1715067390t8m64h1us Message-ID: X-QQ-XMAILINFO: M/zLVQ9OcpvRrfr4/nN26N97YQk9XcGUzqWoqV6orNkEZQLbkQi8E4ObRPj72z rrnqPFUp4uGpnAhi1FtQkt3ljFzfJpcI0d3mjyHx7wNOl6ZTSP9a8oOmvryXCD9TL8Rd7Yp4r8lx ASHAIKISWF/v5aVnMg8PepQoZCBgE0mYaSU0sLntShVyjJzbNAwxtu5NASOhW3r87E5HH0Pxh+60 oD+6+x+w+HXv6i3DIL3ziwwGUAmr7JD6klHiiSqLL5qtNpWVQsud+J+HFXSqN8Bgdqfjs/21vlAZ 5DLk3jNA2uIXGNXlAuoQAp3kjIHvEqW0eFfK7gXnY70G9vX4I3Owb18rTXhfX37lt1Yzsll0A6Vr Jui0jt2xpfKjGFxd93WX6d/BjwEKCLxvxfG0GD9+zfhVYzS9OR/py05AIv96eZWKhauGHqdHcqgH h+4D5fHMQF+8taaewk2dc2vgKC6+U/JagyESH2djXw+hv48B6jImzBzBDr3dcJth7RdKPsIDwWqV XQ+5PMban4QSJLqiicL9JHqo03lSt08IKeJ2qYWqrn/RlftGMjxmTzknCOKjmGzd5dy6PiRCvvks OUhZYuLLgtRMtgO+grElJ+egU+AyJK30UDcsEB9jH7kNQQJBtaWtpJzE1I8fI1ztiKGd7iwpyqoq zu/bht7hVJ8g2plM1udGRAdu71iB26tG+k4A9vqDlwuisp/KYTIIufehQZZmqkZYhapVUTqre5wk 9HvxaEz7mFCfBvVgSC2hdOMtU9V6q0Li+ACTazJKY2Ft9wIsCEl12v+SWgYY0FVR0TxiqShf68/U ZPCgMllvn4fr91KwLjCzUjpzwXjnkVX4ux8p57v9G5X0QFVuyx68kuhSvbw2sme+0LYuuZB+oT21 fULElfgvOX+AvaB7fUokWPAfDl8cmy19JbQEbZ8B8sjBaFUW7cTqkgxh9ing4ZoSVEb9CfFkw1 X-QQ-XMRINFO: NyFYKkN4Ny6FSmKK/uo/jdU= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 7 May 2024 15:36:11 +0800 X-OQ-MSGID: <20240507073613.2871668-7-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240507073613.2871668-1-uk7b@foxmail.com> References: <20240507073613.2871668-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 7/9] lavc/vp9dsp: R-V V mc tap h v X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Ev0er72n5Ecd From: sunyuechi C908: vp9_avg_8tap_smooth_4h_8bpp_c: 13.0 vp9_avg_8tap_smooth_4h_8bpp_rvv_i64: 5.0 vp9_avg_8tap_smooth_4v_8bpp_c: 13.7 vp9_avg_8tap_smooth_4v_8bpp_rvv_i64: 5.0 vp9_avg_8tap_smooth_8h_8bpp_c: 48.7 vp9_avg_8tap_smooth_8h_8bpp_rvv_i64: 9.5 vp9_avg_8tap_smooth_8v_8bpp_c: 50.0 vp9_avg_8tap_smooth_8v_8bpp_rvv_i64: 9.5 vp9_avg_8tap_smooth_16h_8bpp_c: 192.5 vp9_avg_8tap_smooth_16h_8bpp_rvv_i64: 21.2 vp9_avg_8tap_smooth_16v_8bpp_c: 191.5 vp9_avg_8tap_smooth_16v_8bpp_rvv_i64: 21.2 vp9_avg_8tap_smooth_32h_8bpp_c: 763.7 vp9_avg_8tap_smooth_32h_8bpp_rvv_i64: 67.2 vp9_avg_8tap_smooth_32v_8bpp_c: 770.7 vp9_avg_8tap_smooth_32v_8bpp_rvv_i64: 67.2 vp9_avg_8tap_smooth_64h_8bpp_c: 3098.7 vp9_avg_8tap_smooth_64h_8bpp_rvv_i64: 283.2 vp9_avg_8tap_smooth_64v_8bpp_c: 3045.2 vp9_avg_8tap_smooth_64v_8bpp_rvv_i64: 266.7 vp9_put_8tap_smooth_4h_8bpp_c: 11.0 vp9_put_8tap_smooth_4h_8bpp_rvv_i64: 4.2 vp9_put_8tap_smooth_4v_8bpp_c: 28.5 vp9_put_8tap_smooth_4v_8bpp_rvv_i64: 4.2 vp9_put_8tap_smooth_8h_8bpp_c: 42.2 vp9_put_8tap_smooth_8h_8bpp_rvv_i64: 8.5 vp9_put_8tap_smooth_8v_8bpp_c: 43.7 vp9_put_8tap_smooth_8v_8bpp_rvv_i64: 8.5 vp9_put_8tap_smooth_16h_8bpp_c: 165.7 vp9_put_8tap_smooth_16h_8bpp_rvv_i64: 19.7 vp9_put_8tap_smooth_16v_8bpp_c: 168.5 vp9_put_8tap_smooth_16v_8bpp_rvv_i64: 19.5 vp9_put_8tap_smooth_32h_8bpp_c: 675.5 vp9_put_8tap_smooth_32h_8bpp_rvv_i64: 64.2 vp9_put_8tap_smooth_32v_8bpp_c: 664.7 vp9_put_8tap_smooth_32v_8bpp_rvv_i64: 64.2 vp9_put_8tap_smooth_64h_8bpp_c: 2680.5 vp9_put_8tap_smooth_64h_8bpp_rvv_i64: 272.0 vp9_put_8tap_smooth_64v_8bpp_c: 2692.5 vp9_put_8tap_smooth_64v_8bpp_rvv_i64: 272.0 --- libavcodec/riscv/vp9_mc_rvv.S | 238 +++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 8 ++ 2 files changed, 246 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index 598a67fc94..99605dfbb5 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -36,6 +36,18 @@ .endif .endm +.macro vsetvlstatic16 len +.ifc \len,4 + vsetvli zero, zero, e16, mf2, ta, ma +.elseif \len == 8 + vsetvli zero, zero, e16, m1, ta, ma +.elseif \len == 16 + vsetvli zero, zero, e16, m2, ta, ma +.else + vsetvli zero, zero, e16, m4, ta, ma +.endif +.endm + .macro copy_avg len func ff_avg\len\()_rvv, zve32x csrwi vxrm, 0 @@ -92,10 +104,236 @@ func ff_\op\()_bilin_\len\()\type\()_rvv, zve32x endfunc .endm +const subpel_filters_regular + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte 0, 1, -5, 126, 8, -3, 1, 0 + .byte -1, 3, -10, 122, 18, -6, 2, 0 + .byte -1, 4, -13, 118, 27, -9, 3, -1 + .byte -1, 4, -16, 112, 37, -11, 4, -1 + .byte -1, 5, -18, 105, 48, -14, 4, -1 + .byte -1, 5, -19, 97, 58, -16, 5, -1 + .byte -1, 6, -19, 88, 68, -18, 5, -1 + .byte -1, 6, -19, 78, 78, -19, 6, -1 + .byte -1, 5, -18, 68, 88, -19, 6, -1 + .byte -1, 5, -16, 58, 97, -19, 5, -1 + .byte -1, 4, -14, 48, 105, -18, 5, -1 + .byte -1, 4, -11, 37, 112, -16, 4, -1 + .byte -1, 3, -9, 27, 118, -13, 4, -1 + .byte 0, 2, -6, 18, 122, -10, 3, -1 + .byte 0, 1, -3, 8, 126, -5, 1, 0 +subpel_filters_sharp: + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte -1, 3, -7, 127, 8, -3, 1, 0 + .byte -2, 5, -13, 125, 17, -6, 3, -1 + .byte -3, 7, -17, 121, 27, -10, 5, -2 + .byte -4, 9, -20, 115, 37, -13, 6, -2 + .byte -4, 10, -23, 108, 48, -16, 8, -3 + .byte -4, 10, -24, 100, 59, -19, 9, -3 + .byte -4, 11, -24, 90, 70, -21, 10, -4 + .byte -4, 11, -23, 80, 80, -23, 11, -4 + .byte -4, 10, -21, 70, 90, -24, 11, -4 + .byte -3, 9, -19, 59, 100, -24, 10, -4 + .byte -3, 8, -16, 48, 108, -23, 10, -4 + .byte -2, 6, -13, 37, 115, -20, 9, -4 + .byte -2, 5, -10, 27, 121, -17, 7, -3 + .byte -1, 3, -6, 17, 125, -13, 5, -2 + .byte 0, 1, -3, 8, 127, -7, 3, -1 +subpel_filters_smooth: + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte -3, -1, 32, 64, 38, 1, -3, 0 + .byte -2, -2, 29, 63, 41, 2, -3, 0 + .byte -2, -2, 26, 63, 43, 4, -4, 0 + .byte -2, -3, 24, 62, 46, 5, -4, 0 + .byte -2, -3, 21, 60, 49, 7, -4, 0 + .byte -1, -4, 18, 59, 51, 9, -4, 0 + .byte -1, -4, 16, 57, 53, 12, -4, -1 + .byte -1, -4, 14, 55, 55, 14, -4, -1 + .byte -1, -4, 12, 53, 57, 16, -4, -1 + .byte 0, -4, 9, 51, 59, 18, -4, -1 + .byte 0, -4, 7, 49, 60, 21, -3, -2 + .byte 0, -4, 5, 46, 62, 24, -3, -2 + .byte 0, -4, 4, 43, 63, 26, -2, -2 + .byte 0, -3, 2, 41, 63, 29, -2, -2 + .byte 0, -3, 1, 38, 64, 32, -1, -3 +endconst + +.macro epel_filter name type regtype + lla \regtype\()2, subpel_filters_\name + li \regtype\()1, 8 +.ifc \type,v + mul \regtype\()0, a6, \regtype\()1 +.elseif \type == h + mul \regtype\()0, a5, \regtype\()1 +.endif + add \regtype\()0, \regtype\()0, \regtype\()2 + .irp n 1,2,3,4,5,6 + lb \regtype\n, \n(\regtype\()0) + .endr +.ifc \regtype,t + lb a7, 7(\regtype\()0) +.elseif \regtype == s + lb s7, 7(\regtype\()0) +.endif + lb \regtype\()0, 0(\regtype\()0) +.endm + +.macro epel_load dst len op name type from_mem regtype + li a5, 64 +.ifc \from_mem, 1 + vle8.v v22, (a2) +.ifc \type,v + sub a2, a2, a3 + vle8.v v20, (a2) + sh1add a2, a3, a2 + vle8.v v24, (a2) + add a2, a2, a3 + vle8.v v26, (a2) + add a2, a2, a3 + vle8.v v28, (a2) + add a2, a2, a3 + vle8.v v30, (a2) +.elseif \type == h + addi a2, a2, -1 + vle8.v v20, (a2) + addi a2, a2, 2 + vle8.v v24, (a2) + addi a2, a2, 1 + vle8.v v26, (a2) + addi a2, a2, 1 + vle8.v v28, (a2) + addi a2, a2, 1 + vle8.v v30, (a2) +.endif + +.ifc \name,smooth + vwmulu.vx v16, v24, \regtype\()4 + vwmaccu.vx v16, \regtype\()2, v20 + vwmaccu.vx v16, \regtype\()5, v26 + vwmaccsu.vx v16, \regtype\()6, v28 +.else + vwmulu.vx v16, v28, \regtype\()6 + vwmaccsu.vx v16, \regtype\()2, v20 + vwmaccsu.vx v16, \regtype\()5, v26 +.endif + +.ifc \regtype,t + vwmaccsu.vx v16, a7, v30 +.elseif \regtype == s + vwmaccsu.vx v16, s7, v30 +.endif + +.ifc \type,v + .rept 6 + sub a2, a2, a3 + .endr + vle8.v v28, (a2) + sub a2, a2, a3 + vle8.v v26, (a2) + sh1add a2, a3, a2 + add a2, a2, a3 +.elseif \type == h + addi a2, a2, -6 + vle8.v v28, (a2) + addi a2, a2, -1 + vle8.v v26, (a2) + addi a2, a2, 3 +.endif + +.ifc \name,smooth + vwmaccsu.vx v16, \regtype\()1, v28 +.else + vwmaccu.vx v16, \regtype\()1, v28 + vwmulu.vx v28, v24, \regtype\()4 +.endif + vwmaccsu.vx v16, \regtype\()0, v26 + vwmulu.vx v20, v22, \regtype\()3 +.else +.ifc \name,smooth + vwmulu.vx v16, v8, \regtype\()4 + vwmaccu.vx v16, \regtype\()2, v4 + vwmaccu.vx v16, \regtype\()5, v10 + vwmaccsu.vx v16, \regtype\()6, v12 + vwmaccsu.vx v16, \regtype\()1, v2 +.else + vwmulu.vx v16, v2, \regtype\()1 + vwmaccu.vx v16, \regtype\()6, v12 + vwmaccsu.vx v16, \regtype\()5, v10 + vwmaccsu.vx v16, \regtype\()2, v4 + vwmulu.vx v28, v8, \regtype\()4 +.endif + vwmaccsu.vx v16, \regtype\()0, v0 + vwmulu.vx v20, v6, \regtype\()3 + +.ifc \regtype,t + vwmaccsu.vx v16, a7, v14 +.elseif \regtype == s + vwmaccsu.vx v16, s7, v14 +.endif + +.endif + vwadd.wx v16, v16, a5 + vsetvlstatic16 \len + +.ifc \name,smooth + vwadd.vv v24, v16, v20 +.else + vwadd.vv v24, v16, v28 + vwadd.wv v24, v24, v20 +.endif + vnsra.wi v24, v24, 7 + vmax.vx v24, v24, zero + vsetvlstatic8 \len, zero, m2 + + vnclipu.wi \dst, v24, 0 +.ifc \op,avg + vle8.v v24, (a0) + vaaddu.vv \dst, \dst, v24 +.endif + +.endm + +.macro epel_load_inc dst len op name type from_mem regtype + epel_load \dst \len \op \name \type \from_mem \regtype + add a2, a2, a3 +.endm + +.macro epel len op name type +func ff_\op\()_8tap_\name\()_\len\()\type\()_rvv, zve32x + epel_filter \name \type t + vsetvlstatic8 \len a5 m2 +.ifc \op,avg + csrwi vxrm, 0 +.endif + +1: + addi a4, a4, -1 + epel_load v30 \len \op \name \type 1 t + vse8.v v30, (a0) +.ifc \len,64 + addi a0, a0, 32 + addi a2, a2, 32 + epel_load v30 \len \op \name \type 1 t + vse8.v v30, (a0) + addi a0, a0, -32 + addi a2, a2, -32 +.endif + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + .irp len 64, 32, 16, 8, 4 copy_avg \len .irp op put avg bilin_h_v \len \op h a5 bilin_h_v \len \op v a6 + .irp name regular sharp smooth + .irp type h v + epel \len \op \name \type + .endr + .endr .endr .endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index cfeaa06c0a..a45aea530d 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -113,6 +113,12 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) #define init_subpel1(idx1, idx2, idxh, idxv, sz, dir, type) \ dsp->mc[idx1][FILTER_BILINEAR ][idx2][idxh][idxv] = \ ff_##type##_bilin_##sz##dir##_rvv; \ + dsp->mc[idx1][FILTER_8TAP_SMOOTH ][idx2][idxh][idxv] = \ + ff_##type##_8tap_smooth_##sz##dir##_rvv; \ + dsp->mc[idx1][FILTER_8TAP_REGULAR][idx2][idxh][idxv] = \ + ff_##type##_8tap_regular_##sz##dir##_rvv; \ + dsp->mc[idx1][FILTER_8TAP_SHARP ][idx2][idxh][idxv] = \ + ff_##type##_8tap_sharp_##sz##dir##_rvv; #define init_subpel2(idx, idxh, idxv, dir, type) \ init_subpel1(0, idx, idxh, idxv, 64, dir, type); \ @@ -123,7 +129,9 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) #define init_subpel3(idx, type) \ init_subpel2(idx, 1, 0, h, type); \ + if (flags & AV_CPU_FLAG_RVB_ADDR) { \ init_subpel2(idx, 0, 1, v, type); \ + } init_subpel3(0, put); init_subpel3(1, avg); From patchwork Tue May 7 07:36:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48614 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:6816:b0:1af:836d:81b3 with SMTP id wr22csp208956pzb; Tue, 7 May 2024 00:37:48 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCX0ojpogOOxFcWCYOD0QbU3wYvn4p796cj5HfytASgbqHZJFNRCXxNNLmhBnpMnc9agltMDozx5Ru+NkEMBoYWLaWPwl4/CL7aYSw== X-Google-Smtp-Source: AGHT+IFzPyre+YMg4dUgD2pw/z3iYlRpfL8kQdyXg6OLlGMNgoW9u9/TJOljdla9PWi8icHIszCk X-Received: by 2002:a05:6402:4495:b0:572:65f7:eed0 with SMTP id er21-20020a056402449500b0057265f7eed0mr9048747edb.0.1715067468440; Tue, 07 May 2024 00:37:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715067468; cv=none; d=google.com; s=arc-20160816; b=k7xI8svvc7xxhx/lsJCtqlr0ouuPTx1PYJGc5gupIyyobdRktTxsBoMnYPJo9DedoJ /E9zPNJOTplr48Rtqb0STvUXIL+vw7G7r5xsoIVcQrArd/Pz2lnNbTP/bwncGI9wcAqV nJJGy6UvoqQFIY8BTb98t2J+uTBpJKhAlW9sIutsNqqY5AJFdU7VoF6n8X8K+FqbBF5Q XsyrEauT7KVSfzPeJDSPyhJvxsEJDZK1BWoFtRsjNyeenhOOauuxnAS8gq79oFW1th+C jeYxls2FAjJQOiO41AOtkx7TY/t6YzxsjlpmYJ8SeCf8URKYl7VzXl1jMkwUNpJsuE2w Ia6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=0QunvbHwXfo7lwjTR+vAPibWuXprC0WfWsJq1W5eVd4=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=MyjRPzwMYXPa4toKyqsVM2Zn0p0F/AOJtA/+m8wr8W3DjSwfKo6Bu8kzXahIAKwPx+ EjaF908boGfQ2nqSECDXj3y7GoDF/5Pyx+TLFNi3hURiC+J5qcG85buVM3clcaYoZvS4 ehyR3Fb9u33itQga9idAmU5GP4fhSTJ3dc7Xoolap1xdz0RO2ZMPb1iF1HFB33X10E4C HcDs7GM0InzmvYOgvp0z73cUyX5T0cuQoBLXkzdpx2odnBLVUhNLJ/2VIwEls58xiKsA 2v9r7EQ25Bm1r0WOz6WuL2Fz1KaJ8FJ4JT5AokTBlQcR48BnkpPpDrVePuSBoqRJBqpZ tQSQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=wM7dNTzQ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id b24-20020a50ccd8000000b0056e5c7059f7si6062229edj.590.2024.05.07.00.37.47; Tue, 07 May 2024 00:37:48 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=wM7dNTzQ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A110268D74D; Tue, 7 May 2024 10:36:50 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-245.mail.qq.com (out203-205-221-245.mail.qq.com [203.205.221.245]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4319F68D722 for ; Tue, 7 May 2024 10:36:39 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715067392; bh=JpBKHz93BXnIYuX/M5nEHXt/s689OvRbw13ryW6Rjuk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=wM7dNTzQMk9+lW7Rzb5Dr+tUbX4+41/qHuUWQRliKs6VnZdYk0eIYSz3LyCDIlfeE oMBacpxvD/1V2l06FMnmc/e2Een5e6pe6TdrZSRV2eabYRL4n7v6QVTFo0Xm2WV787 wraJV020tw0U/jNw1MZQVOtoxngETa3QsHOQZpfw= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id 9188D283; Tue, 07 May 2024 15:36:24 +0800 X-QQ-mid: xmsmtpt1715067391tciw5zi7l Message-ID: X-QQ-XMAILINFO: N8SbykO4AvKiBjwgzkiZRWVRjmb0UlUQf3tmEZNtl1PN/JghZxBAOEk75+aO1C 64Wl7SgQEwWkhg4Lqydgauj2yYkGlACMOdi2jnTcFMNKad2dG/OfdqjlcoMrCDfO+FvCTlh5orPo U203yJVaNUVokdJmPJAwXG2cbBssiWH/heAkB+uH/xGeFeCwKANiv4k6jyASMrlLIuAJFyPPH7W0 9LM0fhVIoaVHt8DX5u7JEAZ62Yv7dBTcfE02lFzdIZ2wVXztYQ/2VKgprEtzl0UCsG5Q8mcOuJs6 4L24xYyaUWRYUhHXiC4EfPPPHjKAQCjJNR45r7jZszB8gXE6pxliPQ4JZhGJ0pQjHYZvMsR7yp73 p+Low69XLrOdH6zjgpmFbhM5k+Mxc32dpgFU1Tcnh3ie3PxBal0UqeDIXui4PacVYAcxan7CnwLf LJXPvomOCZ0/eLM7t49Gml7NRK7qgN79tjIzhk9OmHeG0rDvJuYxWBSUK+aA6CHH2/3PCu6Tc1iQ UrhhILhsZceQA3eg7LYH9V+h+Izo2MMP6Y1CspDXzCe6cCUFlt7/poufMNzGv1vkQJ63DBWSNnMg ODxfUgIGy2am4zkq37BJqLRVuay+dp5c3PNVV7bvjVVYo0cXPERTPJGjMsylX4AeX8ILqkGyBWtx Fn5kbI+8Sxn7TbHy91RFcNGtPlwu8MG+NHmq/Eo9TM2NQhR9jily90fcYkCwEZEbigb9L8wbz+e/ Apa5iYU0OcY79qdjn2KO2Y9dCR/1Y/L/n0VWqyLggo4JZ8ei8qvMwuf1dfwJBEI1UOxsX4vntFCM HJevQ999v+PqNfkixt0VhGv6FXnOMYvVHimcRz89Pixz+p0HBF6PnJNsusuVFwFXRhwUY4tcnX9q 0VubaZbTuplRzaoymiVkZXw+Kt6A3vRw== X-QQ-XMRINFO: OWPUhxQsoeAVDbp3OJHYyFg= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 7 May 2024 15:36:12 +0800 X-OQ-MSGID: <20240507073613.2871668-8-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240507073613.2871668-1-uk7b@foxmail.com> References: <20240507073613.2871668-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 8/9] lavc/vp9dsp: R-V V mc bilin hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: kBP8fO+dbIXK From: sunyuechi C908: vp9_avg_bilin_4hv_8bpp_c: 11.0 vp9_avg_bilin_4hv_8bpp_rvv_i64: 3.7 vp9_avg_bilin_8hv_8bpp_c: 38.7 vp9_avg_bilin_8hv_8bpp_rvv_i64: 7.2 vp9_avg_bilin_16hv_8bpp_c: 147.0 vp9_avg_bilin_16hv_8bpp_rvv_i64: 14.2 vp9_avg_bilin_32hv_8bpp_c: 574.5 vp9_avg_bilin_32hv_8bpp_rvv_i64: 42.7 vp9_avg_bilin_64hv_8bpp_c: 2311.5 vp9_avg_bilin_64hv_8bpp_rvv_i64: 201.7 vp9_put_bilin_4hv_8bpp_c: 10.0 vp9_put_bilin_4hv_8bpp_rvv_i64: 3.2 vp9_put_bilin_8hv_8bpp_c: 35.2 vp9_put_bilin_8hv_8bpp_rvv_i64: 6.5 vp9_put_bilin_16hv_8bpp_c: 133.7 vp9_put_bilin_16hv_8bpp_rvv_i64: 13.0 vp9_put_bilin_32hv_8bpp_c: 538.2 vp9_put_bilin_32hv_8bpp_rvv_i64: 39.7 vp9_put_bilin_64hv_8bpp_c: 2114.0 vp9_put_bilin_64hv_8bpp_rvv_i64: 153.7 --- libavcodec/riscv/vp9_mc_rvv.S | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index 99605dfbb5..01404bbde5 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -104,6 +104,39 @@ func ff_\op\()_bilin_\len\()\type\()_rvv, zve32x endfunc .endm +.macro bilin_hv len op +func ff_\op\()_bilin_\len\()hv_rvv, zve32x +.ifc \op,avg + csrwi vxrm, 0 +.endif + vsetvlstatic8 \len t0 + neg t1, a5 + neg t2, a6 + li t4, 8 + bilin_load v24, \len, put, h, a5 + add a2, a2, a3 +1: + addi a4, a4, -1 + bilin_load v4, \len, put, h, a5 + vwmulu.vx v16, v4, a6 + vwmaccsu.vx v16, t2, v24 + vwadd.wx v16, v16, t4 + vnsra.wi v16, v16, 4 + vadd.vv v0, v16, v24 +.ifc \op,avg + vle8.v v16, (a0) + vaaddu.vv v0, v0, v16 +.endif + vse8.v v0, (a0) + vmv.v.v v24, v4 + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + const subpel_filters_regular .byte 0, 0, 0, 128, 0, 0, 0, 0 .byte 0, 1, -5, 126, 8, -3, 1, 0 @@ -330,6 +363,7 @@ endfunc .irp op put avg bilin_h_v \len \op h a5 bilin_h_v \len \op v a6 + bilin_hv \len \op .irp name regular sharp smooth .irp type h v epel \len \op \name \type From patchwork Tue May 7 07:36:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48615 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:6816:b0:1af:836d:81b3 with SMTP id wr22csp209018pzb; Tue, 7 May 2024 00:37:58 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCV5dgI5h5i+RTN33K2fK09nGYuGLFxbRBGxhsA14ziDSfLnGwsvl8onXDqu5X76azzji7MYV1WkXQYk1ZOgtO8tIZBS2i5zup7HBA== X-Google-Smtp-Source: AGHT+IGN8PXjtsiPdPxXI7IZKaee4eIUC8/fOgR1hmE64kwq6zwAj/QGHdzYUQXN2bcHKdVpyxXm X-Received: by 2002:a50:d691:0:b0:56e:3293:3777 with SMTP id r17-20020a50d691000000b0056e32933777mr10383734edi.17.1715067477911; Tue, 07 May 2024 00:37:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715067477; cv=none; d=google.com; s=arc-20160816; b=njphgdgesM9NjxnEB/DqPbmcKtz6VpUgw1Ne70OqQIRy2a4TZv0w19t5SkyFlznZiQ y6Ak9j53kMZXQHEnDigEYST6P2zWwLnGACJ3r2kiC+i11aLSg/PS75TCndP8HRQcvgwd sJlJg+3nqUKv/H1qVUHpiQ2uAaoEjJ/m0+EnwHjJpAfY4te1s55JAqtQ1a2Byvvl5ADo 3U+nBSxYdgSGvCK4tE5InVNDuDQqK0kz2evhhkxRdNH1y5qssV4w9pYkAuHweBzFTfJC GlkoY1llsSPSO0hkFiCyDABa/Ab6pNnGfHdj9jAisyTv75nVLdPZEyGhnFDkOJSehazv 7flQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=RoDujgpy6zXN9Cn6vMuazk1w89vsod9nuqoXUVycqPg=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=c3Bu50RNVUReQ0J8k5LHlqiHKUz1O8BtPgFyOZznGc+DYEno/JwJxa21M1Hwr2iyRf 6rbPOSGQwQzXS1rzRYm7kXkQjOa5/OqFigHbcac9rR4FbyCVdm2dWuoKl2r8Z8N6tSJw KskcC+9EX1x+jrT5LkwOIqTYbMqzB3mZcnBjCv08hSXf/GJxFlDdj8WimVZA39ekloTU UGKfNc7/xmaT+YMhHIirBQDxfGw/167nVVj3fs/6qnfZ1k3+3owcH8z1gxXHHoetw8Eu OAYUotYbc26PzkL/1tGxUr8JDujPpJCBlZwc/wCW6EwgiBjTJASWx03hJrBAR+1aPAgR 5++g==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=EB+pEX0S; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id r3-20020aa7d583000000b005729ebea898si5549034edq.416.2024.05.07.00.37.57; Tue, 07 May 2024 00:37:57 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=EB+pEX0S; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2F9C168D752; Tue, 7 May 2024 10:36:52 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-57-49.mail.qq.com (out162-62-57-49.mail.qq.com [162.62.57.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 88DAF68D722 for ; Tue, 7 May 2024 10:36:41 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715067393; bh=p20tqNdhnlPdo1GAcR5Z0Kj2CW7SjAVvoqbF7rxR5h4=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=EB+pEX0SZGUtJnptslQvw8tUxA/nz2cvk9bl2DiiJ0Lxckbd0zfuigVgIo8UeMOkL 4t/o+7otLvTWMdMrXMjZyX12I5sXFce4XkoX6HH4D1cTrdWDlb6Hts4x7LHu8Gzkud WjW/R01Puz45OyjD2MVaOughCHmp6LtAaBvH9YYY= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id 9188D283; Tue, 07 May 2024 15:36:24 +0800 X-QQ-mid: xmsmtpt1715067392tqjshk559 Message-ID: X-QQ-XMAILINFO: M0PjjqbLT90wL0yBJmkH6q4JmjXXYITvfwHrzdZ3qBKhNLdqCIdzZT9A0aJfYO V0NSHKlXOpt0PqGYKqOyvNN2jGzSPuhI3v7wvlqdNEYUeBFyibKM1cjmeAiRtMBcEwvQPa/qM1MD G2r0UiY4GWXEfe7oBLmBcgID5S6BjbOcXhEVLf07bxORyVxmGiQcOGGyNTx/OAKgX7pzz2FiEd+Q /Y+MkRQNBww+VVE5UBGFCjX0UFTAJtQlu7tWUPX3AWXf0FzlYlIeRMv921TENR7RkOPdS4/xztMh 99Xk+rG150uUPAn6IJ3LLkxKQqpAski0xEyj4ox+jGksxB30/oXJKX7vpIH7veRNjhvVYye3/UJs fU3uakfddsUHLjesIRshrNbWb0INdGe2G65a05FEy8vpIUYtFQy45US15BehaTLYVJX7ar0BE5qF DlukOYBsnvsht+MW49u3/0vESPSXLNJrOxl5E8h5mpZcfJRUTUkVzYxDh5ZPEJMH+WAvjVikx18H fklLYQ0CblQjQ+I78iZKj0ovM+EoBnQTS8cJvQNa6dnyjc0AsovcEoHhxY/L8874JdoY6lsa47pi FIa9mGY0H0R1Sx5MeiJMGqlqJ/t5KeJ/UDZqGDMHgJsNZh5zp2pGDFqQyB0RuThMEOoIrK7ksC1d WTtSE8353MKBaAZoKmAGa+k0JQYsGqH5n9vblwe6DpTdaMhoHriy6xpMs32V5h//GlUgfKyC2ACj LTcSvWWfwEFM6c7DnV4shAybOxE+8nXB+RqY2zLveQ5TZ2gnufQeh4bmsIQhAzt4C4KpTZN6RVAh 1CUewLKONZ0yEs5T/dd3w1LxeQfk+ZWtDqwIUgDjXJR58CX+Uk0jDNG3wKnRG+cFc1UbH4z1zZDW Ic3HKh2WBPpdqi3gA4+0QakjmSsQPCasah3uRAy+Wjq82+e5ICy7XLZSpcbOyEXkD8qEPDvQ9u X-QQ-XMRINFO: MSVp+SPm3vtS1Vd6Y4Mggwc= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 7 May 2024 15:36:13 +0800 X-OQ-MSGID: <20240507073613.2871668-9-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240507073613.2871668-1-uk7b@foxmail.com> References: <20240507073613.2871668-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 9/9] lavc/vp9dsp: R-V V mc tap hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ZpWP6NyMOP7G From: sunyuechi C908: vp9_avg_8tap_smooth_4hv_8bpp_c: 32.0 vp9_avg_8tap_smooth_4hv_8bpp_rvv_i64: 15.0 vp9_avg_8tap_smooth_8hv_8bpp_c: 114.5 vp9_avg_8tap_smooth_8hv_8bpp_rvv_i64: 40.5 vp9_avg_8tap_smooth_16hv_8bpp_c: 338.7 vp9_avg_8tap_smooth_16hv_8bpp_rvv_i64: 46.5 vp9_avg_8tap_smooth_32hv_8bpp_c: 1270.7 vp9_avg_8tap_smooth_32hv_8bpp_rvv_i64: 134.0 vp9_avg_8tap_smooth_64hv_8bpp_c: 4923.5 vp9_avg_8tap_smooth_64hv_8bpp_rvv_i64: 523.5 vp9_put_8tap_smooth_4hv_8bpp_c: 30.5 vp9_put_8tap_smooth_4hv_8bpp_rvv_i64: 14.2 vp9_put_8tap_smooth_8hv_8bpp_c: 91.7 vp9_put_8tap_smooth_8hv_8bpp_rvv_i64: 22.7 vp9_put_8tap_smooth_16hv_8bpp_c: 328.7 vp9_put_8tap_smooth_16hv_8bpp_rvv_i64: 45.0 vp9_put_8tap_smooth_32hv_8bpp_c: 1166.7 vp9_put_8tap_smooth_32hv_8bpp_rvv_i64: 131.0 vp9_put_8tap_smooth_64hv_8bpp_c: 4532.5 vp9_put_8tap_smooth_64hv_8bpp_rvv_i64: 512.5 --- libavcodec/riscv/vp9_mc_rvv.S | 94 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 3 +- 2 files changed, 96 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index 01404bbde5..0dcec94bbf 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -358,6 +358,99 @@ func ff_\op\()_8tap_\name\()_\len\()\type\()_rvv, zve32x endfunc .endm +.macro epel_hv_once len name op + sub a2, a2, a3 + sub a2, a2, a3 + sub a2, a2, a3 + .irp n 0 2 4 6 8 10 12 14 + epel_load_inc v\n \len put \name h 1 t + .endr + addi a4, a4, -1 +1: + addi a4, a4, -1 + epel_load v30 \len \op \name v 0 s + vse8.v v30, (a0) + vmv.v.v v0, v2 + vmv.v.v v2, v4 + vmv.v.v v4, v6 + vmv.v.v v6, v8 + vmv.v.v v8, v10 + vmv.v.v v10, v12 + vmv.v.v v12, v14 + epel_load v14 \len put \name h 1 t + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + epel_load v30 \len \op \name v 0 s + vse8.v v30, (a0) +.endm + +.macro epel_hv op name len +func ff_\op\()_8tap_\name\()_\len\()hv_rvv, zve32x +#if __riscv_xlen == 64 + addi sp, sp, -64 + .irp n 0,1,2,3,4,5,6,7 + sd s\n, \n\()<<3(sp) + .endr +#else + addi sp, sp, -32 + .irp n 0,1,2,3,4,5,6,7 + sw s\n, \n\()<<2(sp) + .endr +#endif +.ifc \len,64 +#if __riscv_xlen == 64 + addi sp, sp, -48 + .irp n 0,1,2,3,4,5 + sd a\n, \n\()<<3(sp) + .endr +#else + addi sp, sp, -24 + .irp n 0,1,2,3,4,5 + sw a\n, \n\()<<2(sp) + .endr +#endif +.endif +.ifc \op,avg + csrwi vxrm, 0 +.endif + epel_filter \name h t + epel_filter \name v s + vsetvlstatic8 \len a6 m2 + epel_hv_once \len \name \op +.ifc \len,64 +#if __riscv_xlen == 64 + .irp n 0,1,2,3,4,5 + ld a\n, \n\()<<3(sp) + .endr + addi sp, sp, 48 +#else + .irp n 0,1,2,3,4,5 + lw a\n, \n\()<<2(sp) + .endr + addi sp, sp, 24 +#endif + addi a0, a0, 32 + addi a2, a2, 32 + epel_filter \name h t + epel_hv_once \len \name \op +.endif +#if __riscv_xlen == 64 + .irp n 0,1,2,3,4,5,6,7 + ld s\n, \n\()<<3(sp) + .endr + addi sp, sp, 64 +#else + .irp n 0,1,2,3,4,5,6,7 + lw s\n, \n\()<<2(sp) + .endr + addi sp, sp, 32 +#endif + + ret +endfunc +.endm + .irp len 64, 32, 16, 8, 4 copy_avg \len .irp op put avg @@ -368,6 +461,7 @@ endfunc .irp type h v epel \len \op \name \type .endr + epel_hv \op \name \len .endr .endr .endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index a45aea530d..554fcefa6e 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -131,7 +131,8 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) init_subpel2(idx, 1, 0, h, type); \ if (flags & AV_CPU_FLAG_RVB_ADDR) { \ init_subpel2(idx, 0, 1, v, type); \ - } + } \ + init_subpel2(idx, 1, 1, hv, type) init_subpel3(0, put); init_subpel3(1, avg);