From patchwork Mon May 13 16:59:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48859 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp465516pzb; Mon, 13 May 2024 10:00:23 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVP+nauYwGg4bLhoPfGN1lOpFH7VYUg4L7UGzvy8S2fshOHIhOZdpwtrHGukpka2q/hGOO0f8dh1HNbaZhCA345WNtJFcr0puj0aw== X-Google-Smtp-Source: AGHT+IEEqzY6cKA0+GQRMUFczhWGBhlSYWe8b7TalYnyHo9+3VqCe+vy2+SVTB3EczkIDAdv1zvZ X-Received: by 2002:a2e:750:0:b0:2e5:4e76:42df with SMTP id 38308e7fff4ca-2e54e7644cemr61026061fa.33.1715619622609; Mon, 13 May 2024 10:00:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715619622; cv=none; d=google.com; s=arc-20160816; b=xS4dKPDxb9eRRgjdXKW/PCSBOq+P1fZH4xDnMZU/Kc1g5ZWKd5bHVHsjjpDBg4AN9X m1yIsaqxWqvQ6hXk79kVN2/uoH7RR3NLAmKqSN8xmesy3nSPdwa9y5cX5+amq4uqxEKp uPYwqhUaUjQK8P28vQCulsWZADBgAez5cRITm41yeNaZ+ENNrRpcZO6qrEEdq1bi8Qdc 8t9V/qLDWbZDOP4jYgIwKNY/wgw2iOyTAzMMgdp1v7ksGnyxb+Vy+dE4WhXKZD/zStm7 rS++Ti3bh8k3dzeZC5Ii2kvqDbXBMGMQearh4xUnsPVJYQ5YzK6AA/Gv2yf/H4+OHLVR qVBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=+mTs4XbJhn3u4bT9Gelo6m75o2G7yHQ0zD2lFviRr+s=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=bDtKV2vRIu7xGnt++iT4czdJA5g6dJZmukZgualLkj7JB8pweEyYWkOLzn2c23RQw8 S82abFmD/YFlwoPHITncv+5/pCWGXHMDRHrb4/wj5416alJWtjUcP/gETRP2tft9YZ0F Ht8J9/OaNnuJOQEg8N10dkWeGV+wIlW324mqVdqsDRGLjYceRpcaCwUZGBrQhRqUqp1S S6W49VEm6RQCChyHLDgmIXkdmheWdGPrPAUKpO9SOPHvcSskUw/3PF1nsprbJLkXr1bd rN3cqshkHxcdqUQ7HcMB1EoqLdFM5PiHMFOHPiT7Kfeni3FYr7COLpsv+XXN6pRBy0PK 6QtQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=fcyqMiXa; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17be6714si506228366b.723.2024.05.13.10.00.22; Mon, 13 May 2024 10:00:22 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=fcyqMiXa; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3790E68D5EC; Mon, 13 May 2024 19:59:58 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-57-49.mail.qq.com (out162-62-57-49.mail.qq.com [162.62.57.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id AF5A568D5D9 for ; Mon, 13 May 2024 19:59:48 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715619577; bh=wISf7T5wq85PhZ1s9HbBvJLtfQfqJexcoJTWBTdPfLw=; h=From:To:Cc:Subject:Date; b=fcyqMiXahZDAJNHbDVO68Ne3GmQpWipOay3TflId1ca2Hpds+23B1X5PQFA03tkGv S3hVDvJQ7GqoVH9k/VqVv1LOkaGJG/kmAIAv1dwaRkNGg9L1f1f9LFl9VmYqkx9NHR loax9i9yFX81PcYnRu0QwZLJIEiBl6Z4W3gkOhrM= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id EE3A4485; Tue, 14 May 2024 00:59:35 +0800 X-QQ-mid: xmsmtpt1715619575tcc4nrfd0 Message-ID: X-QQ-XMAILINFO: OZZSS56D9fAjSHt1vYhIyRgXitK1kgYmkOKvB0kbYkJEww2c+YJYIT/lUEy2rL KXpXX2IGz6xDEuh1/ffcxmOo1+RVZN3AXbe+y070ihjS5HBFKPx/9Pk7cz8ctxAr6VvLlIUMGIJv ggaK/H5VmxulaelQUYgoZDdtoelUufRsjo2zqfNJDo7N4b3kktGcn+pzSqqurMxlDsqlqYDsY5a+ g5oY3a3YqkgfmB1hN02trPcOlS1LCspfDOsP1pB2VXMW91cfE7ABsvHBjS8YzvXIY5BZZO8eX3DL 6/HCR/ROuL9m73a5M08zrjgPT+lWKdY5Sqpy0fTkaELXQ7Xh4h4N/Ltex8ZPC1FSHdtBA/P4nKBo sm1Kb2vt1YtzJsMZlhB7rrN+QRGx2lFvQbT7np8CIScbZqo1qN+q14Tycy5bfmacNvDf7n2HMeMc CWfbW7MVnhYrqym9ccj53MRZ6NmbUwkfcXrHzLCajlQ41aaQ1goaGQNDDFNz7GrV0dIDg3TC5Ge5 68FKf/+cmjEo7mx7sav29zzKaTazm1Mn5iAoG45kdVxJV51gCgfU4ev0IOg9CNk5HiZQPadz+l+U Tg36M6hGYToH8O28iktrPG5Wi19giKXElHBTOLTX8qoFzz0kGp42B5a8WF1P5Mh/m7KD5hijtbB6 wroKYtimLLSSKNqpgwl4i7QlJkI698HiGl6XFib5s/Ai2Sc2Rean3qFP3PlxSPWDSGdWPFO8B2LH ysT5Megry1x3OxJ5bpDZ1uJWc8OGeZ4i18tLQDZQoQiZrwAZNCKiXmIvsoGhUhKRcy52vzlDa0Cl EhTFsgmQ1BXsIaPWADmqW9V2hvRixHHQbpErUntQZzQ55hnQZyTxjBVtLuqW1aBHRp4QPsdOo5x+ YeCmHRxghY+vWwPseOFUOvDrUkGnRdGpwm8Jj0OpdhkpsbZLvfRMM= X-QQ-XMRINFO: NI4Ajvh11aEj8Xl/2s1/T8w= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 14 May 2024 00:59:18 +0800 X-OQ-MSGID: <20240513165926.1467967-1-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 1/9] lavc/vp9dsp: R-V ipred vert X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: CB/FyMY5FYLO From: sunyuechi C908: vp9_vert_8x8_8bpp_c: 22.0 vp9_vert_8x8_8bpp_rvi: 15.7 vp9_vert_16x16_8bpp_c: 71.2 vp9_vert_16x16_8bpp_rvi: 39.0 vp9_vert_32x32_8bpp_c: 300.2 vp9_vert_32x32_8bpp_rvi: 135.2 --- libavcodec/riscv/Makefile | 1 + libavcodec/riscv/vp9_intra_rvi.S | 71 ++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 6 +++ libavcodec/riscv/vp9dsp_init.c | 15 +++++-- 4 files changed, 90 insertions(+), 3 deletions(-) create mode 100644 libavcodec/riscv/vp9_intra_rvi.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 89273b1cad..ccd060c666 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -62,6 +62,7 @@ OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_init.o RV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvi.o RVV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvv.o OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9dsp_init.o +RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o diff --git a/libavcodec/riscv/vp9_intra_rvi.S b/libavcodec/riscv/vp9_intra_rvi.S new file mode 100644 index 0000000000..16b6bdb25a --- /dev/null +++ b/libavcodec/riscv/vp9_intra_rvi.S @@ -0,0 +1,71 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/riscv/asm.S" + +#if __riscv_xlen >= 64 +func ff_v_32x32_rvi + ld t0, (a3) + ld t1, 8(a3) + ld t2, 16(a3) + ld t3, 24(a3) + .rept 16 + add a7, a0, a1 + sd t0, (a0) + sd t1, 8(a0) + sd t2, 16(a0) + sd t3, 24(a0) + sh1add a0, a1, a0 + sd t0, (a7) + sd t1, 8(a7) + sd t2, 16(a7) + sd t3, 24(a7) + .endr + + ret +endfunc + +func ff_v_16x16_rvi + ld t0, (a3) + ld t1, 8(a3) + .rept 8 + add a7, a0, a1 + sd t0, (a0) + sd t1, 8(a0) + sh1add a0, a1, a0 + sd t0, (a7) + sd t1, 8(a7) + .endr + + ret +endfunc + +func ff_v_8x8_rvi + ld t0, (a3) + .rept 4 + add a7, a0, a1 + sd t0, (a0) + sh1add a0, a1, a0 + sd t0, (a7) + .endr + + ret +endfunc +#endif diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index 25047ed507..f8bc6563a5 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -60,6 +60,12 @@ void ff_dc_129_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); void ff_dc_129_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_v_32x32_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_v_16x16_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_v_8x8_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); #define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) \ void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index dd418bd5bf..0f64afc6d2 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -24,11 +24,19 @@ #include "libavcodec/vp9dsp.h" #include "vp9dsp.h" -static av_cold void vp9dsp_intrapred_init_rvv(VP9DSPContext *dsp, int bpp) +static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) { -#if HAVE_RVV +#if HAVE_RV int flags = av_get_cpu_flags(); + if (bpp == 8 && (flags & AV_CPU_FLAG_RV_MISALIGNED) && (flags & AV_CPU_FLAG_RVB_ADDR)) { +# if __riscv_xlen >= 64 + dsp->intra_pred[TX_32X32][VERT_PRED] = ff_v_32x32_rvi; + dsp->intra_pred[TX_16X16][VERT_PRED] = ff_v_16x16_rvi; + dsp->intra_pred[TX_8X8][VERT_PRED] = ff_v_8x8_rvi; +# endif + } +#if HAVE_RVV if (bpp == 8 && flags & AV_CPU_FLAG_RVV_I64 && ff_rv_vlen_least(128)) { dsp->intra_pred[TX_8X8][DC_PRED] = ff_dc_8x8_rvv; dsp->intra_pred[TX_8X8][LEFT_DC_PRED] = ff_dc_left_8x8_rvv; @@ -53,9 +61,10 @@ static av_cold void vp9dsp_intrapred_init_rvv(VP9DSPContext *dsp, int bpp) dsp->intra_pred[TX_16X16][TOP_DC_PRED] = ff_dc_top_16x16_rvv; } #endif +#endif } av_cold void ff_vp9dsp_init_riscv(VP9DSPContext *dsp, int bpp, int bitexact) { - vp9dsp_intrapred_init_rvv(dsp, bpp); + vp9dsp_intrapred_init_riscv(dsp, bpp); } From patchwork Mon May 13 16:59:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48857 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp465204pzb; Mon, 13 May 2024 10:00:00 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWA0ekh0rCol7z/yOq+Dhw/fHQcbf5W9zBTgjAFN5/kioR9686+6WzMRa5NsPj09JbQmMg+Y5tYWnkEjwakjne91f+rpm1uLOe5zg== X-Google-Smtp-Source: AGHT+IHRq2X+mI6bn1GwvSkXhzMANHOBwuQOqA2KcVVuK0vlSomjNSfUlPJilbYfdGlX1yG3bEZd X-Received: by 2002:a50:fb08:0:b0:573:5c17:f6f2 with SMTP id 4fb4d7f45d1cf-5735c17f781mr5542629a12.24.1715619600263; Mon, 13 May 2024 10:00:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715619600; cv=none; d=google.com; s=arc-20160816; b=j2ASIx+DYOrI31fxiepXTb+W3NchmWGBh2EdajPo3W1HC1Izs6OCDA/PnlzaSg9qc5 WGAxC0Yr0EU9nPiTekRDGQR5KBQurc/U6CvgzYDawHDTXyH8zg5yQ8qvRWbs+/tnFtj+ ahjiJPvngzYFLJITSPSOo3h5t3f7ipBPCWHasLxM30wd6NmKRAQewkP9oqrsyM8Mvyix iasVRaLNtHhjrdOxMJWoPNjmNB5vOfh9jIF7kHtj3A3kD4wMS9FwsgcLKX4UX7C4M3ZC bBgsamesKYQ0l4SidjpsPd+uCpwHeV8kn8dwS9z3CJED8mxTnDKo4Aq+mSkfGBi30ZHV t1eg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=PPYX8L1ZVcjXHRG8BerPZoSq/Fde9RGU7k2yAmFIXXg=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=IOwBLOU//ru6SdlTpfH9Fa15GSYjbBfn+rnDnmi+zSsjcuQQfEwP+/0cqB5JB70yhz WPmy3yeJ7jX4ME/HTcefgzAYfziXckE13Dhj9AGYXQNqPWqOC7cT5IuVjRDlH2DrvEAn FZBMEfIxqs5P7aSL0ugD/aX8Dg4vg37eF972D+sMTSnIBSodSdieyglV3FyOq2dtoLep 4bsbG2isiqWkiVWZV8oq++kypooy7JTy6izifozSAesCevIswpdBtvUNgdnehIj9hFhJ 5K1fsy916CTenKrNYa2AHpW+BTNeqI/h9lKTRFZUPlfxIIXUxNsN6pcmu5jTxK4T7SkA qCdw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=PwsHNKT9; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5733c37942bsi5089650a12.689.2024.05.13.09.59.59; Mon, 13 May 2024 10:00:00 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=PwsHNKT9; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9F9ED68D649; Mon, 13 May 2024 19:59:55 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-57-252.mail.qq.com (out162-62-57-252.mail.qq.com [162.62.57.252]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0320768D319 for ; Mon, 13 May 2024 19:59:46 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715619578; bh=abQgQOXPqwUttUlYyEXhDPri1dRv4SchjBC6M6aJWZI=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=PwsHNKT9Pyn4QW+MD4lMKZQ3hFzGcpnpR7C8HA3adCRAj/CigLOfl+yU37a0n/Vco ADL0MIfx7psG4P+4+/K7LKVwNW7bccQn1Vpd+gyYcuckLoLnG76DJhgMfNDBnIPD5y IZw8gy+1igjF+Yz5cg36O3+RjtSA/6G80T5J6DTc= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id EE3A4485; Tue, 14 May 2024 00:59:35 +0800 X-QQ-mid: xmsmtpt1715619577tsnk7wasq Message-ID: X-QQ-XMAILINFO: MziGzrjZeogZp4LEQbZ5OLnTQaEI2OwcoLO9iOY3RTYExVM6xJgh4iUamX6AoA YSsuumL/tumHwNfnttISJYmkS4uZMXOH2HJGHbFgrzLMcxjRbf21kDQRoCSryJrh30MTzlzwFjDj bdk/rl+6mJWJTFh2iIERRGnCIkjf6bfroshYZLNVSBdi2/ZZ//mEljj2GcWdKVR06Nfaxj+I8GWu KVoe1xLC8pbtZfjDW63UVqlPVzDiUyRr+eZjgzEPt4bEufdXLqY6GgIYhapnbypCOkS8bpeHyFsq 50fxUgY4TeiGFmCTVhZObMwztQBLh3h1hDRsCGlZCB19J7gEfiIeaKGf0/4Di1OIQUHZRL0Bc65+ gImgCILxTm3TKHdMJ7jHwptrnwuO7tdyulhDHBN66P+bPQFeKxQ75X+rEOFVz5FFMIYPcfRUVOR9 dJ0T2pRBpDbnx34sWB4ibS/RPkWCO24YyOXnrRJFrIEV3P+fOgUKfCAdIH+oa2HBx/qtNb+z1969 QQIlOhEXBa24KhXosKMC8B7brEX1M7fiyq9HlEmb3BiZg+J12n7i6on2w7nSeNPnJB3hdFA8NGot NN9IjSFbTnMMMQJK8HWSc4T9JFDu//1iZ2CEtUIOmCRaX528syCPxm5ZNqMcZlp4dT6aAahQWAC0 xcmm2mIUyejhDecaiSqVwEzUWdNK0zMLNB5Pw2mhJvM4jIc5ux4mt3Rx2XfkO3rkGdUYXlrzvvzx 22rMk+ZLr1eS7DXIiMhnfit0cNIMjpWEU/YGZjgRdKlP853fbHwDlODyeiLMYnkowNTMBlzasJWp 3B6xiWA/3qsJKPpAEXGaKuW8hnPuwihbEtUwBcbPZeKAVYkf7GufanKB4TtrxTWwUYon+ZGHlZFO GhvmcDEwI/+iDYxQUb+ce9pe+UehoiaLnXMlGhzEkwL6crzB3SEgInBkl8L2K3cR/ceWwfpmGyW7 wB/a4b7h4= X-QQ-XMRINFO: NI4Ajvh11aEj8Xl/2s1/T8w= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 14 May 2024 00:59:19 +0800 X-OQ-MSGID: <20240513165926.1467967-2-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240513165926.1467967-1-uk7b@foxmail.com> References: <20240513165926.1467967-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 2/9] lavc/vp9dsp: R-V mc copy X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: rH8wSupvw+xz From: sunyuechi C908: vp9_put4_8bpp_c: 0.7 vp9_put4_8bpp_rvi: 0.5 vp9_put8_8bpp_c: 2.5 vp9_put8_8bpp_rvi: 0.5 vp9_put16_8bpp_c: 16.7 vp9_put16_8bpp_rvi: 1.5 vp9_put32_8bpp_c: 37.2 vp9_put32_8bpp_rvi: 5.7 vp9_put64_8bpp_c: 107.5 vp9_put64_8bpp_rvi: 21.7 --- libavcodec/riscv/Makefile | 3 +- libavcodec/riscv/vp9_mc_rvi.S | 105 +++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 3 + libavcodec/riscv/vp9dsp_init.c | 28 +++++++++ 4 files changed, 138 insertions(+), 1 deletion(-) create mode 100644 libavcodec/riscv/vp9_mc_rvi.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index ccd060c666..0cd900104f 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -62,7 +62,8 @@ OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_init.o RV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvi.o RVV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvv.o OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9dsp_init.o -RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o +RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o \ + riscv/vp9_mc_rvi.o RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o diff --git a/libavcodec/riscv/vp9_mc_rvi.S b/libavcodec/riscv/vp9_mc_rvi.S new file mode 100644 index 0000000000..0db14e83c7 --- /dev/null +++ b/libavcodec/riscv/vp9_mc_rvi.S @@ -0,0 +1,105 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/riscv/asm.S" + +#if __riscv_xlen >= 64 +func ff_copy64_rvi +1: + addi a4, a4, -1 + ld t0, (a2) + ld t1, 8(a2) + ld t2, 16(a2) + ld t3, 24(a2) + ld t4, 32(a2) + ld t5, 40(a2) + ld t6, 48(a2) + ld a7, 56(a2) + sd t0, (a0) + sd t1, 8(a0) + sd t2, 16(a0) + sd t3, 24(a0) + sd t4, 32(a0) + sd t5, 40(a0) + sd t6, 48(a0) + sd a7, 56(a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc + +func ff_copy32_rvi +1: + addi a4, a4, -1 + ld t0, (a2) + ld t1, 8(a2) + ld t2, 16(a2) + ld t3, 24(a2) + sd t0, (a0) + sd t1, 8(a0) + sd t2, 16(a0) + sd t3, 24(a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc + +func ff_copy16_rvi +1: + addi a4, a4, -1 + ld t0, (a2) + ld t1, 8(a2) + sd t0, (a0) + sd t1, 8(a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc + +func ff_copy8_rvi +1: + addi a4, a4, -1 + ld t0, (a2) + sd t0, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +#endif + +func ff_copy4_rvi +1: + addi a4, a4, -1 + lw t0, (a2) + sw t0, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index f8bc6563a5..b8ff282f8a 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -167,6 +167,9 @@ void ff_copy##SIZE##_rvi(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); +VP9_COPY_RISCV_RVI_FUNC(64); +VP9_COPY_RISCV_RVI_FUNC(32); +VP9_COPY_RISCV_RVI_FUNC(16); VP9_COPY_RISCV_RVI_FUNC(8); VP9_COPY_RISCV_RVI_FUNC(4); diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 0f64afc6d2..dace51cf06 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -24,6 +24,33 @@ #include "libavcodec/vp9dsp.h" #include "vp9dsp.h" +static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) +{ +#if HAVE_RV + int flags = av_get_cpu_flags(); + +# if __riscv_xlen >= 64 + if (bpp == 8 && (flags & AV_CPU_FLAG_RV_MISALIGNED)) { + +#define init_fpel(idx1, sz) \ + dsp->mc[idx1][FILTER_8TAP_SMOOTH ][0][0][0] = ff_copy##sz##_rvi; \ + dsp->mc[idx1][FILTER_8TAP_REGULAR][0][0][0] = ff_copy##sz##_rvi; \ + dsp->mc[idx1][FILTER_8TAP_SHARP ][0][0][0] = ff_copy##sz##_rvi; \ + dsp->mc[idx1][FILTER_BILINEAR ][0][0][0] = ff_copy##sz##_rvi + + init_fpel(0, 64); + init_fpel(1, 32); + init_fpel(2, 16); + init_fpel(3, 8); + init_fpel(4, 4); + +#undef init_fpel + } +# endif + +#endif +} + static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) { #if HAVE_RV @@ -67,4 +94,5 @@ static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) av_cold void ff_vp9dsp_init_riscv(VP9DSPContext *dsp, int bpp, int bitexact) { vp9dsp_intrapred_init_riscv(dsp, bpp); + vp9dsp_mc_init_riscv(dsp, bpp); } From patchwork Mon May 13 16:59:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48858 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp465393pzb; Mon, 13 May 2024 10:00:14 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXW5AzxB0CKnZiBVtJXjAZodzvSRV9fNAFluKu3hHKU0J5wXbaEx+Pbi7AawGJmHIW8GC30pRbD/jn+eMEmJgZVF3jn9fWCGbGANQ== X-Google-Smtp-Source: AGHT+IEHCDKBzpa2lMS1KK7yv00ZsM7ZEanCWnLbxLF4UGOrZM518m57pJtLtorjY8UCmxv2jGup X-Received: by 2002:ac2:5b84:0:b0:521:54b5:86a3 with SMTP id 2adb3069b0e04-52210273d59mr6065869e87.64.1715619614159; Mon, 13 May 2024 10:00:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715619614; cv=none; d=google.com; s=arc-20160816; b=cMKrhi0jSdaIL5VCr9JaoFj8YJulPUubeNHZjvhW/MvpdtYGZmWKu7fjQMBz2WPf1F cRBRuWlv7A6Kjho2AeDhWxjqmEyu7xybUv1bkgM7Ktv1pQH77wfxa1XZrUoSnskETIlu 0jRV6S9/XkBlg/bfYgY6Wpr4FTKNyKuX4tnUPkFxuXmcCMR4hZYvN//AW4Erof3aYJZT 59U4y9kHiF2RoMLuLgap6FLuP1ygtTLW9wMmjlQwYz/620CSWglNpPKB/owNcFFai5Qf FdDJfsGmiXH2uW4+C+3XgHxsmyjE7v09P06b/SB27lA4TBf2Mb6+uiFnZpK5cwnh0bKf 3ijw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=EQ5YH4moLmnLhpQvH0apV5MnKIr+MgKUX4A8mugNiIM=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=u29yf9wYAgXvtFQhuqNzXSiIMQQzlL05vQBG1T5WA2U6U9UlnYSyc8IpXPj5z+wEEj d6Ri3FLTtuRnVzBPLA1S9zL871Nm4PR41RQ5PKXDcTMGLU4w1zMRU6EcB35W7HMxqQz5 9kjAuSoxfYSwqEXh+HbwzXZQ/cj753WeztwduJp14p7/CL1qxKMd0In5G2hz13C0PM32 2IKAgyQ3JejIAZh6/I0bHh27pnET04fIGdBHE7/4Rd6bfmpOa6xDzvFUeCANsnMCZ/B3 z92Xl3yumEUKvIAKUJVzMyB7zX/uEywzyClNqX+KJfbgf9yo5CpGsWAy9RVDqe1KpUgJ y84g==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=yzVXVVTy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-521f38d8ff7si3149744e87.354.2024.05.13.10.00.13; Mon, 13 May 2024 10:00:14 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=yzVXVVTy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0641868D56F; Mon, 13 May 2024 19:59:57 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-57-252.mail.qq.com (out162-62-57-252.mail.qq.com [162.62.57.252]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DFA6968D56F for ; Mon, 13 May 2024 19:59:47 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715619579; bh=h2lqUfaZ3+gTdHW/zMKkDGtC/tKDxdgLULO6Z7reL9M=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=yzVXVVTytF0+m/a9C4kumKJM/XoHUuT56u2mF+KDGpaLix17Ko93sOTsmfVSu82DP MHh+Qoi/yITt5Qo9my2eLy7zT0El0xJbujrel2BkWgo1H2LyRVKwe3ZY1B+xf5wckI FM/mFdb58/asCrQ1otiNO6aySIUCP8Cjw8bSI45Q= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id EE3A4485; Tue, 14 May 2024 00:59:35 +0800 X-QQ-mid: xmsmtpt1715619578t6vxkdehr Message-ID: X-QQ-XMAILINFO: MIZfLI1VMPgsPHZntvSQHzHCdOmXCvnMsiEJyIwwsvEc1Sp2rCy4NYCMLEjTxD esgHksDoFGZUqPucCPaq6SzPy29sWsXeHhtYOCCkvoRkugqcUOGgZsJ4QN62N1cFqQWDZsddrlHL 5JfiiJLjCORufty90b+s7580HPND1/4aBtbaTu9JHq1juNgU0h3khbpSRFRszQjs5LlLjsU/kNi7 kWjXAutIMiSsRJM7QnfVAQ75yNDGLbKuauMmW/71xvzyXKe3lOHfXodHoRUva0QGSi/9BgjnVue0 Ko0rXJ+QQqfEcj1NsYQu1YdQZmUzlGKsjd1F6QphwOvUV3AKlJflIqvRBniwqvrEOWRmXKkItQnI pOBKa8bUYkTFU4bUSzPbuh6FBf9V93gNO2X3J1d6bF+L6jSrskigddgraWaU1PCapVBXFisfXQim raXBGdXw0HHrznhONiclVk8kq7uK1i21X7j75jKY9NzSlqBoCE6IwGe0ani2rBmByF4bSX4yfpG1 pzY4GKHCw2m05Ezgb+3/4QpM6ufsdJ6H1jPvKYkhgFBKlbrKn4kTj1UWP+Ha8E4iT/vewGqPbMna 8Lfcmn6EizJjTNais6JhsMSuXrJ9bE0QSbID3Lx5d+nIkfOmCgbzAtdjiEnW0wGZPUw0Kd2EL1AB WBId2ZZfb5W1Mr2rAJuw8+snaiBIdUL1LznPapgkhHK6HEdFKyKvmBigN2I0Efg7nzZ0ipavwzS0 MG78YHC1FVkazuJg1hztUTBaJzNKjw4k21FsgRlZWFlww79PJ+rc1w/0BBWuvK+R4ZHdUHvMkGEL 9lQa7CzqwQr0LCoWrQruo0lwG+HptKVFyq/rNABeyZh2HaNPAexSybYoUaI0g1Q95M0b+0Mj2B8j lhuIDUQA8lViIkrwR/hfgPY3PQWb3uPWSR0uo1vFigU27F0mMTHhbrCD8r+bI4KA== X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 14 May 2024 00:59:20 +0800 X-OQ-MSGID: <20240513165926.1467967-3-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240513165926.1467967-1-uk7b@foxmail.com> References: <20240513165926.1467967-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 3/9] lavc/vp9dsp: R-V V ipred hor X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: KZAVPnO5Mmoz From: sunyuechi C908: vp9_hor_8x8_8bpp_c: 74.7 vp9_hor_8x8_8bpp_rvv_i32: 35.7 vp9_hor_16x16_8bpp_c: 175.5 vp9_hor_16x16_8bpp_rvv_i32: 80.2 vp9_hor_32x32_8bpp_c: 510.2 vp9_hor_32x32_8bpp_rvv_i32: 264.0 --- libavcodec/riscv/vp9_intra_rvv.S | 56 ++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 6 ++++ libavcodec/riscv/vp9dsp_init.c | 3 ++ 3 files changed, 65 insertions(+) diff --git a/libavcodec/riscv/vp9_intra_rvv.S b/libavcodec/riscv/vp9_intra_rvv.S index 40e38ba83e..ca156d65cd 100644 --- a/libavcodec/riscv/vp9_intra_rvv.S +++ b/libavcodec/riscv/vp9_intra_rvv.S @@ -117,3 +117,59 @@ func_dc dc_left 8 left 3 0 zve64x func_dc dc_top 32 top 5 1 zve32x func_dc dc_top 16 top 4 1 zve32x func_dc dc_top 8 top 3 0 zve64x + +func ff_h_32x32_rvv, zve32x + li t0, 32 + addi a2, a2, 31 + vsetvli zero, t0, e8, m2, ta, ma + + .rept 2 + .irp n 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 + lbu t1, (a2) + addi a2, a2, -1 + vmv.v.x v\n, t1 + .endr + .irp n 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + .endr + + ret +endfunc + +func ff_h_16x16_rvv, zve32x + addi a2, a2, 15 + vsetivli zero, 16, e8, m1, ta, ma + + .irp n 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 + lbu t1, (a2) + addi a2, a2, -1 + vmv.v.x v\n, t1 + .endr + .irp n 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vse8.v v23, (a0) + + ret +endfunc + +func ff_h_8x8_rvv, zve32x + addi a2, a2, 7 + vsetivli zero, 8, e8, mf2, ta, ma + + .irp n 8, 9, 10, 11, 12, 13, 14, 15 + lbu t1, (a2) + addi a2, a2, -1 + vmv.v.x v\n, t1 + .endr + .irp n 8, 9, 10, 11, 12, 13, 14 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vse8.v v15, (a0) + + ret +endfunc diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index b8ff282f8a..0ad961c7e0 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -66,6 +66,12 @@ void ff_v_16x16_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); void ff_v_8x8_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_h_32x32_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_h_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_h_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); #define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) \ void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index dace51cf06..eab3e9cb0a 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -86,6 +86,9 @@ static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) dsp->intra_pred[TX_16X16][DC_129_PRED] = ff_dc_129_16x16_rvv; dsp->intra_pred[TX_32X32][TOP_DC_PRED] = ff_dc_top_32x32_rvv; dsp->intra_pred[TX_16X16][TOP_DC_PRED] = ff_dc_top_16x16_rvv; + dsp->intra_pred[TX_32X32][HOR_PRED] = ff_h_32x32_rvv; + dsp->intra_pred[TX_16X16][HOR_PRED] = ff_h_16x16_rvv; + dsp->intra_pred[TX_8X8][HOR_PRED] = ff_h_8x8_rvv; } #endif #endif From patchwork Mon May 13 16:59:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48860 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp465667pzb; Mon, 13 May 2024 10:00:33 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUfVYpOcIA+Mo8uNaBab1bcXG1+Qzg/fDRJ8qH08wBS71Z2ipDuhZGVKNUCje2NsaKfFpyw1b3t32ViFqDbUpVBok/idDpMlQ/fYw== X-Google-Smtp-Source: AGHT+IE1zUcModhj/175jSm+nXT3TNXhcP7BM+e6rex2xnQzbg59oR/9xjYFjgpd0+Cbeee+DKUl X-Received: by 2002:a17:907:3a14:b0:a59:aab3:bb81 with SMTP id a640c23a62f3a-a5a2d67d3a3mr563716966b.6.1715619632845; Mon, 13 May 2024 10:00:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715619632; cv=none; d=google.com; s=arc-20160816; b=VvzoQA47XBWae28rfisQWYOgEmfLAttyBwEG5hiEIFvv1OXJfxSJDb2CpMOAaum+tT 6o1zyGbL+ElvqHv+mEb0GEDTKad2pYNw0zR3CaKEEM4A9Oexn95fbCYa9IZxdqxLzBiU B91CK9OV+EAIBkeuy8Tk+6btanQ3lZgNo8Eq9Amg2M8VMt4qNen47hPHyW3L84pWiSgo 8Yz7Y7FHjHLmZLWVtvsJa78jP780ffJ3Ue6UdAXPO7O6+XElWl9AkANjdvijwKUA1Cxo yfG7nG6YLLx4mMHO0uhaxKRDoMfAVM3L7ue6k37GrgO1RnX9vTYOeUd0nmQCCZ9gTCTZ uFZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=NYexnzwFn8TsaYNZir+4muAyJmlxJg3Br0XwU/99AYw=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=rum7MjXEMOSygjvw82u44BXGTvuKQZKFJNj0bdtFjnMT+FHdXQTxJaxk8mfMd+AVpm OTJ/iCRK8HboptTH3i6OgfZpvsuuf/YfCYbSXjlX5lODzvLjRK5/K+8LGVaXLN8zX667 SksEpo0IwzHIjVET1SE4Evm73HRSwuqZQzy4B4zKVd4m4Gf430hfaihh00FJXKmggIdq vaPrBc4Rvj79xOEZ9rKpAmWYKm3tAkuOCjx570Y8LxRbzcAKjdXszE+yXyVqbcFAbgwE F6vfpVHxCH48nTFTLtJl/JEt/G0Rx8l4f3/sKX7iiVi52MAi8yHwiUKOPguKoSyIQV66 yzng==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=J8uZV9XX; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a1797bf3fsi524508666b.234.2024.05.13.10.00.32; Mon, 13 May 2024 10:00:32 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=J8uZV9XX; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 551E368D6AC; Mon, 13 May 2024 19:59:59 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-57-49.mail.qq.com (out162-62-57-49.mail.qq.com [162.62.57.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A841B68D319 for ; Mon, 13 May 2024 19:59:48 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715619579; bh=8z2sjCc5cMVtLhKsPghpnqDC/xL7O7k5l2TOQa7c1eM=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=J8uZV9XXq8W43wQHt9Iol6jmyrOWtv54AZGvNDm9L8BHFb61422XJnfOB9iHwskNd quDugpCL1PzL/4heGsM9hJmGPhv7pOt1qdWbvBzXgyelmwyktTZelhrX+i3iAtAcl9 MLjvs8dWXLs+uEwUS8s8G9Cymg9KA9QsGY1e1c9A= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id EE3A4485; Tue, 14 May 2024 00:59:35 +0800 X-QQ-mid: xmsmtpt1715619578txdiea928 Message-ID: X-QQ-XMAILINFO: Naw3xMhTNeIuzVWdI/rVxjdrvHTKt1m448/4dKND+D58Joc6VatYBP1pAI3pou m/LRSZBpr+IjAPZlfzJ5TsbqlIkzhNpxMPukHxx0VQNjkKY7AQljMPIgrPi+PPNCKDdOOpvdqCkN Y1Sx+hI/72XFWw36X6dj/vowdvHH0b3Qs3D6dPGG0QMlPjrtQoWWXabVEeLnZ9I2MSs3BDoTp8Vi /ONesIKfK/UWIMwb//admVR7rn6VBszJgsnhopBP8rpyO43yift7cmmDUt0YgEeAVjV1q/CwrWIa nOF1Fbxw8bUedX86IMnhV53Aye/stdrBnLNIEr8uTgVlb553iLREZr4i2mTFpg+Xc7xmPj2Jwn6+ jmRrAJxnNY12SYaZt+qDmEr9HpVBe8fSrb9whZ1iexxJT1Ox/yMeCdQBfNu7bGLcWD6rCOJ2HOgw fjrg32bZcjDTZS9Kyi1AgLD9ErBLUo4fK4/CH82fvA3WnG9wNJTVlNzOSZugNGkgpfSzC7/BPppU HIraW/lKbFGm/4qi+IBGkNHauNGmv/1jh6G1iPPYDj4t2OpZzxyC61J3be98c+yaqRM90ckr0BLI JkCMJ5kzu8IoewkzPUnRZ9M7vtW56Q3lJhsB08uCu9fRbOWVwBb3LgofQ3gm/Yb0OrG/n+8BlMrC XJoaH/KvS/LfxYczLa3XKLrwdvM+O4JBiY2q1udt0AaoXr1XYlygtUy28YxOrF5lnbr20jatbifz 3sZVXkNhhbzf1bmu2TDw7m9hdxQHhZpeFwr6V9z5Bn8SXr/+4xUc0hbcs2lJ98uPVGFYM445Lvt7 a7UAIARb2rgvdQS3C9CfkxDG0tVENxqGfU7Q1U59ujV9AaoloxypMU7L3leJK5w2k+8e8CMNSRMJ 5UEKv8vB0cTS9bJDCu9xR/3FuHYTN+/ypFKngVDtLlmFWyfxZ7dyUSzQ/6/n8owxeufFrHV1/eTS Vc5CQinUU= X-QQ-XMRINFO: NyFYKkN4Ny6FSmKK/uo/jdU= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 14 May 2024 00:59:21 +0800 X-OQ-MSGID: <20240513165926.1467967-4-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240513165926.1467967-1-uk7b@foxmail.com> References: <20240513165926.1467967-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 4/9] lavc/vp9dsp: R-V V ipred tm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Zj7ykUnbNjjZ From: sunyuechi C908: vp9_tm_4x4_8bpp_c: 116.5 vp9_tm_4x4_8bpp_rvv_i32: 43.5 vp9_tm_8x8_8bpp_c: 416.2 vp9_tm_8x8_8bpp_rvv_i32: 86.0 vp9_tm_16x16_8bpp_c: 1665.5 vp9_tm_16x16_8bpp_rvv_i32: 187.2 vp9_tm_32x32_8bpp_c: 6974.2 vp9_tm_32x32_8bpp_rvv_i32: 625.7 --- libavcodec/riscv/vp9_intra_rvv.S | 141 +++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 8 ++ libavcodec/riscv/vp9dsp_init.c | 4 + 3 files changed, 153 insertions(+) diff --git a/libavcodec/riscv/vp9_intra_rvv.S b/libavcodec/riscv/vp9_intra_rvv.S index ca156d65cd..7e1046bc13 100644 --- a/libavcodec/riscv/vp9_intra_rvv.S +++ b/libavcodec/riscv/vp9_intra_rvv.S @@ -173,3 +173,144 @@ func ff_h_8x8_rvv, zve32x ret endfunc + +.macro tm_sum dst, top, offset + lbu t3, \offset(a2) + sub t3, t3, a4 + vadd.vx \dst, \top, t3 +.endm + +func ff_tm_32x32_rvv, zve32x + lbu a4, -1(a3) + li t5, 32 + + .macro tm_sum32 n1,n2,n3,n4,n5,n6,n7,n8 + vsetvli zero, t5, e16, m4, ta, ma + vle8.v v8, (a3) + vzext.vf2 v28, v8 + + tm_sum v0, v28, \n1 + tm_sum v4, v28, \n2 + tm_sum v8, v28, \n3 + tm_sum v12, v28, \n4 + tm_sum v16, v28, \n5 + tm_sum v20, v28, \n6 + tm_sum v24, v28, \n7 + tm_sum v28, v28, \n8 + + .irp n 0, 4, 8, 12, 16, 20, 24, 28 + vmax.vx v\n, v\n, zero + .endr + + vsetvli zero, zero, e8, m2, ta, ma + .irp n 0, 4, 8, 12, 16, 20, 24, 28 + vnclipu.wi v\n, v\n, 0 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + .endm + + tm_sum32 31, 30, 29, 28, 27, 26, 25, 24 + tm_sum32 23, 22, 21, 20, 19, 18, 17, 16 + tm_sum32 15, 14, 13, 12, 11, 10, 9, 8 + tm_sum32 7, 6, 5, 4, 3, 2, 1, 0 + + ret +endfunc + +func ff_tm_16x16_rvv, zve32x + vsetivli zero, 16, e16, m2, ta, ma + vle8.v v8, (a3) + vzext.vf2 v30, v8 + lbu a4, -1(a3) + + tm_sum v0, v30, 15 + tm_sum v2, v30, 14 + tm_sum v4, v30, 13 + tm_sum v6, v30, 12 + tm_sum v8, v30, 11 + tm_sum v10, v30, 10 + tm_sum v12, v30, 9 + tm_sum v14, v30, 8 + tm_sum v16, v30, 7 + tm_sum v18, v30, 6 + tm_sum v20, v30, 5 + tm_sum v22, v30, 4 + tm_sum v24, v30, 3 + tm_sum v26, v30, 2 + tm_sum v28, v30, 1 + tm_sum v30, v30, 0 + + .irp n 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 + vmax.vx v\n, v\n, zero + .endr + + vsetvli zero, zero, e8, m1, ta, ma + .irp n 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28 + vnclipu.wi v\n, v\n, 0 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vnclipu.wi v30, v30, 0 + vse8.v v30, (a0) + + ret +endfunc + +func ff_tm_8x8_rvv, zve32x + vsetivli zero, 8, e16, m1, ta, ma + vle8.v v8, (a3) + vzext.vf2 v28, v8 + lbu a4, -1(a3) + + tm_sum v16, v28, 7 + tm_sum v17, v28, 6 + tm_sum v18, v28, 5 + tm_sum v19, v28, 4 + tm_sum v20, v28, 3 + tm_sum v21, v28, 2 + tm_sum v22, v28, 1 + tm_sum v23, v28, 0 + + .irp n 16, 17, 18, 19, 20, 21, 22, 23 + vmax.vx v\n, v\n, zero + .endr + + vsetvli zero, zero, e8, mf2, ta, ma + .irp n 16, 17, 18, 19, 20, 21, 22 + vnclipu.wi v\n, v\n, 0 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vnclipu.wi v24, v23, 0 + vse8.v v24, (a0) + + ret +endfunc + +func ff_tm_4x4_rvv, zve32x + vsetivli zero, 4, e16, mf2, ta, ma + vle8.v v8, (a3) + vzext.vf2 v28, v8 + lbu a4, -1(a3) + + tm_sum v16, v28, 3 + tm_sum v17, v28, 2 + tm_sum v18, v28, 1 + tm_sum v19, v28, 0 + + .irp n 16, 17, 18, 19 + vmax.vx v\n, v\n, zero + .endr + + vsetvli zero, zero, e8, mf4, ta, ma + .irp n 16, 17, 18 + vnclipu.wi v\n, v\n, 0 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vnclipu.wi v24, v19, 0 + vse8.v v24, (a0) + + ret +endfunc diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index 0ad961c7e0..79330b4968 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -72,6 +72,14 @@ void ff_h_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); void ff_h_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_tm_32x32_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_tm_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_tm_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_tm_4x4_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); #define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) \ void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index eab3e9cb0a..184fadbaf7 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -89,6 +89,10 @@ static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) dsp->intra_pred[TX_32X32][HOR_PRED] = ff_h_32x32_rvv; dsp->intra_pred[TX_16X16][HOR_PRED] = ff_h_16x16_rvv; dsp->intra_pred[TX_8X8][HOR_PRED] = ff_h_8x8_rvv; + dsp->intra_pred[TX_32X32][TM_VP8_PRED] = ff_tm_32x32_rvv; + dsp->intra_pred[TX_16X16][TM_VP8_PRED] = ff_tm_16x16_rvv; + dsp->intra_pred[TX_8X8][TM_VP8_PRED] = ff_tm_8x8_rvv; + dsp->intra_pred[TX_4X4][TM_VP8_PRED] = ff_tm_4x4_rvv; } #endif #endif From patchwork Mon May 13 16:59:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48861 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp465812pzb; Mon, 13 May 2024 10:00:44 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWbRci3I74i4hrgm7BXYUT41HVU+JiOryzdI3rCNs6Bt9teW7Zsj0qun51kTYc0Gs7uGALETWP36l0OvljB4EVYwJnMNwkUMjWcRA== X-Google-Smtp-Source: AGHT+IFqCTqSjDCYoW3vkuuGo8HYH+5A9QMCcCdTJC3AGmF/wf4AC4Q3nKfuV50OcsPpnk7un+VB X-Received: by 2002:a17:907:9705:b0:a5a:7cd3:b2e7 with SMTP id a640c23a62f3a-a5a7cd3b463mr40336866b.11.1715619643846; Mon, 13 May 2024 10:00:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715619643; cv=none; d=google.com; s=arc-20160816; b=aqf14DOqkFhHqG/E25U/x+Nk4hgbBXsGvbg8iWZTiZizoZnMJfidmj8rNG25Nw8cdj YM9gds7T7q1rnfPPPOYE4FQc0QyKnJiTpSeP99DCo/070a+JF0qrsWb7r/tLC+iYbA5W u9a4ryJky2DsPJA7eCgIvF+8Uda0L0QR2hOVYmlsvRKdPS816KcpQPBkv5TmMcnhIZ2G sYM7X+JSjhCRmI2eIzqlPCB0i6kPTxacGZrTeE/PMevniY2Nh3t8ruIPLzMFiSwcbsSb 0Sun3h4u0NDf65i3wb4J6T4qjM9UZ1D+OA4C7mummPvAgDG9snSteQw0yQ++dUm7mEAw vXzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=+oZ9kuHyjaYhTLuBkmkMAA+sN/6aX9CEyqd/N5hiHEI=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=Zddzem2WgooYYu8U4NeNEWxFNpaeqFOC1QwASWV3dpGEDiOMhUf+ETtDovBVL+HXEG EqJGXUsC+5L997GoH+UndX/WhNUMVMlRhHGjnhTsiymCBy7dSP+DjylJqYndOpoREMaF 1CSsRV8SLCycLhvzMGh0mtTZPcWiZgCIm88e9MQpH/uqIfKO+xsHxmvldcl1Yyux78aw vH8ohr4nAnikynFpDjc8g/BCbQ/OVxWHiA2FuYcQaYAn/8DAVCEfdLuhTQ/yzgohuDab VRAkTcklBeTvvqg7cOxV/ll0AmFKzVD64VaQCzLKRyykxvoWpDQBZajuO8c6SqQvgeRl hFYQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=NrrsbQdY; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a74c3fc3bsi66793666b.522.2024.05.13.10.00.43; Mon, 13 May 2024 10:00:43 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=NrrsbQdY; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5980F68D12C; Mon, 13 May 2024 20:00:00 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-58-216.mail.qq.com (out162-62-58-216.mail.qq.com [162.62.58.216]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E096C68D56F for ; Mon, 13 May 2024 19:59:48 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715619580; bh=x0AAXEPcPZHyzlPQVJhf+9bttIi30atQQMSnO3K95vk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=NrrsbQdYwBwbYp6lwrQl8GqAd4tBiFLpFb4eCW2Ns7u81IK3EtqlUagG6CnH6h3/w EhMvi3LrtXkUGsknfP2akrTOVgwH9ojiDetMOJZWnWSF5QRoqz/pVqBG7u464CP/bC MMXyJlDQKV4QGtsfUrxyRfw5TwVKE8nCqLILWPqE= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id EE3A4485; Tue, 14 May 2024 00:59:35 +0800 X-QQ-mid: xmsmtpt1715619579tabn13nyj Message-ID: X-QQ-XMAILINFO: NY/MPejODIJVOoj5kutKKJ2Y42BKpTVVB/W8jk78AThYmfebmkjreR5Nsb/Smd gLk0NdCpWjH3ievpqaYUDEfEeqGGhmXKbpHCIUsaGaoIf0a+WTWZ85iPp9SkyaJA0f8k30WyGXgP mIcb8ewaBQrnrF1FplcTsan9IcUm6cr8tDGvQ0mI44USRgDoL87i2RL2fEktbESw8ezav8WGcyNf DkZZG2Soe0LHZ3l+3jqs90OleaPyEDyT1lWzoZRBOOJ5gmOJl5SiZKTuHt1SupLqaoYp5a5d11tz 1a0A+JsVg3G1d+UuG8CQH/cDdxaXcBmdy3aQxd4d5kh7YlRs6FQ3ZgT44hiRm5o+iM8NaQ9HAoem SwmH9qbqSmttEgg+EpNYeTIOqGb0en5YGPuAZd973r3zorrCYk2NIx6uoPC3+MSDjAyVxwU/Lv1M MU6M/YOV2mmV28ZIYs0RJwxlJuVQy6oT/KkMM5H+7JyiRbUBWhY5CZAXDmqMO2NmHv+/rgpGsrMI 0v8+TIcLNKGyJspuCdI022LkHNBNlvDpGmO9aY1kgfOVWG21DrLEA+n+ZXMHaGuUh2ihtUJ2kNlo uJTkCfu8ufrc+UDkx4RFBUAwAynOzm8gQv+hnOxqR7IMpBq5bq/IBQVf+TSTFgDhGWxo95NPeY6z K4VphScInJJShGDkCSIUAqJi/SztwX2SKsaNOPyH6n9k3eMqNJ1ljGS9eAEyMBkk/YvMIpxz4+GO T0CfeOKkYrKPyhskgU5XahYImFtp5Ek5xsdrfpbqld9FwRjZOCdo2taDeyC4zzgRkogGpIF2U7+V JH1ss0UrrTaMoC8Qv+dynMsNQHiHeE1NVpf6ZvuKLN26FinDpd/0/LT5liiRo/FU+WBNaUzV/2yx 5vwdgm0aIo1MstFZ2950bzRu+H2AEiCg/xaFdJWSfmTyi6LS89EhEL8Uxp3gC43ZIaj7tE0oUYiB KEjLazjOc= X-QQ-XMRINFO: OD9hHCdaPRBwq3WW+NvGbIU= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 14 May 2024 00:59:22 +0800 X-OQ-MSGID: <20240513165926.1467967-5-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240513165926.1467967-1-uk7b@foxmail.com> References: <20240513165926.1467967-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 5/9] lavc/vp9dsp: R-V V mc avg X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: YXOQ93E/fCZ3 From: sunyuechi C908: vp9_avg4_8bpp_c: 1.2 vp9_avg4_8bpp_rvv_i64: 1.0 vp9_avg8_8bpp_c: 3.7 vp9_avg8_8bpp_rvv_i64: 1.5 vp9_avg16_8bpp_c: 14.7 vp9_avg16_8bpp_rvv_i64: 3.5 vp9_avg32_8bpp_c: 57.7 vp9_avg32_8bpp_rvv_i64: 10.0 vp9_avg64_8bpp_c: 229.0 vp9_avg64_8bpp_rvv_i64: 31.7 --- libavcodec/riscv/Makefile | 3 +- libavcodec/riscv/vp9_mc_rvv.S | 58 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 18 +++++++++++ 3 files changed, 78 insertions(+), 1 deletion(-) create mode 100644 libavcodec/riscv/vp9_mc_rvv.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 0cd900104f..1183357b37 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -64,6 +64,7 @@ RVV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvv.o OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9dsp_init.o RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o \ riscv/vp9_mc_rvi.o -RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o +RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o \ + riscv/vp9_mc_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S new file mode 100644 index 0000000000..5d917e7b98 --- /dev/null +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -0,0 +1,58 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/riscv/asm.S" + +.macro vsetvlstatic8 len an maxlen mn=m4 +.if \len == 4 + vsetivli zero, \len, e8, mf4, ta, ma +.elseif \len == 8 + vsetivli zero, \len, e8, mf2, ta, ma +.elseif \len == 16 + vsetivli zero, \len, e8, m1, ta, ma +.elseif \len == 32 + li \an, \len + vsetvli zero, \an, e8, m2, ta, ma +.elseif \len == 64 + li \an, \maxlen + vsetvli zero, \an, e8, \mn, ta, ma +.endif +.endm + +.macro copy_avg len +func ff_avg\len\()_rvv, zve32x + csrwi vxrm, 0 + vsetvlstatic8 \len t0 64 +1: + addi a4, a4, -1 + vle8.v v8, (a2) + vle8.v v16, (a0) + vaaddu.vv v8, v8, v16 + vse8.v v8, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + ret +endfunc +.endm + +.irp len 64, 32, 16, 8, 4 + copy_avg \len +.endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 184fadbaf7..1922484a1d 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -48,6 +48,24 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) } # endif +#if HAVE_RVV + if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I32) && ff_rv_vlen_least(128)) { + +#define init_fpel(idx1, sz) \ + dsp->mc[idx1][FILTER_8TAP_SMOOTH ][1][0][0] = ff_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_8TAP_REGULAR][1][0][0] = ff_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_8TAP_SHARP ][1][0][0] = ff_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_BILINEAR ][1][0][0] = ff_avg##sz##_rvv + + init_fpel(0, 64); + init_fpel(1, 32); + init_fpel(2, 16); + init_fpel(3, 8); + init_fpel(4, 4); + +#undef init_fpel + } +#endif #endif } From patchwork Mon May 13 16:59:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48863 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp466103pzb; Mon, 13 May 2024 10:01:04 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWHwh3RpvFL37M/UTJ0XLYJGi4XUVomRESg5elRrCthUGimUNcOSHtqSGm8SwXwqxC0/9zmc8LHeo40zMJ3kovyrp00G1fMN/b1oQ== X-Google-Smtp-Source: AGHT+IEkV6xnzLi5R1GzR45WPAZsrxx4Ig0nsnc84TneZLo9FyPWeH+2c1NMqVg9d1PPcJxA5cIv X-Received: by 2002:a50:c342:0:b0:572:a7c8:f12b with SMTP id 4fb4d7f45d1cf-5734d6f3201mr8268606a12.13.1715619664601; Mon, 13 May 2024 10:01:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715619664; cv=none; d=google.com; s=arc-20160816; b=jdt6epTWEIfScnKsvNdwVkjasB1+HGQw3+QXR7Hkk9GarSyIoob5/6DIdZhaRoA16P oHWu5tT54i8yI8wlsN8tmcp1lOV1OofnDlus++ZLTRG9kQYzJ1U4KIJwit4cd8EZogf+ zAQjFxLbMClfCM1QxoEH0YDLHf1+l0xl81lSJJjy2NhYVWYqDt6lwybeVwzjIg0jswdX bHsFHjrHyjImaqdz9OJ1YdimZY2yUsgi17Q7L/P3CjP2KzmjhgVszJ5ZvHnNLMAYfSoH USdeQ2I0xEtK2HzQ3oOzVfSklxbC68aaYP66tIkpSxIEoJ4Xj+2U51JaD2W9G1Iq6dmc dPHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=oFNAZZMVaop/wdH7TYmRuqdGsxfcLIIxITIfqKl0KF0=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=puFqgspu4504rdSq6xUXLLl49V1U3PZfq6IU4BfQdjySEhGjbR31ARueEjGmVmsuu3 VvN7M3PyczaLEFnzxGWCoQHW1nT5FISNGwc0Rpt53tXIcLsXDh19SHHUmfqQU/uRXO39 zQuvv49GHYldGoj8ZPwpFbi49vypgkqbQF9sSfa4dkgK3hlS4EtRz1d2iS93qZi3Zrcc NbTeeWfXaAT/xBjLFq+dRg5SNWIgGXrB9qU2Li6NV83jKMWhfh7lQlimbGn04M43AzSr 5qFWNf2QQxe47s2woG7mUeEs8z6e5LVtfbKysMGAnJML1sd//N9ZAqo8dmE8GwbXBWbI JLWg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=kM+uOV1O; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5733beaca37si5141729a12.54.2024.05.13.10.01.04; Mon, 13 May 2024 10:01:04 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=kM+uOV1O; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BBF5B68D6ED; Mon, 13 May 2024 20:00:02 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-58-216.mail.qq.com (out162-62-58-216.mail.qq.com [162.62.58.216]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2470268D61B for ; Mon, 13 May 2024 19:59:49 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715619581; bh=ZJ0jEuOoyQ66pQq4bi75txs+iEbW/rUhoLjnevz3J18=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=kM+uOV1OCb4u+zuM82heFktgnzb31zLCcvKBm2zSCl2NUMG3qEtIozSrUeAtl1aPI cOGc69vHgGNnWLPpURmlZA+qIUXF31VvWpf74OLpSFk4yqVq6q3Ps2Wz3sDyrVlOfC BgJSuyhDH34TSUQTKi1psHGRQjqfzpDgWFPDMaI8= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id EE3A4485; Tue, 14 May 2024 00:59:35 +0800 X-QQ-mid: xmsmtpt1715619580trci0clbk Message-ID: X-QQ-XMAILINFO: N5sfBKY/oC4kxGsf5WAq3/5hskAuj+gitt9vJlETSh2MXexsNJVbOzy0Zoh3W5 ryQ3mhHuFB/OUGuIA+QqSW0s83E6ZBYh8McTMtQv+6dnUwBjH1rTcwJtqQC4jas7ySDlbOSWVYM9 PnnHDoKsknW2SB53rtSd0BB24YNmPGtHTDqFeYNm0cHLET9c7l5cN7H9VJnbDWj2X6zn+irvwpF1 0jUcD7WaICBDUwbVPfMFN/lCqPzZCu5i6iWSbQ/plwC9XgR+YTcRBlojKq9J+cbqT8dnYhdgN7xk YwWargxRBupMPRUdtzTmeyKCUmVhv+n+Dr9FyPwoUbecPgK9daPCuE2u29NygHkSqCkWI582anur 1rEIyqS91JstnwuoV1QNJ9uoIWhHHLDgnTPIsLyG3l4i3sLkxMRj0ymwRiLK+ynaWCuhn+vtiaUG LZeAbsqsLhBK6nIBgSIMZahH9jvyKLRBzbav8deDYHid7d3pspcTg7P0Os1rMPX7Vll3ySTOuupq lHRMoRS/n78qx5P+qD6jbwar50a2XFnimzoD4kf/pFfwxcMWl84CALF7rYkuAC4QO2PFyIvhkj3L mfwJkz+/DdYp3vR+mo7W9ft1krlG4ad8/YFXRTwin610rLIq8TQywklnJAXzsIFaQ69OW+30DmQ/ UKeu/kBn+NCm8q1uAAEHUe0frs/PEup96ChYL3ceVLfHo3VV7rkCGTAbPtauob+mHV+NnPU78amM gjbmWRgrmM8Rdb0Vfdi1EITzNnNwF56ybo/ARQsSLyM2wnbj5aISWbI82MJ8kvJtIrYoWafeIo2A uKhKz+iKxUHM01Yv397ilZGaaeB1naHP1arZgYe9DIWz/G2vkYobdNG4EhncmMrTQH+9LO18CjsT jfvFF5nl2mvmuxValIBV/q3z/b66uZV1FuqowOb2H+TU+1IZL07qnnTJAJXDPuaow82PJ/uOTqDi ulgn/A4hE= X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 14 May 2024 00:59:23 +0800 X-OQ-MSGID: <20240513165926.1467967-6-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240513165926.1467967-1-uk7b@foxmail.com> References: <20240513165926.1467967-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 6/9] lavc/vp9dsp: R-V V mc bilin h v X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: PZ5AeHqOx/R+ From: sunyuechi C908: vp9_avg_bilin_4h_8bpp_c: 5.2 vp9_avg_bilin_4h_8bpp_rvv_i64: 2.2 vp9_avg_bilin_4v_8bpp_c: 5.5 vp9_avg_bilin_4v_8bpp_rvv_i64: 2.2 vp9_avg_bilin_8h_8bpp_c: 20.0 vp9_avg_bilin_8h_8bpp_rvv_i64: 4.5 vp9_avg_bilin_8v_8bpp_c: 21.0 vp9_avg_bilin_8v_8bpp_rvv_i64: 4.2 vp9_avg_bilin_16h_8bpp_c: 78.2 vp9_avg_bilin_16h_8bpp_rvv_i64: 9.0 vp9_avg_bilin_16v_8bpp_c: 82.0 vp9_avg_bilin_16v_8bpp_rvv_i64: 9.0 vp9_avg_bilin_32h_8bpp_c: 325.5 vp9_avg_bilin_32h_8bpp_rvv_i64: 26.2 vp9_avg_bilin_32v_8bpp_c: 326.2 vp9_avg_bilin_32v_8bpp_rvv_i64: 26.2 vp9_avg_bilin_64h_8bpp_c: 1265.7 vp9_avg_bilin_64h_8bpp_rvv_i64: 91.5 vp9_avg_bilin_64v_8bpp_c: 1317.0 vp9_avg_bilin_64v_8bpp_rvv_i64: 91.2 vp9_put_bilin_4h_8bpp_c: 4.5 vp9_put_bilin_4h_8bpp_rvv_i64: 1.7 vp9_put_bilin_4v_8bpp_c: 4.7 vp9_put_bilin_4v_8bpp_rvv_i64: 1.7 vp9_put_bilin_8h_8bpp_c: 17.0 vp9_put_bilin_8h_8bpp_rvv_i64: 3.5 vp9_put_bilin_8v_8bpp_c: 18.0 vp9_put_bilin_8v_8bpp_rvv_i64: 3.5 vp9_put_bilin_16h_8bpp_c: 65.2 vp9_put_bilin_16h_8bpp_rvv_i64: 7.5 vp9_put_bilin_16v_8bpp_c: 85.7 vp9_put_bilin_16v_8bpp_rvv_i64: 7.5 vp9_put_bilin_32h_8bpp_c: 257.5 vp9_put_bilin_32h_8bpp_rvv_i64: 23.5 vp9_put_bilin_32v_8bpp_c: 274.5 vp9_put_bilin_32v_8bpp_rvv_i64: 23.5 vp9_put_bilin_64h_8bpp_c: 1040.5 vp9_put_bilin_64h_8bpp_rvv_i64: 82.5 vp9_put_bilin_64v_8bpp_c: 1108.7 vp9_put_bilin_64v_8bpp_rvv_i64: 82.2 --- libavcodec/riscv/vp9_mc_rvv.S | 43 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 21 +++++++++++++++++ 2 files changed, 64 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index 5d917e7b98..986cc3760d 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -53,6 +53,49 @@ func ff_avg\len\()_rvv, zve32x endfunc .endm +.macro bilin_load dst len op type mn +.ifc \type,v + add t5, a2, a3 +.elseif \type == h + addi t5, a2, 1 +.endif + vle8.v v8, (a2) + vle8.v v0, (t5) + vwmulu.vx v16, v0, \mn + vwmaccsu.vx v16, t1, v8 + vwadd.wx v16, v16, t4 + vnsra.wi v16, v16, 4 + vadd.vv \dst, v16, v8 +.ifc \op,avg + vle8.v v16, (a0) + vaaddu.vv \dst, \dst, v16 +.endif +.endm + +.macro bilin_h_v len op type mn +func ff_\op\()_bilin_\len\()\type\()_rvv, zve32x +.ifc \op,avg + csrwi vxrm, 0 +.endif + vsetvlstatic8 \len t0 64 + li t4, 8 + neg t1, \mn +1: + addi a4, a4, -1 + bilin_load v0, \len, \op, \type, \mn + vse8.v v0, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + .irp len 64, 32, 16, 8, 4 copy_avg \len + .irp op put avg + bilin_h_v \len \op h a5 + bilin_h_v \len \op v a6 + .endr .endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 1922484a1d..ec6db51774 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -63,6 +63,27 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) init_fpel(3, 8); init_fpel(4, 4); + dsp->mc[0][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_64v_rvv; + dsp->mc[0][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_64h_rvv; + dsp->mc[0][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_64v_rvv; + dsp->mc[0][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_64h_rvv; + dsp->mc[1][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_32v_rvv; + dsp->mc[1][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_32h_rvv; + dsp->mc[1][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_32v_rvv; + dsp->mc[1][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_32h_rvv; + dsp->mc[2][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_16v_rvv; + dsp->mc[2][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_16h_rvv; + dsp->mc[2][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_16v_rvv; + dsp->mc[2][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_16h_rvv; + dsp->mc[3][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_8v_rvv; + dsp->mc[3][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_8h_rvv; + dsp->mc[3][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_8v_rvv; + dsp->mc[3][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_8h_rvv; + dsp->mc[4][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_4v_rvv; + dsp->mc[4][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_4h_rvv; + dsp->mc[4][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_4v_rvv; + dsp->mc[4][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_4h_rvv; + #undef init_fpel } #endif From patchwork Mon May 13 16:59:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48865 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp466541pzb; Mon, 13 May 2024 10:01:36 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXS7bQPh4xYQ5XHzQUf2OweGZTXpF8G15yH1PSctHh+g8Gf94njJjTBLoY/vmeiUKfx3yiFDHPAS+EXC1mUgD9VNrmMhjhuID7pGg== X-Google-Smtp-Source: AGHT+IFY37LWks11gae1O3v1ato4oefC01y8yyeQ8qmHugAf1ac/o49ckxQCh/3CK8Ow/t3EcwP/ X-Received: by 2002:a05:6512:245:b0:51f:1bf8:3ea9 with SMTP id 2adb3069b0e04-5221027c339mr5376052e87.3.1715619695144; Mon, 13 May 2024 10:01:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715619695; cv=none; d=google.com; s=arc-20160816; b=cTnH4UuzK23cz/jcKdhLxguFUmkjs2K81jpkhCI+GPjfiuIxxqXhDrhhyA0W92MEV9 gpxlb45D85D2J1a/K7EaPYlaDUtDIj40WUKz6XKwztqi0ONbjvHtk9VZpRE6JRb9tKIv i1cV4WqqprhopIVMGDFYCCtuVxkutwAe5m3ABu/9bPe9bHdT/Uv21LFTdhMUuwQu3DJ0 d5/gYN6Z1l1gMkfqMpwWbqWPgZAKMkHfoyp3MKq5LoJ9CQXljKhNMaF7nEjXEkXPktiL jyZuWxA5fL6br5/Jn2+sitUNYBDXifPZ/SjYjlLRfSrTfN453Cf1H51HE0n6Xw8IOy42 4iXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=CK4WCnCCt3FhRCTlHLzQ9A4ToKpCMdGLjNjGdliZF7Y=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=GOv50Jq0Y9VF7EkgtYtiLgxGBGtfOkHXagB93PEqhooxzC6uWo469/S5h+ra/QYwx6 WzlZSv8j8S9GwD99uSe5oMS9AAynpD5LU14BRPDTSsBsCxd1KqkuliyoAaNzew84JL+g N1JLuqG33T3jmrtkOOZ+rs2TfYL7eogP6n5D+LTLWHJf3Dkm/09G5jgoI5JJWj5chKu3 oQDf0upICG+vuzc/Vi9cn/pAzHO4cT46SDkox2PJZsgpRku8gHOqxt0r/wsrPOwc6+MR mEKc/D4oPnbu+mWyk0aRn29kuG5pKM62MvNVlSb5NXAwRVECrnataDGn5g3sk5xOKRed 48Dg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=Ii2ir9H6; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17ba3878si511149766b.560.2024.05.13.10.01.24; Mon, 13 May 2024 10:01:35 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=Ii2ir9H6; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 735D968D6FC; Mon, 13 May 2024 20:00:05 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-251-84.mail.qq.com (unknown [203.205.251.84]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 097DD68D33D for ; Mon, 13 May 2024 19:59:52 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715619582; bh=p6MtqWC1cGukq92raYvACBfeuI8sx6J1wT9b81Zb9cU=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=Ii2ir9H6DwGGELBJLHznLKwKGPERTPa+7jddgEbxpKUl5fl79Zd1kT176PE8jrgkN UhQuSGUcyDV/H9yK+7pCqE0dE6J+IOd9EFHEAN4KCp68mEGOXRNnn07Mea4ISS+Pn9 nmYWzew/QjSQnAKbHc3W7S0rnaqqnesxOEhMmo+E= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id EE3A4485; Tue, 14 May 2024 00:59:35 +0800 X-QQ-mid: xmsmtpt1715619581t5z733nuj Message-ID: X-QQ-XMAILINFO: OQhZ3T0tjf0aEt6jxvVNkrjGqZiOGygDjN2uoJRAOGbRqLbOuR3GLJqJkCggS+ Cg6C1xG/K7UEwtXffEKAzb9OAUqgAl8NicUKWb5XyR3UsXCjKPB4Ic9kfOMWbT6yjaF4bCkEAdqy 6P/G8/ksEkAojrMBZtHoNBgvmdLSpB8czvq3qa07LXPYmXu8cZVvjY36TeO2qAu/nPSjrFsrlUom 0jYFGhx/LiVspLQ9B6AGIHp2SowrWKkXCB1KiRuiwfLZGXdYjF3UHMCXRVn5cviIpNUK3Xmdz6IM r/whHKxGDIeB3zkmCLxr4A0f/RwnrjeJ6aTeOvgVcfMhOfuf3+wpoMzCjwurIHMXjpiH+0TdgWYJ r84f3AABhDXyZMDXGY5J4KWsEuMcHjG1G7LeP/S0NoAVj07h6YFv3S9pypO07z5niq8H16A1jh3m J97oT6txdLdVA6m8RPpn+pGVTDtsilRSnwO/PFn0mqOXRzMRcmcnf4SncNmEifnXejoNajSGXRRB DE5H4sSG5MDzDedbjlVhOdHoqDBKM7XVJtejb4y4kvy6YGar1lPKCSQicjdr2gsxMzQCFPK26Vt3 aR9fTQKifeH8mZKECreyY3FZ3xpe0PzuR0VhqxlRzbCBPTzTjq9/TYHEN5vpgpxLqZ0I/zlXAJFK DuP65dERLaNr7qlHr60IyUs8+Co22rU8Coc3hUopF0nHm7EMe0pr9IxKTZQsGYe4+tyM9cTl5hmu HwPnScqdCRQ1aPN74kUgQ4ezb6xDXfI4G4Pp3ycAnJPgnr+shqaLS/Q3h0dwtWcq9lazwvuzQZkH iDjjRZMudbFZGcn0X0ZxOiy1o9LNDawfsyKx6fAbkBVUghjwwhFaJlW6qMsa0xcYNj2Cmsu+VFEz iT+DgJdiMWRFMtUT3aJTSBkMZA4iNZsrlKlOiZhk39RW8pLf+NjiT+lDv6PyZrUBJKPG/dnuFFg+ hdXIPX26M= X-QQ-XMRINFO: NyFYKkN4Ny6FSmKK/uo/jdU= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 14 May 2024 00:59:24 +0800 X-OQ-MSGID: <20240513165926.1467967-7-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240513165926.1467967-1-uk7b@foxmail.com> References: <20240513165926.1467967-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 7/9] lavc/vp9dsp: R-V V mc tap h v X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 6hXnFKPRxok3 From: sunyuechi C908 X60 vp9_avg_8tap_smooth_4h_8bpp_c : 13.0 11.2 vp9_avg_8tap_smooth_4h_8bpp_rvv_i32 : 5.0 4.2 vp9_avg_8tap_smooth_4v_8bpp_c : 13.7 12.5 vp9_avg_8tap_smooth_4v_8bpp_rvv_i32 : 5.0 4.2 vp9_avg_8tap_smooth_8h_8bpp_c : 49.5 42.2 vp9_avg_8tap_smooth_8h_8bpp_rvv_i32 : 9.2 8.5 vp9_avg_8tap_smooth_8v_8bpp_c : 66.5 45.0 vp9_avg_8tap_smooth_8v_8bpp_rvv_i32 : 9.5 8.5 vp9_avg_8tap_smooth_16h_8bpp_c : 192.7 166.5 vp9_avg_8tap_smooth_16h_8bpp_rvv_i32 : 21.2 18.7 vp9_avg_8tap_smooth_16v_8bpp_c : 192.2 175.7 vp9_avg_8tap_smooth_16v_8bpp_rvv_i32 : 21.5 19.0 vp9_avg_8tap_smooth_32h_8bpp_c : 780.2 663.7 vp9_avg_8tap_smooth_32h_8bpp_rvv_i32 : 83.5 60.0 vp9_avg_8tap_smooth_32v_8bpp_c : 770.5 689.2 vp9_avg_8tap_smooth_32v_8bpp_rvv_i32 : 67.2 60.0 vp9_avg_8tap_smooth_64h_8bpp_c : 3115.5 2647.2 vp9_avg_8tap_smooth_64h_8bpp_rvv_i32 : 283.5 119.2 vp9_avg_8tap_smooth_64v_8bpp_c : 3082.2 2729.0 vp9_avg_8tap_smooth_64v_8bpp_rvv_i32 : 305.2 119.0 vp9_put_8tap_smooth_4h_8bpp_c : 11.2 9.7 vp9_put_8tap_smooth_4h_8bpp_rvv_i32 : 4.2 4.0 vp9_put_8tap_smooth_4v_8bpp_c : 11.7 10.7 vp9_put_8tap_smooth_4v_8bpp_rvv_i32 : 4.2 4.0 vp9_put_8tap_smooth_8h_8bpp_c : 42.0 37.5 vp9_put_8tap_smooth_8h_8bpp_rvv_i32 : 8.5 7.7 vp9_put_8tap_smooth_8v_8bpp_c : 44.2 38.7 vp9_put_8tap_smooth_8v_8bpp_rvv_i32 : 8.5 7.7 vp9_put_8tap_smooth_16h_8bpp_c : 165.7 147.2 vp9_put_8tap_smooth_16h_8bpp_rvv_i32 : 19.5 17.5 vp9_put_8tap_smooth_16v_8bpp_c : 169.0 149.7 vp9_put_8tap_smooth_16v_8bpp_rvv_i32 : 19.7 17.5 vp9_put_8tap_smooth_32h_8bpp_c : 659.7 586.7 vp9_put_8tap_smooth_32h_8bpp_rvv_i32 : 64.2 57.2 vp9_put_8tap_smooth_32v_8bpp_c : 680.5 591.2 vp9_put_8tap_smooth_32v_8bpp_rvv_i32 : 64.2 57.2 vp9_put_8tap_smooth_64h_8bpp_c : 2681.5 2339.0 vp9_put_8tap_smooth_64h_8bpp_rvv_i32 : 255.5 114.2 vp9_put_8tap_smooth_64v_8bpp_c : 2709.7 2348.7 vp9_put_8tap_smooth_64v_8bpp_rvv_i32 : 255.5 114.0 --- libavcodec/riscv/vp9_mc_rvv.S | 243 +++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 72 ++++++---- libavcodec/riscv/vp9dsp_init.c | 40 +++++- 3 files changed, 329 insertions(+), 26 deletions(-) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index 986cc3760d..c633809675 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -36,6 +36,18 @@ .endif .endm +.macro vsetvlstatic16 len +.ifc \len,4 + vsetvli zero, zero, e16, mf2, ta, ma +.elseif \len == 8 + vsetvli zero, zero, e16, m1, ta, ma +.elseif \len == 16 + vsetvli zero, zero, e16, m2, ta, ma +.else + vsetvli zero, zero, e16, m4, ta, ma +.endif +.endm + .macro copy_avg len func ff_avg\len\()_rvv, zve32x csrwi vxrm, 0 @@ -92,10 +104,241 @@ func ff_\op\()_bilin_\len\()\type\()_rvv, zve32x endfunc .endm +const subpel_filters_regular + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte 0, 1, -5, 126, 8, -3, 1, 0 + .byte -1, 3, -10, 122, 18, -6, 2, 0 + .byte -1, 4, -13, 118, 27, -9, 3, -1 + .byte -1, 4, -16, 112, 37, -11, 4, -1 + .byte -1, 5, -18, 105, 48, -14, 4, -1 + .byte -1, 5, -19, 97, 58, -16, 5, -1 + .byte -1, 6, -19, 88, 68, -18, 5, -1 + .byte -1, 6, -19, 78, 78, -19, 6, -1 + .byte -1, 5, -18, 68, 88, -19, 6, -1 + .byte -1, 5, -16, 58, 97, -19, 5, -1 + .byte -1, 4, -14, 48, 105, -18, 5, -1 + .byte -1, 4, -11, 37, 112, -16, 4, -1 + .byte -1, 3, -9, 27, 118, -13, 4, -1 + .byte 0, 2, -6, 18, 122, -10, 3, -1 + .byte 0, 1, -3, 8, 126, -5, 1, 0 +subpel_filters_sharp: + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte -1, 3, -7, 127, 8, -3, 1, 0 + .byte -2, 5, -13, 125, 17, -6, 3, -1 + .byte -3, 7, -17, 121, 27, -10, 5, -2 + .byte -4, 9, -20, 115, 37, -13, 6, -2 + .byte -4, 10, -23, 108, 48, -16, 8, -3 + .byte -4, 10, -24, 100, 59, -19, 9, -3 + .byte -4, 11, -24, 90, 70, -21, 10, -4 + .byte -4, 11, -23, 80, 80, -23, 11, -4 + .byte -4, 10, -21, 70, 90, -24, 11, -4 + .byte -3, 9, -19, 59, 100, -24, 10, -4 + .byte -3, 8, -16, 48, 108, -23, 10, -4 + .byte -2, 6, -13, 37, 115, -20, 9, -4 + .byte -2, 5, -10, 27, 121, -17, 7, -3 + .byte -1, 3, -6, 17, 125, -13, 5, -2 + .byte 0, 1, -3, 8, 127, -7, 3, -1 +subpel_filters_smooth: + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte -3, -1, 32, 64, 38, 1, -3, 0 + .byte -2, -2, 29, 63, 41, 2, -3, 0 + .byte -2, -2, 26, 63, 43, 4, -4, 0 + .byte -2, -3, 24, 62, 46, 5, -4, 0 + .byte -2, -3, 21, 60, 49, 7, -4, 0 + .byte -1, -4, 18, 59, 51, 9, -4, 0 + .byte -1, -4, 16, 57, 53, 12, -4, -1 + .byte -1, -4, 14, 55, 55, 14, -4, -1 + .byte -1, -4, 12, 53, 57, 16, -4, -1 + .byte 0, -4, 9, 51, 59, 18, -4, -1 + .byte 0, -4, 7, 49, 60, 21, -3, -2 + .byte 0, -4, 5, 46, 62, 24, -3, -2 + .byte 0, -4, 4, 43, 63, 26, -2, -2 + .byte 0, -3, 2, 41, 63, 29, -2, -2 + .byte 0, -3, 1, 38, 64, 32, -1, -3 +endconst + +.macro epel_filter name type regtype + lla \regtype\()2, subpel_filters_\name + li \regtype\()1, 8 +.ifc \type,v + mul \regtype\()0, a6, \regtype\()1 +.elseif \type == h + mul \regtype\()0, a5, \regtype\()1 +.endif + add \regtype\()0, \regtype\()0, \regtype\()2 + .irp n 1,2,3,4,5,6 + lb \regtype\n, \n(\regtype\()0) + .endr +.ifc \regtype,t + lb a7, 7(\regtype\()0) +.elseif \regtype == s + lb s7, 7(\regtype\()0) +.endif + lb \regtype\()0, 0(\regtype\()0) +.endm + +.macro epel_load dst len op name type from_mem regtype + li a5, 64 +.ifc \from_mem, 1 + vle8.v v22, (a2) +.ifc \type,v + sub a2, a2, a3 + vle8.v v20, (a2) + sh1add a2, a3, a2 + vle8.v v24, (a2) + add a2, a2, a3 + vle8.v v26, (a2) + add a2, a2, a3 + vle8.v v28, (a2) + add a2, a2, a3 + vle8.v v30, (a2) +.elseif \type == h + addi a2, a2, -1 + vle8.v v20, (a2) + addi a2, a2, 2 + vle8.v v24, (a2) + addi a2, a2, 1 + vle8.v v26, (a2) + addi a2, a2, 1 + vle8.v v28, (a2) + addi a2, a2, 1 + vle8.v v30, (a2) +.endif + +.ifc \name,smooth + vwmulu.vx v16, v24, \regtype\()4 + vwmaccu.vx v16, \regtype\()2, v20 + vwmaccu.vx v16, \regtype\()5, v26 + vwmaccsu.vx v16, \regtype\()6, v28 +.else + vwmulu.vx v16, v28, \regtype\()6 + vwmaccsu.vx v16, \regtype\()2, v20 + vwmaccsu.vx v16, \regtype\()5, v26 +.endif + +.ifc \regtype,t + vwmaccsu.vx v16, a7, v30 +.elseif \regtype == s + vwmaccsu.vx v16, s7, v30 +.endif + +.ifc \type,v + .rept 6 + sub a2, a2, a3 + .endr + vle8.v v28, (a2) + sub a2, a2, a3 + vle8.v v26, (a2) + sh1add a2, a3, a2 + add a2, a2, a3 +.elseif \type == h + addi a2, a2, -6 + vle8.v v28, (a2) + addi a2, a2, -1 + vle8.v v26, (a2) + addi a2, a2, 3 +.endif + +.ifc \name,smooth + vwmaccsu.vx v16, \regtype\()1, v28 +.else + vwmaccu.vx v16, \regtype\()1, v28 + vwmulu.vx v28, v24, \regtype\()4 +.endif + vwmaccsu.vx v16, \regtype\()0, v26 + vwmulu.vx v20, v22, \regtype\()3 +.else +.ifc \name,smooth + vwmulu.vx v16, v8, \regtype\()4 + vwmaccu.vx v16, \regtype\()2, v4 + vwmaccu.vx v16, \regtype\()5, v10 + vwmaccsu.vx v16, \regtype\()6, v12 + vwmaccsu.vx v16, \regtype\()1, v2 +.else + vwmulu.vx v16, v2, \regtype\()1 + vwmaccu.vx v16, \regtype\()6, v12 + vwmaccsu.vx v16, \regtype\()5, v10 + vwmaccsu.vx v16, \regtype\()2, v4 + vwmulu.vx v28, v8, \regtype\()4 +.endif + vwmaccsu.vx v16, \regtype\()0, v0 + vwmulu.vx v20, v6, \regtype\()3 + +.ifc \regtype,t + vwmaccsu.vx v16, a7, v14 +.elseif \regtype == s + vwmaccsu.vx v16, s7, v14 +.endif + +.endif + vwadd.wx v16, v16, a5 + vsetvlstatic16 \len + +.ifc \name,smooth + vwadd.vv v24, v16, v20 +.else + vwadd.vv v24, v16, v28 + vwadd.wv v24, v24, v20 +.endif + vnsra.wi v24, v24, 7 + vmax.vx v24, v24, zero + vsetvlstatic8 \len, zero, 32, m2 + + vnclipu.wi \dst, v24, 0 +.ifc \op,avg + vle8.v v24, (a0) + vaaddu.vv \dst, \dst, v24 +.endif + +.endm + +.macro epel_load_inc dst len op name type from_mem regtype + epel_load \dst \len \op \name \type \from_mem \regtype + add a2, a2, a3 +.endm + +.macro epel len op name type vlen +func ff_\op\()_8tap_\name\()_\len\()\type\()_rvv\vlen\(), zve32x + epel_filter \name \type t +.if \vlen < 256 + vsetvlstatic8 \len a5 32 m2 +.else + vsetvlstatic8 \len a5 64 m2 +.endif +.ifc \op,avg + csrwi vxrm, 0 +.endif + +1: + addi a4, a4, -1 + epel_load v30 \len \op \name \type 1 t + vse8.v v30, (a0) +.if \len == 64 && \vlen < 256 + addi a0, a0, 32 + addi a2, a2, 32 + epel_load v30 \len \op \name \type 1 t + vse8.v v30, (a0) + addi a0, a0, -32 + addi a2, a2, -32 +.endif + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + .irp len 64, 32, 16, 8, 4 copy_avg \len .irp op put avg bilin_h_v \len \op h a5 bilin_h_v \len \op v a6 + .irp name regular sharp smooth + .irp type h v + epel \len \op \name \type 128 + epel \len \op \name \type 256 + .endr + .endr .endr .endr diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index 79330b4968..1638daaae3 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -81,33 +81,39 @@ void ff_tm_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, void ff_tm_4x4_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); -#define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) \ -void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ +#define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx, min_vlen) \ +void ff_put_8tap_##type##_##SIZE##h_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_put_8tap_##type##_##SIZE##v_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_put_8tap_##type##_##SIZE##v_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_put_8tap_##type##_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_put_8tap_##type##_##SIZE##hv_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_8tap_##type##_##SIZE##h_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_8tap_##type##_##SIZE##v_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_8tap_##type##_##SIZE##v_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_8tap_##type##_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_8tap_##type##_##SIZE##hv_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); @@ -146,23 +152,41 @@ void ff_avg##SIZE##_rvv(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); -VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR); - -VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP); - -VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH); +VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR, 128); + +VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP, 128); + +VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH, 128); + +VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR, 256); + +VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP, 256); + +VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH, 256); VP9_BILINEAR_RISCV_RVV_FUNC(64); VP9_BILINEAR_RISCV_RVV_FUNC(32); diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index ec6db51774..c78d22a7f3 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -49,7 +49,8 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) # endif #if HAVE_RVV - if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I32) && ff_rv_vlen_least(128)) { + if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I32)) { + if (ff_rv_vlen_least(128)) { #define init_fpel(idx1, sz) \ dsp->mc[idx1][FILTER_8TAP_SMOOTH ][1][0][0] = ff_avg##sz##_rvv; \ @@ -63,6 +64,26 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) init_fpel(3, 8); init_fpel(4, 4); +#undef init_fpel + +#define init_subpel1(idx1, idx2, idxh, idxv, sz, dir, type, vlen) \ + dsp->mc[idx1][FILTER_8TAP_SMOOTH ][idx2][idxh][idxv] = \ + ff_##type##_8tap_smooth_##sz##dir##_rvv##vlen; \ + dsp->mc[idx1][FILTER_8TAP_REGULAR][idx2][idxh][idxv] = \ + ff_##type##_8tap_regular_##sz##dir##_rvv##vlen; \ + dsp->mc[idx1][FILTER_8TAP_SHARP ][idx2][idxh][idxv] = \ + ff_##type##_8tap_sharp_##sz##dir##_rvv##vlen; + +#define init_subpel2(idx, idxh, idxv, dir, type, vlen) \ + init_subpel1(0, idx, idxh, idxv, 64, dir, type, vlen); \ + init_subpel1(1, idx, idxh, idxv, 32, dir, type, vlen); \ + init_subpel1(2, idx, idxh, idxv, 16, dir, type, vlen); \ + init_subpel1(3, idx, idxh, idxv, 8, dir, type, vlen); \ + init_subpel1(4, idx, idxh, idxv, 4, dir, type, vlen) + + init_subpel2(0, 1, 0, h, put, 128); + init_subpel2(1, 1, 0, h, avg, 128); + dsp->mc[0][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_64v_rvv; dsp->mc[0][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_64h_rvv; dsp->mc[0][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_64v_rvv; @@ -84,8 +105,23 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) dsp->mc[4][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_4v_rvv; dsp->mc[4][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_4h_rvv; -#undef init_fpel + if (flags & AV_CPU_FLAG_RVB_ADDR) { + init_subpel2(0, 0, 1, v, put, 128); + init_subpel2(1, 0, 1, v, avg, 128); } + + } + if (ff_rv_vlen_least(256)) { + init_subpel2(0, 1, 0, h, put, 256); + init_subpel2(1, 1, 0, h, avg, 256); + + if (flags & AV_CPU_FLAG_RVB_ADDR) { + init_subpel2(0, 0, 1, v, put, 256); + init_subpel2(1, 0, 1, v, avg, 256); + } + } + } + #endif #endif } From patchwork Mon May 13 16:59:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48864 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp466273pzb; Mon, 13 May 2024 10:01:15 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU6/M6DadMdQuubOuFHvLpsbXXPVQ4qCryCFo8l2GcoH/DWSnbktHLqi3tbkkPVrmOWys7PwF6F8Cxzc5EO4RflDZywiwIHaafnbA== X-Google-Smtp-Source: AGHT+IFCJIXzUZ322W6PdDRWueQHi3VdjdnjsGpZiUhjsFmLVnORVhu1hWuFQxyF5jcoFSS/a4zG X-Received: by 2002:a17:906:1406:b0:a5a:7a4e:7e80 with SMTP id a640c23a62f3a-a5a7a4e7effmr82807766b.72.1715619675108; Mon, 13 May 2024 10:01:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715619675; cv=none; d=google.com; s=arc-20160816; b=Vo1eg05xIrtGDGebs+azqQg3iW8iiyZbeXyYlzgGchie6LtnQ1YnlPSXd28dwUKCT8 jMrOcJlMRY/ZedXBWuBaDRiAEUs29G5x0UN1cz4iPWkxNjtuFp5bq86dPnucUMrVAQ/s 2G++IPk0xgErGlSchBxERCA95d964JZCZB5HQr3rf4rBKlBwL9gZdTVCEL13I8WCcDF9 qhtqcmBAXhxd8/fBZcxBong4LXeXpdlz3kVZgEMdIYikKw48qooyoyM/DiwsFg005OJO Nu7/PjiXAnDccU5XGaaqtYRPRFkJuL24H+QivM7LW9cJHqTKX2mu8ZFPwGJV/RH7Uv7Y +26Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=P5o6BQ02ZnB4f8yDn3tENV4D6pakcsbRTxJgmDOSqJI=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=il+yF/QclkXimdNrPltyKnu3r4TVrKqcjLd2CSyMG5uafI2FDho3PXpG/3PuApQOw/ 8WIgr6srE7ZMxTcZSpDD9yu+7PXcQFKt4hh9Obc8BqpAOKQlqZSSosYP/FI6DeFaXl4X dcqy2/S+WkwtzJ0SoF68SyBrBSMZlg72fGLAd8GKGzSASNjELsgbygJ1zId3zMHPloN9 +Hj4F20dz2xqJjmyH1NMqlMka5s8UKwCg1J+Pjmgi7luQiOgIe+ZtufvaB9xNBUcFJ5v YI7ECBKss51nE+xMJA6NHAue0KmCDvzWZU813tepb+BgkM/RJbgZ+zoq52fM9jcCMqmV K8Qg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=L3qycfVi; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17c2d660si530524466b.1040.2024.05.13.10.01.14; Mon, 13 May 2024 10:01:15 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=L3qycfVi; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2013A68D6F6; Mon, 13 May 2024 20:00:04 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-251-27.mail.qq.com (out203-205-251-27.mail.qq.com [203.205.251.27]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0997668D68E for ; Mon, 13 May 2024 19:59:52 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715619583; bh=L1gMSaWNd94/n3lU2iNGhyMXkQy2/Tomn529g9H6IMU=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=L3qycfViyhf2jSYEZzKi6dKBqqQZv/+RBw5hH3lSd+lIXNm6d35AtoiixhOWkQRU6 7RLPeAynKSA2XNHfS45SFOnxnWbEqBCAPHREUzU+0aNkVzK+LlokL6zMQk0fCGVo/m JUYNwAvUlPFTzjKgTH6RGPMmIWqfvwy09gT+V8Mc= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id EE3A4485; Tue, 14 May 2024 00:59:35 +0800 X-QQ-mid: xmsmtpt1715619582tfw2lbvkr Message-ID: X-QQ-XMAILINFO: Nz4vUwhpFJ9NMj0nGmf7IM5ZmcvIE/TLufuvsMQi5+miHA+xAHCT0PUbWj1X8o WySugUkl8hZzi/11Z91qGudl052Cp9LAJzGjxOJ8piInDLiIWa0VPOUnGmMg0zPf/JBAAoXircQB 0ICo01A4ecN60RT1MnOAfaGcAWtsicGXA2Ok9uzPZXF+WBG7y1P+1z7uIHAAH9Z6NtCGC07l8c/M 0v3/XaQ3Rnha+IdLW/EHKrL1YuKMExiYbdlwhj289kSZ8puTFhuieVw32Kw/lgLaEvG4bfeG0zf6 5PUuXFLUAu+xVCBgDSlP0zMld95H+es+g5Jz0K+dLjmrmWEEI4wQZtnEOkvmHcaP1PlUoI2mGZZ0 ssTRm8IKSjEGMgYPnA0jFYkbX3673+c8h3tzVT/moYEqnURDX1ZTQmgiSuuq6P8JJ7o3f+99lLBM budRx3+BuK4XQkr37C3X4aXhjNqwyXmjqiZA/Pr4RZYjBxy1hXgVBLRq2a8pF56NQzcUkgbmtS/W 4ZwlQJoyGi/mez1dOhQQZJF/8PdBy3xlytZ8WVkT8sUqj+re9B6TRYIyiv54aw07a65HTgvBwLdJ bCUadmFi6vUuEbxi1ItbneGQbWwfBhaukmltYqGbHh95zf0pNvVcxLB/Ylg7wn94MMb87h1Ek+zT eSK+hm85UyJ/KuxFLFgBjuTh4LUrhYOMYk+Hk1ggg1MCK2iXWG0Qh/Pia2yVD0wPDjrmk1Er8pGh yBmomjtkJMXVieTpbHD33l4zcnt9dSTPUk3WwrbdSSOKHcH0HLJC6FrF18SUYfE02hkwTeXByapU k0Lr3mFfQdC1H7Jy6nVv/vLylrd+f4pWGid384SPWY8uBmeeFqCfMcBj2JVbmsXHr3TSZsCqdrKG oSmsfxHC08XKwYo4ZToZruoHbzct8fvFazqAhu+b7SAqBzIjn6Jeo= X-QQ-XMRINFO: MSVp+SPm3vtS1Vd6Y4Mggwc= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 14 May 2024 00:59:25 +0800 X-OQ-MSGID: <20240513165926.1467967-8-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240513165926.1467967-1-uk7b@foxmail.com> References: <20240513165926.1467967-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 8/9] lavc/vp9dsp: R-V V mc bilin hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ubo/3QgBpEOE From: sunyuechi C908: vp9_avg_bilin_4hv_8bpp_c: 11.0 vp9_avg_bilin_4hv_8bpp_rvv_i64: 3.7 vp9_avg_bilin_8hv_8bpp_c: 38.7 vp9_avg_bilin_8hv_8bpp_rvv_i64: 7.2 vp9_avg_bilin_16hv_8bpp_c: 147.0 vp9_avg_bilin_16hv_8bpp_rvv_i64: 14.2 vp9_avg_bilin_32hv_8bpp_c: 574.5 vp9_avg_bilin_32hv_8bpp_rvv_i64: 42.7 vp9_avg_bilin_64hv_8bpp_c: 2311.5 vp9_avg_bilin_64hv_8bpp_rvv_i64: 201.7 vp9_put_bilin_4hv_8bpp_c: 10.0 vp9_put_bilin_4hv_8bpp_rvv_i64: 3.2 vp9_put_bilin_8hv_8bpp_c: 35.2 vp9_put_bilin_8hv_8bpp_rvv_i64: 6.5 vp9_put_bilin_16hv_8bpp_c: 133.7 vp9_put_bilin_16hv_8bpp_rvv_i64: 13.0 vp9_put_bilin_32hv_8bpp_c: 538.2 vp9_put_bilin_32hv_8bpp_rvv_i64: 39.7 vp9_put_bilin_64hv_8bpp_c: 2114.0 vp9_put_bilin_64hv_8bpp_rvv_i64: 153.7 --- libavcodec/riscv/vp9_mc_rvv.S | 34 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 10 ++++++++++ 2 files changed, 44 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index c633809675..22ae194367 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -104,6 +104,39 @@ func ff_\op\()_bilin_\len\()\type\()_rvv, zve32x endfunc .endm +.macro bilin_hv len op +func ff_\op\()_bilin_\len\()hv_rvv, zve32x +.ifc \op,avg + csrwi vxrm, 0 +.endif + vsetvlstatic8 \len t0 64 + neg t1, a5 + neg t2, a6 + li t4, 8 + bilin_load v24, \len, put, h, a5 + add a2, a2, a3 +1: + addi a4, a4, -1 + bilin_load v4, \len, put, h, a5 + vwmulu.vx v16, v4, a6 + vwmaccsu.vx v16, t2, v24 + vwadd.wx v16, v16, t4 + vnsra.wi v16, v16, 4 + vadd.vv v0, v16, v24 +.ifc \op,avg + vle8.v v16, (a0) + vaaddu.vv v0, v0, v16 +.endif + vse8.v v0, (a0) + vmv.v.v v24, v4 + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + const subpel_filters_regular .byte 0, 0, 0, 128, 0, 0, 0, 0 .byte 0, 1, -5, 126, 8, -3, 1, 0 @@ -334,6 +367,7 @@ endfunc .irp op put avg bilin_h_v \len \op h a5 bilin_h_v \len \op v a6 + bilin_hv \len \op .irp name regular sharp smooth .irp type h v epel \len \op \name \type 128 diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index c78d22a7f3..f3e9302a73 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -104,6 +104,16 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) dsp->mc[4][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_4h_rvv; dsp->mc[4][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_4v_rvv; dsp->mc[4][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_4h_rvv; + dsp->mc[0][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_64hv_rvv; + dsp->mc[0][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_64hv_rvv; + dsp->mc[1][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_32hv_rvv; + dsp->mc[1][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_32hv_rvv; + dsp->mc[2][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_16hv_rvv; + dsp->mc[2][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_16hv_rvv; + dsp->mc[3][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_8hv_rvv; + dsp->mc[3][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_8hv_rvv; + dsp->mc[4][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_4hv_rvv; + dsp->mc[4][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_4hv_rvv; if (flags & AV_CPU_FLAG_RVB_ADDR) { init_subpel2(0, 0, 1, v, put, 128); From patchwork Mon May 13 16:59:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48862 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp466037pzb; Mon, 13 May 2024 10:00:59 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXZJZncnnJ0xa0ZtU365BIqRdQn4SHFe0ZY58CtgyyiNM2H19XxXn2WhoRXSsS7lyP9Xl74zSkNIcta97DjLePEklUQ1evfP2q3BQ== X-Google-Smtp-Source: AGHT+IE3zxxIY3PKxteJp7DJGLwA1kHzVZ2cxT5HeuCx9F66g5ITUvGmc1sQLdwBluqOwtR0Pczh X-Received: by 2002:a05:6402:1a49:b0:572:cfa4:57ea with SMTP id 4fb4d7f45d1cf-5734d70991cmr6067620a12.4.1715619659085; Mon, 13 May 2024 10:00:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715619659; cv=none; d=google.com; s=arc-20160816; b=K35Ws9vHFl0pHhKOgrGurNQ6YOCMcz3sZMlUWtqNcTxbf45pHFUvm29SpaJzSt0WH5 KqCosaTFPEDcJWTrv5GRqvRpWlcpyCCK86YHeO5HBzuaffNXLqwzdpAY0OfdrAvPCQt9 g9XYP/okW3iZhXsOGBp0TajKulmn620mMHVdZnXWexdXhTb1nKymLBzyD0VoVzrxcwae 7kmOKK9DuepPnY6D0DbFFdYqw2SYZ2shdtwVWSrDXPjHxtP2S2rblQTCXsPly1A28GDQ jx34hDEFsXOKxqnAWPx2fA+SpSi2uU3G3rgSpZPp4NYBh3+BeCMNvtc6DXQWWtYQpLpA 68LQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=JBWnpFLu4Y36WF8sGfuByROOGhQ7SHoBa6vUfZ5hlDQ=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=FUKnDDY5wQv0ufMYFEolC7tJVmvSa7RYdzsQDLUtCdm/3RePeWXsDmXAM9OA/bgV70 3Tu6IQS20IcnZt54Z3bJbT98itglWNFa0vpWSJKQHdmOrn+iF0JYg+q7MJZ+RFC8piUv lPjkmbCkj5xuCIndJWElkfPBgVLJhy4NRIyjn1sydbt7pZtm0/p4M3cB2IU5z+tpXrYa eBBjDHU+7DaFWqk7J2rYRmAf+CUJcNuV81+U0CJrvPYm6cWvkllYtiaYngRV4ugZPRmA zVKTwtcTEmuvinBhRvRmQuLE7pszVgxng2ZgHiG3rFfLqtPq4fTHfZIrRl7Ojz0WLVTK ONZA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=CFk9u+mr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-573456fb06dsi4511789a12.179.2024.05.13.10.00.53; Mon, 13 May 2024 10:00:59 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=CFk9u+mr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9878A68D119; Mon, 13 May 2024 20:00:01 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-155.mail.qq.com (out203-205-221-155.mail.qq.com [203.205.221.155]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C674568D56F for ; Mon, 13 May 2024 19:59:54 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715619583; bh=aAqD/Njsm3EuOH9OY/T1Ch6/OjLHj51O8EBcr2iNecY=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=CFk9u+mr25OBR2hHHQel4fUfPanq6j5pMm93KZIG95MizvDoZfN0PHUNheRGPFJY2 NVYQmlYLi5o5REjgjDVg35MNB30iKf7ayGhRwGhaUFyoLmgeozxEOUPcioD6QSmNkD 72genVu8qQaFzTsF7v8bN0y+sk8YGqCzblkmgJVk= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id EE3A4485; Tue, 14 May 2024 00:59:35 +0800 X-QQ-mid: xmsmtpt1715619583tzv25akxl Message-ID: X-QQ-XMAILINFO: OKKHiI6c9SH3v3POLo0vp158HJ9nMXsZukbM7Kzaknza0emuHfDInwzVUHhhzv NKaveyNKO02WaoDKfkikKNJa2b69gMXLHYKddbUXzlMLQFbFmP2+3I4LQMAYi0MYPwlt5bTtRtBB ZZ2QACs92zPsqnlM5YZOCwy44A5QjHPlFSjELfsOMX9D/vnReUOwVK/pMU1DWdxcP5saGPM8FaSd SzhfYu+W4W/LPtwIBlEjNCt8cJjWF9dweWJ+/od4fTZTFNfaYLb+As9miDXa4+h5Wn+r2WfJGlmL ujLINoj2HYTwnltc6RzdocmTcn5J1Jz14svAyXL5UjlbkH3vNvMyXlmfV8Jfwyrhq6hdKo6NIcPz 4FdsdNcUqS7yyR7tjUauIsckZ5znn2JP7flu2B2nbwIbMZU3RwPkYkGt7kIeJi3NQoaZkTnh1ykm NNS6GiRtfyl+Cy3CUwfmuenM6S6Ku/+/Gcd3+E4RqXM43Sp/m7LMsD/cEuce6dWQoGwmFVIwWuS+ q4A/1hbVYAQQ2/mCP/nDTOkUR5q22Sqlja4kRNNOGwb+7v8k0T5Tq1ErLnZeCiZC6/2HlMGMe5Xc fIV/9gjIpZPUoASz49PXG4Tv6FCUI+GwwdwCtWUV/lY6VHjs6GqtZYdm2rTg68zoxvfGM6z1M8jM YmPPuBykvtdN0EGVXAfqf6gDM7Iddy1gQ+Nz/Rr/Gkvnrx52rurg+wpQaDBmgqHWnhRI2Z+KGvfb SzAX0kcuQpjsnfw2cnbp3owUj6+dD4PP825h0f9kAR28R7tRop7Dfb/wPoTCkWNUGIIyoESCpV6T rmEclLq1VvmxmfnbS2q/Tl/ArNjZxV6t2AU34HeUloMpnKp1r3ZlOVXOKDTm+03ngbeO0NtcjUmg y+7R4Qkve1ocxsVT68XTMZ64+W/cRBn+eotKd6cczbIMzMrrd+Mc145S8TJDODNwVpt3iqymmgAT cOGNJXJFSgv+sVP+uo4g== X-QQ-XMRINFO: OD9hHCdaPRBwq3WW+NvGbIU= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 14 May 2024 00:59:26 +0800 X-OQ-MSGID: <20240513165926.1467967-9-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240513165926.1467967-1-uk7b@foxmail.com> References: <20240513165926.1467967-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 9/9] lavc/vp9dsp: R-V V mc tap hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: uA1giY+xJ0ef From: sunyuechi C908 X60 vp9_avg_8tap_smooth_4hv_8bpp_c : 32.0 28.2 vp9_avg_8tap_smooth_4hv_8bpp_rvv_i32 : 15.0 13.2 vp9_avg_8tap_smooth_8hv_8bpp_c : 98.0 86.2 vp9_avg_8tap_smooth_8hv_8bpp_rvv_i32 : 23.7 21.0 vp9_avg_8tap_smooth_16hv_8bpp_c : 355.5 297.0 vp9_avg_8tap_smooth_16hv_8bpp_rvv_i32 : 62.7 41.2 vp9_avg_8tap_smooth_32hv_8bpp_c : 1273.0 1099.7 vp9_avg_8tap_smooth_32hv_8bpp_rvv_i32 : 133.7 119.2 vp9_avg_8tap_smooth_64hv_8bpp_c : 4933.0 4240.5 vp9_avg_8tap_smooth_64hv_8bpp_rvv_i32 : 506.7 227.0 vp9_put_8tap_smooth_4hv_8bpp_c : 30.2 27.0 vp9_put_8tap_smooth_4hv_8bpp_rvv_i32 : 14.5 12.7 vp9_put_8tap_smooth_8hv_8bpp_c : 91.2 81.2 vp9_put_8tap_smooth_8hv_8bpp_rvv_i32 : 22.7 20.2 vp9_put_8tap_smooth_16hv_8bpp_c : 329.2 277.7 vp9_put_8tap_smooth_16hv_8bpp_rvv_i32 : 44.7 40.0 vp9_put_8tap_smooth_32hv_8bpp_c : 1183.7 1022.7 vp9_put_8tap_smooth_32hv_8bpp_rvv_i32 : 130.7 116.5 vp9_put_8tap_smooth_64hv_8bpp_c : 4502.7 3954.5 vp9_put_8tap_smooth_64hv_8bpp_rvv_i32 : 496.0 224.7 --- libavcodec/riscv/vp9_mc_rvv.S | 75 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 8 ++++ 2 files changed, 83 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index 22ae194367..958460d165 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -362,6 +362,77 @@ func ff_\op\()_8tap_\name\()_\len\()\type\()_rvv\vlen\(), zve32x endfunc .endm +#if __riscv_xlen == 64 +.macro epel_hv_once len name op + sub a2, a2, a3 + sub a2, a2, a3 + sub a2, a2, a3 + .irp n 0 2 4 6 8 10 12 14 + epel_load_inc v\n \len put \name h 1 t + .endr + addi a4, a4, -1 +1: + addi a4, a4, -1 + epel_load v30 \len \op \name v 0 s + vse8.v v30, (a0) + vmv.v.v v0, v2 + vmv.v.v v2, v4 + vmv.v.v v4, v6 + vmv.v.v v6, v8 + vmv.v.v v8, v10 + vmv.v.v v10, v12 + vmv.v.v v12, v14 + epel_load v14 \len put \name h 1 t + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + epel_load v30 \len \op \name v 0 s + vse8.v v30, (a0) +.endm + +.macro epel_hv op name len vlen +func ff_\op\()_8tap_\name\()_\len\()hv_rvv\vlen\(), zve32x + addi sp, sp, -64 + .irp n 0,1,2,3,4,5,6,7 + sd s\n, \n\()<<3(sp) + .endr +.if \len == 64 && \vlen < 256 + addi sp, sp, -48 + .irp n 0,1,2,3,4,5 + sd a\n, \n\()<<3(sp) + .endr +.endif +.ifc \op,avg + csrwi vxrm, 0 +.endif + epel_filter \name h t + epel_filter \name v s +.if \vlen < 256 + vsetvlstatic8 \len a6 32 m2 +.else + vsetvlstatic8 \len a6 64 m2 +.endif + epel_hv_once \len \name \op +.if \len == 64 && \vlen < 256 + .irp n 0,1,2,3,4,5 + ld a\n, \n\()<<3(sp) + .endr + addi sp, sp, 48 + addi a0, a0, 32 + addi a2, a2, 32 + epel_filter \name h t + epel_hv_once \len \name \op +.endif + .irp n 0,1,2,3,4,5,6,7 + ld s\n, \n\()<<3(sp) + .endr + addi sp, sp, 64 + + ret +endfunc +.endm +#endif + .irp len 64, 32, 16, 8, 4 copy_avg \len .irp op put avg @@ -373,6 +444,10 @@ endfunc epel \len \op \name \type 128 epel \len \op \name \type 256 .endr + #if __riscv_xlen == 64 + epel_hv \op \name \len 128 + epel_hv \op \name \len 256 + #endif .endr .endr .endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index f3e9302a73..cc5878f414 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -118,6 +118,10 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) if (flags & AV_CPU_FLAG_RVB_ADDR) { init_subpel2(0, 0, 1, v, put, 128); init_subpel2(1, 0, 1, v, avg, 128); +# if __riscv_xlen == 64 + init_subpel2(0, 1, 1, hv, put, 128); + init_subpel2(1, 1, 1, hv, avg, 128); +# endif } } @@ -128,6 +132,10 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) if (flags & AV_CPU_FLAG_RVB_ADDR) { init_subpel2(0, 0, 1, v, put, 256); init_subpel2(1, 0, 1, v, avg, 256); +# if __riscv_xlen == 64 + init_subpel2(0, 1, 1, hv, put, 256); + init_subpel2(1, 1, 1, hv, avg, 256); +# endif } } }