From patchwork Sun May 12 10:03:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48807 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1706:b0:1af:cdee:28c5 with SMTP id nv6csp505813pzb; Sun, 12 May 2024 03:04:53 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUasdvEG5ezbwiPDax8fxep6ujwgYOyoLisrUXMtkCONFad5okvMdUbIZQVtcNgrcoqETRkosr+FemAr4eSOw8brdvvwWz3D1PxYg== X-Google-Smtp-Source: AGHT+IFi10l6/4QKy7kPI0xODK6cGBP3wh9dksKq5sFXIVj6IzdABBNk+JXh4NImLInq6ebqwI3X X-Received: by 2002:a17:906:d8cd:b0:a59:dbb0:ddcf with SMTP id a640c23a62f3a-a5a5a60c1e4mr104805366b.0.1715508293318; Sun, 12 May 2024 03:04:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715508293; cv=none; d=google.com; s=arc-20160816; b=G8pk0e8zZMwpEHhPgA99em2Vg20KfpNrv/lNx1GRFiu1D0icN0rIQFVTFpJeFy9eUB Uk5OCzCYLrjKxNjZv0VwnIO7DeEhM0Ukjlr7q6osBg7FuKIxG3KCRmhpQCsw8Ou1aPsZ asB/JT2DPgbOAi+mfsTZLxWPzhncEwyfdgty2fVFzS0g7ToiuBFfxexeEULyTAjrn8W4 RwV8k4aptEUpsx0sbKn6UTNSi6CLR2ri8ZXyJppffa12c4Rzw/u9XFUS24XrRZ8EoAg3 RrCHRh8RV4BhqDSCCKkMe3R9ypnVKR4PoM9+i1/9i0dhVXTqVkpD4PKInS6BBWSE2eEl 0jhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=VKFAyAnBMolmjFEM7lOA7ZTXDnSrRmAs47jQhm2jIU8=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=cZYBUZt8h++Ul97b4xkARe6TXCHLu1ks06oxqnWJ0W/cyZIVhVF/fRKK/6dRWuYa29 M18sDeNd7f9XsEZihkUuEx00Ko8pBE6IBJP+sKYs//GWsr0mDBxn5qPOZn7GHxXUpjDd RIURLs3p/t7PkYwDcAKeDUbatoGcS6lv5jGNHsKACKVSLUue6sqwilMepSrj4RkM072G 53YJIu6U0KFlu29bOGwOnM8sU8ZwT557msGkUTAdFihG+C9rCmaU4B1QTls8G5FDC69f iCrysFJ44eO83K3hePcnIiUayX6AtMcpKu0ycnywCn6oOWM9MfDW1W9K4ssgYcFAnYFY QZzA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=m9aTRxIY; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17b21afesi377431166b.306.2024.05.12.03.04.52; Sun, 12 May 2024 03:04:53 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=m9aTRxIY; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 06C1B68D605; Sun, 12 May 2024 13:04:01 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-191.mail.qq.com (out203-205-221-191.mail.qq.com [203.205.221.191]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D8A0168D42D for ; Sun, 12 May 2024 13:03:46 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715508214; bh=bjfzEE5kmuh1+26lMfUPH4O5zovI1I3YQLqqwWHLwHQ=; h=From:To:Cc:Subject:Date; b=m9aTRxIYeusalkctUfNNMAgsqfGPnAsyoJXdnMF57QkKXx4cNGeaP1elj5623Ajse j2zM9rv5YU5eruncrgwxUmPgRkXJViSohW79JZnnz1uF0tOMsO0plOaZpP9BweTKCe TrxoLz2n2TegalAu0oUd71o0zfRfv1syJkVBFdK0= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id E135E56; Sun, 12 May 2024 18:03:33 +0800 X-QQ-mid: xmsmtpt1715508213tkhh3nozi Message-ID: X-QQ-XMAILINFO: NXBlMEyQi3god1bR5LyJClz7bJ4V6ZFOm5Pk7PIMjZzaTY3/JPdAGMeyaleiqn Bp1KX0blfIkpaDsnlGq917Xk/M4tZbeUwCvzJArwWkuV96CvwbAcDw0bVIlxJfMQTiZrzdBwn4Ad j2H4PWQe69ll3406mLsSCYXOYj29iCwTOiNa2t3kffhq+UIQS2JgwsIdWkHE668g06F0nFMVfYgx znmv+/C32e9BL9ob5L0hrecsov3I8R3yAJYMXW713DXOhsdq1+aVhenrK2+Vt9c+eBLAI5oispw5 t61k08jCAzFBQtjvG3wv2mXFQCF4FPUX94HV2kuRdbQkHNoaWmG6ibACz2CNgzyT50PI2TQ649sH W91ecnbKchkbdNGldoDTUPor7QOrbWrL4XTInt1npJd6S6E0bsdiJXe+7qGW9pSqP2673nTSjDXf WvUZ9sDZEwEoysdUzFQhXijRA8ovq/V7+wREvvkFxW+QChzI3BAb73eBCAMfWF11rQTcfsbAPo6y A9pO2Ik4L4FDpZch/L/e89TnGRgp1hoF3Kzvbn5XD5Ne4boRkeXB983jat5cmjG0SrycJvZA9LkO uqXRVc3SOhSeanFCpiGXz8j4gYNTXrhDNy5T+fal9RI3PRY6ilgVFLbYo++Yt+Sug09F8VB/mUb6 ABrerKVBO0GAWzEgI2Ksx672erJaPppHQ8ouz90+U4/HPKL4RjUyTH8nIn1olkRjnF63kqgyi/Vm nAd6JMlAbXqkhuRphK+Q9WUU407XN0at/MyVO2KVQxJDRVb+pbeZ536MaYvcnNj/jvHKM8/CRhQk BqAUETaLUQDN1Pj5x1LNMgpDsX77aRA+Oi5UM3vb80IlsDqKEhKuuMBKmbOhgeTv+fXhPboauvnN WoxhZWxi+t0hHA3j2vLXZQ1NLZulguLcPrzMm7eOmUetgQ7tXjceM= X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 12 May 2024 18:03:23 +0800 X-OQ-MSGID: <20240512100331.995415-1-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 1/9] lavc/vp9dsp: R-V ipred vert X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: JsJMc1S5RFjk From: sunyuechi C908: vp9_vert_8x8_8bpp_c: 22.0 vp9_vert_8x8_8bpp_rvi: 15.7 vp9_vert_16x16_8bpp_c: 71.2 vp9_vert_16x16_8bpp_rvi: 39.0 vp9_vert_32x32_8bpp_c: 300.2 vp9_vert_32x32_8bpp_rvi: 135.2 --- libavcodec/riscv/Makefile | 1 + libavcodec/riscv/vp9_intra_rvi.S | 71 ++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 6 +++ libavcodec/riscv/vp9dsp_init.c | 63 ++++++++++++++++------------ 4 files changed, 114 insertions(+), 27 deletions(-) create mode 100644 libavcodec/riscv/vp9_intra_rvi.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 89273b1cad..ccd060c666 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -62,6 +62,7 @@ OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_init.o RV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvi.o RVV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvv.o OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9dsp_init.o +RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o diff --git a/libavcodec/riscv/vp9_intra_rvi.S b/libavcodec/riscv/vp9_intra_rvi.S new file mode 100644 index 0000000000..16b6bdb25a --- /dev/null +++ b/libavcodec/riscv/vp9_intra_rvi.S @@ -0,0 +1,71 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/riscv/asm.S" + +#if __riscv_xlen >= 64 +func ff_v_32x32_rvi + ld t0, (a3) + ld t1, 8(a3) + ld t2, 16(a3) + ld t3, 24(a3) + .rept 16 + add a7, a0, a1 + sd t0, (a0) + sd t1, 8(a0) + sd t2, 16(a0) + sd t3, 24(a0) + sh1add a0, a1, a0 + sd t0, (a7) + sd t1, 8(a7) + sd t2, 16(a7) + sd t3, 24(a7) + .endr + + ret +endfunc + +func ff_v_16x16_rvi + ld t0, (a3) + ld t1, 8(a3) + .rept 8 + add a7, a0, a1 + sd t0, (a0) + sd t1, 8(a0) + sh1add a0, a1, a0 + sd t0, (a7) + sd t1, 8(a7) + .endr + + ret +endfunc + +func ff_v_8x8_rvi + ld t0, (a3) + .rept 4 + add a7, a0, a1 + sd t0, (a0) + sh1add a0, a1, a0 + sd t0, (a7) + .endr + + ret +endfunc +#endif diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index 25047ed507..f8bc6563a5 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -60,6 +60,12 @@ void ff_dc_129_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); void ff_dc_129_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_v_32x32_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_v_16x16_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_v_8x8_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); #define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) \ void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 69ab39004c..e377d377e3 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -24,38 +24,47 @@ #include "libavcodec/vp9dsp.h" #include "vp9dsp.h" -static av_cold void vp9dsp_intrapred_init_rvv(VP9DSPContext *dsp, int bpp) +static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) { - #if HAVE_RVV - int flags = av_get_cpu_flags(); +#if HAVE_RV + int flags = av_get_cpu_flags(); - if (bpp == 8 && flags & AV_CPU_FLAG_RVV_I64 && ff_get_rv_vlenb() >= 16) { - dsp->intra_pred[TX_8X8][DC_PRED] = ff_dc_8x8_rvv; - dsp->intra_pred[TX_8X8][LEFT_DC_PRED] = ff_dc_left_8x8_rvv; - dsp->intra_pred[TX_8X8][DC_127_PRED] = ff_dc_127_8x8_rvv; - dsp->intra_pred[TX_8X8][DC_128_PRED] = ff_dc_128_8x8_rvv; - dsp->intra_pred[TX_8X8][DC_129_PRED] = ff_dc_129_8x8_rvv; - dsp->intra_pred[TX_8X8][TOP_DC_PRED] = ff_dc_top_8x8_rvv; - } + if (bpp == 8 && (flags & AV_CPU_FLAG_RVI) && (flags & AV_CPU_FLAG_RVB_ADDR)) { +# if __riscv_xlen >= 64 + dsp->intra_pred[TX_32X32][VERT_PRED] = ff_v_32x32_rvi; + dsp->intra_pred[TX_16X16][VERT_PRED] = ff_v_16x16_rvi; + dsp->intra_pred[TX_8X8][VERT_PRED] = ff_v_8x8_rvi; +# endif + } +#if HAVE_RVV + if (bpp == 8 && flags & AV_CPU_FLAG_RVV_I64 && ff_get_rv_vlenb() >= 16) { + dsp->intra_pred[TX_8X8][DC_PRED] = ff_dc_8x8_rvv; + dsp->intra_pred[TX_8X8][LEFT_DC_PRED] = ff_dc_left_8x8_rvv; + dsp->intra_pred[TX_8X8][DC_127_PRED] = ff_dc_127_8x8_rvv; + dsp->intra_pred[TX_8X8][DC_128_PRED] = ff_dc_128_8x8_rvv; + dsp->intra_pred[TX_8X8][DC_129_PRED] = ff_dc_129_8x8_rvv; + dsp->intra_pred[TX_8X8][TOP_DC_PRED] = ff_dc_top_8x8_rvv; + } - if (bpp == 8 && flags & AV_CPU_FLAG_RVV_I32 && ff_get_rv_vlenb() >= 16) { - dsp->intra_pred[TX_32X32][DC_PRED] = ff_dc_32x32_rvv; - dsp->intra_pred[TX_16X16][DC_PRED] = ff_dc_16x16_rvv; - dsp->intra_pred[TX_32X32][LEFT_DC_PRED] = ff_dc_left_32x32_rvv; - dsp->intra_pred[TX_16X16][LEFT_DC_PRED] = ff_dc_left_16x16_rvv; - dsp->intra_pred[TX_32X32][DC_127_PRED] = ff_dc_127_32x32_rvv; - dsp->intra_pred[TX_16X16][DC_127_PRED] = ff_dc_127_16x16_rvv; - dsp->intra_pred[TX_32X32][DC_128_PRED] = ff_dc_128_32x32_rvv; - dsp->intra_pred[TX_16X16][DC_128_PRED] = ff_dc_128_16x16_rvv; - dsp->intra_pred[TX_32X32][DC_129_PRED] = ff_dc_129_32x32_rvv; - dsp->intra_pred[TX_16X16][DC_129_PRED] = ff_dc_129_16x16_rvv; - dsp->intra_pred[TX_32X32][TOP_DC_PRED] = ff_dc_top_32x32_rvv; - dsp->intra_pred[TX_16X16][TOP_DC_PRED] = ff_dc_top_16x16_rvv; - } - #endif + if (bpp == 8 && flags & AV_CPU_FLAG_RVV_I32 && ff_get_rv_vlenb() >= 16) { + dsp->intra_pred[TX_32X32][DC_PRED] = ff_dc_32x32_rvv; + dsp->intra_pred[TX_16X16][DC_PRED] = ff_dc_16x16_rvv; + dsp->intra_pred[TX_32X32][LEFT_DC_PRED] = ff_dc_left_32x32_rvv; + dsp->intra_pred[TX_16X16][LEFT_DC_PRED] = ff_dc_left_16x16_rvv; + dsp->intra_pred[TX_32X32][DC_127_PRED] = ff_dc_127_32x32_rvv; + dsp->intra_pred[TX_16X16][DC_127_PRED] = ff_dc_127_16x16_rvv; + dsp->intra_pred[TX_32X32][DC_128_PRED] = ff_dc_128_32x32_rvv; + dsp->intra_pred[TX_16X16][DC_128_PRED] = ff_dc_128_16x16_rvv; + dsp->intra_pred[TX_32X32][DC_129_PRED] = ff_dc_129_32x32_rvv; + dsp->intra_pred[TX_16X16][DC_129_PRED] = ff_dc_129_16x16_rvv; + dsp->intra_pred[TX_32X32][TOP_DC_PRED] = ff_dc_top_32x32_rvv; + dsp->intra_pred[TX_16X16][TOP_DC_PRED] = ff_dc_top_16x16_rvv; + } +#endif +#endif } av_cold void ff_vp9dsp_init_riscv(VP9DSPContext *dsp, int bpp, int bitexact) { - vp9dsp_intrapred_init_rvv(dsp, bpp); + vp9dsp_intrapred_init_riscv(dsp, bpp); } From patchwork Sun May 12 10:03:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48804 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1706:b0:1af:cdee:28c5 with SMTP id nv6csp505619pzb; Sun, 12 May 2024 03:04:23 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVca+XW7odJ4v2hsQ1op1a0iGIRHkREtuPC4sAi0j1z5ctcIeZ+V+JK41984B00SaIaTLidBr/LUrMdVXI7oopjKedDXWygzGhyGw== X-Google-Smtp-Source: AGHT+IE9z97A0BvPQ0W7+f5cBTgDhohdEWel8shd+e9AOjDzzLZaEgz4nk5cCuC8LwJ+DvzDIRiZ X-Received: by 2002:a17:906:b0c:b0:a59:d5f7:a697 with SMTP id a640c23a62f3a-a5a2d54c47fmr436708266b.9.1715508263105; Sun, 12 May 2024 03:04:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715508263; cv=none; d=google.com; s=arc-20160816; b=bd+oYvRWW6IWqcNJXzIeTYc4a0Pz3Bafk9JgfJAuHrAh2j337Yb2vhquzW8O1hdspe Zu2GSy+P2mBD8wvZTk5OcS2V8eU3k4P8GWARKTgK1Qbp2XpLTMVNhw1ddTZXM1J40lZj CwEi6hV6TB2G14JJATzns11R3MbF8sqO1XPqmt/XotSRzmxRNAfyHgkU5KvGpSAxoNzc 5g/0rqaDXluq2z2vyY+xHhI4JhCytZ7IPhHZZA1o72CUy8bxjIWoCy4yWtWPmYBrVn1t SaRpU86ru02u3aojFSYox7ksGkwfLrmM40iVe5YUG58oDI/Tg8JBRlmh+zxL1YUUdPiw Rnrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=dBRrBRoTXANzv2D5+0V860oZRKhP7Vt52OXWntZYC+I=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=CG0Oc8ARaGZhJeXW8mlGisykXoTaPnSmL/EcyawR9KQA99L20HztSJLTufMBHj1BHP vMPi3O9vDeaMmkZlePaKs4YkRtEIKceLZcC99RJtFoCROOYFck6YThbXCOe0JRJVmeaZ eQa+G7UizSwqzlZwOgCgmmP2anDuoYNS1+ZmjCguzyutYJGlffwEnR3LtSBy1FbKbnL4 ZitKtzYnvykndsfLT1GPKoFMDXD4VBG15nPFv40xZr6vi/Hv+pMfSBxP2hl8kKy00uBr 1CaOsoGkW3R5oblfFvQuI6Br90OBhFdyzQz/4EFEEboIQleDTF+T1/7JN07XhZE6EFAg dPBQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=fHJgB1kN; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17c2d4absi384513466b.1050.2024.05.12.03.04.22; Sun, 12 May 2024 03:04:23 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=fHJgB1kN; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 380C268D621; Sun, 12 May 2024 13:03:57 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-251-73.mail.qq.com (out203-205-251-73.mail.qq.com [203.205.251.73]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DEAC668D4CF for ; Sun, 12 May 2024 13:03:46 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715508215; bh=xZX1KIuXeY29zUArnvprTbwrjaBMJnuKC8PwQ9XSmBQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=fHJgB1kNG01hclPshORB5jyOGyNUMoz8arKgzBKUdwHfxQhuQjAM5S2y7hQ++fDmy AMrWjrTqn5udRPmQz2N1CjGYhytDyA2CPcyZWG5uejerlTZju7DhGtWXPMKMkxwuh8 K/qKuv4pBtODex37+xceNoNqBbxYZDIGvr/JRHXA= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id E135E56; Sun, 12 May 2024 18:03:33 +0800 X-QQ-mid: xmsmtpt1715508214tvba2ibmo Message-ID: X-QQ-XMAILINFO: NkHKfw09D6j8yFnvrp2MwV/VcnmBXwzyCmMVPhKhPO7iFl+eKAihCLBmhwj4up VgCS/ifU4VohzczY7szjy1/f5rsDe2pmB558mO2BcS8/JkaMdu9Z+5jdclNJcXpdNAnlKlBU3s4Y g5LoITPgQJI+j8GhPDyuk5h6we0tBj0nNYbv/IZiQoWA7Qyw/fnOYIaxiL26pKXCwpPPNX6PqV/T PYvSREDtpnrUvbeqLrGDFNk7YpCHwyvDf5KUPaILiOyTEouIZ/Uz+qGhZJu6zOAG6oZf/BJM1sED kEbHL/+eyXMcv86Nqhkciz/bObhEwBAywEXulvk/CbNQ9vNKTItaT0rKarx/mtSNWgdC0HiKafPa KDWx4fkK453XlhNxBbbkJqwLlkqiZ9vtPGAWVY+24YgzB3h4G7R9Gr15lhY1oviXNiySMGdCJMiH QE+3YSBl14X3/+ZUNEIQtm94uE2KBfHlS3g4pXbTTs+2oIpukjxcYITDG8OP65Fr5sEin24uZ/sE wlV2c37uwobWxyscvDwPd5Nj3mtkLUoGycG4OBMwMTEouQWDhxclY7SmXY3ZZGUSNT3uHpLntYHL rMVcoeEU65JvQVPS8C9eN/7qis4RGNxf/mKKWByPuNCWQyVLHS290F0/4cbNkssY3MiugEvO+1tZ coZtchVgwiw2gkNxKGX0Vq1DiFgq02+UsjWrtUMeAHImR47C5ClL/zzHiRUIsBNTSQQW8pAReeb4 xinC1SkC2c43Hsh/DzaIi3UFH7/jC88z7fJasQb+1Ebmuh3SC2KVI44+buFyMh0uUUIJcvQAJYNp ZdkHtWZeZkmNDI7JSzO8GhA2iLVqfQb64npchZH/epvf8j8ZOeoNhuu9ITekEIiloahwZ9lHwrdt qsS3Wuu5Xj57kiSzr32ksBaLg45tG6Jt392SxVDEVOHKoQILhnwcALb0IhjJ70cMOUAo81rqTjkl wLVhAgSjk= X-QQ-XMRINFO: Nq+8W0+stu50PRdwbJxPCL0= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 12 May 2024 18:03:24 +0800 X-OQ-MSGID: <20240512100331.995415-2-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240512100331.995415-1-uk7b@foxmail.com> References: <20240512100331.995415-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 2/9] lavc/vp9dsp: R-V mc copy X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: o78iCl01VXOY From: sunyuechi C908: vp9_put4_8bpp_c: 0.7 vp9_put4_8bpp_rvi: 0.5 vp9_put8_8bpp_c: 2.5 vp9_put8_8bpp_rvi: 0.5 vp9_put16_8bpp_c: 16.7 vp9_put16_8bpp_rvi: 1.5 vp9_put32_8bpp_c: 37.2 vp9_put32_8bpp_rvi: 5.7 vp9_put64_8bpp_c: 107.5 vp9_put64_8bpp_rvi: 21.7 --- libavcodec/riscv/Makefile | 3 +- libavcodec/riscv/vp9_mc_rvi.S | 105 +++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 3 + libavcodec/riscv/vp9dsp_init.c | 28 +++++++++ 4 files changed, 138 insertions(+), 1 deletion(-) create mode 100644 libavcodec/riscv/vp9_mc_rvi.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index ccd060c666..0cd900104f 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -62,7 +62,8 @@ OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_init.o RV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvi.o RVV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvv.o OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9dsp_init.o -RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o +RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o \ + riscv/vp9_mc_rvi.o RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o diff --git a/libavcodec/riscv/vp9_mc_rvi.S b/libavcodec/riscv/vp9_mc_rvi.S new file mode 100644 index 0000000000..0db14e83c7 --- /dev/null +++ b/libavcodec/riscv/vp9_mc_rvi.S @@ -0,0 +1,105 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/riscv/asm.S" + +#if __riscv_xlen >= 64 +func ff_copy64_rvi +1: + addi a4, a4, -1 + ld t0, (a2) + ld t1, 8(a2) + ld t2, 16(a2) + ld t3, 24(a2) + ld t4, 32(a2) + ld t5, 40(a2) + ld t6, 48(a2) + ld a7, 56(a2) + sd t0, (a0) + sd t1, 8(a0) + sd t2, 16(a0) + sd t3, 24(a0) + sd t4, 32(a0) + sd t5, 40(a0) + sd t6, 48(a0) + sd a7, 56(a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc + +func ff_copy32_rvi +1: + addi a4, a4, -1 + ld t0, (a2) + ld t1, 8(a2) + ld t2, 16(a2) + ld t3, 24(a2) + sd t0, (a0) + sd t1, 8(a0) + sd t2, 16(a0) + sd t3, 24(a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc + +func ff_copy16_rvi +1: + addi a4, a4, -1 + ld t0, (a2) + ld t1, 8(a2) + sd t0, (a0) + sd t1, 8(a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc + +func ff_copy8_rvi +1: + addi a4, a4, -1 + ld t0, (a2) + sd t0, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +#endif + +func ff_copy4_rvi +1: + addi a4, a4, -1 + lw t0, (a2) + sw t0, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index f8bc6563a5..b8ff282f8a 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -167,6 +167,9 @@ void ff_copy##SIZE##_rvi(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); +VP9_COPY_RISCV_RVI_FUNC(64); +VP9_COPY_RISCV_RVI_FUNC(32); +VP9_COPY_RISCV_RVI_FUNC(16); VP9_COPY_RISCV_RVI_FUNC(8); VP9_COPY_RISCV_RVI_FUNC(4); diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index e377d377e3..fa9c3f4d8c 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -24,6 +24,33 @@ #include "libavcodec/vp9dsp.h" #include "vp9dsp.h" +static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) +{ +#if HAVE_RV + int flags = av_get_cpu_flags(); + +# if __riscv_xlen >= 64 + if (bpp == 8 && flags & AV_CPU_FLAG_RVI) { + +#define init_fpel(idx1, sz) \ + dsp->mc[idx1][FILTER_8TAP_SMOOTH ][0][0][0] = ff_copy##sz##_rvi; \ + dsp->mc[idx1][FILTER_8TAP_REGULAR][0][0][0] = ff_copy##sz##_rvi; \ + dsp->mc[idx1][FILTER_8TAP_SHARP ][0][0][0] = ff_copy##sz##_rvi; \ + dsp->mc[idx1][FILTER_BILINEAR ][0][0][0] = ff_copy##sz##_rvi + + init_fpel(0, 64); + init_fpel(1, 32); + init_fpel(2, 16); + init_fpel(3, 8); + init_fpel(4, 4); + +#undef init_fpel + } +# endif + +#endif +} + static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) { #if HAVE_RV @@ -67,4 +94,5 @@ static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) av_cold void ff_vp9dsp_init_riscv(VP9DSPContext *dsp, int bpp, int bitexact) { vp9dsp_intrapred_init_riscv(dsp, bpp); + vp9dsp_mc_init_riscv(dsp, bpp); } From patchwork Sun May 12 10:03:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48805 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1706:b0:1af:cdee:28c5 with SMTP id nv6csp505687pzb; Sun, 12 May 2024 03:04:33 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWJs2bCCha1knVxaP9P5yFn+VQiQ6cEJC8EutVyX2RkOmm+dKZwOR42I+1OfUljeZCJ9uU6fb8cme2pQBOgfc+DT01UP2LvWn+1hQ== X-Google-Smtp-Source: AGHT+IFqD+qJ5IptkxGRyEpQQRjyZfFzGVDW1woJsMdORCt+P1ybqUZjA3zQxAMLDO5vGXNEvduW X-Received: by 2002:a17:907:50e:b0:a59:a0b7:1850 with SMTP id a640c23a62f3a-a5a2d6871dfmr494220366b.5.1715508273238; Sun, 12 May 2024 03:04:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715508273; cv=none; d=google.com; s=arc-20160816; b=pONUK69y482KmIJgjpcbTjjZ/UOWML/qbjIO/xKMEIYX0WouC9T0u1gGTPtGbKCsx+ pSa2a1If+9c4iCmJKj3XstpHk9sRAVxy0SSRAeWe81PYON/VtrQBNr4u7caR8S3i7imn dAHx8Xat4OLMYGWZhckYzRzCsITYdV/hAN6Fvd+nnbFEOgFGQNfeOK63fs/pJsHJtshK FSkuz1JlcEkv2f7u2GTxvpisC26bh9k+ir7VJjlZlhcTYK5R89FkRyNHt7TeURBilYkr ISdtIf3wk8HO3jVreHBIFeMBVkSq6GUdnpVlxxD070HeXPyHBygVCUq1RPljZ0T73TED Jqkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=o+B1ypo4NfWJf5rBN5mjYoaB/DHDVugtoA6NxVRR9a4=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=d3kd8Jz1Xh70j1z6KMk+KeGx9cx7fsARv5QFBNGoBQDnNEpH24ym5guJTvzy+6BN60 5u6uUBxPIRf5fs0RmMmkHWVjdptHkR4zlfZE6kroV4HTHsjZ4M+FBK07MdoIbVVa3J6Q HOU+fV9HpYThvbjl+wrrPX5gJaWh7jNnb3vPNDcxUlxwQQj6XhjfAmoVYLDpCEikil9c CHFslQVZtSzb1Se59vMLBLtG/fc9OWx9LW65cZD6Qku2LAJUZoQwsVJ1zoYLR3iiec5B 4zLnBoAHMmq5cPbesi84VyxSe6E3iZb8R4Iy+uQ3hy1JkkDVGu+aHdLHPV/8gKfxCKR8 yDhQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=i0EMOHbl; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17c04388si378231466b.892.2024.05.12.03.04.32; Sun, 12 May 2024 03:04:33 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=i0EMOHbl; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6178768D611; Sun, 12 May 2024 13:03:58 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-251-82.mail.qq.com (out203-205-251-82.mail.qq.com [203.205.251.82]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E1A5668D58E for ; Sun, 12 May 2024 13:03:46 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715508216; bh=QVq6QO4knhsHm6j0IM1npleh19SoXkEbBwyZMdkGy2g=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=i0EMOHblhi/QjfzQeU6bw22SskHAuBJWpwAeZyRr1rJDXh9wnV+JEoj9o2BldEfez LZANiG89Mp/XW+H0TxODGomRiFcfIprD6uA1tkGO6CzFAFr5yR50JrNIEDi0vp/jpM /RZBI+AsK8hpBmHbeRnuM9Qmqoxm8CNsvdiHUdJQ= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id E135E56; Sun, 12 May 2024 18:03:33 +0800 X-QQ-mid: xmsmtpt1715508215taiwoz9f6 Message-ID: X-QQ-XMAILINFO: NyTsQ4JOu2J28b3BSHM5N8gYiN/0Nf4+n8rXh9Ij2pFkvR/fCil3Rd/a8jFMb+ 4R3SjhiOWfDlj6fScXN1k/EQNDyrD7NquPJdUDW7/6u4ABel1Zfju/1FI0WnZ9vLe6vf0yG6v+xK up1i89dY8SVV8Ex7tfcmqoOx/BFnLMdp/LkX5U0sdTEPX0dWDLXT0IQj73DLv+OrOgn3HPI4yFZW f7UwZ29zdDd3RnGs1FJinPj4KJExRwBA53ibxOkiuSdo7cNlb4QRbtLLBoOmTQIJEQActG2VcA1X M/dYzWcdjC1jqf98pkekSghXFsVN0bHxdtvq1oL+XatUbB2dBbokErW8uXUDKWB6NHiffdMoijLP iEDgF73/6eBemt/XZwkHozpcfRiycmd+bfvxVg7HxQcw7CbcFzskg+u20o1VWnWpUiUqgtU514wN wgIXc5TwcLVfLEwWu8r4vEc2YS9VfDHCyRVoPrr4lra3zwzoFe9OdDZB3XN+R8BL4lr4FTD1ARoz AgUpDR+CWYmMMkIsg+ckNHpOo90658LaPeGMaNlw/c2xioo3+U3nW85lSNBaoZi4MiKCy7z9sZUJ obNhOsEV3+iNGC/YbT4Z65hQSEnOn/M2kP9j8LqQWpJ+6apEXQjcKtBxlq8afpmwpsgzWRrehv9t 8YpibxfZh7VPqbF6CKX9f3Z93WNAuyCOMcZuCc52PQTbeWE1gSdZvB0zi3aUpTOrsZKs7Vk517Xv cdKv08TWG86MkUS90V8CTUpKfnAD/dHTiuRHIqt0s4jjzLNWCFZBqJcuf8uO18+Qw3fFBATiXZhV Ol1CS18/f/tlLgILUkktwtoVZNtKHhS1uoinW4rIHFJo/GAC98KeQ4LkPsGdJndiDX3PbqH0ULEz GmUJytaqujkDrRZUDgayN2s4ODpvn2IzAgOggjnrGtqTRDOi8jizU= X-QQ-XMRINFO: MSVp+SPm3vtS1Vd6Y4Mggwc= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 12 May 2024 18:03:25 +0800 X-OQ-MSGID: <20240512100331.995415-3-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240512100331.995415-1-uk7b@foxmail.com> References: <20240512100331.995415-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 3/9] lavc/vp9dsp: R-V V ipred hor X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Z5ZlZ8KcTc8V From: sunyuechi C908: vp9_hor_8x8_8bpp_c: 74.7 vp9_hor_8x8_8bpp_rvv_i32: 35.7 vp9_hor_16x16_8bpp_c: 175.5 vp9_hor_16x16_8bpp_rvv_i32: 80.2 vp9_hor_32x32_8bpp_c: 510.2 vp9_hor_32x32_8bpp_rvv_i32: 264.0 --- libavcodec/riscv/vp9_intra_rvv.S | 56 ++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 6 ++++ libavcodec/riscv/vp9dsp_init.c | 3 ++ 3 files changed, 65 insertions(+) diff --git a/libavcodec/riscv/vp9_intra_rvv.S b/libavcodec/riscv/vp9_intra_rvv.S index 40e38ba83e..ca156d65cd 100644 --- a/libavcodec/riscv/vp9_intra_rvv.S +++ b/libavcodec/riscv/vp9_intra_rvv.S @@ -117,3 +117,59 @@ func_dc dc_left 8 left 3 0 zve64x func_dc dc_top 32 top 5 1 zve32x func_dc dc_top 16 top 4 1 zve32x func_dc dc_top 8 top 3 0 zve64x + +func ff_h_32x32_rvv, zve32x + li t0, 32 + addi a2, a2, 31 + vsetvli zero, t0, e8, m2, ta, ma + + .rept 2 + .irp n 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 + lbu t1, (a2) + addi a2, a2, -1 + vmv.v.x v\n, t1 + .endr + .irp n 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + .endr + + ret +endfunc + +func ff_h_16x16_rvv, zve32x + addi a2, a2, 15 + vsetivli zero, 16, e8, m1, ta, ma + + .irp n 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 + lbu t1, (a2) + addi a2, a2, -1 + vmv.v.x v\n, t1 + .endr + .irp n 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vse8.v v23, (a0) + + ret +endfunc + +func ff_h_8x8_rvv, zve32x + addi a2, a2, 7 + vsetivli zero, 8, e8, mf2, ta, ma + + .irp n 8, 9, 10, 11, 12, 13, 14, 15 + lbu t1, (a2) + addi a2, a2, -1 + vmv.v.x v\n, t1 + .endr + .irp n 8, 9, 10, 11, 12, 13, 14 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vse8.v v15, (a0) + + ret +endfunc diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index b8ff282f8a..0ad961c7e0 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -66,6 +66,12 @@ void ff_v_16x16_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); void ff_v_8x8_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_h_32x32_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_h_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_h_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); #define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) \ void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index fa9c3f4d8c..513e62721f 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -86,6 +86,9 @@ static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) dsp->intra_pred[TX_16X16][DC_129_PRED] = ff_dc_129_16x16_rvv; dsp->intra_pred[TX_32X32][TOP_DC_PRED] = ff_dc_top_32x32_rvv; dsp->intra_pred[TX_16X16][TOP_DC_PRED] = ff_dc_top_16x16_rvv; + dsp->intra_pred[TX_32X32][HOR_PRED] = ff_h_32x32_rvv; + dsp->intra_pred[TX_16X16][HOR_PRED] = ff_h_16x16_rvv; + dsp->intra_pred[TX_8X8][HOR_PRED] = ff_h_8x8_rvv; } #endif #endif From patchwork Sun May 12 10:03:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48803 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1706:b0:1af:cdee:28c5 with SMTP id nv6csp505541pzb; Sun, 12 May 2024 03:04:13 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU9kcg3Kg/d1n/OhIhqclfSAk9WMnl2+9BMjBzMUGX33DXBik972YO8QzcwGdy1WauzEOWOSE0f1uAMeGVe9RiruVtQuCvGQ2fP7g== X-Google-Smtp-Source: AGHT+IFsBDZmDVEih9A8QHy6V7L57QGRMrIOtKH464zVMjmw195BI0Fm4olkivKjTf6Tu9VXXOHL X-Received: by 2002:a17:906:2842:b0:a59:bb63:5e93 with SMTP id a640c23a62f3a-a5a2d53b111mr670073466b.16.1715508252887; Sun, 12 May 2024 03:04:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715508252; cv=none; d=google.com; s=arc-20160816; b=peHRbhX+e7gz5vsBBb2PakTdfa3uoC5Vs0b7VKE79N+66fQIelY8bVvOfMXO1zbcvQ OAQSjagh2YpHapqtHtBKdNNDzeAkx5kybmJf/ejejA8lkRLUabsFEiXeFOHTHVXOe4Wg 83MN7S1xYQfkFn99tWrn4FfSZ9il/iZAf/4Euecg8Uq4wX8CO6eRiEB8rMkwVOuv76Zf e1Lx3jCsUcSeN4+r5UupkMMpUqD6odDttbtydP9N9RqaddvIZFHrCJig+TB7Ctywg70E RVgBeciip/JOa4YZ/zTLr/6n+QD5s0ju2WsCJxuK1vjOxbIAnMs4EnkbZ2WPo+Q6I9wc qvBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=2YBI0yyplzSvJSo2TC7F0icIIlzSZTZBrJuHU6pmUmk=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=HM+Q/DOj2XC6zhDtwQI4dOGF1cJTcJS4tVDuSno1prLi6hi5BI88mcFhkAb2jxa/eW y7HR+shoQFrexoXqka08F7264G59fuDhjmlsu9aNIf25iEkNsOi5BT2XbL7HE+udunu2 pkOAdzhLJIRXCdaJ0tm/wmblWXfnxyBSEtiT3hBN83b0B1etvcvxZ0Ksr7XNRapp3J3k +83Ck3DrDt94CJbWN0AcrU7i0D6nUWk5DEWO4IIN9LWXRHh6MrZtpAO5zMBCWqN2aMVT luZsw1TFpHytPUSk0vilVt5MPSa8pLlLSiTC271ROSogixQ/Tmk3x/2x3wjd77kqfZyq AtDg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=VGIM2J+y; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17945f04si417114866b.191.2024.05.12.03.04.12; Sun, 12 May 2024 03:04:12 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=VGIM2J+y; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 131BB68D27C; Sun, 12 May 2024 13:03:56 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-221.mail.qq.com (out203-205-221-221.mail.qq.com [203.205.221.221]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DB92668D435 for ; Sun, 12 May 2024 13:03:46 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715508217; bh=dZvcQCcH72subVIidgMZGwZa+787LtDLjyVTPfUh/As=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=VGIM2J+y7JUK/gZEa46CFd0ZP4gfw+cowp5wdLuLMx3Z0TMnolWMTjNSaK/Hts4CI vMHS11Bofv+KCdOx+91vCMLGreWgqiCJ6XEwu9r+J1XoEMlJFzPuiu9Cw4gaBcZFlp NoZAg004CLHQA6r2IzoOgGXV7AkDKgLB2XD44p9M= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id E135E56; Sun, 12 May 2024 18:03:33 +0800 X-QQ-mid: xmsmtpt1715508216to2f38i1p Message-ID: X-QQ-XMAILINFO: OWP2IlFwb745qnYTKjRmmzFDDzQwhvfuQE/wS84t/S161VwCAKBc34UDkpInGs IVK5i+tTU6SGfzpYyu9r5bf4+qiqSLek3AhvzXtRvVDk0odaPXS10mLUqPU3uH9gCgRwx2Yfe3XK 2Y6USK6jbTuPzLuQQ/UzJl2Gc6tDSFWkBU1+s8iBgxCWDcl1cjRaMLU9FwKBfbb76UC/63n6Wlb2 pr2MHYaMpzw2ucx/c3Zcu9UgvJ04hj01ribHbhi7tKS1wo0LnYwHAziHyZmx9hyM835gGLA1LbwE WwxikhaQx8Px4c5TuMFdzeDTup4xVxIHDEZ+v8BvZSE6/UveI+ybAcC75BS+4vg4//pOVBPLq9yo 744RmoVF4CpFCcJi4Kv5Q78Ipl6WH9FFpMgkNaYaKRFsIker5Ll9k9cacpgINN1Pnz6MlHj2GOXI Yg2aDTLeoSkeNYXFNe42F5WZHHzMvOUevDd2zynvF0E5lc2P+ugxDQGCB/cndlmRszWiutGpUS8q zrRC67GRwzQ0z88reNTW5dEddcCd0m9A9uQNB5exP+rIzEonAM5Rk1eCCqJ+Wid8SNJDrC5NQHYj rpkOlgDFeAiomj9Nf8dPjL56tiAjm5YmqT945mN+39EC4l6Gn/PTvZQbA0erNdPM0qrsmwSyGCGB LM8rNzA0KIfkK/j1LdJ/02ab6pdlz6UQ1gRsdNjNNyRcvF0I0VfeBZJakAA8UEu1G5eSS3wFxnki 7XvvyO9/kRb5KFp1JbS4JyWvsLFXdbC7v42sgIp3EzN3Nmxi/egdxRYQkcasPPEu7fhDMny5kG3B 99gZLcXm42jhpWIMoGTsynm8H5FUaQj/4Kz9z+UrEkiRwam9rsw7GGvsNiy3Sh4cLZlT0Gx8hCca HTEmJ9kZqADZzEiKdbp6vfCOwkYNc7oOYcHpsJ/nWfDicYGQZr3lHvGUY8wTD43k1oFcPWXdGC X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 12 May 2024 18:03:26 +0800 X-OQ-MSGID: <20240512100331.995415-4-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240512100331.995415-1-uk7b@foxmail.com> References: <20240512100331.995415-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 4/9] lavc/vp9dsp: R-V V ipred tm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: a4KiAGbbEkLe From: sunyuechi C908: vp9_tm_4x4_8bpp_c: 116.5 vp9_tm_4x4_8bpp_rvv_i32: 43.5 vp9_tm_8x8_8bpp_c: 416.2 vp9_tm_8x8_8bpp_rvv_i32: 86.0 vp9_tm_16x16_8bpp_c: 1665.5 vp9_tm_16x16_8bpp_rvv_i32: 187.2 vp9_tm_32x32_8bpp_c: 6974.2 vp9_tm_32x32_8bpp_rvv_i32: 625.7 --- libavcodec/riscv/vp9_intra_rvv.S | 141 +++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 8 ++ libavcodec/riscv/vp9dsp_init.c | 4 + 3 files changed, 153 insertions(+) diff --git a/libavcodec/riscv/vp9_intra_rvv.S b/libavcodec/riscv/vp9_intra_rvv.S index ca156d65cd..7e1046bc13 100644 --- a/libavcodec/riscv/vp9_intra_rvv.S +++ b/libavcodec/riscv/vp9_intra_rvv.S @@ -173,3 +173,144 @@ func ff_h_8x8_rvv, zve32x ret endfunc + +.macro tm_sum dst, top, offset + lbu t3, \offset(a2) + sub t3, t3, a4 + vadd.vx \dst, \top, t3 +.endm + +func ff_tm_32x32_rvv, zve32x + lbu a4, -1(a3) + li t5, 32 + + .macro tm_sum32 n1,n2,n3,n4,n5,n6,n7,n8 + vsetvli zero, t5, e16, m4, ta, ma + vle8.v v8, (a3) + vzext.vf2 v28, v8 + + tm_sum v0, v28, \n1 + tm_sum v4, v28, \n2 + tm_sum v8, v28, \n3 + tm_sum v12, v28, \n4 + tm_sum v16, v28, \n5 + tm_sum v20, v28, \n6 + tm_sum v24, v28, \n7 + tm_sum v28, v28, \n8 + + .irp n 0, 4, 8, 12, 16, 20, 24, 28 + vmax.vx v\n, v\n, zero + .endr + + vsetvli zero, zero, e8, m2, ta, ma + .irp n 0, 4, 8, 12, 16, 20, 24, 28 + vnclipu.wi v\n, v\n, 0 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + .endm + + tm_sum32 31, 30, 29, 28, 27, 26, 25, 24 + tm_sum32 23, 22, 21, 20, 19, 18, 17, 16 + tm_sum32 15, 14, 13, 12, 11, 10, 9, 8 + tm_sum32 7, 6, 5, 4, 3, 2, 1, 0 + + ret +endfunc + +func ff_tm_16x16_rvv, zve32x + vsetivli zero, 16, e16, m2, ta, ma + vle8.v v8, (a3) + vzext.vf2 v30, v8 + lbu a4, -1(a3) + + tm_sum v0, v30, 15 + tm_sum v2, v30, 14 + tm_sum v4, v30, 13 + tm_sum v6, v30, 12 + tm_sum v8, v30, 11 + tm_sum v10, v30, 10 + tm_sum v12, v30, 9 + tm_sum v14, v30, 8 + tm_sum v16, v30, 7 + tm_sum v18, v30, 6 + tm_sum v20, v30, 5 + tm_sum v22, v30, 4 + tm_sum v24, v30, 3 + tm_sum v26, v30, 2 + tm_sum v28, v30, 1 + tm_sum v30, v30, 0 + + .irp n 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 + vmax.vx v\n, v\n, zero + .endr + + vsetvli zero, zero, e8, m1, ta, ma + .irp n 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28 + vnclipu.wi v\n, v\n, 0 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vnclipu.wi v30, v30, 0 + vse8.v v30, (a0) + + ret +endfunc + +func ff_tm_8x8_rvv, zve32x + vsetivli zero, 8, e16, m1, ta, ma + vle8.v v8, (a3) + vzext.vf2 v28, v8 + lbu a4, -1(a3) + + tm_sum v16, v28, 7 + tm_sum v17, v28, 6 + tm_sum v18, v28, 5 + tm_sum v19, v28, 4 + tm_sum v20, v28, 3 + tm_sum v21, v28, 2 + tm_sum v22, v28, 1 + tm_sum v23, v28, 0 + + .irp n 16, 17, 18, 19, 20, 21, 22, 23 + vmax.vx v\n, v\n, zero + .endr + + vsetvli zero, zero, e8, mf2, ta, ma + .irp n 16, 17, 18, 19, 20, 21, 22 + vnclipu.wi v\n, v\n, 0 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vnclipu.wi v24, v23, 0 + vse8.v v24, (a0) + + ret +endfunc + +func ff_tm_4x4_rvv, zve32x + vsetivli zero, 4, e16, mf2, ta, ma + vle8.v v8, (a3) + vzext.vf2 v28, v8 + lbu a4, -1(a3) + + tm_sum v16, v28, 3 + tm_sum v17, v28, 2 + tm_sum v18, v28, 1 + tm_sum v19, v28, 0 + + .irp n 16, 17, 18, 19 + vmax.vx v\n, v\n, zero + .endr + + vsetvli zero, zero, e8, mf4, ta, ma + .irp n 16, 17, 18 + vnclipu.wi v\n, v\n, 0 + vse8.v v\n, (a0) + add a0, a0, a1 + .endr + vnclipu.wi v24, v19, 0 + vse8.v v24, (a0) + + ret +endfunc diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index 0ad961c7e0..79330b4968 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -72,6 +72,14 @@ void ff_h_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); void ff_h_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_tm_32x32_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_tm_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_tm_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_tm_4x4_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); #define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) \ void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 513e62721f..1f9a3bcd24 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -89,6 +89,10 @@ static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) dsp->intra_pred[TX_32X32][HOR_PRED] = ff_h_32x32_rvv; dsp->intra_pred[TX_16X16][HOR_PRED] = ff_h_16x16_rvv; dsp->intra_pred[TX_8X8][HOR_PRED] = ff_h_8x8_rvv; + dsp->intra_pred[TX_32X32][TM_VP8_PRED] = ff_tm_32x32_rvv; + dsp->intra_pred[TX_16X16][TM_VP8_PRED] = ff_tm_16x16_rvv; + dsp->intra_pred[TX_8X8][TM_VP8_PRED] = ff_tm_8x8_rvv; + dsp->intra_pred[TX_4X4][TM_VP8_PRED] = ff_tm_4x4_rvv; } #endif #endif From patchwork Sun May 12 10:03:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48802 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1706:b0:1af:cdee:28c5 with SMTP id nv6csp505463pzb; Sun, 12 May 2024 03:04:00 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVt1S1NuQ0NLMrTNhmlHXq1mCq22lNcQF+c1SQ0qVUhAlZG1Lbtw8Vj8IH+j9TWHTLDlHNFdk/eH2Tm19SG9zJNS+LmqUurAHfnoQ== X-Google-Smtp-Source: AGHT+IF5HKK9MplWnU9zmzH7RKE++EVMfN4czZfNtOF9UBK03a2hcN1EPko49tZvf2QKqsMIYjPQ X-Received: by 2002:a19:2d01:0:b0:51d:aaf7:a92e with SMTP id 2adb3069b0e04-5221057931fmr5069473e87.47.1715508239882; Sun, 12 May 2024 03:03:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715508239; cv=none; d=google.com; s=arc-20160816; b=cvHFOFV+9I4HEin33K7lMeejYR6+Y0CNAjjwcG3EKlLh1vNHf/k2DBWkTpBrqw4zfr GB8KTT1m7IXivVAmjS7aKPcnHQHhEIVGzGrUfS4w49K4CQ5F66RlVkXFqGGEG8akBY84 2Cu9CA3Sd4rme7eIpzyoHipklspDrcwaU9I3FDerl3AEC96wLGz3LvcouRIG3jJ94Whl 0dloMfgyATy020MpcqjJfHlJBOs56+q686pdSCyUVw+MLSjy+nnpuXCaMx1cVyxh1CjI iamt1hPrPp5QYEUkJAfRTR2Qy3HnGQp/lBGKF1Z493WAG32MhmcV4vjGjwpHhkmmnVXd 0/7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=dVJ4mHXN5hmtTjXwSQwQWGHE0P/rn/8Ee3DZIhAQe1U=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=cIdrzQjkB7j9OxvsIfyNsEBez5JVqYVLQFRFaVTjJ9GB7Z72oF1sU3Yyw7T33epMwS R3i2vBQ4E54g4wqhtZyRlcLeBEt6WG/GPqTGCrcPvsU6iGKLTxotyrUY3LJbyWuitKnA aOGeVjQwDAklAeKsWBowQHcoYRK3xz/HMsepcdfC9fwKmqL3NYLIhzoYzjEN4Gv3CK97 OG1usvlCMRTziNCFG8uxLznzLZEp70G8+9C4jNRqplA2VKW/Kj27Ypw8Q+zx0kA494XJ StqfuhsVqyHQBNO9sswwOQySJJ5f3jENmOaubwjRLvZHG5kN/97UjdB1aqN1lLipzj2C IWUA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=i8DSBj5r; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a179461a2si395461766b.106.2024.05.12.03.03.59; Sun, 12 May 2024 03:03:59 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=i8DSBj5r; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DFFD568D614; Sun, 12 May 2024 13:03:54 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-251-53.mail.qq.com (out203-205-251-53.mail.qq.com [203.205.251.53]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D59FA68D28C for ; Sun, 12 May 2024 13:03:46 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715508217; bh=IzPOcBOWNtUKphNWCBJQnWWXajctDFyMkbzGuNG5ubk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=i8DSBj5rHt/ObLe37fqh/mWclRUPwb4pcNEM+Ohflg+ETXb/PzCYT78uq5XFzPE8b 9h6wpj1zcmnhuEl0B0RraVKLwI0Ywm68ks5tBAn0jrbd5E1FNySrhkG8E0qCUAFsov gq5dsqpk9H4UmlT/DYVnox7Mh2DMadX+sysNiUvs= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id E135E56; Sun, 12 May 2024 18:03:33 +0800 X-QQ-mid: xmsmtpt1715508217t7s78nv3j Message-ID: X-QQ-XMAILINFO: N5sfBKY/oC4kxGsf5WAq3/5DKnhzLxtVmyNjEk9M3+XscvWMyTcwadRTwKsw1g G6D1ej2rfxgCSxG4CZ9eI1CPWQdH61KlriX+u7rHUz9IP5nsaWLHulHY8mwAnOGCNvoZxnGityWe eNxcEztjYrxTrDDw4YU6ryDRntaiDLncWfmHvhfM3dFH+u49J7tPT7ErIzZjoxz7rf+IMLWESh36 vLvB6kJRN86owZxZ4LLVm/dnxI7U604DqhfNDB2zpyt4+4fBRabQHkDnRXnpgxHMrE1nd4rqO+0t qFCIdzW/ISPsx/sy0El4flLGtU7UqQHHolr2Dl6aeDbr+ftHlg0hbq4s6Z2Ul3rLlUyowFPezlrw SmFAg8nnBhQfaQeW2x1yFtfmbUh6NVamUuyl7fajjTk4+K8NPD46HFkyhFjKZtsemIJQ4TjCjAWL G2P6HR7dEeFegDSTsi40BYudbdhAUogSX1/U9IT9Blqh8wDOE4iYm4rIC34wJy454AJ0eAFsWe73 lX0k7I7w6aoSKsMPtE7o8x+dY+y0CY8DumdVsbP0/8ipFEh++wXVkL8BjNFtYAzSuuuGQsbCWRJ1 CAfj46MqGlggq6l81yR6TkdG09syiTg7MMxmHrcoCgOJ8kTNAJP5hAsaG3ekrcZIOSjnvEXWjDO4 7qlvcmCt7A4YjSLgrSMeMYQZYXjvhieO2/XzpwscDnkmW51ML0oGZ9mHA8rh2jQ5+ZEK6wpg2CSL O+fM6N6Jwql3QLE8aBn6U8cgfBvxI42crpT7lARVNjaRbyewY03ahVQGv4tS5r+v2A6W+w1IYjGm AlzVau4L3tJYvGjuHJESfQE3s44JQCJxCGR51aQpLvKv45Za4uSowqKd1mEknRu0l1Nzg0Jkv+aR tuziE3lmHlctHcWIFkjYUUF/baXgO6wCL9VlcFERkrOfBvVgq3D0O3WtNSIx91O1907SgYOYSn X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 12 May 2024 18:03:27 +0800 X-OQ-MSGID: <20240512100331.995415-5-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240512100331.995415-1-uk7b@foxmail.com> References: <20240512100331.995415-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 5/9] lavc/vp9dsp: R-V V mc avg X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: k4xTATVmiWO3 From: sunyuechi C908: vp9_avg4_8bpp_c: 1.2 vp9_avg4_8bpp_rvv_i64: 1.0 vp9_avg8_8bpp_c: 3.7 vp9_avg8_8bpp_rvv_i64: 1.5 vp9_avg16_8bpp_c: 14.7 vp9_avg16_8bpp_rvv_i64: 3.5 vp9_avg32_8bpp_c: 57.7 vp9_avg32_8bpp_rvv_i64: 10.0 vp9_avg64_8bpp_c: 229.0 vp9_avg64_8bpp_rvv_i64: 31.7 --- libavcodec/riscv/Makefile | 3 +- libavcodec/riscv/vp9_mc_rvv.S | 58 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 18 +++++++++++ 3 files changed, 78 insertions(+), 1 deletion(-) create mode 100644 libavcodec/riscv/vp9_mc_rvv.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 0cd900104f..1183357b37 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -64,6 +64,7 @@ RVV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvv.o OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9dsp_init.o RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o \ riscv/vp9_mc_rvi.o -RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o +RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o \ + riscv/vp9_mc_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S new file mode 100644 index 0000000000..5d917e7b98 --- /dev/null +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -0,0 +1,58 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/riscv/asm.S" + +.macro vsetvlstatic8 len an maxlen mn=m4 +.if \len == 4 + vsetivli zero, \len, e8, mf4, ta, ma +.elseif \len == 8 + vsetivli zero, \len, e8, mf2, ta, ma +.elseif \len == 16 + vsetivli zero, \len, e8, m1, ta, ma +.elseif \len == 32 + li \an, \len + vsetvli zero, \an, e8, m2, ta, ma +.elseif \len == 64 + li \an, \maxlen + vsetvli zero, \an, e8, \mn, ta, ma +.endif +.endm + +.macro copy_avg len +func ff_avg\len\()_rvv, zve32x + csrwi vxrm, 0 + vsetvlstatic8 \len t0 64 +1: + addi a4, a4, -1 + vle8.v v8, (a2) + vle8.v v16, (a0) + vaaddu.vv v8, v8, v16 + vse8.v v8, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + ret +endfunc +.endm + +.irp len 64, 32, 16, 8, 4 + copy_avg \len +.endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 1f9a3bcd24..832b015166 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -48,6 +48,24 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) } # endif +#if HAVE_RVV + if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I32) && ff_get_rv_vlenb() >= 16) { + +#define init_fpel(idx1, sz) \ + dsp->mc[idx1][FILTER_8TAP_SMOOTH ][1][0][0] = ff_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_8TAP_REGULAR][1][0][0] = ff_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_8TAP_SHARP ][1][0][0] = ff_avg##sz##_rvv; \ + dsp->mc[idx1][FILTER_BILINEAR ][1][0][0] = ff_avg##sz##_rvv + + init_fpel(0, 64); + init_fpel(1, 32); + init_fpel(2, 16); + init_fpel(3, 8); + init_fpel(4, 4); + +#undef init_fpel + } +#endif #endif } From patchwork Sun May 12 10:03:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48806 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1706:b0:1af:cdee:28c5 with SMTP id nv6csp505756pzb; Sun, 12 May 2024 03:04:43 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWkoFafIB93BTl374mVubbP84MBmPE+Cy6ge/mDQ3+Ox0iqtXzXKRnr8S+UQz3qjjLMliNzF9SXcsdqbCGulwc9m4Ykm+kHXn/i4Q== X-Google-Smtp-Source: AGHT+IG69cGnb5+LUy89JTwS0G097TwX+BHLH27cdVZISnP0Dd+B/0Ask7z9YdHJ0wBrkknBqRDN X-Received: by 2002:a50:f697:0:b0:573:1e32:f6d6 with SMTP id 4fb4d7f45d1cf-5734d707240mr5937799a12.17.1715508282710; Sun, 12 May 2024 03:04:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715508282; cv=none; d=google.com; s=arc-20160816; b=P0BeRXoER/sMQO+Yly7fe6KQDqu8xxu7gOjjWKqgc0RQ3GMSBZw53Yw2e9rUixNIaG cbEtSYd7TBLhL74SWaoAIpeYUJqO/zunJbzqePpA/aAOry7ghs2ncAM03Tamsb/Qv+rR +k56Kmx+ZK/TbPSugUrcHeZfX/ETPoghqlYvuf9zSJeIIPh5q1Z3AnI46iuwizSDZgrw 5E9ykP8kw5G3l/zlxLkSYp7zfwHco9Lu67qxKHHCM9nsITM+Z/7q4eq8NlecOyUzXXjC usN3Fiy4gdWu8EdGqVGRjkMnvpEq3gvBzRLxuB4QJcMbyQVBJSM/mz94hc4o+qATZWB0 zhCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=kjQl+Poiq5A1OuPStJxynv1sXipe/cEe5q21tNHBsQI=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=CIE5kKkx6/HQdEERsVq/XQbEuof4RQEHz+r/Sx5qTOvI2lYN+BRmVG5mD+VQqGPFyz Lab1irzuISqi7mmtkn5zgKgKZ5sCrQYDQTbWuHgJMrsc6AUpoMQko/Hk+O48qhC3OU0J lYWMo3Li2YLSWu3Ix4sHeAj70px6BIJA7+8ANjRnVnpgqA2WGDjWn/LCLw05rpgAosev x7ECWiJrMunQgmg/67bJA5XpbSk3bAlRJJD7hhAaMTr3boxmNFMe5KhGqutXOtdypb1l NEw1lJPoBwKxEoy9GqvHlgs5VStKVPs3me2m3JvfmLZIi37ZO5otgN6JC2N3J/lNy0ys KfRQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=djV9PjSy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5733beac78csi3859235a12.106.2024.05.12.03.04.42; Sun, 12 May 2024 03:04:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=djV9PjSy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B7ADE68D61C; Sun, 12 May 2024 13:03:59 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-251-84.mail.qq.com (unknown [203.205.251.84]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DBDCB68D5D9 for ; Sun, 12 May 2024 13:03:47 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715508218; bh=DL55mdLzyvGwfMXnChpw81rzR7/qRrOSqCjIZKrAkr0=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=djV9PjSyBHuABZZ3tZyIvDtkY6bJH36WEOhwvyQljZEIzj/Jr8aT5Gfs+ItF3eVzr tjfBaoVl2iKeG9BiAvrWdxkxcCTgpYit4G5KNY22qulEzkhezs+gsUjKlfQAPRd0+d Fm2OkxdARrLY7vS4HTOMpXczzG9e847xkGVNbCsE= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id E135E56; Sun, 12 May 2024 18:03:33 +0800 X-QQ-mid: xmsmtpt1715508217tkc9kc8ke Message-ID: X-QQ-XMAILINFO: N5sfBKY/oC4kxGsf5WAq3/5Ijqv6gzWajEbJ0g+fG1VYHGAcdsayKpjz2C94Ge LxL4AxhAKNIB6JuL880sVuGAQmRbmkZO3hycDXDNwu46LBJ0D1zBiLRpQZ42TcgmcxtYgQQk7hxW /XhYz1YBn5/ykz2dEZSelI6hPBZrIdrFqul3QjhqYLOz2pIh+jyOrg80ke87EWjPHjDCBE/SSxcX hh1RxwvkCwcUo/gKgZo6amyIxT5mRfWIIhNasmwEn+XmxxKtOspUhpQtcmroVf9TIL0SEuCxtSJ0 STThxT+LbVHUgg9zVBxdmLb6+8IuVLrK8P2wvfRGtBo6RuZsIWyrNv9BTH83hicyp6C1vHXUc407 h0yviLEp7hKYITUWC0pd7ACJmCvl/CmJc6y0Ajf9iL+yDoqlNPDCZUIH+aGFrbo336VMVoitD/X1 3zh6XhnrYh0JB1lyH7rV4RZx72yXWIALZOzRInUVwaCI8IeTb03tUz1YkkiNAZNbEMsqISPGV8S9 UjKb9OJJJinVB348SRSl3x7BfhLVMQRuN7CSmYgwOuXllcgWGImyFBBh7TWjg3WsE8OHX+8yPxOo naU20zT+0lfuQ5mXnTNqqO5SsXVE+q76fHx0+hsOlr53dnEYFnlS3TdAqilehOe+JvmERPzxEMGG jrzH7LVxJtRIvYzuEVf95bjQsm953WhW9oS2Fn/sNekw6SE3igOaGCt2RXnIMLRgJ8hhSvyE7yvR 3J/MS1hQFb1N3+Y/UH3akZkViAYw1XS10thfJif3rNB3aDAD2RiO9oFoDPU3YoZxYKsuhGVmVrGq E86k0UgYLU8d7cGDufsEdCOio+pGPY42M4zyMdlrByIq7Ky4xZesgawVuzxklffB5A+W6sAfuN+F 450edyKSSzyO4wOu0ZftczS7puFOQ4s15q6HFzVaywI/3KBUD8aSVEi+HIpc1zn5PE4eh2VTZmks UvnBoIATk= X-QQ-XMRINFO: M/715EihBoGSf6IYSX1iLFg= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 12 May 2024 18:03:28 +0800 X-OQ-MSGID: <20240512100331.995415-6-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240512100331.995415-1-uk7b@foxmail.com> References: <20240512100331.995415-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 6/9] lavc/vp9dsp: R-V V mc bilin h v X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 058hcJX3XOln From: sunyuechi C908: vp9_avg_bilin_4h_8bpp_c: 5.2 vp9_avg_bilin_4h_8bpp_rvv_i64: 2.2 vp9_avg_bilin_4v_8bpp_c: 5.5 vp9_avg_bilin_4v_8bpp_rvv_i64: 2.2 vp9_avg_bilin_8h_8bpp_c: 20.0 vp9_avg_bilin_8h_8bpp_rvv_i64: 4.5 vp9_avg_bilin_8v_8bpp_c: 21.0 vp9_avg_bilin_8v_8bpp_rvv_i64: 4.2 vp9_avg_bilin_16h_8bpp_c: 78.2 vp9_avg_bilin_16h_8bpp_rvv_i64: 9.0 vp9_avg_bilin_16v_8bpp_c: 82.0 vp9_avg_bilin_16v_8bpp_rvv_i64: 9.0 vp9_avg_bilin_32h_8bpp_c: 325.5 vp9_avg_bilin_32h_8bpp_rvv_i64: 26.2 vp9_avg_bilin_32v_8bpp_c: 326.2 vp9_avg_bilin_32v_8bpp_rvv_i64: 26.2 vp9_avg_bilin_64h_8bpp_c: 1265.7 vp9_avg_bilin_64h_8bpp_rvv_i64: 91.5 vp9_avg_bilin_64v_8bpp_c: 1317.0 vp9_avg_bilin_64v_8bpp_rvv_i64: 91.2 vp9_put_bilin_4h_8bpp_c: 4.5 vp9_put_bilin_4h_8bpp_rvv_i64: 1.7 vp9_put_bilin_4v_8bpp_c: 4.7 vp9_put_bilin_4v_8bpp_rvv_i64: 1.7 vp9_put_bilin_8h_8bpp_c: 17.0 vp9_put_bilin_8h_8bpp_rvv_i64: 3.5 vp9_put_bilin_8v_8bpp_c: 18.0 vp9_put_bilin_8v_8bpp_rvv_i64: 3.5 vp9_put_bilin_16h_8bpp_c: 65.2 vp9_put_bilin_16h_8bpp_rvv_i64: 7.5 vp9_put_bilin_16v_8bpp_c: 85.7 vp9_put_bilin_16v_8bpp_rvv_i64: 7.5 vp9_put_bilin_32h_8bpp_c: 257.5 vp9_put_bilin_32h_8bpp_rvv_i64: 23.5 vp9_put_bilin_32v_8bpp_c: 274.5 vp9_put_bilin_32v_8bpp_rvv_i64: 23.5 vp9_put_bilin_64h_8bpp_c: 1040.5 vp9_put_bilin_64h_8bpp_rvv_i64: 82.5 vp9_put_bilin_64v_8bpp_c: 1108.7 vp9_put_bilin_64v_8bpp_rvv_i64: 82.2 --- libavcodec/riscv/vp9_mc_rvv.S | 43 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 21 +++++++++++++++++ 2 files changed, 64 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index 5d917e7b98..986cc3760d 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -53,6 +53,49 @@ func ff_avg\len\()_rvv, zve32x endfunc .endm +.macro bilin_load dst len op type mn +.ifc \type,v + add t5, a2, a3 +.elseif \type == h + addi t5, a2, 1 +.endif + vle8.v v8, (a2) + vle8.v v0, (t5) + vwmulu.vx v16, v0, \mn + vwmaccsu.vx v16, t1, v8 + vwadd.wx v16, v16, t4 + vnsra.wi v16, v16, 4 + vadd.vv \dst, v16, v8 +.ifc \op,avg + vle8.v v16, (a0) + vaaddu.vv \dst, \dst, v16 +.endif +.endm + +.macro bilin_h_v len op type mn +func ff_\op\()_bilin_\len\()\type\()_rvv, zve32x +.ifc \op,avg + csrwi vxrm, 0 +.endif + vsetvlstatic8 \len t0 64 + li t4, 8 + neg t1, \mn +1: + addi a4, a4, -1 + bilin_load v0, \len, \op, \type, \mn + vse8.v v0, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + .irp len 64, 32, 16, 8, 4 copy_avg \len + .irp op put avg + bilin_h_v \len \op h a5 + bilin_h_v \len \op v a6 + .endr .endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 832b015166..31120a7893 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -63,6 +63,27 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) init_fpel(3, 8); init_fpel(4, 4); + dsp->mc[0][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_64v_rvv; + dsp->mc[0][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_64h_rvv; + dsp->mc[0][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_64v_rvv; + dsp->mc[0][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_64h_rvv; + dsp->mc[1][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_32v_rvv; + dsp->mc[1][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_32h_rvv; + dsp->mc[1][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_32v_rvv; + dsp->mc[1][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_32h_rvv; + dsp->mc[2][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_16v_rvv; + dsp->mc[2][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_16h_rvv; + dsp->mc[2][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_16v_rvv; + dsp->mc[2][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_16h_rvv; + dsp->mc[3][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_8v_rvv; + dsp->mc[3][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_8h_rvv; + dsp->mc[3][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_8v_rvv; + dsp->mc[3][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_8h_rvv; + dsp->mc[4][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_4v_rvv; + dsp->mc[4][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_4h_rvv; + dsp->mc[4][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_4v_rvv; + dsp->mc[4][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_4h_rvv; + #undef init_fpel } #endif From patchwork Sun May 12 10:03:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48810 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1706:b0:1af:cdee:28c5 with SMTP id nv6csp506060pzb; Sun, 12 May 2024 03:05:26 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXAqELzkgZUr0a3KpfVxoUTcXBximKqM0+mte+jF+DsW9nniAkZda95k9owWOIrbWYv6xXZWvqbgk2SKcRYSm+uA7SfOkvGL6W79Q== X-Google-Smtp-Source: AGHT+IELm88/Zpbr5SfCpFX3yYGwEWAj4PZ3Yk3XzKb2MK5MMCC4GPVZAloq4wFMNlSpfnqJTQgq X-Received: by 2002:a2e:7015:0:b0:2df:b63:a8 with SMTP id 38308e7fff4ca-2e5205f052cmr41426331fa.50.1715508326192; Sun, 12 May 2024 03:05:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715508326; cv=none; d=google.com; s=arc-20160816; b=zYcqxefx8EYMY5z8L0NHC3t9cZmCbJiZLrAUhWi/M4DHw1Sij5O6enOK4qPKFifDh4 eR+ApPQcZZmjjIOK1+HNJOjfi4S3Awlc6qiIgGy4BxiNO1PgArsWf32AG+AIKw5FOrCH Oq8xtItJvvdWE0MjRO5kzhAhubyxBZvBHMava2UjzyqHSWyd6apAPs+ZQKvIBgb7kS5f wiXoiSh0Ve9OPmciUdp8NyZMaOZbTrjcc6Vq01w9rQmyhyO7cntkX1nzmuImwxdX/WvM qvGrrM+s5UmPtnPsohiC+n6yO8P5elrvNRs6f4Q+fH3Bx0jlPSAwC9oMPoMi9fwg4V8r Dwyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=wgO2g7vGY6relq+DRbdnADOaPgOAI2riHe4m2rXVpx4=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=f6YrhK8R2ReBV0FgFAFB52l1yQJuFOQbCDD/LF4f5vJu5S9Bk4rflR4GCgqp0P/HaP cpOotMS/7pKkCbfNA1WLndvqpTUGzKHD8Hxd7k00KFXDN9u6YjFyXlbwF0gdG0vPoNcz zIpbgQ6J6esAxS+Dc8iY0Hsj+rrgL68jfBoIWQSLdpsp87NcsQKXXgraAD4wNZrUJ8wK JLRGbNOr5V2uFTooI/lksmaVDj0pYNa+OXFZ6oQbKoICxTHAGw0bWcXqLDAjz0kQ4Phg cf5/RsPkGQBo3LPfW1XeEEdMAxbeU3hoenHfXVfNCjNL3Y0WgX/+xeWogT3fYVlxrI1f 1Rmg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=gaD95aSh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2e4d15192f0si22555631fa.321.2024.05.12.03.05.25; Sun, 12 May 2024 03:05:26 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=gaD95aSh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EB3B968D688; Sun, 12 May 2024 13:04:04 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-240.mail.qq.com (out203-205-221-240.mail.qq.com [203.205.221.240]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id AADF168D613 for ; Sun, 12 May 2024 13:03:49 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715508219; bh=e16wbGjgfpEyue+SpxLtAfRT0y57wSRxbH4rKV4jZNk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=gaD95aShRVAua4xbSsTW4D7tfrxxrnSvOTku2WFlsUz+qroXfK1mSCJrqOYjLVaxP BTx4cdb/FhOcehCWWHwSFqR5Y+Bpq7QuZVEwTf9adP8+Rsh5M85daNGN9gX+0O3m9D yKTWOeT1mmJThLBIGjALC+L6qZ08v7KSsyb5k8mg= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id E135E56; Sun, 12 May 2024 18:03:33 +0800 X-QQ-mid: xmsmtpt1715508218tpryhur18 Message-ID: X-QQ-XMAILINFO: MPIPMtP+A9k/XZgNX1DSQFmmgittlHSBMaB9gUTEoJX4qlg8kGEENz/Ct9U2Q/ 7V0OpX57z7ws4XipH5HuJsDCR8NUFRAzdwK5rGie19W/XZ8FVTOXN8fLuvWgYoAe70rK4y8Jx7Fe WE+7WQCLzYwP+DsKD+nYOLklOt2PWLdh03MpgmE0GrvMa+NWnwnKkm4zkPNvB4pRZQLMYMEiFuFz NlEDpKxRk3fnVSpWuHnw8K+fYT8potMY3X6d7k8WEcQDEzaxanD0m8Xu2t04YjLbI8iHRdmqDhSN bLber45L3wg6aAYaVOzPnFroR5UNpmS0KdFHuiABYUweMyZbPMXBR0u7hZTaTp+2SB084JUnsEce q1/CtO4fHDGchPvj4rWc+K2dKGJsUjpL03/T54kCDkPy8vgaEppQzs+bJI1iagd8VkX20718VSNF ZeiVY/JytvQf5AgTRrB+CE4mV/g84QBI7e/iFOVti1m6TozcigULvEWM9UQ2QbDUauVhrKngiZaH vPlfju+p+EqhHgpFlFhdyxNN4QI2nv9MvbugwQ30Q6HJ32gIQRSgn1RfKGpSau+76lmJQB+/GkE+ y13zCn4KIOuNk2vp3rH02pls5vZRNIaGqBZLjoxJy90+RzBE1mB/STFiPyaj3pSLxD9P3rUe9Gop r+DUlPF2FiSyKtdqD6ictMn4ss0f8HjSbnIOqaPx1J9KHthZOUQW9RJXtqx+HN+ZQw1agnBz9E9N rGwG6xbd0L8uFZC04W1D3tTpjqVkf2eh44nfPhHyHUo6G1EyHEekETYcU0319XlncpH0FrgNq9by WjNs0V+FKSuPgbvyGZHhb1y7NS3lbnXT7fFjy91b5ztL3tio0nghqbyPLtLIy3chwPLQLAay446W fk+Hqbqj92Xe/ZxhFjqlpXadNALqjiYSac64LMf7XF3Ttdx69gWbI= X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 12 May 2024 18:03:29 +0800 X-OQ-MSGID: <20240512100331.995415-7-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240512100331.995415-1-uk7b@foxmail.com> References: <20240512100331.995415-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 7/9] lavc/vp9dsp: R-V V mc tap h v X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: YtvuJ0xsEwyX From: sunyuechi C908 X60 vp9_avg_8tap_smooth_4h_8bpp_c : 13.0 11.2 vp9_avg_8tap_smooth_4h_8bpp_rvv_i32 : 5.0 4.2 vp9_avg_8tap_smooth_4v_8bpp_c : 13.7 12.5 vp9_avg_8tap_smooth_4v_8bpp_rvv_i32 : 5.0 4.2 vp9_avg_8tap_smooth_8h_8bpp_c : 49.5 42.2 vp9_avg_8tap_smooth_8h_8bpp_rvv_i32 : 9.2 8.5 vp9_avg_8tap_smooth_8v_8bpp_c : 66.5 45.0 vp9_avg_8tap_smooth_8v_8bpp_rvv_i32 : 9.5 8.5 vp9_avg_8tap_smooth_16h_8bpp_c : 192.7 166.5 vp9_avg_8tap_smooth_16h_8bpp_rvv_i32 : 21.2 18.7 vp9_avg_8tap_smooth_16v_8bpp_c : 192.2 175.7 vp9_avg_8tap_smooth_16v_8bpp_rvv_i32 : 21.5 19.0 vp9_avg_8tap_smooth_32h_8bpp_c : 780.2 663.7 vp9_avg_8tap_smooth_32h_8bpp_rvv_i32 : 83.5 60.0 vp9_avg_8tap_smooth_32v_8bpp_c : 770.5 689.2 vp9_avg_8tap_smooth_32v_8bpp_rvv_i32 : 67.2 60.0 vp9_avg_8tap_smooth_64h_8bpp_c : 3115.5 2647.2 vp9_avg_8tap_smooth_64h_8bpp_rvv_i32 : 283.5 119.2 vp9_avg_8tap_smooth_64v_8bpp_c : 3082.2 2729.0 vp9_avg_8tap_smooth_64v_8bpp_rvv_i32 : 305.2 119.0 vp9_put_8tap_smooth_4h_8bpp_c : 11.2 9.7 vp9_put_8tap_smooth_4h_8bpp_rvv_i32 : 4.2 4.0 vp9_put_8tap_smooth_4v_8bpp_c : 11.7 10.7 vp9_put_8tap_smooth_4v_8bpp_rvv_i32 : 4.2 4.0 vp9_put_8tap_smooth_8h_8bpp_c : 42.0 37.5 vp9_put_8tap_smooth_8h_8bpp_rvv_i32 : 8.5 7.7 vp9_put_8tap_smooth_8v_8bpp_c : 44.2 38.7 vp9_put_8tap_smooth_8v_8bpp_rvv_i32 : 8.5 7.7 vp9_put_8tap_smooth_16h_8bpp_c : 165.7 147.2 vp9_put_8tap_smooth_16h_8bpp_rvv_i32 : 19.5 17.5 vp9_put_8tap_smooth_16v_8bpp_c : 169.0 149.7 vp9_put_8tap_smooth_16v_8bpp_rvv_i32 : 19.7 17.5 vp9_put_8tap_smooth_32h_8bpp_c : 659.7 586.7 vp9_put_8tap_smooth_32h_8bpp_rvv_i32 : 64.2 57.2 vp9_put_8tap_smooth_32v_8bpp_c : 680.5 591.2 vp9_put_8tap_smooth_32v_8bpp_rvv_i32 : 64.2 57.2 vp9_put_8tap_smooth_64h_8bpp_c : 2681.5 2339.0 vp9_put_8tap_smooth_64h_8bpp_rvv_i32 : 255.5 114.2 vp9_put_8tap_smooth_64v_8bpp_c : 2709.7 2348.7 vp9_put_8tap_smooth_64v_8bpp_rvv_i32 : 255.5 114.0 --- libavcodec/riscv/vp9_mc_rvv.S | 243 +++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 72 ++++++---- libavcodec/riscv/vp9dsp_init.c | 40 +++++- 3 files changed, 329 insertions(+), 26 deletions(-) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index 986cc3760d..c633809675 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -36,6 +36,18 @@ .endif .endm +.macro vsetvlstatic16 len +.ifc \len,4 + vsetvli zero, zero, e16, mf2, ta, ma +.elseif \len == 8 + vsetvli zero, zero, e16, m1, ta, ma +.elseif \len == 16 + vsetvli zero, zero, e16, m2, ta, ma +.else + vsetvli zero, zero, e16, m4, ta, ma +.endif +.endm + .macro copy_avg len func ff_avg\len\()_rvv, zve32x csrwi vxrm, 0 @@ -92,10 +104,241 @@ func ff_\op\()_bilin_\len\()\type\()_rvv, zve32x endfunc .endm +const subpel_filters_regular + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte 0, 1, -5, 126, 8, -3, 1, 0 + .byte -1, 3, -10, 122, 18, -6, 2, 0 + .byte -1, 4, -13, 118, 27, -9, 3, -1 + .byte -1, 4, -16, 112, 37, -11, 4, -1 + .byte -1, 5, -18, 105, 48, -14, 4, -1 + .byte -1, 5, -19, 97, 58, -16, 5, -1 + .byte -1, 6, -19, 88, 68, -18, 5, -1 + .byte -1, 6, -19, 78, 78, -19, 6, -1 + .byte -1, 5, -18, 68, 88, -19, 6, -1 + .byte -1, 5, -16, 58, 97, -19, 5, -1 + .byte -1, 4, -14, 48, 105, -18, 5, -1 + .byte -1, 4, -11, 37, 112, -16, 4, -1 + .byte -1, 3, -9, 27, 118, -13, 4, -1 + .byte 0, 2, -6, 18, 122, -10, 3, -1 + .byte 0, 1, -3, 8, 126, -5, 1, 0 +subpel_filters_sharp: + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte -1, 3, -7, 127, 8, -3, 1, 0 + .byte -2, 5, -13, 125, 17, -6, 3, -1 + .byte -3, 7, -17, 121, 27, -10, 5, -2 + .byte -4, 9, -20, 115, 37, -13, 6, -2 + .byte -4, 10, -23, 108, 48, -16, 8, -3 + .byte -4, 10, -24, 100, 59, -19, 9, -3 + .byte -4, 11, -24, 90, 70, -21, 10, -4 + .byte -4, 11, -23, 80, 80, -23, 11, -4 + .byte -4, 10, -21, 70, 90, -24, 11, -4 + .byte -3, 9, -19, 59, 100, -24, 10, -4 + .byte -3, 8, -16, 48, 108, -23, 10, -4 + .byte -2, 6, -13, 37, 115, -20, 9, -4 + .byte -2, 5, -10, 27, 121, -17, 7, -3 + .byte -1, 3, -6, 17, 125, -13, 5, -2 + .byte 0, 1, -3, 8, 127, -7, 3, -1 +subpel_filters_smooth: + .byte 0, 0, 0, 128, 0, 0, 0, 0 + .byte -3, -1, 32, 64, 38, 1, -3, 0 + .byte -2, -2, 29, 63, 41, 2, -3, 0 + .byte -2, -2, 26, 63, 43, 4, -4, 0 + .byte -2, -3, 24, 62, 46, 5, -4, 0 + .byte -2, -3, 21, 60, 49, 7, -4, 0 + .byte -1, -4, 18, 59, 51, 9, -4, 0 + .byte -1, -4, 16, 57, 53, 12, -4, -1 + .byte -1, -4, 14, 55, 55, 14, -4, -1 + .byte -1, -4, 12, 53, 57, 16, -4, -1 + .byte 0, -4, 9, 51, 59, 18, -4, -1 + .byte 0, -4, 7, 49, 60, 21, -3, -2 + .byte 0, -4, 5, 46, 62, 24, -3, -2 + .byte 0, -4, 4, 43, 63, 26, -2, -2 + .byte 0, -3, 2, 41, 63, 29, -2, -2 + .byte 0, -3, 1, 38, 64, 32, -1, -3 +endconst + +.macro epel_filter name type regtype + lla \regtype\()2, subpel_filters_\name + li \regtype\()1, 8 +.ifc \type,v + mul \regtype\()0, a6, \regtype\()1 +.elseif \type == h + mul \regtype\()0, a5, \regtype\()1 +.endif + add \regtype\()0, \regtype\()0, \regtype\()2 + .irp n 1,2,3,4,5,6 + lb \regtype\n, \n(\regtype\()0) + .endr +.ifc \regtype,t + lb a7, 7(\regtype\()0) +.elseif \regtype == s + lb s7, 7(\regtype\()0) +.endif + lb \regtype\()0, 0(\regtype\()0) +.endm + +.macro epel_load dst len op name type from_mem regtype + li a5, 64 +.ifc \from_mem, 1 + vle8.v v22, (a2) +.ifc \type,v + sub a2, a2, a3 + vle8.v v20, (a2) + sh1add a2, a3, a2 + vle8.v v24, (a2) + add a2, a2, a3 + vle8.v v26, (a2) + add a2, a2, a3 + vle8.v v28, (a2) + add a2, a2, a3 + vle8.v v30, (a2) +.elseif \type == h + addi a2, a2, -1 + vle8.v v20, (a2) + addi a2, a2, 2 + vle8.v v24, (a2) + addi a2, a2, 1 + vle8.v v26, (a2) + addi a2, a2, 1 + vle8.v v28, (a2) + addi a2, a2, 1 + vle8.v v30, (a2) +.endif + +.ifc \name,smooth + vwmulu.vx v16, v24, \regtype\()4 + vwmaccu.vx v16, \regtype\()2, v20 + vwmaccu.vx v16, \regtype\()5, v26 + vwmaccsu.vx v16, \regtype\()6, v28 +.else + vwmulu.vx v16, v28, \regtype\()6 + vwmaccsu.vx v16, \regtype\()2, v20 + vwmaccsu.vx v16, \regtype\()5, v26 +.endif + +.ifc \regtype,t + vwmaccsu.vx v16, a7, v30 +.elseif \regtype == s + vwmaccsu.vx v16, s7, v30 +.endif + +.ifc \type,v + .rept 6 + sub a2, a2, a3 + .endr + vle8.v v28, (a2) + sub a2, a2, a3 + vle8.v v26, (a2) + sh1add a2, a3, a2 + add a2, a2, a3 +.elseif \type == h + addi a2, a2, -6 + vle8.v v28, (a2) + addi a2, a2, -1 + vle8.v v26, (a2) + addi a2, a2, 3 +.endif + +.ifc \name,smooth + vwmaccsu.vx v16, \regtype\()1, v28 +.else + vwmaccu.vx v16, \regtype\()1, v28 + vwmulu.vx v28, v24, \regtype\()4 +.endif + vwmaccsu.vx v16, \regtype\()0, v26 + vwmulu.vx v20, v22, \regtype\()3 +.else +.ifc \name,smooth + vwmulu.vx v16, v8, \regtype\()4 + vwmaccu.vx v16, \regtype\()2, v4 + vwmaccu.vx v16, \regtype\()5, v10 + vwmaccsu.vx v16, \regtype\()6, v12 + vwmaccsu.vx v16, \regtype\()1, v2 +.else + vwmulu.vx v16, v2, \regtype\()1 + vwmaccu.vx v16, \regtype\()6, v12 + vwmaccsu.vx v16, \regtype\()5, v10 + vwmaccsu.vx v16, \regtype\()2, v4 + vwmulu.vx v28, v8, \regtype\()4 +.endif + vwmaccsu.vx v16, \regtype\()0, v0 + vwmulu.vx v20, v6, \regtype\()3 + +.ifc \regtype,t + vwmaccsu.vx v16, a7, v14 +.elseif \regtype == s + vwmaccsu.vx v16, s7, v14 +.endif + +.endif + vwadd.wx v16, v16, a5 + vsetvlstatic16 \len + +.ifc \name,smooth + vwadd.vv v24, v16, v20 +.else + vwadd.vv v24, v16, v28 + vwadd.wv v24, v24, v20 +.endif + vnsra.wi v24, v24, 7 + vmax.vx v24, v24, zero + vsetvlstatic8 \len, zero, 32, m2 + + vnclipu.wi \dst, v24, 0 +.ifc \op,avg + vle8.v v24, (a0) + vaaddu.vv \dst, \dst, v24 +.endif + +.endm + +.macro epel_load_inc dst len op name type from_mem regtype + epel_load \dst \len \op \name \type \from_mem \regtype + add a2, a2, a3 +.endm + +.macro epel len op name type vlen +func ff_\op\()_8tap_\name\()_\len\()\type\()_rvv\vlen\(), zve32x + epel_filter \name \type t +.if \vlen < 256 + vsetvlstatic8 \len a5 32 m2 +.else + vsetvlstatic8 \len a5 64 m2 +.endif +.ifc \op,avg + csrwi vxrm, 0 +.endif + +1: + addi a4, a4, -1 + epel_load v30 \len \op \name \type 1 t + vse8.v v30, (a0) +.if \len == 64 && \vlen < 256 + addi a0, a0, 32 + addi a2, a2, 32 + epel_load v30 \len \op \name \type 1 t + vse8.v v30, (a0) + addi a0, a0, -32 + addi a2, a2, -32 +.endif + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + .irp len 64, 32, 16, 8, 4 copy_avg \len .irp op put avg bilin_h_v \len \op h a5 bilin_h_v \len \op v a6 + .irp name regular sharp smooth + .irp type h v + epel \len \op \name \type 128 + epel \len \op \name \type 256 + .endr + .endr .endr .endr diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index 79330b4968..1638daaae3 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -81,33 +81,39 @@ void ff_tm_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, void ff_tm_4x4_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); -#define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) \ -void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ +#define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx, min_vlen) \ +void ff_put_8tap_##type##_##SIZE##h_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_put_8tap_##type##_##SIZE##v_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_put_8tap_##type##_##SIZE##v_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_put_8tap_##type##_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_put_8tap_##type##_##SIZE##hv_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_8tap_##type##_##SIZE##h_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_8tap_##type##_##SIZE##v_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_8tap_##type##_##SIZE##v_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); \ \ -void ff_avg_8tap_##type##_##SIZE##hv_rvv(uint8_t *dst, ptrdiff_t dststride, \ +void ff_avg_8tap_##type##_##SIZE##hv_rvv##min_vlen(uint8_t *dst, \ + ptrdiff_t dststride, \ const uint8_t *src, \ ptrdiff_t srcstride, \ int h, int mx, int my); @@ -146,23 +152,41 @@ void ff_avg##SIZE##_rvv(uint8_t *dst, ptrdiff_t dststride, \ const uint8_t *src, ptrdiff_t srcstride, \ int h, int mx, int my); -VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR); -VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR); - -VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP); -VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP); - -VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH); -VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH); +VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR, 128); +VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR, 128); + +VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP, 128); +VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP, 128); + +VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH, 128); +VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH, 128); + +VP9_8TAP_RISCV_RVV_FUNC(64, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(32, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(16, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(8, regular, FILTER_8TAP_REGULAR, 256); +VP9_8TAP_RISCV_RVV_FUNC(4, regular, FILTER_8TAP_REGULAR, 256); + +VP9_8TAP_RISCV_RVV_FUNC(64, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(32, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(16, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(8, sharp, FILTER_8TAP_SHARP, 256); +VP9_8TAP_RISCV_RVV_FUNC(4, sharp, FILTER_8TAP_SHARP, 256); + +VP9_8TAP_RISCV_RVV_FUNC(64, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(32, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(16, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(8, smooth, FILTER_8TAP_SMOOTH, 256); +VP9_8TAP_RISCV_RVV_FUNC(4, smooth, FILTER_8TAP_SMOOTH, 256); VP9_BILINEAR_RISCV_RVV_FUNC(64); VP9_BILINEAR_RISCV_RVV_FUNC(32); diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 31120a7893..0ae14879ea 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -49,7 +49,8 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) # endif #if HAVE_RVV - if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I32) && ff_get_rv_vlenb() >= 16) { + if (bpp == 8 && (flags & AV_CPU_FLAG_RVV_I32)) { + if (ff_get_rv_vlenb() >= 16) { #define init_fpel(idx1, sz) \ dsp->mc[idx1][FILTER_8TAP_SMOOTH ][1][0][0] = ff_avg##sz##_rvv; \ @@ -63,6 +64,26 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) init_fpel(3, 8); init_fpel(4, 4); +#undef init_fpel + +#define init_subpel1(idx1, idx2, idxh, idxv, sz, dir, type, vlen) \ + dsp->mc[idx1][FILTER_8TAP_SMOOTH ][idx2][idxh][idxv] = \ + ff_##type##_8tap_smooth_##sz##dir##_rvv##vlen; \ + dsp->mc[idx1][FILTER_8TAP_REGULAR][idx2][idxh][idxv] = \ + ff_##type##_8tap_regular_##sz##dir##_rvv##vlen; \ + dsp->mc[idx1][FILTER_8TAP_SHARP ][idx2][idxh][idxv] = \ + ff_##type##_8tap_sharp_##sz##dir##_rvv##vlen; + +#define init_subpel2(idx, idxh, idxv, dir, type, vlen) \ + init_subpel1(0, idx, idxh, idxv, 64, dir, type, vlen); \ + init_subpel1(1, idx, idxh, idxv, 32, dir, type, vlen); \ + init_subpel1(2, idx, idxh, idxv, 16, dir, type, vlen); \ + init_subpel1(3, idx, idxh, idxv, 8, dir, type, vlen); \ + init_subpel1(4, idx, idxh, idxv, 4, dir, type, vlen) + + init_subpel2(0, 1, 0, h, put, 128); + init_subpel2(1, 1, 0, h, avg, 128); + dsp->mc[0][FILTER_BILINEAR ][0][0][1] = ff_put_bilin_64v_rvv; dsp->mc[0][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_64h_rvv; dsp->mc[0][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_64v_rvv; @@ -84,8 +105,23 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) dsp->mc[4][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_4v_rvv; dsp->mc[4][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_4h_rvv; -#undef init_fpel + if (flags & AV_CPU_FLAG_RVB_ADDR) { + init_subpel2(0, 0, 1, v, put, 128); + init_subpel2(1, 0, 1, v, avg, 128); } + + } + if (ff_get_rv_vlenb() >= 32) { + init_subpel2(0, 1, 0, h, put, 256); + init_subpel2(1, 1, 0, h, avg, 256); + + if (flags & AV_CPU_FLAG_RVB_ADDR) { + init_subpel2(0, 0, 1, v, put, 256); + init_subpel2(1, 0, 1, v, avg, 256); + } + } + } + #endif #endif } From patchwork Sun May 12 10:03:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48808 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1706:b0:1af:cdee:28c5 with SMTP id nv6csp505904pzb; Sun, 12 May 2024 03:05:04 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXY9x8jWgkyFknLvOglqYrHXPIIiKqJNjhiXKtwzF88tCRLDvaTMcWo23El7T0iXp+G8P9VmIBq9zv0KGdHZqWzUfW+Yj2iA2zf5g== X-Google-Smtp-Source: AGHT+IFGdsvcKI9RnXhtXkDEBYxLKYigVW3zTTEi0VDeWDa1Sp+8BWlJG8WXuFe8osrYyFdmAl4t X-Received: by 2002:a17:906:4f8d:b0:a59:beb2:62cc with SMTP id a640c23a62f3a-a5a2d6668b0mr377195066b.61.1715508304367; Sun, 12 May 2024 03:05:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715508304; cv=none; d=google.com; s=arc-20160816; b=umiT/9D23eGg3EaLho5RxqITZrYAgxc1huy509KbASLiiA+OZih0nZW1iGo3evmtPT LEkm/bF8DyUGWiB8FbAX8pNhl/XpkGttPoHRa2c+YqyGtjaS6CPa/wT+sV+Z6bI+uBQs D0qZlpLl3lL6Jcx7l7ODICUhuQm48ax1nU5MfbNmXN9Ev58DoqRh8G4T1oxRos9oixQF gR1qcK3Q3wv9FP45E7dcBD9L6zl79omRxDBPebZIFxcjKTvDyMniRAL5z9oatLspkOEF /XP13//SFnw3ZOQ6qLyy3BGkGttyfa7ZofeuRz0p+4eDhZeCCXFNkDnNrmoktlyVoelQ NgZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=eAcbFFZF/nFM6iI/hZd8vKwVpbj08z+6JUrTl1TOaBM=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=oAF47WC/1VY/4sFPx81cjR2d5e9hsT8irTtYtlF4nJIlTcB1TV2Q2qTpThV92SKN0q 3kXZlpHaSWar/pE1FgWBiM3lIyRskZsyKSx8b1wbjghNWlu283EBgRE8TfCMuW8LfItn poEOmY+zDul8Xm37bJNjfNj5++9EsroiTy0XONTJ3yBWoU7yoGmQ/i/HH9xSJ9522FPz 6sbUhOgPhev3ifhW6ycTef5uo+NoQ3lywazUjrSwen+9ujqY+yQLHZLjmeq1Bn0aJbGF LUAu72Gbk//jOa28kNRinZ9qYqSO1xgeTwMpoTgaX6oxJDYCenswtB6OGJdVUMmdmcCa P+zQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=Rcb1pD6V; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17c2d129si393506366b.1037.2024.05.12.03.05.03; Sun, 12 May 2024 03:05:04 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=Rcb1pD6V; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 60AD868D66C; Sun, 12 May 2024 13:04:02 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-191.mail.qq.com (out203-205-221-191.mail.qq.com [203.205.221.191]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2AA1968D5D9 for ; Sun, 12 May 2024 13:03:48 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715508220; bh=s+bGheSYOTk35OJEpmCr7mklE1OmsE1F5DJF+JnQ4X0=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=Rcb1pD6VniHFan00int8pUtTPbyKHdRRb9zT/1VVZzLTM0rihK0Z7W5V6bMKgwEqI V29BMwIFpWcEeoT2OQsQkgx5tAohztO46dX8oN4G7Gq2rzmEPasaqkai0IWbH+Lv++ H+JZ3FUUEEZGpS/MYnj2MOvYnqS0gyma9Zujxu5o= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id E135E56; Sun, 12 May 2024 18:03:33 +0800 X-QQ-mid: xmsmtpt1715508219t51f8ucvp Message-ID: X-QQ-XMAILINFO: N7h1OCCDntujX0cGn3oj3z5CGKeoK1ZX63c2EN8QFwpq4yokXCXGfFApCWZW70 lupLGREyR8y7DRTYGkR33L2W2BKke3jp5fOjkGJWyQD4rzURSOwVAqSrpUeMwcDTU/8Suk3W6SQU gH1r7E0Z7WKSO9w1tqBKfB7gzoCzmPvFErqE6mr+lvV2Jj/ot3TcXYYx45F1ONj4bZPH6A3DYqwU OglcBVqOcoOi3VMWlSHVV5qRb0PsAiMHrK3c4WkW3HjsnjwrBMfujrtuXUBR/k2BB42SnyoVoMuc CfUtC3UTNIfdZlkqVQceU6mohcah4hO7q4Kjg45PdBIKbxfXUKMv2a3xAyDZUIvjFuDvxjxpq559 P9XbGqkZ/uR82rra3F9KtL4AgXDikAU5+xUh02eXXEURj79jpWfoXcBDce+lwIzlCOXMvKffNgKX v8uOk3sNzlquHoTDs52RViLqaNw5cjohIAKD+dQd5pP2YzGc2Zguswcr92mnkyzY8xyCAFMKPm17 gkhakjK8ZN2lTkzVTV6GLX0E1LiUAfFJKwZ6hoyeJhYfSnnaxhRYElyzFu+hDekYmbGLgjoGaD4R bEbNDskqLs4/3kCoF4wjf5hAmQsGs53G4UqkA/aLeGwPjDitDGZJyQN3MF+QB9kx03ldXnbOQ6KD hDnm6mMMTbnt3httXEE5GfWTdPuCG8aJjK60AC4er0Hy07dx2osh74a2qyJsZC7cKXWGnD1+V2t/ wULGGOgrnZ1gXs55aG3Sy5GZxSx7tfR5jpHSdilIoyZEvZYUQT3viMEWVq6df3KTvOsxydCx+9oH 0whGQgeOF9cywZSX6EnZURuBWIa3FjwuK2B/fHTBIbHCAG6dh6/6b0U4eB+MAX6mgoY8qgXwo1Tc 3K1KmYnukHunHYYf8Jl2v4myKDmb1yzqdWLScz8cwwLwm7oMSBKcGHp4OVnGYnWDbJ1+2OmZKm7c DEkELt7/CUA/IchgM0NEFiuObuoj/7M6IO9osDPhY= X-QQ-XMRINFO: OWPUhxQsoeAVDbp3OJHYyFg= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 12 May 2024 18:03:30 +0800 X-OQ-MSGID: <20240512100331.995415-8-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240512100331.995415-1-uk7b@foxmail.com> References: <20240512100331.995415-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 8/9] lavc/vp9dsp: R-V V mc bilin hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: yFew3ln5SkH7 From: sunyuechi C908: vp9_avg_bilin_4hv_8bpp_c: 11.0 vp9_avg_bilin_4hv_8bpp_rvv_i64: 3.7 vp9_avg_bilin_8hv_8bpp_c: 38.7 vp9_avg_bilin_8hv_8bpp_rvv_i64: 7.2 vp9_avg_bilin_16hv_8bpp_c: 147.0 vp9_avg_bilin_16hv_8bpp_rvv_i64: 14.2 vp9_avg_bilin_32hv_8bpp_c: 574.5 vp9_avg_bilin_32hv_8bpp_rvv_i64: 42.7 vp9_avg_bilin_64hv_8bpp_c: 2311.5 vp9_avg_bilin_64hv_8bpp_rvv_i64: 201.7 vp9_put_bilin_4hv_8bpp_c: 10.0 vp9_put_bilin_4hv_8bpp_rvv_i64: 3.2 vp9_put_bilin_8hv_8bpp_c: 35.2 vp9_put_bilin_8hv_8bpp_rvv_i64: 6.5 vp9_put_bilin_16hv_8bpp_c: 133.7 vp9_put_bilin_16hv_8bpp_rvv_i64: 13.0 vp9_put_bilin_32hv_8bpp_c: 538.2 vp9_put_bilin_32hv_8bpp_rvv_i64: 39.7 vp9_put_bilin_64hv_8bpp_c: 2114.0 vp9_put_bilin_64hv_8bpp_rvv_i64: 153.7 --- libavcodec/riscv/vp9_mc_rvv.S | 34 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 10 ++++++++++ 2 files changed, 44 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index c633809675..22ae194367 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -104,6 +104,39 @@ func ff_\op\()_bilin_\len\()\type\()_rvv, zve32x endfunc .endm +.macro bilin_hv len op +func ff_\op\()_bilin_\len\()hv_rvv, zve32x +.ifc \op,avg + csrwi vxrm, 0 +.endif + vsetvlstatic8 \len t0 64 + neg t1, a5 + neg t2, a6 + li t4, 8 + bilin_load v24, \len, put, h, a5 + add a2, a2, a3 +1: + addi a4, a4, -1 + bilin_load v4, \len, put, h, a5 + vwmulu.vx v16, v4, a6 + vwmaccsu.vx v16, t2, v24 + vwadd.wx v16, v16, t4 + vnsra.wi v16, v16, 4 + vadd.vv v0, v16, v24 +.ifc \op,avg + vle8.v v16, (a0) + vaaddu.vv v0, v0, v16 +.endif + vse8.v v0, (a0) + vmv.v.v v24, v4 + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +endfunc +.endm + const subpel_filters_regular .byte 0, 0, 0, 128, 0, 0, 0, 0 .byte 0, 1, -5, 126, 8, -3, 1, 0 @@ -334,6 +367,7 @@ endfunc .irp op put avg bilin_h_v \len \op h a5 bilin_h_v \len \op v a6 + bilin_hv \len \op .irp name regular sharp smooth .irp type h v epel \len \op \name \type 128 diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 0ae14879ea..a8e5759cd8 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -104,6 +104,16 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) dsp->mc[4][FILTER_BILINEAR ][0][1][0] = ff_put_bilin_4h_rvv; dsp->mc[4][FILTER_BILINEAR ][1][0][1] = ff_avg_bilin_4v_rvv; dsp->mc[4][FILTER_BILINEAR ][1][1][0] = ff_avg_bilin_4h_rvv; + dsp->mc[0][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_64hv_rvv; + dsp->mc[0][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_64hv_rvv; + dsp->mc[1][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_32hv_rvv; + dsp->mc[1][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_32hv_rvv; + dsp->mc[2][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_16hv_rvv; + dsp->mc[2][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_16hv_rvv; + dsp->mc[3][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_8hv_rvv; + dsp->mc[3][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_8hv_rvv; + dsp->mc[4][FILTER_BILINEAR ][0][1][1] = ff_put_bilin_4hv_rvv; + dsp->mc[4][FILTER_BILINEAR ][1][1][1] = ff_avg_bilin_4hv_rvv; if (flags & AV_CPU_FLAG_RVB_ADDR) { init_subpel2(0, 0, 1, v, put, 128); From patchwork Sun May 12 10:03:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48809 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1706:b0:1af:cdee:28c5 with SMTP id nv6csp505973pzb; Sun, 12 May 2024 03:05:14 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXrMD3XeoMjIy/3J3Dv8DbVdq79bnoi9rf/q3DYIDVp8w4/JxpRTClhi3FEjPUzXCjoUYrym4UYlgDU4dxIk7nLfr6nxlBL/N4ndw== X-Google-Smtp-Source: AGHT+IHHZD4B8E45ILggvqsguOpSIYjMleVXT+62Ix/CwN+i4j1OJfoDoGiPPlmARFV7QZKpAIVT X-Received: by 2002:a17:906:5393:b0:a59:bacc:b07f with SMTP id a640c23a62f3a-a5a2d672f6bmr487847566b.52.1715508314281; Sun, 12 May 2024 03:05:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715508314; cv=none; d=google.com; s=arc-20160816; b=xj1OLXbouCcTw5BYOeP4TrXxQ5CiJEM9J0ogeQ4cMPQQmoc01mSdDWcss+3MNe3cn6 CAbTcB3HReRkZ50eu9/okFqte/a8lvJOeFqvvbG93dXd835K/y2uyPZsB9f4mAsmxS/a pSVk2yE+/vtJ23Y/fO9F/jwdSSW1FKySZvfbrX3wQ+1I9QFAzdCYFU5u9l3+GRku7Eua vIeHKkKSvxxVH0LN387EB+La0oo3beTRi7gZ3RzDJSREgkM69fr7DmHDasPjJ4bTjE0w JLtdPLuNFQZET8IWxHt3BA7vuEsEcVDmQayd0RN39iyQAmez9kE3Bz4xTCpdgPPeuHLo mKWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=bwQ3sKQgB2a9abt+fsCU3HBtSR/5K+JnLfLxjHSJkRw=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=Zw1cdlgbtGLBsLApUvhi53VwpMPqwBZO6lkXeeKh6sbDjyxTSZ1UC/oUEtvRaPlHaP pg/8i/rz/YY4PHp5t6azncD1OgKPVcZbbiWQ/9neJlypaVqyLS8gFAB9okOWVPpuL3zM gakQqsMOIgjaeZ9/Sr1kGLrsqgB9pgGgernAR12mfTS0MMg3YOEv196ovuH5YviBzhKJ oHDkAAiyBXI/zOYHB3KhM9Envr0YR8R6F/PSTNeofdnPFXD8uc/eV2RfNgzzuPaUz/Fq 0L5LvdxTGFxo9s257bluv6/URT9oYcEEuYsIlTsRgFKoSTw/FldFKml4LP5kfAeZ3CTr OIuQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=oWtAlbsH; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17c2c71dsi396744766b.969.2024.05.12.03.05.13; Sun, 12 May 2024 03:05:14 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=oWtAlbsH; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B09A768D67C; Sun, 12 May 2024 13:04:03 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-231.mail.qq.com (out203-205-221-231.mail.qq.com [203.205.221.231]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A4CED68D611 for ; Sun, 12 May 2024 13:03:49 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715508221; bh=MWSrv+DL8NhtsCs4tqLHk+bHf2Pf4zrqJjMrI/9BHNU=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=oWtAlbsH+yCmuhwYhCs8Zq/bFb33+OVNiJlMRVGmJ7G1cnHV+ZDH7B9choEEn/Qbc eipMsxzXdOIq/xkQ44PpvJ+AvDf9DpYr8lCeAVc+1C/PCtWRegHOTRaw1DQosbZ3iK IbEXOJeTK7/IOqceoxGeDjtuDVTEHCKim5z5dVx4= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id E135E56; Sun, 12 May 2024 18:03:33 +0800 X-QQ-mid: xmsmtpt1715508220tc0q91kxv Message-ID: X-QQ-XMAILINFO: OdIVOfqOaVcrBqXpKDhomChooL7l1N0NK7ZBuJYuJ0q9Zp0d8iTPISypYTPxV2 PTGxcRl6eMAVvQf3RZI73w6fgSYZigT8JaOKAx34nPLml0OMgpDmps39StLHub5YR/7oZO70KxVL KTz2SfEuqYbSBxc9mTyGBgv2K6X1O5e1uwucHkiEYj+d7Haxt1FbRYOg3i9Du55/tm/p345V73mn mdUtC1B/OeBlnzzuWuAuS3s0T6K3bcOIdGGQgkUAUtRSbOQuPAIc9o43/0ifiu3TVL7dyhrpV+FX hROpZYVFIikKwzQcN9yb+8hblYH4CKhtrHGVfczO4qi8Fdxuu/bIMQKp+b5A/fuLA05Nyn9km+CV s9F3pTtiqN1/bHo1obH8Z8b2VYN5hiTfKEO5XhcHIUzNl92GdpVLjjl3RebeM3aIoZqr4E7Fze6x YYdSeyF9iOxsQnKPgELUVCQ0bkzFofRAAS2u3oZObZfnqJUh/0y7vHDHNPaZVqEKhYKqiOiwp8Lt OiATbbHU6ASiTyjT8x9TOb16wvbLuFoR4hpyFo4IhwBPzwgcc0NGSr6dGzqhJcmonJZUysAFnI9k n2JiOyB1/aYHJsg+445KCaGW5gnHxjjlNJHeUeX/l79Wvh4CoZPrUlppo1fg9YRH0wKEAipm74H9 RIebjk8pHhbAu+g+XjlX1Xhbic62UJQ6pX2k21SkcVhdcZYMTw35+wNJctF3iFiU2zx6vWU2yUP/ p6NtAzCGEaQdowOnwQrdBMzPiFv6Ndtdiyf93pgtGWE72W90pNSmvyixvpf7458SruoSHbzcNJns zu/w/DhVX+fEcJun68yC5WtV/q6SKPoIAZ2EBZo/55zDYB2F5pmlRyjdI+fm2HuexTFlCS2dBEo9 C1OS34PpYZ/fvlFTaTjxVTtIQ2wRjbo/4SDvYBJfuZMvQn3d4YgdcgWKwPOFcbpe4lFA7+ylWNqZ T+nI95zsmMUYK2xGam+Q== X-QQ-XMRINFO: M/715EihBoGSf6IYSX1iLFg= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 12 May 2024 18:03:31 +0800 X-OQ-MSGID: <20240512100331.995415-9-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240512100331.995415-1-uk7b@foxmail.com> References: <20240512100331.995415-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 9/9] lavc/vp9dsp: R-V V mc tap hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: s397alzfDkDZ From: sunyuechi C908 X60 vp9_avg_8tap_smooth_4hv_8bpp_c : 32.0 28.2 vp9_avg_8tap_smooth_4hv_8bpp_rvv_i32 : 15.0 13.2 vp9_avg_8tap_smooth_8hv_8bpp_c : 98.0 86.2 vp9_avg_8tap_smooth_8hv_8bpp_rvv_i32 : 23.7 21.0 vp9_avg_8tap_smooth_16hv_8bpp_c : 355.5 297.0 vp9_avg_8tap_smooth_16hv_8bpp_rvv_i32 : 62.7 41.2 vp9_avg_8tap_smooth_32hv_8bpp_c : 1273.0 1099.7 vp9_avg_8tap_smooth_32hv_8bpp_rvv_i32 : 133.7 119.2 vp9_avg_8tap_smooth_64hv_8bpp_c : 4933.0 4240.5 vp9_avg_8tap_smooth_64hv_8bpp_rvv_i32 : 506.7 227.0 vp9_put_8tap_smooth_4hv_8bpp_c : 30.2 27.0 vp9_put_8tap_smooth_4hv_8bpp_rvv_i32 : 14.5 12.7 vp9_put_8tap_smooth_8hv_8bpp_c : 91.2 81.2 vp9_put_8tap_smooth_8hv_8bpp_rvv_i32 : 22.7 20.2 vp9_put_8tap_smooth_16hv_8bpp_c : 329.2 277.7 vp9_put_8tap_smooth_16hv_8bpp_rvv_i32 : 44.7 40.0 vp9_put_8tap_smooth_32hv_8bpp_c : 1183.7 1022.7 vp9_put_8tap_smooth_32hv_8bpp_rvv_i32 : 130.7 116.5 vp9_put_8tap_smooth_64hv_8bpp_c : 4502.7 3954.5 vp9_put_8tap_smooth_64hv_8bpp_rvv_i32 : 496.0 224.7 --- libavcodec/riscv/vp9_mc_rvv.S | 75 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp_init.c | 8 ++++ 2 files changed, 83 insertions(+) diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S index 22ae194367..958460d165 100644 --- a/libavcodec/riscv/vp9_mc_rvv.S +++ b/libavcodec/riscv/vp9_mc_rvv.S @@ -362,6 +362,77 @@ func ff_\op\()_8tap_\name\()_\len\()\type\()_rvv\vlen\(), zve32x endfunc .endm +#if __riscv_xlen == 64 +.macro epel_hv_once len name op + sub a2, a2, a3 + sub a2, a2, a3 + sub a2, a2, a3 + .irp n 0 2 4 6 8 10 12 14 + epel_load_inc v\n \len put \name h 1 t + .endr + addi a4, a4, -1 +1: + addi a4, a4, -1 + epel_load v30 \len \op \name v 0 s + vse8.v v30, (a0) + vmv.v.v v0, v2 + vmv.v.v v2, v4 + vmv.v.v v4, v6 + vmv.v.v v6, v8 + vmv.v.v v8, v10 + vmv.v.v v10, v12 + vmv.v.v v12, v14 + epel_load v14 \len put \name h 1 t + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + epel_load v30 \len \op \name v 0 s + vse8.v v30, (a0) +.endm + +.macro epel_hv op name len vlen +func ff_\op\()_8tap_\name\()_\len\()hv_rvv\vlen\(), zve32x + addi sp, sp, -64 + .irp n 0,1,2,3,4,5,6,7 + sd s\n, \n\()<<3(sp) + .endr +.if \len == 64 && \vlen < 256 + addi sp, sp, -48 + .irp n 0,1,2,3,4,5 + sd a\n, \n\()<<3(sp) + .endr +.endif +.ifc \op,avg + csrwi vxrm, 0 +.endif + epel_filter \name h t + epel_filter \name v s +.if \vlen < 256 + vsetvlstatic8 \len a6 32 m2 +.else + vsetvlstatic8 \len a6 64 m2 +.endif + epel_hv_once \len \name \op +.if \len == 64 && \vlen < 256 + .irp n 0,1,2,3,4,5 + ld a\n, \n\()<<3(sp) + .endr + addi sp, sp, 48 + addi a0, a0, 32 + addi a2, a2, 32 + epel_filter \name h t + epel_hv_once \len \name \op +.endif + .irp n 0,1,2,3,4,5,6,7 + ld s\n, \n\()<<3(sp) + .endr + addi sp, sp, 64 + + ret +endfunc +.endm +#endif + .irp len 64, 32, 16, 8, 4 copy_avg \len .irp op put avg @@ -373,6 +444,10 @@ endfunc epel \len \op \name \type 128 epel \len \op \name \type 256 .endr + #if __riscv_xlen == 64 + epel_hv \op \name \len 128 + epel_hv \op \name \len 256 + #endif .endr .endr .endr diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index a8e5759cd8..513532f73b 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -118,6 +118,10 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) if (flags & AV_CPU_FLAG_RVB_ADDR) { init_subpel2(0, 0, 1, v, put, 128); init_subpel2(1, 0, 1, v, avg, 128); +# if __riscv_xlen == 64 + init_subpel2(0, 1, 1, hv, put, 128); + init_subpel2(1, 1, 1, hv, avg, 128); +# endif } } @@ -128,6 +132,10 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext *dsp, int bpp) if (flags & AV_CPU_FLAG_RVB_ADDR) { init_subpel2(0, 0, 1, v, put, 256); init_subpel2(1, 0, 1, v, avg, 256); +# if __riscv_xlen == 64 + init_subpel2(0, 1, 1, hv, put, 256); + init_subpel2(1, 1, 1, hv, avg, 256); +# endif } } }