From patchwork Sun May 12 10:03:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48807 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1706:b0:1af:cdee:28c5 with SMTP id nv6csp505813pzb; Sun, 12 May 2024 03:04:53 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUasdvEG5ezbwiPDax8fxep6ujwgYOyoLisrUXMtkCONFad5okvMdUbIZQVtcNgrcoqETRkosr+FemAr4eSOw8brdvvwWz3D1PxYg== X-Google-Smtp-Source: AGHT+IFi10l6/4QKy7kPI0xODK6cGBP3wh9dksKq5sFXIVj6IzdABBNk+JXh4NImLInq6ebqwI3X X-Received: by 2002:a17:906:d8cd:b0:a59:dbb0:ddcf with SMTP id a640c23a62f3a-a5a5a60c1e4mr104805366b.0.1715508293318; Sun, 12 May 2024 03:04:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715508293; cv=none; d=google.com; s=arc-20160816; b=G8pk0e8zZMwpEHhPgA99em2Vg20KfpNrv/lNx1GRFiu1D0icN0rIQFVTFpJeFy9eUB Uk5OCzCYLrjKxNjZv0VwnIO7DeEhM0Ukjlr7q6osBg7FuKIxG3KCRmhpQCsw8Ou1aPsZ asB/JT2DPgbOAi+mfsTZLxWPzhncEwyfdgty2fVFzS0g7ToiuBFfxexeEULyTAjrn8W4 RwV8k4aptEUpsx0sbKn6UTNSi6CLR2ri8ZXyJppffa12c4Rzw/u9XFUS24XrRZ8EoAg3 RrCHRh8RV4BhqDSCCKkMe3R9ypnVKR4PoM9+i1/9i0dhVXTqVkpD4PKInS6BBWSE2eEl 0jhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=VKFAyAnBMolmjFEM7lOA7ZTXDnSrRmAs47jQhm2jIU8=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=cZYBUZt8h++Ul97b4xkARe6TXCHLu1ks06oxqnWJ0W/cyZIVhVF/fRKK/6dRWuYa29 M18sDeNd7f9XsEZihkUuEx00Ko8pBE6IBJP+sKYs//GWsr0mDBxn5qPOZn7GHxXUpjDd RIURLs3p/t7PkYwDcAKeDUbatoGcS6lv5jGNHsKACKVSLUue6sqwilMepSrj4RkM072G 53YJIu6U0KFlu29bOGwOnM8sU8ZwT557msGkUTAdFihG+C9rCmaU4B1QTls8G5FDC69f iCrysFJ44eO83K3hePcnIiUayX6AtMcpKu0ycnywCn6oOWM9MfDW1W9K4ssgYcFAnYFY QZzA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=m9aTRxIY; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17b21afesi377431166b.306.2024.05.12.03.04.52; Sun, 12 May 2024 03:04:53 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=m9aTRxIY; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 06C1B68D605; Sun, 12 May 2024 13:04:01 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-191.mail.qq.com (out203-205-221-191.mail.qq.com [203.205.221.191]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D8A0168D42D for ; Sun, 12 May 2024 13:03:46 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1715508214; bh=bjfzEE5kmuh1+26lMfUPH4O5zovI1I3YQLqqwWHLwHQ=; h=From:To:Cc:Subject:Date; b=m9aTRxIYeusalkctUfNNMAgsqfGPnAsyoJXdnMF57QkKXx4cNGeaP1elj5623Ajse j2zM9rv5YU5eruncrgwxUmPgRkXJViSohW79JZnnz1uF0tOMsO0plOaZpP9BweTKCe TrxoLz2n2TegalAu0oUd71o0zfRfv1syJkVBFdK0= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id E135E56; Sun, 12 May 2024 18:03:33 +0800 X-QQ-mid: xmsmtpt1715508213tkhh3nozi Message-ID: X-QQ-XMAILINFO: NXBlMEyQi3god1bR5LyJClz7bJ4V6ZFOm5Pk7PIMjZzaTY3/JPdAGMeyaleiqn Bp1KX0blfIkpaDsnlGq917Xk/M4tZbeUwCvzJArwWkuV96CvwbAcDw0bVIlxJfMQTiZrzdBwn4Ad j2H4PWQe69ll3406mLsSCYXOYj29iCwTOiNa2t3kffhq+UIQS2JgwsIdWkHE668g06F0nFMVfYgx znmv+/C32e9BL9ob5L0hrecsov3I8R3yAJYMXW713DXOhsdq1+aVhenrK2+Vt9c+eBLAI5oispw5 t61k08jCAzFBQtjvG3wv2mXFQCF4FPUX94HV2kuRdbQkHNoaWmG6ibACz2CNgzyT50PI2TQ649sH W91ecnbKchkbdNGldoDTUPor7QOrbWrL4XTInt1npJd6S6E0bsdiJXe+7qGW9pSqP2673nTSjDXf WvUZ9sDZEwEoysdUzFQhXijRA8ovq/V7+wREvvkFxW+QChzI3BAb73eBCAMfWF11rQTcfsbAPo6y A9pO2Ik4L4FDpZch/L/e89TnGRgp1hoF3Kzvbn5XD5Ne4boRkeXB983jat5cmjG0SrycJvZA9LkO uqXRVc3SOhSeanFCpiGXz8j4gYNTXrhDNy5T+fal9RI3PRY6ilgVFLbYo++Yt+Sug09F8VB/mUb6 ABrerKVBO0GAWzEgI2Ksx672erJaPppHQ8ouz90+U4/HPKL4RjUyTH8nIn1olkRjnF63kqgyi/Vm nAd6JMlAbXqkhuRphK+Q9WUU407XN0at/MyVO2KVQxJDRVb+pbeZ536MaYvcnNj/jvHKM8/CRhQk BqAUETaLUQDN1Pj5x1LNMgpDsX77aRA+Oi5UM3vb80IlsDqKEhKuuMBKmbOhgeTv+fXhPboauvnN WoxhZWxi+t0hHA3j2vLXZQ1NLZulguLcPrzMm7eOmUetgQ7tXjceM= X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 12 May 2024 18:03:23 +0800 X-OQ-MSGID: <20240512100331.995415-1-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 1/9] lavc/vp9dsp: R-V ipred vert X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: JsJMc1S5RFjk From: sunyuechi C908: vp9_vert_8x8_8bpp_c: 22.0 vp9_vert_8x8_8bpp_rvi: 15.7 vp9_vert_16x16_8bpp_c: 71.2 vp9_vert_16x16_8bpp_rvi: 39.0 vp9_vert_32x32_8bpp_c: 300.2 vp9_vert_32x32_8bpp_rvi: 135.2 --- libavcodec/riscv/Makefile | 1 + libavcodec/riscv/vp9_intra_rvi.S | 71 ++++++++++++++++++++++++++++++++ libavcodec/riscv/vp9dsp.h | 6 +++ libavcodec/riscv/vp9dsp_init.c | 63 ++++++++++++++++------------ 4 files changed, 114 insertions(+), 27 deletions(-) create mode 100644 libavcodec/riscv/vp9_intra_rvi.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 89273b1cad..ccd060c666 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -62,6 +62,7 @@ OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_init.o RV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvi.o RVV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvv.o OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9dsp_init.o +RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o diff --git a/libavcodec/riscv/vp9_intra_rvi.S b/libavcodec/riscv/vp9_intra_rvi.S new file mode 100644 index 0000000000..16b6bdb25a --- /dev/null +++ b/libavcodec/riscv/vp9_intra_rvi.S @@ -0,0 +1,71 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/riscv/asm.S" + +#if __riscv_xlen >= 64 +func ff_v_32x32_rvi + ld t0, (a3) + ld t1, 8(a3) + ld t2, 16(a3) + ld t3, 24(a3) + .rept 16 + add a7, a0, a1 + sd t0, (a0) + sd t1, 8(a0) + sd t2, 16(a0) + sd t3, 24(a0) + sh1add a0, a1, a0 + sd t0, (a7) + sd t1, 8(a7) + sd t2, 16(a7) + sd t3, 24(a7) + .endr + + ret +endfunc + +func ff_v_16x16_rvi + ld t0, (a3) + ld t1, 8(a3) + .rept 8 + add a7, a0, a1 + sd t0, (a0) + sd t1, 8(a0) + sh1add a0, a1, a0 + sd t0, (a7) + sd t1, 8(a7) + .endr + + ret +endfunc + +func ff_v_8x8_rvi + ld t0, (a3) + .rept 4 + add a7, a0, a1 + sd t0, (a0) + sh1add a0, a1, a0 + sd t0, (a7) + .endr + + ret +endfunc +#endif diff --git a/libavcodec/riscv/vp9dsp.h b/libavcodec/riscv/vp9dsp.h index 25047ed507..f8bc6563a5 100644 --- a/libavcodec/riscv/vp9dsp.h +++ b/libavcodec/riscv/vp9dsp.h @@ -60,6 +60,12 @@ void ff_dc_129_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); void ff_dc_129_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_v_32x32_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_v_16x16_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); +void ff_v_8x8_rvi(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, + const uint8_t *a); #define VP9_8TAP_RISCV_RVV_FUNC(SIZE, type, type_idx) \ void ff_put_8tap_##type##_##SIZE##h_rvv(uint8_t *dst, ptrdiff_t dststride, \ diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 69ab39004c..e377d377e3 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -24,38 +24,47 @@ #include "libavcodec/vp9dsp.h" #include "vp9dsp.h" -static av_cold void vp9dsp_intrapred_init_rvv(VP9DSPContext *dsp, int bpp) +static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) { - #if HAVE_RVV - int flags = av_get_cpu_flags(); +#if HAVE_RV + int flags = av_get_cpu_flags(); - if (bpp == 8 && flags & AV_CPU_FLAG_RVV_I64 && ff_get_rv_vlenb() >= 16) { - dsp->intra_pred[TX_8X8][DC_PRED] = ff_dc_8x8_rvv; - dsp->intra_pred[TX_8X8][LEFT_DC_PRED] = ff_dc_left_8x8_rvv; - dsp->intra_pred[TX_8X8][DC_127_PRED] = ff_dc_127_8x8_rvv; - dsp->intra_pred[TX_8X8][DC_128_PRED] = ff_dc_128_8x8_rvv; - dsp->intra_pred[TX_8X8][DC_129_PRED] = ff_dc_129_8x8_rvv; - dsp->intra_pred[TX_8X8][TOP_DC_PRED] = ff_dc_top_8x8_rvv; - } + if (bpp == 8 && (flags & AV_CPU_FLAG_RVI) && (flags & AV_CPU_FLAG_RVB_ADDR)) { +# if __riscv_xlen >= 64 + dsp->intra_pred[TX_32X32][VERT_PRED] = ff_v_32x32_rvi; + dsp->intra_pred[TX_16X16][VERT_PRED] = ff_v_16x16_rvi; + dsp->intra_pred[TX_8X8][VERT_PRED] = ff_v_8x8_rvi; +# endif + } +#if HAVE_RVV + if (bpp == 8 && flags & AV_CPU_FLAG_RVV_I64 && ff_get_rv_vlenb() >= 16) { + dsp->intra_pred[TX_8X8][DC_PRED] = ff_dc_8x8_rvv; + dsp->intra_pred[TX_8X8][LEFT_DC_PRED] = ff_dc_left_8x8_rvv; + dsp->intra_pred[TX_8X8][DC_127_PRED] = ff_dc_127_8x8_rvv; + dsp->intra_pred[TX_8X8][DC_128_PRED] = ff_dc_128_8x8_rvv; + dsp->intra_pred[TX_8X8][DC_129_PRED] = ff_dc_129_8x8_rvv; + dsp->intra_pred[TX_8X8][TOP_DC_PRED] = ff_dc_top_8x8_rvv; + } - if (bpp == 8 && flags & AV_CPU_FLAG_RVV_I32 && ff_get_rv_vlenb() >= 16) { - dsp->intra_pred[TX_32X32][DC_PRED] = ff_dc_32x32_rvv; - dsp->intra_pred[TX_16X16][DC_PRED] = ff_dc_16x16_rvv; - dsp->intra_pred[TX_32X32][LEFT_DC_PRED] = ff_dc_left_32x32_rvv; - dsp->intra_pred[TX_16X16][LEFT_DC_PRED] = ff_dc_left_16x16_rvv; - dsp->intra_pred[TX_32X32][DC_127_PRED] = ff_dc_127_32x32_rvv; - dsp->intra_pred[TX_16X16][DC_127_PRED] = ff_dc_127_16x16_rvv; - dsp->intra_pred[TX_32X32][DC_128_PRED] = ff_dc_128_32x32_rvv; - dsp->intra_pred[TX_16X16][DC_128_PRED] = ff_dc_128_16x16_rvv; - dsp->intra_pred[TX_32X32][DC_129_PRED] = ff_dc_129_32x32_rvv; - dsp->intra_pred[TX_16X16][DC_129_PRED] = ff_dc_129_16x16_rvv; - dsp->intra_pred[TX_32X32][TOP_DC_PRED] = ff_dc_top_32x32_rvv; - dsp->intra_pred[TX_16X16][TOP_DC_PRED] = ff_dc_top_16x16_rvv; - } - #endif + if (bpp == 8 && flags & AV_CPU_FLAG_RVV_I32 && ff_get_rv_vlenb() >= 16) { + dsp->intra_pred[TX_32X32][DC_PRED] = ff_dc_32x32_rvv; + dsp->intra_pred[TX_16X16][DC_PRED] = ff_dc_16x16_rvv; + dsp->intra_pred[TX_32X32][LEFT_DC_PRED] = ff_dc_left_32x32_rvv; + dsp->intra_pred[TX_16X16][LEFT_DC_PRED] = ff_dc_left_16x16_rvv; + dsp->intra_pred[TX_32X32][DC_127_PRED] = ff_dc_127_32x32_rvv; + dsp->intra_pred[TX_16X16][DC_127_PRED] = ff_dc_127_16x16_rvv; + dsp->intra_pred[TX_32X32][DC_128_PRED] = ff_dc_128_32x32_rvv; + dsp->intra_pred[TX_16X16][DC_128_PRED] = ff_dc_128_16x16_rvv; + dsp->intra_pred[TX_32X32][DC_129_PRED] = ff_dc_129_32x32_rvv; + dsp->intra_pred[TX_16X16][DC_129_PRED] = ff_dc_129_16x16_rvv; + dsp->intra_pred[TX_32X32][TOP_DC_PRED] = ff_dc_top_32x32_rvv; + dsp->intra_pred[TX_16X16][TOP_DC_PRED] = ff_dc_top_16x16_rvv; + } +#endif +#endif } av_cold void ff_vp9dsp_init_riscv(VP9DSPContext *dsp, int bpp, int bitexact) { - vp9dsp_intrapred_init_rvv(dsp, bpp); + vp9dsp_intrapred_init_riscv(dsp, bpp); }