From patchwork Sun May 26 07:29:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 49267 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:542:0:b0:460:55fa:d5ed with SMTP id 63csp2626917vqf; Sun, 26 May 2024 00:29:16 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVRiFsrywaTMazGXv40oOVDtv9x+tPDUWm2tp7N/RO7qSx1p1wEcgEBVs9qqVUrQqtMxBvzdnwEMo4PvFANT5D/BLpb5z37Ih4bnA== X-Google-Smtp-Source: AGHT+IHk50tnQZOXJZBZ2inmx9BJRrdeXpVbUB871Uc3NDRY/UzjsfYxP3R1fT22o6AcMJyI6qzM X-Received: by 2002:ac2:5932:0:b0:521:b1ca:9c99 with SMTP id 2adb3069b0e04-529645e2556mr3864473e87.5.1716708555749; Sun, 26 May 2024 00:29:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716708555; cv=none; d=google.com; s=arc-20160816; b=im3wG0Om40VO3XRD0DnA3dtS8zYDNl6ADUrPGnroIeoVv84ZgQogO5DJ2uH+BIY6oV fgDR0oZMqTLAkwaNI7n8UO6r4Fcs9Za6LGgfljBjpiWPWI8Go/jF3oCmiD1KQf/fvEDa P6KUKVGWi10tnp/OgEJn9kG2aOW2A/rJOiCni6fjAFcUWbZZ1VTWt1CCFHm0ldc5RLH5 TyEiBW3ZOR9TOzj6FshB4UyF+y7eGSgkUjmboeRu1NE2RON6aopo2WiH0+fmv988GhhB m65dW/ybydvcYuF8ptLzTbckZ3PDDDRVaFyKJi2fLsxrTQnjan+lO4IJBd0pRSiYcEL7 pQBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=jTJT31Qeh/qA75uHUcbfwA3VviGEW2e+Kki9HzWFZiA=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=LUEFBkajlrZF9A6vcewwZrsQJ+XoMqTl1iXcuQOp5/kb6GUMLjWRkLQ/OQImwzYmiQ pb+2DqtKS/5Hxx/GLQUg7QEzWkapItJmNJcger/SKkM5mQhghgI5qGmKbcS8H7uNbbks pFV5OcdRGTZRfwI463NSqvTKVxFvlb3yXrSXTAeYqtJ8p7in0AdOBhCaM4i1IbICFJQ1 tBSGa/KIPEmJkNAnwgPPKEoaqG9xsLwOjYF9zWsTTl+Ol64Gexitk89FrYB+jVflLlSM IK8krks7bWZixQH5UnGYjkMgr12K8ND7pGzaka+8yX+etBnjjcJdHi/j7aP1PmYEr6/F JvWA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-5297094fb41si1647474e87.448.2024.05.26.00.29.15; Sun, 26 May 2024 00:29:15 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E1CF368D4AE; Sun, 26 May 2024 10:29:10 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6A78F68D231 for ; Sun, 26 May 2024 10:29:03 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 88820C006F for ; Sun, 26 May 2024 10:29:02 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Sun, 26 May 2024 10:29:02 +0300 Message-ID: <20240526072902.10274-1-remi@remlab.net> X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V vp8_luma_dc_wht X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 33/2Wo+oBHQl This is not great as transposition is poorly supported, but it works: vp8_luma_dc_wht_c: 2.5 vp8_luma_dc_wht_rvv_i32: 1.7 --- libavcodec/riscv/vp8dsp_init.c | 2 ++ libavcodec/riscv/vp8dsp_rvv.S | 55 ++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+) diff --git a/libavcodec/riscv/vp8dsp_init.c b/libavcodec/riscv/vp8dsp_init.c index 2413fbf449..d48fe08560 100644 --- a/libavcodec/riscv/vp8dsp_init.c +++ b/libavcodec/riscv/vp8dsp_init.c @@ -26,6 +26,7 @@ #include "libavcodec/vp8dsp.h" #include "vp8dsp.h" +void ff_vp8_luma_dc_wht_rvv(int16_t block[4][4][16], int16_t dc[16]); void ff_vp8_idct_dc_add_rvv(uint8_t *dst, int16_t block[16], ptrdiff_t stride); void ff_vp8_idct_dc_add4y_rvv(uint8_t *dst, int16_t block[4][16], ptrdiff_t stride); void ff_vp8_idct_dc_add4uv_rvv(uint8_t *dst, int16_t block[4][16], ptrdiff_t stride); @@ -110,6 +111,7 @@ av_cold void ff_vp8dsp_init_riscv(VP8DSPContext *c) int flags = av_get_cpu_flags(); if (flags & AV_CPU_FLAG_RVV_I32 && ff_rv_vlen_least(128)) { + c->vp8_luma_dc_wht = ff_vp8_luma_dc_wht_rvv; c->vp8_idct_dc_add = ff_vp8_idct_dc_add_rvv; c->vp8_idct_dc_add4y = ff_vp8_idct_dc_add4y_rvv; if (flags & AV_CPU_FLAG_RVB_ADDR) { diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S index 21c8985b04..f57852c82b 100644 --- a/libavcodec/riscv/vp8dsp_rvv.S +++ b/libavcodec/riscv/vp8dsp_rvv.S @@ -32,6 +32,61 @@ .endif .endm +#if __riscv_xlen >= 64 +func ff_vp8_luma_dc_wht_rvv, zve64x + vsetivli zero, 1, e64, m1, ta, ma + vlseg4e64.v v4, (a1) + vsetivli zero, 4, e16, mf2, ta, ma + vwadd.vv v1, v5, v6 + addi t1, sp, -48 + vwadd.vv v0, v4, v7 + addi t2, sp, -32 + vwsub.vv v2, v5, v6 + addi t3, sp, -16 + vwsub.vv v3, v4, v7 + addi sp, sp, -64 + vsetvli zero, zero, e32, m1, ta, ma + vadd.vv v4, v0, v1 + vadd.vv v5, v3, v2 + vse32.v v4, (sp) + vsub.vv v6, v0, v1 + vse32.v v5, (t1) + vsub.vv v7, v3, v2 + vse32.v v6, (t2) + vse32.v v7, (t3) + vlseg4e32.v v4, (sp) + vadd.vv v0, v4, v7 + sd zero, (a1) + vadd.vv v1, v5, v6 + sd zero, 8(a1) + vsub.vv v2, v5, v6 + sd zero, 16(a1) + vsub.vv v3, v4, v7 + sd zero, 24(a1) + vadd.vi v0, v0, 3 # rounding mode not supported, do it manually + li t0, 4 * 16 * 2 + vadd.vi v3, v3, 3 + addi t1, a0, 16 * 2 + vadd.vv v4, v0, v1 + addi t2, a0, 16 * 2 * 2 + vadd.vv v5, v3, v2 + addi t3, a0, 16 * 2 * 3 + vsub.vv v6, v0, v1 + vsub.vv v7, v3, v2 + vsetvli zero, zero, e16, mf2, ta, ma + vnsra.wi v0, v4, 3 + addi sp, sp, 64 + vnsra.wi v1, v5, 3 + vsse16.v v0, (a0), t0 + vnsra.wi v2, v6, 3 + vsse16.v v1, (t1), t0 + vnsra.wi v3, v7, 3 + vsse16.v v2, (t2), t0 + vsse16.v v3, (t3), t0 + ret +endfunc +#endif + .macro vp8_idct_dc_add vlse32.v v0, (a0), a2 lh a5, 0(a1)