From patchwork Fri Oct 27 19:25:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 44391 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:dd83:b0:15d:8365:d4b8 with SMTP id kw3csp69201pzb; Fri, 27 Oct 2023 12:25:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG2J6bAWGZML7myu63S6N+/VYM4TPGAH0OxrSbPaxDzfIkKj9yikqPv8jRU3ZEUWElDClYE X-Received: by 2002:a05:6402:54f:b0:53f:a4f7:7bfb with SMTP id i15-20020a056402054f00b0053fa4f77bfbmr2827732edx.17.1698434751391; Fri, 27 Oct 2023 12:25:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698434751; cv=none; d=google.com; s=arc-20160816; b=xsKEUF7P96akWWgNsZ/ef/76q1KKNTQh6SYUxIzW/oDuT/ZV7D7IAqvH4PYDyMFH80 V7pLg1JdWYjRmahRD0TCv2xfkDao8pesWxpzgaC2Vv/405FGe1lIz0fgUcTXqQ0tNF2e y1fAvS+R4yUkR0npIhyYZfXexVmNgpCcGcL+LCiXzW+0DHejEOHZb0xWgJkp2FHwNF0C teCb6TWNdiszzgUAHR2/TiGry+ksgvphexD9QO9jQNjRilGufHUz2ecwwldpr8L+VfkN kHRB6s1/8RYnDjW7eEv0NgHud4aF0lzB2P395Vj2cjSD2wGS29+nOk48eqd3cjKcpHf8 /Upg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=49xaNXBvAQjHgLvYGtJc9KmdosmNs79EC5OvHU55Gy8=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=p0zOfTof/zb6r8ArVUNNs3DS8DwbL0Yh9GG4wXSgX4uz1dbzDC/Rx7J8qxXNZ/0sbU lMGa1t7abQyGH83kz0pbLevzyo3wj72L+h8RX0M/BcmuxmS4poOwCdUW0LvJi4CY4IW9 FLJGTTvw2Kf4kg02BJ9yo2m4aDe9BtzGRMAcytUzDwy0CnW9mhj807k6unl6BTWiDd/D KtlQD9L0eUJFi0SUkvt6PjaMWfGHzVPPgak4tVFpw9R3BCiiWzJEaszSSsO4rZ6RDuoX NZIpF+CV+LeurTx8iQ162uPuTP+r0phIqLgcvpPqO/LkEa19rdsFBdXKbZFk4SRmcySe yNaw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id f17-20020a0564021e9100b00533c5d23f39si1119619edf.399.2023.10.27.12.25.51; Fri, 27 Oct 2023 12:25:51 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9343C68CAE7; Fri, 27 Oct 2023 22:25:47 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E805068CACA for ; Fri, 27 Oct 2023 22:25:40 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 7A6C2C0017 for ; Fri, 27 Oct 2023 22:25:40 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Fri, 27 Oct 2023 22:25:36 +0300 Message-ID: <20231027192540.27373-2-remi@remlab.net> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/6] lavc/pixblockdsp: aligned R-V V 8-bit functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: bNQE8n75Al+X If the scan lines are aligned, we can load each row as a 64-bit value, thus avoiding segmentation. And then we can factor the conversion or subtraction. In principle, the same optimisation should be possible for high depth, but would require 128-bit elements, for which no FFmpeg CPU flag exists. --- libavcodec/riscv/pixblockdsp_init.c | 11 +++++++++++ libavcodec/riscv/pixblockdsp_rvv.S | 21 +++++++++++++++++++++ 2 files changed, 32 insertions(+) diff --git a/libavcodec/riscv/pixblockdsp_init.c b/libavcodec/riscv/pixblockdsp_init.c index 8f24281217..7d259a032f 100644 --- a/libavcodec/riscv/pixblockdsp_init.c +++ b/libavcodec/riscv/pixblockdsp_init.c @@ -32,10 +32,14 @@ void ff_get_pixels_8_rvi(int16_t *block, const uint8_t *pixels, void ff_get_pixels_16_rvi(int16_t *block, const uint8_t *pixels, ptrdiff_t stride); +void ff_get_pixels_8_rvv(int16_t *block, const uint8_t *pixels, + ptrdiff_t stride); void ff_get_pixels_unaligned_8_rvv(int16_t *block, const uint8_t *pixels, ptrdiff_t stride); void ff_get_pixels_unaligned_16_rvv(int16_t *block, const uint8_t *pixels, ptrdiff_t stride); +void ff_diff_pixels_rvv(int16_t *block, const uint8_t *s1, + const uint8_t *s2, ptrdiff_t stride); void ff_diff_pixels_unaligned_rvv(int16_t *block, const uint8_t *s1, const uint8_t *s2, ptrdiff_t stride); @@ -64,6 +68,13 @@ av_cold void ff_pixblockdsp_init_riscv(PixblockDSPContext *c, c->diff_pixels = ff_diff_pixels_unaligned_rvv; c->diff_pixels_unaligned = ff_diff_pixels_unaligned_rvv; + + if (cpu_flags & AV_CPU_FLAG_RVV_I64) { + if (!high_bit_depth) + c->get_pixels = ff_get_pixels_8_rvv; + + c->diff_pixels = ff_diff_pixels_rvv; + } } #endif } diff --git a/libavcodec/riscv/pixblockdsp_rvv.S b/libavcodec/riscv/pixblockdsp_rvv.S index e3a2fcc6ef..80c7415acf 100644 --- a/libavcodec/riscv/pixblockdsp_rvv.S +++ b/libavcodec/riscv/pixblockdsp_rvv.S @@ -20,6 +20,16 @@ #include "libavutil/riscv/asm.S" +func ff_get_pixels_8_rvv, zve64x + vsetivli zero, 8, e8, mf2, ta, ma + li t0, 8 * 8 + vlse64.v v16, (a1), a2 + vsetvli zero, t0, e8, m4, ta, ma + vwcvtu.x.x.v v8, v16 + vse16.v v8, (a0) + ret +endfunc + func ff_get_pixels_unaligned_8_rvv, zve32x vsetivli zero, 8, e8, mf2, ta, ma vlsseg8e8.v v16, (a1), a2 @@ -42,6 +52,17 @@ func ff_get_pixels_unaligned_16_rvv, zve32x ret endfunc +func ff_diff_pixels_rvv, zve64x + vsetivli zero, 8, e8, mf2, ta, ma + li t0, 8 * 8 + vlse64.v v16, (a1), a3 + vlse64.v v24, (a2), a3 + vsetvli zero, t0, e8, m4, ta, ma + vwsubu.vv v8, v16, v24 + vse16.v v8, (a0) + ret +endfunc + func ff_diff_pixels_unaligned_rvv, zve32x vsetivli zero, 8, e8, mf2, ta, ma vlsseg8e8.v v16, (a1), a3