From patchwork Wed Nov 1 21:03:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 44479 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4e15:b0:181:818d:5e7f with SMTP id gk21csp174402pzb; Wed, 1 Nov 2023 14:03:47 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEfCwle20vycdc6k/o9QM+1oSKU+PPRGa4ptx4NjaWjfkKHqcs8Cd4BlwAdhIeqi6jy93yd X-Received: by 2002:a50:c35b:0:b0:543:65ab:2f09 with SMTP id q27-20020a50c35b000000b0054365ab2f09mr5064686edb.34.1698872627493; Wed, 01 Nov 2023 14:03:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698872627; cv=none; d=google.com; s=arc-20160816; b=o3VG0T/AqWS9ogeIMMYz4oiyBZNtSPLau5zhsT9ARD6CReyk80mYj7l8uiYXPcJuFr oo72vK+4PzX/0at8yy5TuqJzO0eRCG7NvZu8r7ipO5vIZRT9bDsDUhAp9CgSXXHCai2H cy81NglYGMdJzUge4/UW0mSDhVtUqRoqP028LzFVQ5o56IfNWFpKZ59KzNL9nO21IgXI epQibpPU75D8RjYVUne7n8cAY2ffUehnG5Oom7H4wFySc0LpIrGQRJEVav1dAhs8wUQa u0JK0WpLBL+pz9Dt0Rf3ICgfM5pOo4Govbswa/IsbV2l3O6flu+LI6KdLG5Wqafdv+m9 LeQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=FfdlblPpKlA7iksiaVP+5Nk35vgMNdjcZ2FlS+aaHOc=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=zz1DmovLjRaMzmC7mkzL/HWaZIOQnLYZVensY2wEoVD51Png/OPi+POQBa5uWJBzZa /yQlS72uolT0z5CIwKNeV/RPg5TrRp977E/ZjzVaN3660+vIvFgz5+bdCXhhoPdOyWjB CMr6CJUnVpQOxcFAD96FXbJfx5kvFxyvqcH9iUIUe5HYF3wkFNBPe1tix26rO0q0OkHh 2ZYzg+MiTfJyVamnQG1r4nKtKiYaGYQd934PRMgvu+NkragYeqN2usFqN0cU+KZ5E5Ma X6zjR8ZT00iHg0f99PZ9iNRO61AjXlaTN8Z72x0V0Cuoybjo1SVv0DO45ExaOkDnIaUx b12A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id h25-20020a50cdd9000000b0053dd751f3c9si1044739edj.553.2023.11.01.14.03.39; Wed, 01 Nov 2023 14:03:47 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CF82868CA71; Wed, 1 Nov 2023 23:03:36 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A7B7668CBB2 for ; Wed, 1 Nov 2023 23:03:29 +0200 (EET) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 390A8C006A for ; Wed, 1 Nov 2023 23:03:29 +0200 (EET) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 1 Nov 2023 23:03:29 +0200 Message-ID: <20231101210329.499350-1-remi@remlab.net> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] lavc/pixblockdsp: rework R-V V get_pixels_unaligned X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: vXUu/IX/XHS5 As in the aligned case, we can use VLSE64.V, though the way of doing so gets more convoluted, so the performance gains are more modest: get_pixels_unaligned_c: 126.7 get_pixels_unaligned_rvv_i32: 145.5 (before) get_pixels_unaligned_rvv_i64: 62.2 (after) For the reference, those are the aligned benchmarks (unchanged) on the same T-Head C908 hardware: get_pixels_c: 126.7 get_pixels_rvi: 85.7 get_pixels_rvv_i64: 33.2 --- libavcodec/riscv/pixblockdsp_init.c | 17 +++++++---------- libavcodec/riscv/pixblockdsp_rvv.S | 28 +++++++++++++++++----------- 2 files changed, 24 insertions(+), 21 deletions(-) diff --git a/libavcodec/riscv/pixblockdsp_init.c b/libavcodec/riscv/pixblockdsp_init.c index 6b1efd16f8..3c623a9473 100644 --- a/libavcodec/riscv/pixblockdsp_init.c +++ b/libavcodec/riscv/pixblockdsp_init.c @@ -56,20 +56,17 @@ av_cold void ff_pixblockdsp_init_riscv(PixblockDSPContext *c, #if HAVE_RVV if ((cpu_flags & AV_CPU_FLAG_RVV_I32) && ff_get_rv_vlenb() >= 16) { - if (!high_bit_depth) { - c->get_pixels = ff_get_pixels_unaligned_8_rvv; - c->get_pixels_unaligned = ff_get_pixels_unaligned_8_rvv; - } - c->diff_pixels = ff_diff_pixels_unaligned_rvv; c->diff_pixels_unaligned = ff_diff_pixels_unaligned_rvv; + } - if (cpu_flags & AV_CPU_FLAG_RVV_I64) { - if (!high_bit_depth) - c->get_pixels = ff_get_pixels_8_rvv; - - c->diff_pixels = ff_diff_pixels_rvv; + if ((cpu_flags & AV_CPU_FLAG_RVV_I64) && ff_get_rv_vlenb() >= 16) { + if (!high_bit_depth) { + c->get_pixels = ff_get_pixels_8_rvv; + c->get_pixels_unaligned = ff_get_pixels_unaligned_8_rvv; } + + c->diff_pixels = ff_diff_pixels_rvv; } #endif } diff --git a/libavcodec/riscv/pixblockdsp_rvv.S b/libavcodec/riscv/pixblockdsp_rvv.S index 7e35fc5b46..4213cd1b85 100644 --- a/libavcodec/riscv/pixblockdsp_rvv.S +++ b/libavcodec/riscv/pixblockdsp_rvv.S @@ -23,6 +23,7 @@ func ff_get_pixels_8_rvv, zve64x vsetivli zero, 8, e8, mf2, ta, ma li t0, 8 * 8 +1: vlse64.v v16, (a1), a2 vsetvli zero, t0, e8, m4, ta, ma vwcvtu.x.x.v v8, v16 @@ -30,18 +31,23 @@ func ff_get_pixels_8_rvv, zve64x ret endfunc -func ff_get_pixels_unaligned_8_rvv, zve32x - vsetivli zero, 8, e8, mf2, ta, ma - vlsseg8e8.v v16, (a1), a2 +func ff_get_pixels_unaligned_8_rvv, zve64x + andi t1, a1, 7 + vsetivli zero, 8, e64, m4, ta, ma + li t0, 8 * 8 + beqz t1, 1b + andi a1, a1, -8 + slli t2, t1, 3 + addi t1, a1, 8 + sub t3, t0, t2 + vlse64.v v16, (a1), a2 + vlse64.v v24, (t1), a2 + vsrl.vx v16, v16, t2 + vsll.vx v24, v24, t3 + vor.vv v16, v16, v24 + vsetvli zero, t0, e8, m4, ta, ma vwcvtu.x.x.v v8, v16 - vwcvtu.x.x.v v9, v17 - vwcvtu.x.x.v v10, v18 - vwcvtu.x.x.v v11, v19 - vwcvtu.x.x.v v12, v20 - vwcvtu.x.x.v v13, v21 - vwcvtu.x.x.v v14, v22 - vwcvtu.x.x.v v15, v23 - vsseg8e16.v v8, (a0) + vse16.v v8, (a0) ret endfunc