From patchwork Wed Jul 10 20:52:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 50458 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:b968:0:b0:482:c625:d099 with SMTP id w8csp662110vqx; Wed, 10 Jul 2024 13:52:48 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUZfd2VX8eyWXpjIikf+64yNYpnb0NgthzdjhOnVQSIEZ6Zsb22BObug4KmKMHdxIzfmWh5yi4dXZE7ePQASxwqu3SGHOEW8iNVaQ== X-Google-Smtp-Source: AGHT+IELgHEQTLzHUZpSF7W3rAtkQE48xlwWEId1lMxLyuIbJGxm51z/9O9sM3sZGT2S4y/UuPx0 X-Received: by 2002:a17:906:260d:b0:a72:b287:1e04 with SMTP id a640c23a62f3a-a780b70106fmr395632466b.34.1720644767863; Wed, 10 Jul 2024 13:52:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1720644767; cv=none; d=google.com; s=arc-20160816; b=jiaC6KNYRcnWostRV/9hZqHryKgIs1hJR7H+DymMLl6SCvYCOmOCdq6tpRKvTD0XzR CglkJUu9E9BzKB+35NgQpy3L3eXEdAlvRyb+yClSZM1t3rkqxH+Xh6NrsxRNTiNn09+P arkkxQFzX/5FrrjxA9LHrWR0Z++M0m0ZBp1HxjFR9E6V435oGKCq46Y/D+96krglyXM5 ZxvFy6kmYRoFsgoXSNHvMGkg3/H9STe5X2y9NJ75J7m4185GXbC50Pu0KyylXOhZ1jh0 p7VdpxXy5BQdrK7aODuewUd3aZUghaUYn4J03mjJcYHF45qDA6GXdM/yKIFSzPnSumv8 9bCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=WWec4GSUW5jrpf7w7wKc1WNd24uXU3tSbRFjiiJHAOE=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=ONXwm90k69/fNf+ec3VBOg2AfPvVCkbAajTSkep3RZ1VcTiAVCI085e6inUHQ7oU7m 1XqOho+/9YlWjhHnpfUa03fUPPUZ1Cj7q1bp46INxbbyIw0FNbkIVLcIquflWWHyThJN +a8bLslnVkwTE6nLFnOTz0VUUXBEMswuDLoJdVHJBG1W/hC6U4moVrP43FRukZlalT0p ltB/xpDa8x73MRj9mcUzwiC6X8CA2VEpVJ2p575F66ZFxd6wtRh3rlJEEnaq+VejxRYr 7lcn6W9mSLzAXJ53jxXV7QpDrxgpRD3sjaG5bLY5PPDOz4Fa/5o2/i1ga9LlcXKxuJvk APiQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a780ab0da4esi244093066b.604.2024.07.10.13.52.47; Wed, 10 Jul 2024 13:52:47 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CE17168DB6D; Wed, 10 Jul 2024 23:52:36 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 09B5668D7A3 for ; Wed, 10 Jul 2024 23:52:27 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 48E8FC0099 for ; Wed, 10 Jul 2024 23:52:26 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 10 Jul 2024 23:52:25 +0300 Message-ID: <20240710205225.50112-2-remi@remlab.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240710205225.50112-1-remi@remlab.net> References: <20240710205225.50112-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] lavc/h264dsp: R-V V high-depth h264_idct_add X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 87v4dUKr6kol T-Head C908 (cycles): h264_idct4_add_9bpp_c: 248.2 h264_idct4_add_9bpp_rvv_i32: 128.7 h264_idct4_add_10bpp_c: 256.7 h264_idct4_add_10bpp_rvv_i32: 128.7 h264_idct4_add_12bpp_c: 252.5 h264_idct4_add_12bpp_rvv_i32: 129.7 h264_idct4_add_14bpp_c: 258.0 h264_idct4_add_14bpp_rvv_i32: 129.7 --- libavcodec/riscv/h264dsp_init.c | 19 +++++++++- libavcodec/riscv/h264idct_rvv.S | 63 +++++++++++++++++++++++++++++++++ 2 files changed, 81 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/h264dsp_init.c b/libavcodec/riscv/h264dsp_init.c index 5d1eddbab4..28e042e91f 100644 --- a/libavcodec/riscv/h264dsp_init.c +++ b/libavcodec/riscv/h264dsp_init.c @@ -52,6 +52,11 @@ void ff_h264_idct8_add4_8_rvv(uint8_t *dst, const int *blockoffset, int16_t *block, int stride, const uint8_t nnzc[5 * 8]); +void ff_h264_idct_add_9_rvv(uint8_t *dst, int16_t *block, int stride); +void ff_h264_idct_add_10_rvv(uint8_t *dst, int16_t *block, int stride); +void ff_h264_idct_add_12_rvv(uint8_t *dst, int16_t *block, int stride); +void ff_h264_idct_add_14_rvv(uint8_t *dst, int16_t *block, int stride); + extern int ff_startcode_find_candidate_rvb(const uint8_t *, int); extern int ff_startcode_find_candidate_rvv(const uint8_t *, int); @@ -65,7 +70,9 @@ av_cold void ff_h264dsp_init_riscv(H264DSPContext *dsp, const int bit_depth, dsp->startcode_find_candidate = ff_startcode_find_candidate_rvb; # if HAVE_RVV if (flags & AV_CPU_FLAG_RVV_I32) { - if (bit_depth == 8 && ff_rv_vlen_least(128)) { + const bool zvl128b = ff_rv_vlen_least(128); + + if (bit_depth == 8 && zvl128b) { for (int i = 0; i < 4; i++) { dsp->weight_h264_pixels_tab[i] = ff_h264_weight_funcs_8_rvv[i].weight; @@ -86,6 +93,16 @@ av_cold void ff_h264dsp_init_riscv(H264DSPContext *dsp, const int bit_depth, dsp->h264_idct8_add4 = ff_h264_idct8_add4_8_rvv; # endif } + + if (bit_depth == 9 && zvl128b) + dsp->h264_idct_add = ff_h264_idct_add_9_rvv; + if (bit_depth == 10 && zvl128b) + dsp->h264_idct_add = ff_h264_idct_add_10_rvv; + if (bit_depth == 12 && zvl128b) + dsp->h264_idct_add = ff_h264_idct_add_12_rvv; + if (bit_depth == 14 && zvl128b) + dsp->h264_idct_add = ff_h264_idct_add_14_rvv; + dsp->startcode_find_candidate = ff_startcode_find_candidate_rvv; } # endif diff --git a/libavcodec/riscv/h264idct_rvv.S b/libavcodec/riscv/h264idct_rvv.S index 6f17df66cc..23dcf45437 100644 --- a/libavcodec/riscv/h264idct_rvv.S +++ b/libavcodec/riscv/h264idct_rvv.S @@ -105,6 +105,69 @@ func ff_h264_idct_add_8_rvv, zve32x ret endfunc +func ff_h264_idct_add_16_rvv, zve32x + csrwi vxrm, 0 + vsetivli zero, 4, e32, m1, ta, ma + addi t1, a1, 1 * 4 * 4 + vle32.v v0, (a1) + addi t2, a1, 2 * 4 * 4 + vle32.v v1, (t1) + addi t3, a1, 3 * 4 * 4 + vle32.v v2, (t2) + vle32.v v3, (t3) + jal t0, ff_h264_idct4_rvv + vse32.v v0, (a1) + vse32.v v1, (t1) + vse32.v v2, (t2) + vse32.v v3, (t3) + vlseg4e32.v v0, (a1) + .equ offset, 0 + .rept 512 / __riscv_xlen + sx zero, offset(a1) + .equ offset, offset + (__riscv_xlen / 8) + .endr + jal t0, ff_h264_idct4_rvv + add t1, a0, a2 + vle16.v v4, (a0) + add t2, t1, a2 + vle16.v v5, (t1) + add t3, t2, a2 + vle16.v v6, (t2) + vle16.v v7, (t3) + .irp n,0,1,2,3 + vssra.vi v\n, v\n, 6 + .endr + vsetvli zero, zero, e16, mf2, ta, ma + vwaddu.wv v0, v0, v4 + vwaddu.wv v1, v1, v5 + vwaddu.wv v2, v2, v6 + vwaddu.wv v3, v3, v7 + vsetvli zero, zero, e32, m1, ta, ma + .irp n,0,1,2,3 + vmax.vx v\n, v\n, zero + .endr + .irp n,0,1,2,3 + vmin.vx v\n, v\n, a3 + .endr + vsetvli zero, zero, e16, mf2, ta, ma + vncvt.x.x.w v4, v0 + vncvt.x.x.w v5, v1 + vncvt.x.x.w v6, v2 + vncvt.x.x.w v7, v3 + vse16.v v4, (a0) + vse16.v v5, (t1) + vse16.v v6, (t2) + vse16.v v7, (t3) + ret +endfunc + +.irp depth, 9, 10, 12, 14 +func ff_h264_idct_add_\depth\()_rvv, zve32x + li a3, (1 << \depth) - 1 + j ff_h264_idct_add_16_rvv +endfunc +.endr + .variant_cc ff_h264_idct8_rvv func ff_h264_idct8_rvv, zve32x vsra.vi v9, v7, 1