From patchwork Wed Jul 3 10:47:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 50302 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:cc64:0:b0:482:c625:d099 with SMTP id k4csp2988479vqv; Wed, 3 Jul 2024 03:47:48 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVTYIkdeKRdTUNpABhl7sequsw2/yjSSAnhmalDKRVUhbn2P8rb2x3xuhIPaWeVl6fWPFOtkdpq1UNsDuuV1QBXUZ7caXtrBoLh0A== X-Google-Smtp-Source: AGHT+IFR4frUCri7lQge6nHu6KcpdFmKFFpqspuk51L2EJ7QqFHMCzWP+ayP8bD+Vcu76uRrqnHD X-Received: by 2002:a17:907:7e90:b0:a72:7bf4:6954 with SMTP id a640c23a62f3a-a751441edfemr975385366b.10.1720003667829; Wed, 03 Jul 2024 03:47:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1720003667; cv=none; d=google.com; s=arc-20160816; b=EuVZ1mNexJ0myFhg7MZyb9NkM6Kl5Wzfh2gKszim9boJRaTiYiuoIp+M2nNZyqi+wM 570I4tlRR1D+Nk3ZbXl9KVb6RgIlYpkgMIMSzj0jS7EAIQG6fwm4fZEGxXGignKgsEYq YtGdpl7yzYmrrTr/SPl6bNbeAzBrzJDspvenkv0GqNOoEYQjtKxSQsa50OUqgR5BcbWr nQitB3z7jUTr3Dr/DV28x8SwwQFh3fvkGQ1zJRnbd3xsT20B0Q0Y/7baymjh6BAWpMQZ e+gnaV38STLfMnp0v6g9pJezuFn21dfUyt6WwzkvEQBSQDHdWU0QPm+sIuZgbhcuXgI+ PjlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from:feedback-id :dkim-signature:dkim-signature:delivered-to; bh=jsJzOmDE6nf7RUWSISu/HCseKaFSIplUieF0AR2zwqs=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=wLUU+vaP5ItodVc9akOFggJ25w2y8un5m5kpqIj89LRGQOZrHE4C1H2yQ5vsurtiEM EtHpkPUYmytmF6uN/5CKG7kVzIzMxx3dkul6dQ3EDFJSeu7p9xUqGeKwQeUTvzwfECCp UZNqSh8FmAujqIPZRZ/k6ZXX8K3RNhO+GXkW6ey9WeihQty9wV3QDMy+xfMc8Qx5cwlP Fil9/cqD7etlx/uSLaHjyG76lABh6rIPMXT8TBe3LUZQdDi3W8kBHaT7X1CLUHw8sleo zC0nz18/fFrt8qqKvC9nI1/vHzTIQXgsrHuQzzYQpFo0ErYU1CNxNJpg9I2rFwZlluJd REDA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm3 header.b=i1gsJTk3; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm2 header.b=YU1g4EVo; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a72aaef5c77si541700566b.120.2024.07.03.03.47.46; Wed, 03 Jul 2024 03:47:47 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm3 header.b=i1gsJTk3; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm2 header.b=YU1g4EVo; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5850D68D8AF; Wed, 3 Jul 2024 13:47:43 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from fout6-smtp.messagingengine.com (fout6-smtp.messagingengine.com [103.168.172.149]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C5C7368D941 for ; Wed, 3 Jul 2024 13:47:34 +0300 (EEST) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailfout.nyi.internal (Postfix) with ESMTP id 8012113805F0 for ; Wed, 3 Jul 2024 06:47:33 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Wed, 03 Jul 2024 06:47:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:message-id:mime-version:reply-to:subject :subject:to:to; s=fm3; t=1720003653; x=1720090053; bh=jKX+ZH/WPB oc/wXVzOrGnMvT51Gv3iiHQmVI5HY9rEQ=; b=i1gsJTk3AlgYb4WCOWV//snCj3 ULb4v1s3vCZ3lpgLFfjIcz36Z4vgn4/dRFXyDhDVJSPZ96FRQ294WHUlq7tfouia 48oXpDdaHIHz8a8YZ5iOmGfqkjQf1hv3sYUj00whgwXcj4H/zrxOXFdAQ4YYlw6A n5gBY/vN2at/R2fXBxZGIWAEo9kSWxqTziiUM2NwOeqGtJskHRNz2YXTa7VuyKB/ aoSFh7oeITveTdwF0kMBgaRXIdHeb5pSHq2/RZyjbROqsDzGwZTwmLP6Sd022Pob zFDiA6j142NqIkERBRKyFINWovCLKT1/TQG6c93BECMbM4AkHNk8TRNY890A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:message-id:mime-version:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1720003653; x=1720090053; bh=jKX+ZH/WPBoc/wXVzOrGnMvT51Gv 3iiHQmVI5HY9rEQ=; b=YU1g4EVoe0N6FulnjSqcCs/Og7SGITvFWSQ8sCnbKlzy GadMsbt9MCkyUQJRWHXqgWnKxoRWu6Hr8JgeX6g7t6+1cENKmrQr2MfzHCppKcaB QjzzZBODJWeYVdaP/1ETMczPLlOKJnQs/BN25WNRtezde7d6OVdo936HEa7fPZfk UIRR9GVPdXGJe3QUM/OPEqpV0GMuLiWoC3X2PmxHq8THqu75aW8Z228KZmkaXvP1 2jHTtluQ8STUGCk9yZ94frVWkgs7eFTAhOu0tGGisilqZu0OGIvH+Hel4x5rINL6 Jw+HDmyjvw1VF7RFMdR7f2A0XawMzoO/cNQh9PcxdA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrudejgdefvdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffogggtgfesthekredtre dtjeenucfhrhhomhepfdflrdcuffgvkhhkvghrfdcuoehjuggvkhesihhtrghnihhmuhhl rdhliheqnecuggftrfgrthhtvghrnhepkeeuvddvtdffledugfejfefgiefftdethefhte egvedtgfetveejgffhhfejffdunecuffhomhgrihhnpehhvdeigegushhppghrvhhvrdhs sgenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehjug gvkhesihhtrghnihhmuhhlrdhlih X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Wed, 3 Jul 2024 06:47:32 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Wed, 3 Jul 2024 12:47:29 +0200 Message-ID: <20240703104730.883009-1-jdek@itanimul.li> X-Mailer: git-send-email 2.44.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2] avcodec/riscv: add h264 dc idct rvv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 8h/PHTjUTjTN checkasm: bench runs 131072 (1 << 17) h264_idct4_add_dc_8bpp_c: 1.5 h264_idct4_add_dc_8bpp_rvv_i64: 0.7 h264_idct4_add_dc_9bpp_c: 1.5 h264_idct4_add_dc_9bpp_rvv_i64: 0.7 h264_idct4_add_dc_10bpp_c: 1.5 h264_idct4_add_dc_10bpp_rvv_i64: 0.7 h264_idct4_add_dc_12bpp_c: 1.2 h264_idct4_add_dc_12bpp_rvv_i64: 0.7 h264_idct4_add_dc_14bpp_c: 1.2 h264_idct4_add_dc_14bpp_rvv_i64: 0.7 h264_idct8_add_dc_8bpp_c: 5.2 h264_idct8_add_dc_8bpp_rvv_i64: 1.5 h264_idct8_add_dc_9bpp_c: 5.5 h264_idct8_add_dc_9bpp_rvv_i64: 1.2 h264_idct8_add_dc_10bpp_c: 5.5 h264_idct8_add_dc_10bpp_rvv_i64: 1.2 h264_idct8_add_dc_12bpp_c: 4.2 h264_idct8_add_dc_12bpp_rvv_i64: 1.2 h264_idct8_add_dc_14bpp_c: 4.2 h264_idct8_add_dc_14bpp_rvv_i64: 1.2 Signed-off-by: J. Dekker --- rdcycle always returns 0 on my board, clock_gettime() seems as noisy as rdtime (just with bigger numbers). libavcodec/riscv/Makefile | 1 + libavcodec/riscv/h264dsp_init.c | 42 +++++++- libavcodec/riscv/h264dsp_rvv.S | 176 ++++++++++++++++++++++++++++++++ 3 files changed, 216 insertions(+), 3 deletions(-) create mode 100644 libavcodec/riscv/h264dsp_rvv.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index c180223141..a1510e8c6e 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -31,6 +31,7 @@ RVV-OBJS-$(CONFIG_H263DSP) += riscv/h263dsp_rvv.o OBJS-$(CONFIG_H264CHROMA) += riscv/h264_chroma_init_riscv.o RVV-OBJS-$(CONFIG_H264CHROMA) += riscv/h264_mc_chroma.o OBJS-$(CONFIG_H264DSP) += riscv/h264dsp_init.o +RVV-OBJS-$(CONFIG_H264DSP) += riscv/h264dsp_rvv.o OBJS-$(CONFIG_HUFFYUV_DECODER) += riscv/huffyuvdsp_init.o RVV-OBJS-$(CONFIG_HUFFYUV_DECODER) += riscv/huffyuvdsp_rvv.o OBJS-$(CONFIG_IDCTDSP) += riscv/idctdsp_init.o diff --git a/libavcodec/riscv/h264dsp_init.c b/libavcodec/riscv/h264dsp_init.c index dbbf3db400..8c77303ec6 100644 --- a/libavcodec/riscv/h264dsp_init.c +++ b/libavcodec/riscv/h264dsp_init.c @@ -1,4 +1,5 @@ /* + * Copyright (c) 2024 J. Dekker * Copyright © 2024 Rémi Denis-Courmont. * * This file is part of FFmpeg. @@ -24,22 +25,57 @@ #include "libavutil/attributes.h" #include "libavutil/cpu.h" +#include "libavutil/riscv/cpu.h" #include "libavcodec/h264dsp.h" extern int ff_startcode_find_candidate_rvb(const uint8_t *, int); extern int ff_startcode_find_candidate_rvv(const uint8_t *, int); +void ff_h264_idct4_dc_add_8_rvv(uint8_t *dst, int16_t *block, int stride); +void ff_h264_idct8_dc_add_8_rvv(uint8_t *dst, int16_t *block, int stride); +void ff_h264_idct4_dc_add_9_rvv(uint8_t *dst, int16_t *block, int stride); +void ff_h264_idct8_dc_add_9_rvv(uint8_t *dst, int16_t *block, int stride); +void ff_h264_idct4_dc_add_10_rvv(uint8_t *dst, int16_t *block, int stride); +void ff_h264_idct8_dc_add_10_rvv(uint8_t *dst, int16_t *block, int stride); +void ff_h264_idct4_dc_add_12_rvv(uint8_t *dst, int16_t *block, int stride); +void ff_h264_idct8_dc_add_12_rvv(uint8_t *dst, int16_t *block, int stride); +void ff_h264_idct4_dc_add_14_rvv(uint8_t *dst, int16_t *block, int stride); +void ff_h264_idct8_dc_add_14_rvv(uint8_t *dst, int16_t *block, int stride); -av_cold void ff_h264dsp_init_riscv(H264DSPContext *dsp, const int bit_depth, +av_cold void ff_h264dsp_init_riscv(H264DSPContext *c, const int bit_depth, const int chroma_format_idc) { #if HAVE_RV int flags = av_get_cpu_flags(); if (flags & AV_CPU_FLAG_RVB_BASIC) - dsp->startcode_find_candidate = ff_startcode_find_candidate_rvb; + c->startcode_find_candidate = ff_startcode_find_candidate_rvb; # if HAVE_RVV if (flags & AV_CPU_FLAG_RVV_I32) - dsp->startcode_find_candidate = ff_startcode_find_candidate_rvv; + c->startcode_find_candidate = ff_startcode_find_candidate_rvv; # endif + if ((flags & AV_CPU_FLAG_RVV_I64) && ff_rv_vlen_least(16)) { + switch(bit_depth) { + case 8: + c->h264_idct_dc_add = ff_h264_idct4_dc_add_8_rvv; + c->h264_idct8_dc_add = ff_h264_idct8_dc_add_8_rvv; + break; + case 9: + c->h264_idct_dc_add = ff_h264_idct4_dc_add_9_rvv; + c->h264_idct8_dc_add = ff_h264_idct8_dc_add_9_rvv; + break; + case 10: + c->h264_idct_dc_add = ff_h264_idct4_dc_add_10_rvv; + c->h264_idct8_dc_add = ff_h264_idct8_dc_add_10_rvv; + break; + case 12: + c->h264_idct_dc_add = ff_h264_idct4_dc_add_12_rvv; + c->h264_idct8_dc_add = ff_h264_idct8_dc_add_12_rvv; + break; + case 14: + c->h264_idct_dc_add = ff_h264_idct4_dc_add_14_rvv; + c->h264_idct8_dc_add = ff_h264_idct8_dc_add_14_rvv; + break; + } + } #endif } diff --git a/libavcodec/riscv/h264dsp_rvv.S b/libavcodec/riscv/h264dsp_rvv.S new file mode 100644 index 0000000000..57f0433f7c --- /dev/null +++ b/libavcodec/riscv/h264dsp_rvv.S @@ -0,0 +1,176 @@ +/* + * SPDX-License-Identifier: BSD-2-Clause + * + * Copyright (c) 2024 J. Dekker + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#include "libavutil/riscv/asm.S" + +.macro idct_dc_add8 width +func ff_h264_idct\width\()_dc_add_8_rvv, zve64x, zba + vsetivli zero, \width, e16, m1, ta, ma + lh a3, 0(a1) + addi a3, a3, 32 + srai a3, a3, 6 + sh zero, 0(a1) +.if \width == 8 + vlsseg8e8.v v24, (a0), a2 +.else + vlsseg4e8.v v24, (a0), a2 +.endif + vzext.vf2 v0, v24 + vzext.vf2 v2, v25 + vzext.vf2 v4, v26 + vzext.vf2 v6, v27 +.if \width == 8 + vzext.vf2 v10, v28 + vzext.vf2 v12, v29 + vzext.vf2 v14, v30 + vzext.vf2 v16, v31 +.endif + vadd.vx v0, v0, a3 + vadd.vx v2, v2, a3 + vadd.vx v4, v4, a3 + vadd.vx v6, v6, a3 +.if \width == 8 + vadd.vx v10, v10, a3 + vadd.vx v12, v12, a3 + vadd.vx v14, v14, a3 + vadd.vx v16, v16, a3 +.endif + vmax.vx v0, v0, zero + vmax.vx v2, v2, zero + vmax.vx v4, v4, zero + vmax.vx v6, v6, zero +.if \width == 8 + vmax.vx v10, v10, zero + vmax.vx v12, v12, zero + vmax.vx v14, v14, zero + vmax.vx v16, v16, zero +.endif + vsetvli zero, zero, e8, mf2, ta, ma + vnclipu.wi v24, v0, 0 + vnclipu.wi v25, v2, 0 + vnclipu.wi v26, v4, 0 + vnclipu.wi v27, v6, 0 +.if \width == 8 + vnclipu.wi v28, v10, 0 + vnclipu.wi v29, v12, 0 + vnclipu.wi v30, v14, 0 + vnclipu.wi v31, v16, 0 + vssseg8e8.v v24, (a0), a2 +.else + vssseg4e8.v v24, (a0), a2 +.endif + ret +endfunc +.endm + +idct_dc_add8 4 +idct_dc_add8 8 + +.macro idct_dc_add width +func ff_h264_idct\width\()_dc_add_16_rvv, zve64x, zba + vsetivli zero, \width, e16, m1, ta, ma + lw a3, 0(a1) + addi a3, a3, 32 + srai a3, a3, 6 + sw zero, 0(a1) + add t4, a0, a2 + sh1add t5, a2, a0 + sh1add t6, a2, t4 +.if \width == 8 + sh2add t0, a2, a0 + sh2add t1, a2, t4 + sh2add t2, a2, t5 + sh2add t3, a2, t6 +.endif + vle16.v v0, (a0) + vle16.v v2, (t4) + vle16.v v4, (t5) + vle16.v v6, (t6) +.if \width == 8 + vle16.v v10, (t0) + vle16.v v12, (t1) + vle16.v v14, (t2) + vle16.v v16, (t3) +.endif + vadd.vx v0, v0, a3 + vadd.vx v2, v2, a3 + vadd.vx v4, v4, a3 + vadd.vx v6, v6, a3 +.if \width == 8 + vadd.vx v10, v10, a3 + vadd.vx v12, v12, a3 + vadd.vx v14, v14, a3 + vadd.vx v16, v16, a3 +.endif + vmax.vx v0, v0, zero + vmax.vx v2, v2, zero + vmax.vx v4, v4, zero + vmax.vx v6, v6, zero +.if \width == 8 + vmax.vx v10, v10, zero + vmax.vx v12, v12, zero + vmax.vx v14, v14, zero + vmax.vx v16, v16, zero +.endif + vmin.vx v0, v0, a5 + vmin.vx v2, v2, a5 + vmin.vx v4, v4, a5 + vmin.vx v6, v6, a5 +.if \width == 8 + vmin.vx v10, v10, a5 + vmin.vx v12, v12, a5 + vmin.vx v14, v14, a5 + vmin.vx v16, v16, a5 +.endif + vse16.v v0, (a0) + vse16.v v2, (t4) + vse16.v v4, (t5) + vse16.v v6, (t6) +.if \width == 8 + vse16.v v10, (t0) + vse16.v v12, (t1) + vse16.v v14, (t2) + vse16.v v16, (t3) +.endif + ret +endfunc +.endm + +idct_dc_add 4 +idct_dc_add 8 + +.irp depth,9,10,12,14 +func ff_h264_idct4_dc_add_\depth\()_rvv, zve64x + li a5, (1 << \depth) - 1 + j ff_h264_idct4_dc_add_16_rvv +endfunc + +func ff_h264_idct8_dc_add_\depth\()_rvv, zve64x + li a5, (1 << \depth) - 1 + j ff_h264_idct8_dc_add_16_rvv +endfunc +.endr