From patchwork Mon Jul 15 05:50:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 50540 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:2298:b0:482:c625:d099 with SMTP id fp24csp2060380vqb; Sun, 14 Jul 2024 22:56:47 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU3XraydMWr+WslDvQC8ZhFxd/Muc8uUM7bEbh8WpH4TdoJKjGTSpXq8+4FproMFWp4M5d5EqVDvb5TSLvoKEw/ZiTGbIBt9++lQg== X-Google-Smtp-Source: AGHT+IGh7y8Nq/C0NmB4k/JF5P7sUJas1GoheOXHBChS2wjDpM1Ih/YuojPsQnBIjh60qEVgrApn X-Received: by 2002:a05:6402:40d5:b0:57d:3df:f881 with SMTP id 4fb4d7f45d1cf-594bab80834mr19636398a12.3.1721023006819; Sun, 14 Jul 2024 22:56:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1721023006; cv=none; d=google.com; s=arc-20160816; b=vOm6hWODUFetGimLezFgzXASiS1s9+6BCqQvnolB3rnz6p23XEMQ2OE6eHdXHYDpiw vdGUM2vCZrgfbNq/nLxsjSVJNrQhxcBYJWaCYIBxYT1o5zqIBvPc4aftOHFa6uNdYG/s ll4QVvwQ+Uah3qlPy75CN69WxyCxygqJeZOx/rNhYh4q9htKJrD543sqOnHf++0L64kv p//6cryl+ghJxXFdUKVukFllqgxBP5cO/kGqzXx8xWRl143DIz/3umtky034/CKI48XJ eS0iiesOB9Ivi4jgtV8LogpcRgorppoXOYbiHzR11MIeJUM2JJ2VgsY0LWYmfpih9S7d vZ3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from:feedback-id :dkim-signature:dkim-signature:delivered-to; bh=c1Fsn/bE9lyO4NswzWsFTSAmzLeOu/Zg/WVMUXZpTaE=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=IdtxuTOJ0uvZ845Grf9AvTECrVwuilA6Z34c+Qn2XNU3KM95DnLqvg/clBgqWoLpzH gfqa8FlyQ9r/Ai/tzgs2gLkxhUf475r9JSYSjcQnObOKWOtrGfC3sMtPGibrgUXZbxtm ZXY2kUhtWcNuFNkXVaBUkxZj58lPZcG1AXvwJttZ1vycKz8rGDFEpRvV5jq3KW+Fxc7N m3QDKMFSvlFqLMs0HkKFxydy/zCGally/9Vu1x3j7rzBettO7GZdS1kolaQuFMj/Ylo/ P4ORGfLNfVVvOI/vNrlFyl+M34GU5fzBqXcwtsgBZjYpQUnyoiZheAXZA8vnJjxyO+yJ +RLw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm3 header.b=guEtIbts; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm2 header.b="bhW/EqKV"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-59b2705ac8esi2093974a12.441.2024.07.14.22.56.46; Sun, 14 Jul 2024 22:56:46 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm3 header.b=guEtIbts; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm2 header.b="bhW/EqKV"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5011868D93D; Mon, 15 Jul 2024 08:50:53 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from fout4-smtp.messagingengine.com (fout4-smtp.messagingengine.com [103.168.172.147]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 807F468D66D for ; Mon, 15 Jul 2024 08:50:44 +0300 (EEST) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailfout.nyi.internal (Postfix) with ESMTP id 8B0601388E27 for ; Mon, 15 Jul 2024 01:50:42 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Mon, 15 Jul 2024 01:50:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:message-id:mime-version:reply-to:subject :subject:to:to; s=fm3; t=1721022642; x=1721109042; bh=MKObwTvhfh IzcIw24ugrO8UPhBJUnK+3iKxlqNRueOc=; b=guEtIbtsvbtNd4h8aaznanxjVt VhXR+XC09/DnJp9tCAySaVHIF1eikh1QplI1HT8TRIaOUQL4gfNrwoyeB6AICo9z IdAWXhVPLHiode6KBo3fsgDOecGGhlN1mdYDZQi+iMZ4suYfAOgVf8Iq0YDiZaEY qU2hkX1BCXjBsNZeNjTjrUFqkd3q94/JpqUlSOQF7qg83Bxwfc7k1cYTCR97zwMX MfokLg4gqZUKnw1zJQ6h44DTcHmg90K5RDfcGRScte3u1Z62qeG1GpprJp25Jl3R gDEMCWh/9eQ76oi5plgp5M5XbtfknH7KA+hUK46g7x/rkIt9uxaqX+22m7YQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:message-id:mime-version:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1721022642; x=1721109042; bh=MKObwTvhfhIzcIw24ugrO8UPhBJU nK+3iKxlqNRueOc=; b=bhW/EqKVtPxrs00j+hG4+85QzQizN5pbKmJvdu8XOQ7v yUlrx0UMln6MVEwWbQGl0Zz1EjzRxPNDPp9IiMHRge0NcOloRQrxh8N1DnhzZSTU 8i2ID2sUQ3Rd1eaWdTrKwRmgMhgo4xy1l5fnW28N8/5v9RtK1qlmBTNAuvbizGPb RJ/Khit5/2IyX0d/dVwjgAu2UG8hyhOpl9zrhaJaCmoAd+Es3F5iWtLXt+ybjdW7 z0Kr+VRaVhoOj4a5aawULv7dQWjuBPhynEtDbzh+gNxkjleyF1fcMbn1RiQeIDoZ braSfSsRyT94SucBJ8AAxKdT4okYcwNb+AZ0d0/ToQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrgedugddutddvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofggtgfgsehtkeertd ertdejnecuhfhrohhmpedflfdrucffvghkkhgvrhdfuceojhguvghksehithgrnhhimhhu lhdrlhhiqeenucggtffrrghtthgvrhhnpeekuedvvddtffeludfgjeefgfeiffdtteehhf etgeevtdfgteevjefghffhjeffudenucffohhmrghinhephhdvieegughsphgprhhvvhdr shgsnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepjh guvghksehithgrnhhimhhulhdrlhhi X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Mon, 15 Jul 2024 01:50:40 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Mon, 15 Jul 2024 07:50:38 +0200 Message-ID: <20240715055039.592571-1-jdek@itanimul.li> X-Mailer: git-send-email 2.44.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3] avcodec/riscv: add h264 dc idct rvv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: S1IxVQUr4i11 checkasm: bench runs 131072 (1 << 17) h264_idct4_add_dc_8bpp_c: 1.5 h264_idct4_add_dc_8bpp_rvv_i64: 0.7 h264_idct4_add_dc_9bpp_c: 1.5 h264_idct4_add_dc_9bpp_rvv_i64: 0.7 h264_idct4_add_dc_10bpp_c: 1.5 h264_idct4_add_dc_10bpp_rvv_i64: 0.7 h264_idct4_add_dc_12bpp_c: 1.2 h264_idct4_add_dc_12bpp_rvv_i64: 0.7 h264_idct4_add_dc_14bpp_c: 1.2 h264_idct4_add_dc_14bpp_rvv_i64: 0.7 h264_idct8_add_dc_8bpp_c: 5.2 h264_idct8_add_dc_8bpp_rvv_i64: 1.5 h264_idct8_add_dc_9bpp_c: 5.5 h264_idct8_add_dc_9bpp_rvv_i64: 1.2 h264_idct8_add_dc_10bpp_c: 5.5 h264_idct8_add_dc_10bpp_rvv_i64: 1.2 h264_idct8_add_dc_12bpp_c: 4.2 h264_idct8_add_dc_12bpp_rvv_i64: 1.2 h264_idct8_add_dc_14bpp_c: 4.2 h264_idct8_add_dc_14bpp_rvv_i64: 1.2 Signed-off-by: J. Dekker --- libavcodec/riscv/Makefile | 1 + libavcodec/riscv/h264dsp_init.c | 44 ++++++++++- libavcodec/riscv/h264dsp_rvv.S | 130 ++++++++++++++++++++++++++++++++ 3 files changed, 172 insertions(+), 3 deletions(-) create mode 100644 libavcodec/riscv/h264dsp_rvv.S As Remi mentioned, high bit-depth 4x4 could be done in the same way as low bit-depth. I've left it with the high bit-depth intentionally since this eases templating. Use of segments removed and repeated instructions changed to use m8. diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index c180223141..a1510e8c6e 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -31,6 +31,7 @@ RVV-OBJS-$(CONFIG_H263DSP) += riscv/h263dsp_rvv.o OBJS-$(CONFIG_H264CHROMA) += riscv/h264_chroma_init_riscv.o RVV-OBJS-$(CONFIG_H264CHROMA) += riscv/h264_mc_chroma.o OBJS-$(CONFIG_H264DSP) += riscv/h264dsp_init.o +RVV-OBJS-$(CONFIG_H264DSP) += riscv/h264dsp_rvv.o OBJS-$(CONFIG_HUFFYUV_DECODER) += riscv/huffyuvdsp_init.o RVV-OBJS-$(CONFIG_HUFFYUV_DECODER) += riscv/huffyuvdsp_rvv.o OBJS-$(CONFIG_IDCTDSP) += riscv/idctdsp_init.o diff --git a/libavcodec/riscv/h264dsp_init.c b/libavcodec/riscv/h264dsp_init.c index dbbf3db400..3256199303 100644 --- a/libavcodec/riscv/h264dsp_init.c +++ b/libavcodec/riscv/h264dsp_init.c @@ -1,4 +1,5 @@ /* + * Copyright (c) 2024 J. Dekker * Copyright © 2024 Rémi Denis-Courmont. * * This file is part of FFmpeg. @@ -24,22 +25,59 @@ #include "libavutil/attributes.h" #include "libavutil/cpu.h" +#include "libavutil/riscv/cpu.h" #include "libavcodec/h264dsp.h" extern int ff_startcode_find_candidate_rvb(const uint8_t *, int); extern int ff_startcode_find_candidate_rvv(const uint8_t *, int); +void ff_h264_idct4_dc_add_8_rvv(uint8_t *, int16_t *, int); +void ff_h264_idct8_dc_add_8_rvv(uint8_t *, int16_t *, int); +void ff_h264_idct4_dc_add_9_rvv(uint8_t *, int16_t *, int); +void ff_h264_idct8_dc_add_9_rvv(uint8_t *, int16_t *, int); +void ff_h264_idct4_dc_add_10_rvv(uint8_t *, int16_t *, int); +void ff_h264_idct8_dc_add_10_rvv(uint8_t *, int16_t *, int); +void ff_h264_idct4_dc_add_12_rvv(uint8_t *, int16_t *, int); +void ff_h264_idct8_dc_add_12_rvv(uint8_t *, int16_t *, int); +void ff_h264_idct4_dc_add_14_rvv(uint8_t *, int16_t *, int); +void ff_h264_idct8_dc_add_14_rvv(uint8_t *, int16_t *, int); -av_cold void ff_h264dsp_init_riscv(H264DSPContext *dsp, const int bit_depth, +av_cold void ff_h264dsp_init_riscv(H264DSPContext *c, const int bit_depth, const int chroma_format_idc) { #if HAVE_RV int flags = av_get_cpu_flags(); if (flags & AV_CPU_FLAG_RVB_BASIC) - dsp->startcode_find_candidate = ff_startcode_find_candidate_rvb; + c->startcode_find_candidate = ff_startcode_find_candidate_rvb; # if HAVE_RVV if (flags & AV_CPU_FLAG_RVV_I32) - dsp->startcode_find_candidate = ff_startcode_find_candidate_rvv; + c->startcode_find_candidate = ff_startcode_find_candidate_rvv; # endif + if (ff_rv_vlen_least(128)) { + switch(bit_depth) { + case 8: + if (flags & AV_CPU_FLAG_RVV_I64) { + c->h264_idct_dc_add = ff_h264_idct4_dc_add_8_rvv; + c->h264_idct8_dc_add = ff_h264_idct8_dc_add_8_rvv; + } + break; + case 9: + c->h264_idct_dc_add = ff_h264_idct4_dc_add_9_rvv; + c->h264_idct8_dc_add = ff_h264_idct8_dc_add_9_rvv; + break; + case 10: + c->h264_idct_dc_add = ff_h264_idct4_dc_add_10_rvv; + c->h264_idct8_dc_add = ff_h264_idct8_dc_add_10_rvv; + break; + case 12: + c->h264_idct_dc_add = ff_h264_idct4_dc_add_12_rvv; + c->h264_idct8_dc_add = ff_h264_idct8_dc_add_12_rvv; + break; + case 14: + c->h264_idct_dc_add = ff_h264_idct4_dc_add_14_rvv; + c->h264_idct8_dc_add = ff_h264_idct8_dc_add_14_rvv; + break; + } + } #endif } diff --git a/libavcodec/riscv/h264dsp_rvv.S b/libavcodec/riscv/h264dsp_rvv.S new file mode 100644 index 0000000000..0e6c2e49e9 --- /dev/null +++ b/libavcodec/riscv/h264dsp_rvv.S @@ -0,0 +1,130 @@ +/* + * SPDX-License-Identifier: BSD-2-Clause + * + * Copyright (c) 2024 J. Dekker + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#include "libavutil/riscv/asm.S" + +.macro idct_dc_add8 width +func ff_h264_idct\width\()_dc_add_8_rvv, zve64x, zba +.if \width == 8 + vsetivli zero, \width, e16, m1, ta, ma +.else + vsetivli zero, \width, e16, mf2, ta, ma +.endif + lh a3, 0(a1) + addi a3, a3, 32 + srai a3, a3, 6 + sh zero, 0(a1) +.if \width == 8 + vlse64.v v24, (a0), a2 + vsetvli t0, zero, e16, m8, ta, ma +.else + vlse32.v v24, (a0), a2 + vsetvli t0, zero, e16, m4, ta, ma +.endif + vzext.vf2 v0, v24 + vadd.vx v0, v0, a3 + vmax.vx v0, v0, zero +.if \width == 8 + vsetvli zero, zero, e8, m4, ta, ma +.else + vsetvli zero, zero, e8, m2, ta, ma +.endif + vnclipu.wi v24, v0, 0 + vsetivli zero, \width, e8, m1, ta, ma +.if \width == 8 + vsse64.v v24, (a0), a2 +.else + vsse32.v v24, (a0), a2 +.endif + ret +endfunc +.endm + +idct_dc_add8 4 +idct_dc_add8 8 + +.macro idct_dc_add width +func ff_h264_idct\width\()_dc_add_16_rvv, zve64x, zba + vsetivli zero, \width, e16, m1, ta, ma + lw a3, 0(a1) + addi a3, a3, 32 + srai a3, a3, 6 + sw zero, 0(a1) + add t4, a0, a2 + sh1add t5, a2, a0 + sh1add t6, a2, t4 +.if \width == 8 + sh2add t0, a2, a0 + sh2add t1, a2, t4 + sh2add t2, a2, t5 + sh2add t3, a2, t6 +.endif + vle16.v v0, (a0) + vle16.v v1, (t4) + vle16.v v2, (t5) + vle16.v v3, (t6) +.if \width == 8 + vle16.v v4, (t0) + vle16.v v5, (t1) + vle16.v v6, (t2) + vle16.v v7, (t3) + vsetvli a6, zero, e16, m8, ta, ma +.else + vsetvli a6, zero, e16, m4, ta, ma +.endif + vadd.vx v0, v0, a3 + vmax.vx v0, v0, zero + vmin.vx v0, v0, a5 + vsetivli zero, \width, e16, m1, ta, ma + vse16.v v0, (a0) + vse16.v v1, (t4) + vse16.v v2, (t5) + vse16.v v3, (t6) +.if \width == 8 + vse16.v v4, (t0) + vse16.v v5, (t1) + vse16.v v6, (t2) + vse16.v v7, (t3) +.endif + ret +endfunc +.endm + +idct_dc_add 4 +idct_dc_add 8 + +.irp depth,9,10,12,14 +func ff_h264_idct4_dc_add_\depth\()_rvv, zve64x + li a5, (1 << \depth) - 1 + j ff_h264_idct4_dc_add_16_rvv +endfunc + +func ff_h264_idct8_dc_add_\depth\()_rvv, zve64x + li a5, (1 << \depth) - 1 + j ff_h264_idct8_dc_add_16_rvv +endfunc +.endr