From patchwork Wed Dec 27 04:50:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6YeR5rOi?= X-Patchwork-Id: 45342 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:6623:b0:194:e134:edd4 with SMTP id n35csp3415454pzh; Tue, 26 Dec 2023 20:51:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IGYzTLrBTIsKlL9X5RnvOyF+grYA4975YX+Za8EdQfQVFQd2N4ns4Enqy2BlFnc69x1vHVG X-Received: by 2002:a17:907:9009:b0:a23:510c:7f07 with SMTP id ay9-20020a170907900900b00a23510c7f07mr2878936ejc.125.1703652673922; Tue, 26 Dec 2023 20:51:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703652673; cv=none; d=google.com; s=arc-20160816; b=rP7EeIaqNHKUP93YL22TbGubRZhjqQBFX6iENG6VwRlRwsAJokalTA0OrKOCQJ63n0 dG0dy4EAMFJjrE6QfTgSlhHbQc19c2c05RSI5KPz8FvnAirzTVv/zsTAVXyM02A9bfvK k1yyrd/6QOZnRJtXIwKD1W+ScovvCYaHxPyQRMPAdDOsod4G/0bjXZdIL3jH1KMyXcMv RnTRA5xF1CyMo42ljbS8nDwZn3pgFJds6gpJKj2Im9wp1y6YC7mjwfFDHWdUBmVKS9W5 sBEYUb54NBG8dvd1BnN6aW2vXGp3jmx4owfDiJqLrPyzbgHpiNwhNsIklFv3gpfsk7nw rA3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:delivered-to; bh=FaKyLjWcEe7CZILMq8K4eN0pAHxHJ4GtV8E877kjomk=; fh=jVI190atZy/rQkFyTEx2ZbUJwkNs7ZTytKjYCllSOv4=; b=D3tjQ+FCuqA2q1UBW+UY2QXZOVcyLCmnIMcZW1BAi2QExNCdq8K6Rydrj7xA6p9jTa 5FzCX0FOjF0yENnVUPzdERCtbD1P4mmDRap6obBQQQe0vm179TF+dMU5MJLBetMjmH4x gRTFvW4J48ubqiTUGy2S6uHi9qg1vZA0hF75V5X6vc8JnTx8MCdb0Gp7jqfoM22DmSsT qjX/rIZ1BUeeB4BY9nLI4CWq0S5QLwKJ/pyKvc7jiPAtDpOC9HIR0zf9SPFdMwD+LSuN GX9fGGvMdUTUBaQcKmfQxRLuaWtX42EX3Mq7jfG3vVz1nC62anN7ntgHIwhzsaanshl6 h0Og== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id kb2-20020a170907924200b00a272ec66d58si725766ejb.111.2023.12.26.20.51.13; Tue, 26 Dec 2023 20:51:13 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AB00368CDE3; Wed, 27 Dec 2023 06:50:49 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 037E968CD45 for ; Wed, 27 Dec 2023 06:50:41 +0200 (EET) Received: from loongson.cn (unknown [36.33.26.33]) by gateway (Coremail) with SMTP id _____8Bx6ugbrYtlQu4EAA--.23267S3; Wed, 27 Dec 2023 12:50:35 +0800 (CST) Received: from localhost (unknown [36.33.26.33]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Bxyr0arYtlglsMAA--.15908S3; Wed, 27 Dec 2023 12:50:34 +0800 (CST) From: jinbo To: ffmpeg-devel@ffmpeg.org Date: Wed, 27 Dec 2023 12:50:14 +0800 Message-Id: <20231227045019.25078-3-jinbo@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231227045019.25078-1-jinbo@loongson.cn> References: <20231227045019.25078-1-jinbo@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Bxyr0arYtlglsMAA--.15908S3 X-CM-SenderInfo: xmlqu0o6or00hjvr0hdfq/1tbiAQASEmWLia8B1wAFsa X-Coremail-Antispam: 1Uk129KBj93XoW3AFyDtr4rtrWkCr17Aw15WrX_yoWxAF4DpF 9FvwnxGw1kWr9I9wnrKry5XF1j9rZaga4agFW3try29rWUXryjvw1DJF97XFyDXwn5ArWr X3Zaq343C3W7K3gCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkYb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r106r15M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Jr0_Gr1l84ACjcxK6I8E87Iv67AKxVWxJVW8Jr1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6r4UJVWxJr1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqjxCEc2xF0cIa020Ex4CE44I27w Aqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE 14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwCF04k20xvY0x 0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E 7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jrv_JF1lIxkGc2Ij64vIr41lIxAIcV C0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Jr0_Gr1lIxAIcVCF 04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7 CjxVAFwI0_Jr0_GrUvcSsGvfC2KfnxnUUI43ZEXa7IU8I38UUUUUU== Subject: [FFmpeg-devel] [PATCH v2 2/7] avcodec/hevc: Add add_residual_4/8/16/32 asm opt X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: jinbo Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: lLwY2ns6UEWq After this patch, the peformance of decoding H265 4K 30FPS 30Mbps on 3A6000 with 8 threads improves 2fps (45fps-->47fsp). --- libavcodec/loongarch/Makefile | 3 +- libavcodec/loongarch/hevc_add_res.S | 162 ++++++++++++++++++ libavcodec/loongarch/hevcdsp_init_loongarch.c | 5 + libavcodec/loongarch/hevcdsp_lsx.h | 5 + 4 files changed, 174 insertions(+), 1 deletion(-) create mode 100644 libavcodec/loongarch/hevc_add_res.S diff --git a/libavcodec/loongarch/Makefile b/libavcodec/loongarch/Makefile index 06cfab5c20..07ea97f803 100644 --- a/libavcodec/loongarch/Makefile +++ b/libavcodec/loongarch/Makefile @@ -27,7 +27,8 @@ LSX-OBJS-$(CONFIG_HEVC_DECODER) += loongarch/hevcdsp_lsx.o \ loongarch/hevc_lpf_sao_lsx.o \ loongarch/hevc_mc_bi_lsx.o \ loongarch/hevc_mc_uni_lsx.o \ - loongarch/hevc_mc_uniw_lsx.o + loongarch/hevc_mc_uniw_lsx.o \ + loongarch/hevc_add_res.o LSX-OBJS-$(CONFIG_H264DSP) += loongarch/h264idct.o \ loongarch/h264idct_loongarch.o \ loongarch/h264dsp.o diff --git a/libavcodec/loongarch/hevc_add_res.S b/libavcodec/loongarch/hevc_add_res.S new file mode 100644 index 0000000000..dd2d820af8 --- /dev/null +++ b/libavcodec/loongarch/hevc_add_res.S @@ -0,0 +1,162 @@ +/* + * Loongson LSX optimized add_residual functions for HEVC decoding + * + * Copyright (c) 2023 Loongson Technology Corporation Limited + * Contributed by jinbo + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "loongson_asm.S" + +/* + * void ff_hevc_add_residual4x4_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride) + */ +.macro ADD_RES_LSX_4x4_8 + vldrepl.w vr0, a0, 0 + add.d t0, a0, a2 + vldrepl.w vr1, t0, 0 + vld vr2, a1, 0 + + vilvl.w vr1, vr1, vr0 + vsllwil.hu.bu vr1, vr1, 0 + vadd.h vr1, vr1, vr2 + vssrani.bu.h vr1, vr1, 0 + + vstelm.w vr1, a0, 0, 0 + vstelm.w vr1, t0, 0, 1 +.endm + +function ff_hevc_add_residual4x4_8_lsx + ADD_RES_LSX_4x4_8 + alsl.d a0, a2, a0, 1 + addi.d a1, a1, 16 + ADD_RES_LSX_4x4_8 +endfunc + +/* + * void ff_hevc_add_residual8x8_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride) + */ +.macro ADD_RES_LSX_8x8_8 + vldrepl.d vr0, a0, 0 + add.d t0, a0, a2 + vldrepl.d vr1, t0, 0 + add.d t1, t0, a2 + vldrepl.d vr2, t1, 0 + add.d t2, t1, a2 + vldrepl.d vr3, t2, 0 + + vld vr4, a1, 0 + addi.d t3, zero, 16 + vldx vr5, a1, t3 + addi.d t4, a1, 32 + vld vr6, t4, 0 + vldx vr7, t4, t3 + + vsllwil.hu.bu vr0, vr0, 0 + vsllwil.hu.bu vr1, vr1, 0 + vsllwil.hu.bu vr2, vr2, 0 + vsllwil.hu.bu vr3, vr3, 0 + vadd.h vr0, vr0, vr4 + vadd.h vr1, vr1, vr5 + vadd.h vr2, vr2, vr6 + vadd.h vr3, vr3, vr7 + vssrani.bu.h vr1, vr0, 0 + vssrani.bu.h vr3, vr2, 0 + + vstelm.d vr1, a0, 0, 0 + vstelm.d vr1, t0, 0, 1 + vstelm.d vr3, t1, 0, 0 + vstelm.d vr3, t2, 0, 1 +.endm + +function ff_hevc_add_residual8x8_8_lsx + ADD_RES_LSX_8x8_8 + alsl.d a0, a2, a0, 2 + addi.d a1, a1, 64 + ADD_RES_LSX_8x8_8 +endfunc + +/* + * void ff_hevc_add_residual16x16_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride) + */ +function ff_hevc_add_residual16x16_8_lsx +.rept 8 + vld vr0, a0, 0 + vldx vr2, a0, a2 + + vld vr4, a1, 0 + addi.d t0, zero, 16 + vldx vr5, a1, t0 + addi.d t1, a1, 32 + vld vr6, t1, 0 + vldx vr7, t1, t0 + + vexth.hu.bu vr1, vr0 + vsllwil.hu.bu vr0, vr0, 0 + vexth.hu.bu vr3, vr2 + vsllwil.hu.bu vr2, vr2, 0 + vadd.h vr0, vr0, vr4 + vadd.h vr1, vr1, vr5 + vadd.h vr2, vr2, vr6 + vadd.h vr3, vr3, vr7 + + vssrani.bu.h vr1, vr0, 0 + vssrani.bu.h vr3, vr2, 0 + + vst vr1, a0, 0 + vstx vr3, a0, a2 + + alsl.d a0, a2, a0, 1 + addi.d a1, a1, 64 +.endr +endfunc + +/* + * void ff_hevc_add_residual32x32_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride) + */ +function ff_hevc_add_residual32x32_8_lsx +.rept 32 + vld vr0, a0, 0 + addi.w t0, zero, 16 + vldx vr2, a0, t0 + + vld vr4, a1, 0 + vldx vr5, a1, t0 + addi.d t1, a1, 32 + vld vr6, t1, 0 + vldx vr7, t1, t0 + + vexth.hu.bu vr1, vr0 + vsllwil.hu.bu vr0, vr0, 0 + vexth.hu.bu vr3, vr2 + vsllwil.hu.bu vr2, vr2, 0 + vadd.h vr0, vr0, vr4 + vadd.h vr1, vr1, vr5 + vadd.h vr2, vr2, vr6 + vadd.h vr3, vr3, vr7 + + vssrani.bu.h vr1, vr0, 0 + vssrani.bu.h vr3, vr2, 0 + + vst vr1, a0, 0 + vstx vr3, a0, t0 + + add.d a0, a0, a2 + addi.d a1, a1, 64 +.endr +endfunc diff --git a/libavcodec/loongarch/hevcdsp_init_loongarch.c b/libavcodec/loongarch/hevcdsp_init_loongarch.c index 5a96f3a4c9..a8f753dc86 100644 --- a/libavcodec/loongarch/hevcdsp_init_loongarch.c +++ b/libavcodec/loongarch/hevcdsp_init_loongarch.c @@ -189,6 +189,11 @@ void ff_hevc_dsp_init_loongarch(HEVCDSPContext *c, const int bit_depth) c->idct[1] = ff_hevc_idct_8x8_lsx; c->idct[2] = ff_hevc_idct_16x16_lsx; c->idct[3] = ff_hevc_idct_32x32_lsx; + + c->add_residual[0] = ff_hevc_add_residual4x4_8_lsx; + c->add_residual[1] = ff_hevc_add_residual8x8_8_lsx; + c->add_residual[2] = ff_hevc_add_residual16x16_8_lsx; + c->add_residual[3] = ff_hevc_add_residual32x32_8_lsx; } } } diff --git a/libavcodec/loongarch/hevcdsp_lsx.h b/libavcodec/loongarch/hevcdsp_lsx.h index 0d54196caf..ac509984fd 100644 --- a/libavcodec/loongarch/hevcdsp_lsx.h +++ b/libavcodec/loongarch/hevcdsp_lsx.h @@ -227,4 +227,9 @@ void ff_hevc_idct_8x8_lsx(int16_t *coeffs, int col_limit); void ff_hevc_idct_16x16_lsx(int16_t *coeffs, int col_limit); void ff_hevc_idct_32x32_lsx(int16_t *coeffs, int col_limit); +void ff_hevc_add_residual4x4_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); +void ff_hevc_add_residual8x8_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); +void ff_hevc_add_residual16x16_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); +void ff_hevc_add_residual32x32_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); + #endif // #ifndef AVCODEC_LOONGARCH_HEVCDSP_LSX_H