From patchwork Fri Dec 22 10:52:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6YeR5rOi?= X-Patchwork-Id: 45283 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:6623:b0:194:e134:edd4 with SMTP id n35csp979072pzh; Fri, 22 Dec 2023 02:52:54 -0800 (PST) X-Google-Smtp-Source: AGHT+IETui4YCThCxEH9GtaVpWaMAnk4GlYgXmtJPKnMYe3RiaD9paqLJevWfQg+67VICWv1UjYw X-Received: by 2002:a19:8c48:0:b0:50e:684c:7222 with SMTP id i8-20020a198c48000000b0050e684c7222mr452845lfj.58.1703242374660; Fri, 22 Dec 2023 02:52:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703242374; cv=none; d=google.com; s=arc-20160816; b=ACko299ht8YvfLvKI1HpFEju8K6TAtyHn3ImfxmHG7bPCauHX2HUGRQ8ER+jPBRypc u7JiywzMkWkwU+VdHSllQ9AeMq2Ke7Dxc4BwRDsjI5s4KSdDxANuw55uW0TFWez5njxx 3VuQ6GZqW9oO/G5HqCASnFj7Zbyt9NixoS7T4en8ZjSsa+cIxk1I8YoTaO6AV/EOwpin hVm5UOdJUKaTPLxyvUBjav2kJX6JH18WpWMUw5ge/KUjwONQMu78qy4xE6ndRa2Dzezq D/5AImYwYc86jo4bZgtnunllJypu+wNMZzTFimLzNY3dc+wva1FkQudnh8yMCjd61k+D 6JFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:delivered-to; bh=FaKyLjWcEe7CZILMq8K4eN0pAHxHJ4GtV8E877kjomk=; fh=jVI190atZy/rQkFyTEx2ZbUJwkNs7ZTytKjYCllSOv4=; b=yD2t7PTZ5xHmZr64nzABAqcAo0oqsZLmgK5gF3rQjgpF29WDa89SpbxT48L9amOFL/ n5SBCeAH6GT52ZL2jGI/sCjghFJlI2O+U8/c+GKqkCrKGswjNsIgiavftWto0AunPdgS PR0eiuMy8y8uAkl3qzt2SUxYifqG2WwXMP9NdTPcH7gdmcyaWQ6OPC08M7ZHBp0CiknK gDKLhmkc3+kQ+SJzf1I/O4xXGvgK2iEfTcRhUqQDlfTVgCLON9Mir8X7vcbasaplS7fm 8z0woiGTcUDlDyA2/kKkLTB1W3MK9grbhLedo4MerJrV5cJoTmy7wk2fTCPvbrT3oFIr IF1Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id qc14-20020a170906d8ae00b00a2696a83e46si1712078ejb.1047.2023.12.22.02.52.51; Fri, 22 Dec 2023 02:52:54 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 88B4168D2FD; Fri, 22 Dec 2023 12:52:31 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E0D3968D2CC for ; Fri, 22 Dec 2023 12:52:21 +0200 (EET) Received: from loongson.cn (unknown [36.33.26.33]) by gateway (Coremail) with SMTP id _____8DxBOtjaoVl8dIDAA--.15335S3; Fri, 22 Dec 2023 18:52:19 +0800 (CST) Received: from localhost (unknown [36.33.26.33]) by localhost.localdomain (Coremail) with SMTP id AQAAf8BxXORjaoVlrYMFAA--.27667S3; Fri, 22 Dec 2023 18:52:19 +0800 (CST) From: jinbo To: ffmpeg-devel@ffmpeg.org Date: Fri, 22 Dec 2023 18:52:10 +0800 Message-Id: <20231222105214.15168-3-jinbo@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231222105214.15168-1-jinbo@loongson.cn> References: <20231222105214.15168-1-jinbo@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8BxXORjaoVlrYMFAA--.27667S3 X-CM-SenderInfo: xmlqu0o6or00hjvr0hdfq/1tbiAQANEmWE8i4JJQAGsJ X-Coremail-Antispam: 1Uk129KBj93XoW3AFyDtr4rtrWkCr17Aw15WrX_yoWxAF4DpF 9FvwnxGw1kWr9I9wnrKry5XF1j9rZaga4agFW3try29rWUXryjvw1DJF97XFyDXwn5ArWr X3Zaq343C3W7K3gCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkFb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Jr0_Gr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1Y6r17McIj6I8E87Iv 67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1Y6r17MIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6x kF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07jb_-PUUUUU= Subject: [FFmpeg-devel] [PATCH v1 2/6] avcodec/hevc: Add add_residual_4/8/16/32 asm opt X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: jinbo Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ghRm3VihoDhP After this patch, the peformance of decoding H265 4K 30FPS 30Mbps on 3A6000 with 8 threads improves 2fps (45fps-->47fsp). --- libavcodec/loongarch/Makefile | 3 +- libavcodec/loongarch/hevc_add_res.S | 162 ++++++++++++++++++ libavcodec/loongarch/hevcdsp_init_loongarch.c | 5 + libavcodec/loongarch/hevcdsp_lsx.h | 5 + 4 files changed, 174 insertions(+), 1 deletion(-) create mode 100644 libavcodec/loongarch/hevc_add_res.S diff --git a/libavcodec/loongarch/Makefile b/libavcodec/loongarch/Makefile index 06cfab5c20..07ea97f803 100644 --- a/libavcodec/loongarch/Makefile +++ b/libavcodec/loongarch/Makefile @@ -27,7 +27,8 @@ LSX-OBJS-$(CONFIG_HEVC_DECODER) += loongarch/hevcdsp_lsx.o \ loongarch/hevc_lpf_sao_lsx.o \ loongarch/hevc_mc_bi_lsx.o \ loongarch/hevc_mc_uni_lsx.o \ - loongarch/hevc_mc_uniw_lsx.o + loongarch/hevc_mc_uniw_lsx.o \ + loongarch/hevc_add_res.o LSX-OBJS-$(CONFIG_H264DSP) += loongarch/h264idct.o \ loongarch/h264idct_loongarch.o \ loongarch/h264dsp.o diff --git a/libavcodec/loongarch/hevc_add_res.S b/libavcodec/loongarch/hevc_add_res.S new file mode 100644 index 0000000000..dd2d820af8 --- /dev/null +++ b/libavcodec/loongarch/hevc_add_res.S @@ -0,0 +1,162 @@ +/* + * Loongson LSX optimized add_residual functions for HEVC decoding + * + * Copyright (c) 2023 Loongson Technology Corporation Limited + * Contributed by jinbo + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "loongson_asm.S" + +/* + * void ff_hevc_add_residual4x4_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride) + */ +.macro ADD_RES_LSX_4x4_8 + vldrepl.w vr0, a0, 0 + add.d t0, a0, a2 + vldrepl.w vr1, t0, 0 + vld vr2, a1, 0 + + vilvl.w vr1, vr1, vr0 + vsllwil.hu.bu vr1, vr1, 0 + vadd.h vr1, vr1, vr2 + vssrani.bu.h vr1, vr1, 0 + + vstelm.w vr1, a0, 0, 0 + vstelm.w vr1, t0, 0, 1 +.endm + +function ff_hevc_add_residual4x4_8_lsx + ADD_RES_LSX_4x4_8 + alsl.d a0, a2, a0, 1 + addi.d a1, a1, 16 + ADD_RES_LSX_4x4_8 +endfunc + +/* + * void ff_hevc_add_residual8x8_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride) + */ +.macro ADD_RES_LSX_8x8_8 + vldrepl.d vr0, a0, 0 + add.d t0, a0, a2 + vldrepl.d vr1, t0, 0 + add.d t1, t0, a2 + vldrepl.d vr2, t1, 0 + add.d t2, t1, a2 + vldrepl.d vr3, t2, 0 + + vld vr4, a1, 0 + addi.d t3, zero, 16 + vldx vr5, a1, t3 + addi.d t4, a1, 32 + vld vr6, t4, 0 + vldx vr7, t4, t3 + + vsllwil.hu.bu vr0, vr0, 0 + vsllwil.hu.bu vr1, vr1, 0 + vsllwil.hu.bu vr2, vr2, 0 + vsllwil.hu.bu vr3, vr3, 0 + vadd.h vr0, vr0, vr4 + vadd.h vr1, vr1, vr5 + vadd.h vr2, vr2, vr6 + vadd.h vr3, vr3, vr7 + vssrani.bu.h vr1, vr0, 0 + vssrani.bu.h vr3, vr2, 0 + + vstelm.d vr1, a0, 0, 0 + vstelm.d vr1, t0, 0, 1 + vstelm.d vr3, t1, 0, 0 + vstelm.d vr3, t2, 0, 1 +.endm + +function ff_hevc_add_residual8x8_8_lsx + ADD_RES_LSX_8x8_8 + alsl.d a0, a2, a0, 2 + addi.d a1, a1, 64 + ADD_RES_LSX_8x8_8 +endfunc + +/* + * void ff_hevc_add_residual16x16_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride) + */ +function ff_hevc_add_residual16x16_8_lsx +.rept 8 + vld vr0, a0, 0 + vldx vr2, a0, a2 + + vld vr4, a1, 0 + addi.d t0, zero, 16 + vldx vr5, a1, t0 + addi.d t1, a1, 32 + vld vr6, t1, 0 + vldx vr7, t1, t0 + + vexth.hu.bu vr1, vr0 + vsllwil.hu.bu vr0, vr0, 0 + vexth.hu.bu vr3, vr2 + vsllwil.hu.bu vr2, vr2, 0 + vadd.h vr0, vr0, vr4 + vadd.h vr1, vr1, vr5 + vadd.h vr2, vr2, vr6 + vadd.h vr3, vr3, vr7 + + vssrani.bu.h vr1, vr0, 0 + vssrani.bu.h vr3, vr2, 0 + + vst vr1, a0, 0 + vstx vr3, a0, a2 + + alsl.d a0, a2, a0, 1 + addi.d a1, a1, 64 +.endr +endfunc + +/* + * void ff_hevc_add_residual32x32_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride) + */ +function ff_hevc_add_residual32x32_8_lsx +.rept 32 + vld vr0, a0, 0 + addi.w t0, zero, 16 + vldx vr2, a0, t0 + + vld vr4, a1, 0 + vldx vr5, a1, t0 + addi.d t1, a1, 32 + vld vr6, t1, 0 + vldx vr7, t1, t0 + + vexth.hu.bu vr1, vr0 + vsllwil.hu.bu vr0, vr0, 0 + vexth.hu.bu vr3, vr2 + vsllwil.hu.bu vr2, vr2, 0 + vadd.h vr0, vr0, vr4 + vadd.h vr1, vr1, vr5 + vadd.h vr2, vr2, vr6 + vadd.h vr3, vr3, vr7 + + vssrani.bu.h vr1, vr0, 0 + vssrani.bu.h vr3, vr2, 0 + + vst vr1, a0, 0 + vstx vr3, a0, t0 + + add.d a0, a0, a2 + addi.d a1, a1, 64 +.endr +endfunc diff --git a/libavcodec/loongarch/hevcdsp_init_loongarch.c b/libavcodec/loongarch/hevcdsp_init_loongarch.c index 5a96f3a4c9..a8f753dc86 100644 --- a/libavcodec/loongarch/hevcdsp_init_loongarch.c +++ b/libavcodec/loongarch/hevcdsp_init_loongarch.c @@ -189,6 +189,11 @@ void ff_hevc_dsp_init_loongarch(HEVCDSPContext *c, const int bit_depth) c->idct[1] = ff_hevc_idct_8x8_lsx; c->idct[2] = ff_hevc_idct_16x16_lsx; c->idct[3] = ff_hevc_idct_32x32_lsx; + + c->add_residual[0] = ff_hevc_add_residual4x4_8_lsx; + c->add_residual[1] = ff_hevc_add_residual8x8_8_lsx; + c->add_residual[2] = ff_hevc_add_residual16x16_8_lsx; + c->add_residual[3] = ff_hevc_add_residual32x32_8_lsx; } } } diff --git a/libavcodec/loongarch/hevcdsp_lsx.h b/libavcodec/loongarch/hevcdsp_lsx.h index 0d54196caf..ac509984fd 100644 --- a/libavcodec/loongarch/hevcdsp_lsx.h +++ b/libavcodec/loongarch/hevcdsp_lsx.h @@ -227,4 +227,9 @@ void ff_hevc_idct_8x8_lsx(int16_t *coeffs, int col_limit); void ff_hevc_idct_16x16_lsx(int16_t *coeffs, int col_limit); void ff_hevc_idct_32x32_lsx(int16_t *coeffs, int col_limit); +void ff_hevc_add_residual4x4_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); +void ff_hevc_add_residual8x8_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); +void ff_hevc_add_residual16x16_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); +void ff_hevc_add_residual32x32_8_lsx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); + #endif // #ifndef AVCODEC_LOONGARCH_HEVCDSP_LSX_H