From patchwork Thu Apr 18 07:36:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: pengxu X-Patchwork-Id: 48124 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:ce4e:b0:1a9:af23:56c1 with SMTP id id14csp1517110pzb; Thu, 18 Apr 2024 00:36:46 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWF45Av1lb0bU9BFxXY40kwN3rCniIrmvHBMF13R+VlKyP1CNzUX31JMjI9wY54k9D7lMo/qqCWTQun7DOAdzbZnM8xPyFg7pYxSA== X-Google-Smtp-Source: AGHT+IHF2nB/0bk1ELQq4JwXsv7S0Xz4i9ntOfKiYOd/UAN/1Iu7UDYLTwzEnAzLgXuNlurjz3yH X-Received: by 2002:a50:cdd7:0:b0:56e:316f:f455 with SMTP id h23-20020a50cdd7000000b0056e316ff455mr1262997edj.22.1713425806229; Thu, 18 Apr 2024 00:36:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713425806; cv=none; d=google.com; s=arc-20160816; b=dikpM1C5zQOypaavCr/j/cYUPyCjzBHbt5AWe6D2kg28kpRsKMbeQpG814JffgshDn mvqpdljfrtbLHxWx+Syo/89nhu7PTQUCVg769maTL2ve16Zc9eB9smBaywZxM7NFAKeq egb6c5gt9su/g5whjRMOg4iWz1oAHwBOHJdvQvwRvnk7o70Qd62lnr1pq869YQAGcHOY xNX5QvDcsYm8AVZnVbEJZ05g2zccyS7w/MM+Z/ZbpbCUIZFepKiy2Y1HpzeUBZPoT6em vPY+92vDxkHqLBQr086LHX5ZZ7Ve85lN34S0BPGW1G16f5BLrA1hN6fChC7WSIdwb6Hc Rpsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=TONJP+MU6uukXZrtcSNlQIwsDOj/SOKhkSUosV1vuKo=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=Jw7RZ0RTUhgZlZceviF7muebwYyvCGxu4GmB7gmdVPRYO0PucH02cHsu0ND9VK6TCw C/hm+EhbUxT/g3bUjaAMLqBovp7tpU8qT8EvFSAPcKPhKVkfQHDzzvUB1r1j1vGEfcuY F8IEE2RSXKSyippGceSbdihBIZZPh9/RB/IjCo0IGF0i+hCw0H9BcqUGDUIRtvOuHZD3 XG+9UxjguGskN3H8xw0xabVEsmR8QHWJr2aG1PLBl9gb79O9k9S6NYAoVgF7hQg6quQ+ Yg8z55/RoqjRjj4RsJEqXIfnrnuLQgFyAImoa8ZW6RipO7KN5rIm0j1i4UW3BsnJDmW2 d3yA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u23-20020aa7db97000000b0056e30d0c473si540555edt.513.2024.04.18.00.36.45; Thu, 18 Apr 2024 00:36:46 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 720AA68D3E6; Thu, 18 Apr 2024 10:36:33 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A21AF68D3E6 for ; Thu, 18 Apr 2024 10:36:26 +0300 (EEST) Received: from loongson.cn (unknown [36.33.26.33]) by gateway (Coremail) with SMTP id _____8Dx+7pxzSBmlykpAA--.12083S3; Thu, 18 Apr 2024 15:36:17 +0800 (CST) Received: from localhost (unknown [36.33.26.33]) by localhost.localdomain (Coremail) with SMTP id AQAAf8BxlhFtzSBmPtB+AA--.33707S3; Thu, 18 Apr 2024 15:36:13 +0800 (CST) From: pengxu To: ffmpeg-devel@ffmpeg.org Date: Thu, 18 Apr 2024 15:36:08 +0800 Message-Id: <20240418073609.19365-2-pengxu@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240418073609.19365-1-pengxu@loongson.cn> References: <20240418073609.19365-1-pengxu@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8BxlhFtzSBmPtB+AA--.33707S3 X-CM-SenderInfo: pshqw53x6o00pqjv00gofq/ X-Coremail-Antispam: 1Uk129KBj93XoW3uFWxZr4xGFy7uryrJrWfXrc_yoWDXF15pF W3uw1jqw4kKFyS9F4kZ3s2vr1rXr97CF1SgF98WFW8ZFW8Gr4UXrZrtF9xCFyxX3yUAF4Y 9ayrKa4SyFyrAwcCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkFb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Jr0_Gr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r126r1DMcIj6I8E87Iv 67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1j6r15MIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6x kF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07j1q2_UUUUU= Subject: [FFmpeg-devel] [PATCH v2 1/2] avutil/loongarch: add LSX optimization for aac audio decode X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: k5+aniVfZyif Add functions: vector_fmul_window_lsx butterflies_float_lsx vector_fmul_scalar_lsx ./ffmpeg -i ../../1.aac -f null - before:482x after:523x --- libavutil/float_dsp.c | 2 + libavutil/float_dsp.h | 1 + libavutil/loongarch/Makefile | 5 +- libavutil/loongarch/float_dsp.S | 287 ++++++++++++++++++ libavutil/loongarch/float_dsp.h | 32 ++ .../loongarch/float_dsp_init_loongarch.c | 35 +++ 6 files changed, 361 insertions(+), 1 deletion(-) create mode 100644 libavutil/loongarch/float_dsp.S create mode 100644 libavutil/loongarch/float_dsp.h create mode 100644 libavutil/loongarch/float_dsp_init_loongarch.c diff --git a/libavutil/float_dsp.c b/libavutil/float_dsp.c index e9fb023466..7128ff3f96 100644 --- a/libavutil/float_dsp.c +++ b/libavutil/float_dsp.c @@ -162,6 +162,8 @@ av_cold AVFloatDSPContext *avpriv_float_dsp_alloc(int bit_exact) ff_float_dsp_init_x86(fdsp); #elif ARCH_MIPS ff_float_dsp_init_mips(fdsp); +#elif ARCH_LOONGARCH64 + ff_float_dsp_init_loongarch(fdsp); #endif return fdsp; } diff --git a/libavutil/float_dsp.h b/libavutil/float_dsp.h index 342a8715c5..679a930eab 100644 --- a/libavutil/float_dsp.h +++ b/libavutil/float_dsp.h @@ -206,6 +206,7 @@ void ff_float_dsp_init_ppc(AVFloatDSPContext *fdsp, int strict); void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp); void ff_float_dsp_init_x86(AVFloatDSPContext *fdsp); void ff_float_dsp_init_mips(AVFloatDSPContext *fdsp); +void ff_float_dsp_init_loongarch(AVFloatDSPContext *fdsp); /** * Allocate a float DSP context. diff --git a/libavutil/loongarch/Makefile b/libavutil/loongarch/Makefile index 2addd9351c..ae710f0515 100644 --- a/libavutil/loongarch/Makefile +++ b/libavutil/loongarch/Makefile @@ -1 +1,4 @@ -OBJS += loongarch/cpu.o +OBJS += loongarch/cpu.o \ + loongarch/float_dsp_init_loongarch.o + +LSX-OBJS += loongarch/float_dsp.o diff --git a/libavutil/loongarch/float_dsp.S b/libavutil/loongarch/float_dsp.S new file mode 100644 index 0000000000..5073c8424f --- /dev/null +++ b/libavutil/loongarch/float_dsp.S @@ -0,0 +1,287 @@ +/* + * Loongarch LASX/LSX optimizeds dsp + * + * Copyright (c) 2024 Loongson Technology Corporation Limited + * Contributed by PengXu + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavcodec/loongarch/loongson_asm.S" + + +/* void vector_fmul_window(float *dst, const float *src0, + const float *src1, const float *win, int len) */ +function vector_fmul_window_lsx + addi.d sp, sp, -8 + st.d $r23, sp, 0 + + move t4, a0 + move t5, a1 + move t6, a2 + move t7, a3 + move t8, a4 + slli.d t8, t8, 2 + + add.d t4, t4, t8 + add.d t7, t7, t8 + add.d t5, t5, t8 + + add.d a6, $r0, t8 + addi.d a7, t8, -4 + + move a5, $r0 + srai.d t0, a4, 2 + beq a5, t0, .VFW02 + +.VFW01: + sub.d t1, t5, a6 + addi.d t2, a7, -12 + vld vr1, t1, 0x00 //s0 + vldx vr2, a2, t2 //s1 + + sub.d t1, t7, a6 + vld vr3, t1, 0x00 //wi + vldx vr4, t7, t2 //wj + + vpermi.w vr2, vr2, 0x1b + vpermi.w vr4, vr4, 0x1b + + vfmul.s vr5, vr2, vr3 + vfmsub.s vr5, vr1, vr4, vr5 //dsti + + vfmul.s vr6, vr2, vr4 + vfmadd.s vr6, vr1, vr3, vr6 //dstj + + vpermi.w vr6, vr6, 0x1b + + sub.d t1, t4, a6 + vst vr5, t1, 0x00 + vstx vr6, t4, t2 + + addi.d a6, a6, -16 + addi.d a7, a7, -16 + + addi.d a5, a5, 1 + blt a5, t0, .VFW01 + +.VFW02: + andi t0, a4, 2 + beq $r0, t0, .VFW03 + + sub.d t0, t5, a6 + addi.d t1, a7, -4 + add.d t1, t5, t1 + + sub.d t2, t7, a6 + addi.d t3, a7, -4 + add.d t3, t7, t3 + + fld.s f0, t0, 0x00 //s0 + fld.s f1, t0, 0x04 + + fld.s f2, t1, 0x04 //s1 + fld.s f3, t1, 0x00 + + fld.s f4, t2, 0x00 //wi + fld.s f5, t2, 0x04 + + fld.s f6, t3, 0x04 //wj + fld.s f7, t3, 0x00 + + fmul.s f8, f2, f4 + fmsub.s f8, f0, f6, f8 //dsti + fmul.s f9, f3, f5 + fmsub.s f9, f1, f7, f9 + + fmul.s f10, f2, f6 + fmadd.s f10, f0, f4, f10 //dstj + fmul.s f11, f3, f7 + fmadd.s f11, f1, f5, f11 + + sub.d t2, t4, a6 + add.d t3, t4, a7 + addi.d t3, t3, -4 + + fst.s f8, t2, 0x00 + fst.s f9, t2, 0x04 + fst.s f10, t3, 0x04 + fst.s f11, t3, 0x00 + + addi.d a6, a6, -2 + addi.d a7, a7, -2 + +.VFW03: + andi t0, a4, 1 + beq $r0, t0, .VFW04 + + sub.d t0, t5, a6 + + fldx.s f0, t5, t0 //s0 + fldx.s f2, t6, a7 //s1 + fldx.s f4, t7, t0 //wi + fldx.s f6, t7, a7 //wj + + fmul.s f8, f2, f4 + fmsub.s f8, f0, f6, f8 //dsti + + fmul.s f10, f2, f6 + fmadd.s f10, f0, f4, f10 //dstj + + sub.d t0, t4, a6 + + fst.s f8, t0, 0x00 + fstx.s f10, t4, a7 + + addi.d a6, a6, -1 + addi.d a7, a7, -1 + +.VFW04: + ld.d $r23, sp, 0 + addi.d sp, sp, 8 + +endfunc + + +/* void butterflies_float(float *restrict v1, float *restrict v2, + int len) */ +function butterflies_float_lsx + move a6, $r0 + move a7, $r0 + + move t4, a0 + move t5, a1 + move t6, a2 + + srai.d t0, t6, 2 + beq a6, t0, .BFL02 + +.BFL01: + vldx vr0, t4, a7 + vldx vr1, t5, a7 + + vfsub.s vr3, vr0, vr1 + vfadd.s vr4, vr0, vr1 + + vstx vr4, t4, a7 + vstx vr3, t5, a7 + + addi.d a7, a7, 16 + addi.d a6, a6, 1 + blt a6, t0, .BFL01 + +.BFL02: + andi t0, t6, 2 + beq $r0, t0, .BFL03 + + add.d t1, t4, a7 + add.d t2, t5, a7 + + fld.s f0, t1, 0x00 + fld.s f1, t1, 0x04 + fld.s f2, t2, 0x00 + fld.s f3, t2, 0x04 + + fsub.s f4, f0, f2 + fsub.s f5, f1, f3 + fadd.s f6, f0, f2 + fadd.s f7, f1, f3 + + fst.s f6, t1, 0x00 + fst.s f7, t1, 0x04 + fst.s f4, t2, 0x00 + fst.s f5, t2, 0x04 + + addi.d a7, a7, 8 + +.BFL03: + andi t0, t6, 1 + beq $r0, t0, .BFL04 + + fldx.s f0, t4, a7 + fldx.s f2, t5, a7 + + fsub.s f4, f0, f2 + fadd.s f6, f0, f2 + + fstx.s f6, t4, a7 + fstx.s f4, t5, a7 + + addi.d a7, a7, 4 + +.BFL04: +endfunc + + +/* void vector_fmul_scalar_lsx(float *dst, const float *src, float mul, + int len) */ +function vector_fmul_scalar_lsx + move a6, $r0 + move a7, $r0 + + move t4, a0 + move t5, a1 + move t6, a2 + + vpermi.w vr0, vr0, 0x00 + + srai.d t0, t6, 2 + beq a6, t0, .BFS02 + +.BFS01: + vldx vr1, t5, a7 + + vfmul.s vr2, vr1, vr0 + + vstx vr2, t4, a7 + + addi.d a7, a7, 16 + addi.d a6, a6, 1 + blt a6, t0, .BFS01 + +.BFS02: + andi t0, t6, 2 + beq $r0, t0, .BFS03 + + add.d t1, t5, a7 + add.d t2, t4, a7 + + fld.s f1, t1, 0x00 + fld.s f2, t1, 0x04 + + fmul.s f3, f1, f0 + fmul.s f4, f2, f0 + + fst.s f3, t2, 0x00 + fst.s f4, t2, 0x04 + + addi.d a7, a7, 8 + +.BFS03: + andi t0, t6, 1 + beq $r0, t0, .BFS04 + + fldx.s f1, t5, a7 + + fmul.s f3, f1, f0 + + fstx.s f3, t4, a7 + + addi.d a7, a7, 4 + +.BFS04: +endfunc \ No newline at end of file diff --git a/libavutil/loongarch/float_dsp.h b/libavutil/loongarch/float_dsp.h new file mode 100644 index 0000000000..644c1f3713 --- /dev/null +++ b/libavutil/loongarch/float_dsp.h @@ -0,0 +1,32 @@ +/* + * Copyright (c) 2024 Loongson Technology Corporation Limited + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVUTIL_LOONGARCH_FLOAT_DSP_H +#define AVUTIL_LOONGARCH_FLOAT_DSP_H + +#include "libavutil/float_dsp.h" + +void vector_fmul_window_lsx(float *dst, const float *src0, + const float *src1, const float *win, int len); + +void butterflies_float_lsx(float *restrict v1, float *restrict v2, int len); + +void vector_fmul_scalar_lsx(float *dst, const float *src, float mul, int len); + +#endif /* AVUTIL_LOONGARCH_FLOAT_DSP_H */ \ No newline at end of file diff --git a/libavutil/loongarch/float_dsp_init_loongarch.c b/libavutil/loongarch/float_dsp_init_loongarch.c new file mode 100644 index 0000000000..592ba78058 --- /dev/null +++ b/libavutil/loongarch/float_dsp_init_loongarch.c @@ -0,0 +1,35 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "config.h" + +#include "libavutil/attributes.h" +#include "libavutil/cpu.h" +#include "float_dsp.h" +#include "libavutil/loongarch/cpu.h" + +av_cold void ff_float_dsp_init_loongarch(AVFloatDSPContext *fdsp) +{ + int cpu_flags = av_get_cpu_flags(); + + if (have_lsx(cpu_flags)) { + fdsp->vector_fmul_window = vector_fmul_window_lsx; + fdsp->butterflies_float = butterflies_float_lsx; + fdsp->vector_fmul_scalar = vector_fmul_scalar_lsx; + } +} From patchwork Thu Apr 18 07:36:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: pengxu X-Patchwork-Id: 48125 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:ce4e:b0:1a9:af23:56c1 with SMTP id id14csp1517054pzb; Thu, 18 Apr 2024 00:36:36 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVdIMVH88vpodT/Lb+3yvI+i0VkmrrvkCfqNfZi6+lRrTBCzDk2f1/aeFDNTpcHdMukuIXcoB1GJMuoqbQy9oosMj3M/UVg4fjwEg== X-Google-Smtp-Source: AGHT+IG725VfuykJawHAD56/5iD1gAcSdkESNo662jL6F0eT6DHCOOxrgDFLJzDicqAa01tWtkPo X-Received: by 2002:a50:9f43:0:b0:56f:ec04:b4f9 with SMTP id b61-20020a509f43000000b0056fec04b4f9mr1100619edf.1.1713425796475; Thu, 18 Apr 2024 00:36:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713425796; cv=none; d=google.com; s=arc-20160816; b=bMf4zBRl+yY7a3+wpqS/w7aQMBKUv5GTre221gwNsluhbfJ8fYB5xLu4zpkX7Zvdrj CJIUb4F66ioQxv9eefletvK6olj8jJ11rDz5vaqzkEjUPtd/dsM/23f8ZYSxuVtAPrc+ /0mOgmAIDAY7NKR0RK3YENHwHh3R3RWV5flAWbyZGfea13qoBL6vTpOZzwGUvtQhUIgZ 6djhwysDuquWUm+btOC4jryKa9fBlB0E43YQ3T4Qm9bvHlgceCWYqBPmahfHVBwlDAik VokZpAzKaZ6JvytEnh2zfIx4Hla1TytecfAjdI83DRxEuXWm7C1ifAp3Vgxt3TCRpTjO BHqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=mdf/Bc14rHzZR1mjcn+RGPVeWP1VUmJ3bc91eUprdyA=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=z6Xi0+IA8ljOepDU795Sr05V1tpddYuGAFLIi+gI4+C4y5IJ8ys4cn5BpcJsrkcaSn DlgeSP6pkMYkf6TxMAtqlT1icBb/nSQCb2Bo0avX+cffeHuj+Tazj12DsoRI6PM+Q/3Z YxoxtvERSsgc91IsN7xwnBCQfGt9WACxOuTmy+Zp2Wjkuok9iIVb67WhNbIqCLCvALP9 WccKbGoJtQqT5vXEktkrkU2VawWvvNoCHOTT4+9CbgRxuU/BXSgoHMbiYAJZ6awHOeLd 6AMju0Sxgc9i8QhfaG+oqDO7gLSpcK7tFQPl0AX1dL1XTBvluuZgFJbs7AWwloSR3HUb ngLQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id z93-20020a509e66000000b0056e0c49296esi536000ede.537.2024.04.18.00.36.36; Thu, 18 Apr 2024 00:36:36 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5622568D3F4; Thu, 18 Apr 2024 10:36:29 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 61D7B68D3C9 for ; Thu, 18 Apr 2024 10:36:22 +0300 (EEST) Received: from loongson.cn (unknown [36.33.26.33]) by gateway (Coremail) with SMTP id _____8DxvrtzzSBmmykpAA--.13651S3; Thu, 18 Apr 2024 15:36:19 +0800 (CST) Received: from localhost (unknown [36.33.26.33]) by localhost.localdomain (Coremail) with SMTP id AQAAf8CxZRFxzSBmQdB+AA--.34087S3; Thu, 18 Apr 2024 15:36:17 +0800 (CST) From: pengxu To: ffmpeg-devel@ffmpeg.org Date: Thu, 18 Apr 2024 15:36:09 +0800 Message-Id: <20240418073609.19365-3-pengxu@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240418073609.19365-1-pengxu@loongson.cn> References: <20240418073609.19365-1-pengxu@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8CxZRFxzSBmQdB+AA--.34087S3 X-CM-SenderInfo: pshqw53x6o00pqjv00gofq/ X-Coremail-Antispam: 1Uk129KBj93XoWfGFykJr18CFWUGry5ZryfZrc_yoWkGFW3pF Zxuw1DKw1kWrZrA3ykX345Zr18WFyrGFnag3W7tr48Cr4fXF48Xrn7tF9rZa4kXw4rAa4S 9a1fKa47JF9Yy3gCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkFb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r106r15M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Jr0_Gr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r126r1DMcIj6I8E87Iv 67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1j6r15MIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6x kF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07jY38nUUUUU= Subject: [FFmpeg-devel] [PATCH v2 2/2] avcodec/loongarch: add LSX optimization for aac audio encode X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: QtExj2qO8i9X Add functions: ff_abs_pow34_lsx ff_aac_quantize_bands_lsx ./ffmpeg -f s16le -ac 2 -i ../../1.pcm -c:a aac -f null - before:37.5x after:48.1x --- libavcodec/aacencdsp.h | 3 + libavcodec/loongarch/Makefile | 2 + .../loongarch/aacencdsp_init_loongarch.c | 33 +++ libavcodec/loongarch/aacencdsp_loongarch.S | 254 ++++++++++++++++++ libavcodec/loongarch/aacencdsp_loongarch.h | 35 +++ 5 files changed, 327 insertions(+) create mode 100644 libavcodec/loongarch/aacencdsp_init_loongarch.c create mode 100644 libavcodec/loongarch/aacencdsp_loongarch.S create mode 100644 libavcodec/loongarch/aacencdsp_loongarch.h diff --git a/libavcodec/aacencdsp.h b/libavcodec/aacencdsp.h index 67836d8cf7..5db27a95a9 100644 --- a/libavcodec/aacencdsp.h +++ b/libavcodec/aacencdsp.h @@ -34,6 +34,7 @@ typedef struct AACEncDSPContext { void ff_aacenc_dsp_init_riscv(AACEncDSPContext *s); void ff_aacenc_dsp_init_x86(AACEncDSPContext *s); +void ff_aacenc_dsp_init_loongarch(AACEncDSPContext *s); static inline void abs_pow34_v(float *out, const float *in, const int size) { @@ -66,6 +67,8 @@ static inline void ff_aacenc_dsp_init(AACEncDSPContext *s) ff_aacenc_dsp_init_riscv(s); #elif ARCH_X86 ff_aacenc_dsp_init_x86(s); +#elif ARCH_LOONGARCH64 + ff_aacenc_dsp_init_loongarch(s); #endif } diff --git a/libavcodec/loongarch/Makefile b/libavcodec/loongarch/Makefile index 07da2964e4..068fd61810 100644 --- a/libavcodec/loongarch/Makefile +++ b/libavcodec/loongarch/Makefile @@ -9,6 +9,7 @@ OBJS-$(CONFIG_HPELDSP) += loongarch/hpeldsp_init_loongarch.o OBJS-$(CONFIG_IDCTDSP) += loongarch/idctdsp_init_loongarch.o OBJS-$(CONFIG_VIDEODSP) += loongarch/videodsp_init.o OBJS-$(CONFIG_HEVC_DECODER) += loongarch/hevcdsp_init_loongarch.o +OBJS-$(CONFIG_AAC_ENCODER) += loongarch/aacencdsp_init_loongarch.o LASX-OBJS-$(CONFIG_H264QPEL) += loongarch/h264qpel_lasx.o LASX-OBJS-$(CONFIG_H264DSP) += loongarch/h264dsp_lasx.o \ loongarch/h264_deblock_lasx.o @@ -38,3 +39,4 @@ LSX-OBJS-$(CONFIG_H264QPEL) += loongarch/h264qpel.o \ loongarch/h264qpel_lsx.o LSX-OBJS-$(CONFIG_H264CHROMA) += loongarch/h264chroma.o LSX-OBJS-$(CONFIG_H264PRED) += loongarch/h264intrapred.o +LSX-OBJS-$(CONFIG_AAC_ENCODER) += loongarch/aacencdsp_loongarch.o diff --git a/libavcodec/loongarch/aacencdsp_init_loongarch.c b/libavcodec/loongarch/aacencdsp_init_loongarch.c new file mode 100644 index 0000000000..5f67a5857d --- /dev/null +++ b/libavcodec/loongarch/aacencdsp_init_loongarch.c @@ -0,0 +1,33 @@ +/* + * AAC encoder assembly optimizations + * Copyright (c) 2024 Loongson Technology Corporation Limited + * Contributed by PengXu + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "aacencdsp_loongarch.h" + +av_cold void ff_aacenc_dsp_init_loongarch(AACEncDSPContext *s) +{ + int cpu_flags = av_get_cpu_flags(); + + if (have_lsx(cpu_flags)) { + s->abs_pow34 = ff_abs_pow34_lsx; + s->quant_bands = ff_aac_quantize_bands_lsx; + } +} \ No newline at end of file diff --git a/libavcodec/loongarch/aacencdsp_loongarch.S b/libavcodec/loongarch/aacencdsp_loongarch.S new file mode 100644 index 0000000000..b80bb98aa9 --- /dev/null +++ b/libavcodec/loongarch/aacencdsp_loongarch.S @@ -0,0 +1,254 @@ +/* + * Loongarch LASX/LSX optimizeds AAC encoder DSP functions + * + * Copyright (c) 2024 Loongson Technology Corporation Limited + * Contributed by PengXu + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "loongson_asm.S" + + +/* void ff_abs_pow34_lsx(float *out, const float *in, const int size); */ +// Param, out:a0, in:a1, size:a2 +function ff_abs_pow34_lsx + move t0, zero //loop param + move t1, zero //data index + + srai.d t2, a2, 2 + beq zero, t2, .FAPL02 + +.FAPL01: + add.d t3, a1, t1 + fld.s f0, t3, 0x00 + fld.s f1, t3, 0x04 + fld.s f2, t3, 0x08 + fld.s f3, t3, 0x0c + + fabs.s f0, f0 + fabs.s f1, f1 + fabs.s f2, f2 + fabs.s f3, f3 + + vextrins.w vr0, vr1, 0x10 + vextrins.w vr0, vr2, 0x20 + vextrins.w vr0, vr3, 0x30 + + vfsqrt.s vr4, vr0 + vfmul.s vr5, vr0, vr4 + vfsqrt.s vr6, vr5 + + vstx vr6, a0, t1 + + addi.d t1, t1, 16 + addi.d t0, t0, 1 + blt t0, t2, .FAPL01 + +.FAPL02: /* &2 */ + andi t0, a2, 2 + beq zero, t0, .FAPL03 + + add.d t3, a1, t1 + add.d t4, a0, t1 + + fld.s f0, t3, 0x00 + fld.s f1, t3, 0x04 + + fabs.s f0, f0 + fabs.s f1, f1 + + fsqrt.s f2, f0 + fsqrt.s f3, f1 + + fmul.s f4, f0, f2 + fmul.s f5, f1, f3 + + fsqrt.s f6, f4 + fsqrt.s f7, f5 + + fld.s f6, t4, 0x00 + fld.s f7, t4, 0x04 + + addi.d t1, t1, 8 + +.FAPL03: /* &1 */ + andi t0, a2, 1 + beq zero, t0, .FAPL04 + + fldx.s f0, a1, t1 + + fabs.s f0, f0 + fsqrt.s f2, f0 + fmul.s f4, f0, f2 + fsqrt.s f6, f4 + + fldx.s f6, a0, t1 + + addi.d t1, t1, 4 + +.FAPL04: +endfunc + + + +/* void ff_aac_quantize_bands_lsx(int *out, const float *in, const float *scaled, + int size, int is_signed, int maxval, const float Q34, + const float rounding) */ +// param: +// out: a0 +// in: a1 +// scaled: a2 +// size: a3 +// is_signed: a4 +// maxval: a5 +// Q34: f0 +// rounding: f1 +function ff_aac_quantize_bands_lsx + move t0, zero //loop param + move t1, zero //data index + + vpermi.w vr0, vr0, 0x00 //Q34 + vpermi.w vr1, vr1, 0x00 //rounding + + srai.d t2, a3, 2 ////loop max + beq zero, t2, .FAQBL02 + +.FAQBL01: /* /4 */ + vldx vr2, a2, t1 + vfmul.s vr3, vr2, vr0 //qc + vfadd.s vr4, vr3, vr1 + + movgr2fr.w f5, a5 + ffint.s.w f5, f5 + vpermi.w vr5, vr5, 0x00 //maxval + vfmin.s vr6, vr4, vr5 + vfrintrz.s vr7, vr6 //(float .0)tmp + + beq a4, zero, .S4ISEND + + fsub.s f8, f0, f0 + vshuf4i.w vr8, vr8, 0x00 //0.0f + vldx vr9, a1, t1 //in + vextrins.w vr10, vr9, 0x01 + vextrins.w vr11, vr9, 0x02 + vextrins.w vr12, vr9, 0x03 +.S4IS00: + fcmp.clt.s $fcc0, f9, f8 + bceqz $fcc0, .S4IS01 + vextrins.w vr13, vr7, 0x00 + fneg.s f13, f13 + vextrins.w vr7, vr13, 0x00 +.S4IS01: + fcmp.clt.s $fcc1, f10, f8 + bceqz $fcc1, .S4IS02 + vextrins.w vr13, vr7, 0x01 + fneg.s f13, f13 + vextrins.w vr7, vr13, 0x10 +.S4IS02: + fcmp.clt.s $fcc2, f11, f8 + bceqz $fcc2, .S4IS03 + vextrins.w vr13, vr7, 0x02 + fneg.s f13, f13 + vextrins.w vr7, vr13, 0x20 +.S4IS03: + fcmp.clt.s $fcc3, f12, f8 + bceqz $fcc3, .S4ISEND + vextrins.w vr13, vr7, 0x03 + fneg.s f13, f13 + vextrins.w vr7, vr13, 0x30 +.S4ISEND: + vftintrz.w.s vr14, vr7 + vstx vr14, a0, t1 + addi.d t1, t1, 16 + addi.d t0, t0, 1 + blt t0, t2, .FAQBL01 + +.FAQBL02: /* /2 */ + andi t2, a3, 2 + beq $r0, t2, .FAQBL03 + + vldx vr2, a2, t1 + vfmul.s vr3, vr2, vr0 //qc + vfadd.s vr4, vr3, vr1 + + movgr2fr.w f5, a5 + ffint.s.w f5, f5 + vpermi.w vr5, vr5, 0x00 //maxval + vfmin.s vr6, vr4, vr5 + vfrintrz.s vr7, vr6 //(float .0)tmp + + beq a4, zero, .S2ISEND + + fsub.s f8, f0, f0 + vshuf4i.w vr8, vr8, 0x00 //0.0f + vldx vr9, a1, t1 //in + vextrins.w vr10, vr9, 0x01 +.S2IS00: + fcmp.clt.s $fcc0, f9, f8 + bceqz $fcc0, .S2IS01 + vextrins.w vr13, vr7, 0x00 + fneg.s f13, f13 + vextrins.w vr7, vr13, 0x00 +.S2IS01: + fcmp.clt.s $fcc1, f10, f8 + bceqz $fcc1, .S2ISEND + vextrins.w vr13, vr7, 0x01 + fneg.s f13, f13 + vextrins.w vr7, vr13, 0x10 +.S2ISEND: + vftintrz.w.s vr14, vr7 + vpickve2gr.w t3, vr14, 0 + vpickve2gr.w t4, vr14, 1 + add.d t7, a0, t1 + st.w t3, t7, 0x00 + st.w t4, t7, 0x04 + addi.d t1, t1, 8 + +.FAQBL03: /* /1 */ + andi t2, a3, 1 + beq $r0, t2, .FAQBL04 + + vldx vr2, a2, t1 + vfmul.s vr3, vr2, vr0 //qc + vfadd.s vr4, vr3, vr1 + + movgr2fr.w f5, a5 + ffint.s.w f5, f5 + vpermi.w vr5, vr5, 0x00 //maxval + vfmin.s vr6, vr4, vr5 + vfrintrz.s vr7, vr6 //(float .0)tmp + + beq a4, zero, .S1ISEND + + fsub.s f8, f0, f0 + vshuf4i.w vr8, vr8, 0x00 //0.0f + vldx vr9, a1, t1 //in +.S1IS00: + fcmp.clt.s $fcc0, f9, f8 + bceqz $fcc0, .S1ISEND + vextrins.w vr13, vr7, 0x00 + fneg.s f13, f13 + vextrins.w vr7, vr13, 0x00 +.S1ISEND: + vftintrz.w.s vr14, vr7 + vpickve2gr.w t3, vr14, 0 + stx.w t3, a0, t1 + addi.d t1, t1, 4 + +.FAQBL04: +endfunc \ No newline at end of file diff --git a/libavcodec/loongarch/aacencdsp_loongarch.h b/libavcodec/loongarch/aacencdsp_loongarch.h new file mode 100644 index 0000000000..076cd4d247 --- /dev/null +++ b/libavcodec/loongarch/aacencdsp_loongarch.h @@ -0,0 +1,35 @@ +/* + * AAC encoder assembly optimizations + * Copyright (c) 2024 Loongson Technology Corporation Limited + * Contributed by PengXu + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVCODEC_LOONGARCH_AACENC_H +#define AVCODEC_LOONGARCH_AACENC_H + +#include "libavutil/float_dsp.h" +#include "libavutil/loongarch/cpu.h" +#include "libavcodec/aacenc.h" + +void ff_abs_pow34_lsx(float *out, const float *in, const int size); +void ff_aac_quantize_bands_lsx(int *out, const float *in, const float *scaled, + int size, int is_signed, int maxval, const float Q34, + const float rounding); + +#endif /* AVCODEC_LOONGARCH_AACENC_H */ \ No newline at end of file