From patchwork Tue Sep 20 08:13:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lynne X-Patchwork-Id: 38067 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp1770566pzh; Tue, 20 Sep 2022 01:13:32 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5wR51SXRarptNR6eMeHKGPCAWdiPew/85qZkIEV2au+LkLey78DsiN43vY7O8B7Ass/Vmg X-Received: by 2002:a17:906:eeca:b0:730:6880:c398 with SMTP id wu10-20020a170906eeca00b007306880c398mr16290577ejb.706.1663661612538; Tue, 20 Sep 2022 01:13:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663661612; cv=none; d=google.com; s=arc-20160816; b=I7YbrVCJznbFgilJbfL+3FmodfsY4PN01qO4seIWOQJQaeHAGXsbAfNjdNPneCKiyJ VsSnHtJsGLR+V+bt8eP+YRVYYmDA73OUnKH87wMP29WZBpkteI2/XFNgzSZnILA+sfJJ U2bmooI77+MNDfbjzqxX2hGWn0gOyf383sUVyZwvpGc9Hy1ZqsdZdQt2Vh5FDY811/6n In9mauRee9TAXba4bSwOiw1XLYmeHHl0wMfUIff5CxxDvEe6bg3Hha54zeCI9b5cg6VZ +9suzsaHmBKyZcNIPnNIbhVko+I3V/MfOPvG9b8ApA31M2aVOoUk7WMT1d3oJ4dkp6fr Kz1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject :mime-version:message-id:to:from:date:dkim-signature:delivered-to; bh=3JEAuTEYGIB/RW/VqtCF7QSYLgR0IAfj5T/PO7stedc=; b=A0LcuABU8Q+PQXxC0UJlxyOfElguWfF6R8Fade8jWRC68Z1lLGZjMJ0ZHhnk/KgwCh Vt1C2X+3bk0g58QIqrMvP+Ao8G87leJaNGUbEgZORPyf+SOLNnrOQLZ4HbL8nsAlomHx xDmTG44oFZ8d9hHQWpv4ANGTr+CCUZugW1RxmvRkFJYhfqBr0z/g+boDGkFxmoVvMF67 wDRpPjUQ1bdl+oc4+T2O9fOI8SeiczAJsJyN671lrWTdUZzXEzj6IMM52o33vugD3lGU BW+Nx04P3+zEmw1IKAPUe9eh632c8AYajVfNP89CxE14S7FyFA4cKInvU+OAAU2HfwZg Erbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b=ymNoxfqx; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id la12-20020a170906ad8c00b0072ef52514f6si614030ejb.629.2022.09.20.01.13.31; Tue, 20 Sep 2022 01:13:32 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b=ymNoxfqx; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E533B68BB01; Tue, 20 Sep 2022 11:13:26 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CF61E68B29E for ; Tue, 20 Sep 2022 11:13:20 +0300 (EEST) Received: from w3.tutanota.de (unknown [192.168.1.164]) by w4.tutanota.de (Postfix) with ESMTP id 005E71060313 for ; Tue, 20 Sep 2022 08:13:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1663661599; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:Sender; bh=7ZFkPbZhcMBOVfNHef4mH3x5m8U/bnacfhhjqIXBKCw=; b=ymNoxfqx5L0Jo3+Q0L/K9L9d1i9H3DlJAzzP9eVW6iBDfd2+L6RHfc/SM1ldPZtB gaZBpx/nFQ8n3cGsfu/zPCGWAQ6a+tcKkaJyRaiK+qV1zMZNz1pL/qO5sxyimxtDwlw VOWPhTSoe7CVIcigsmudJpAJgiOKTtzHJPq71TVssD2yvigaNyDV7586Aqk5lGHd7sp 9GPKHWozzcutSNREzuh2h+Qns/hd5nnOst8m2vZxbWEQRahmbEmtg8gLbxlgODTUVJN 8ze+eRUZg1FOvEEP5k8Vut8+4pXNSOKWHc1JhVRziNE7OSh7ukTTxG6tn5MQhbFD8wk rHMbvnaTGA== Date: Tue, 20 Sep 2022 10:13:19 +0200 (CEST) From: Lynne To: Ffmpeg Devel Message-ID: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] x86/lpc: write a new Welch windowing function X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: saR8+lsGLhlq Old one was written with the assumption only even inputs would be given. This very messy replacement supports even and odd inputs, and supports AVX2 for extra speed. The buffers given are usually quite big (4k samples), so the speedup is worth it. The new SSE version is still faster than the old inline asm version. The only place where this function mismatches the C version is with very small odd buffers (3 samples when using AVX2). I haven't found such scenarios possible. Also checkasm is provided to make sure this monstrosity works. This fixes some FATE tests. Patch attached. From 23d18909e3561b7f6ecf04dfbf97b6e7e410a312 Mon Sep 17 00:00:00 2001 From: Lynne Date: Mon, 19 Sep 2022 23:48:53 +0200 Subject: [PATCH] x86/lpc: write a new Welch windowing function Old one was written with the assumption only even inputs would be given. This very messy replacement supports even and odd inputs, and supports AVX2 for extra speed. The buffers given are usually quite big (4k samples), so the speedup is worth it. The new SSE version is still faster than the old inline asm version. The only place where this function mismatches the C version is with very small odd buffers (3 samples when using AVX2). I haven't found such scenarios possible. Also checkasm is provided to make sure this monstrosity works. This fixes some FATE tests. --- libavcodec/lpc.c | 30 +--- libavcodec/lpc.h | 3 + libavcodec/x86/Makefile | 3 +- libavcodec/x86/lpc.asm | 232 +++++++++++++++++++++++++++ libavcodec/x86/{lpc.c => lpc_init.c} | 72 ++------- tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 + tests/checkasm/checkasm.h | 1 + tests/checkasm/lpc.c | 76 +++++++++ 9 files changed, 336 insertions(+), 85 deletions(-) create mode 100644 libavcodec/x86/lpc.asm rename libavcodec/x86/{lpc.c => lpc_init.c} (64%) create mode 100644 tests/checkasm/lpc.c diff --git a/libavcodec/lpc.c b/libavcodec/lpc.c index 3238ad5fc8..ce2fe102ea 100644 --- a/libavcodec/lpc.c +++ b/libavcodec/lpc.c @@ -31,8 +31,8 @@ /** * Apply Welch window function to audio block */ -static void lpc_apply_welch_window_c(const int32_t *data, int len, - double *w_data) +void ff_lpc_apply_welch_window_c(const int32_t *data, int len, + double *w_data) { int i, n2; double w; @@ -247,30 +247,6 @@ int ff_lpc_calc_coefs(LPCContext *s, for(j=0; j>pass) + fabs(eval - var[0]); - inv = 1/eval; - rinv = sqrt(inv); - for(j=0; j<=max_order; j++) - var[j] *= rinv; - weight += inv; - }else - weight++; - - m[pass&1].update_lls(&m[pass&1], var); - } - avpriv_solve_lls(&m[pass&1], 0.001, 0); - } for(i=0; iwindowed_samples = s->windowed_buffer + FFALIGN(max_order, 4); - s->lpc_apply_welch_window = lpc_apply_welch_window_c; + s->lpc_apply_welch_window = ff_lpc_apply_welch_window_c; s->lpc_compute_autocorr = lpc_compute_autocorr_c; #if ARCH_X86 diff --git a/libavcodec/lpc.h b/libavcodec/lpc.h index e1b41bfd9b..a734bb44c0 100644 --- a/libavcodec/lpc.h +++ b/libavcodec/lpc.h @@ -110,6 +110,9 @@ int ff_lpc_init(LPCContext *s, int blocksize, int max_order, enum FFLPCType lpc_type); void ff_lpc_init_x86(LPCContext *s); +void ff_lpc_apply_welch_window_c(const int32_t *data, int len, + double *w_data); + /** * Uninitialize LPCContext. */ diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile index 4e448623af..e1120b7e15 100644 --- a/libavcodec/x86/Makefile +++ b/libavcodec/x86/Makefile @@ -23,7 +23,7 @@ OBJS-$(CONFIG_LLVIDENCDSP) += x86/lossless_videoencdsp_init.o OBJS-$(CONFIG_HUFFYUVDSP) += x86/huffyuvdsp_init.o OBJS-$(CONFIG_HUFFYUVENCDSP) += x86/huffyuvencdsp_init.o OBJS-$(CONFIG_IDCTDSP) += x86/idctdsp_init.o -OBJS-$(CONFIG_LPC) += x86/lpc.o +OBJS-$(CONFIG_LPC) += x86/lpc_init.o OBJS-$(CONFIG_MDCT15) += x86/mdct15_init.o OBJS-$(CONFIG_ME_CMP) += x86/me_cmp_init.o OBJS-$(CONFIG_MPEGAUDIODSP) += x86/mpegaudiodsp.o @@ -127,6 +127,7 @@ X86ASM-OBJS-$(CONFIG_IDCTDSP) += x86/idctdsp.o X86ASM-OBJS-$(CONFIG_LLAUDDSP) += x86/lossless_audiodsp.o X86ASM-OBJS-$(CONFIG_LLVIDDSP) += x86/lossless_videodsp.o X86ASM-OBJS-$(CONFIG_LLVIDENCDSP) += x86/lossless_videoencdsp.o +X86ASM-OBJS-$(CONFIG_LPC) += x86/lpc.o X86ASM-OBJS-$(CONFIG_MDCT15) += x86/mdct15.o X86ASM-OBJS-$(CONFIG_ME_CMP) += x86/me_cmp.o X86ASM-OBJS-$(CONFIG_MPEGAUDIODSP) += x86/imdct36.o diff --git a/libavcodec/x86/lpc.asm b/libavcodec/x86/lpc.asm new file mode 100644 index 0000000000..1ec337b0d4 --- /dev/null +++ b/libavcodec/x86/lpc.asm @@ -0,0 +1,232 @@ +;****************************************************************************** +;* Copyright (c) Lynne +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;****************************************************************************** + +%include "libavutil/x86/x86util.asm" + +SECTION_RODATA 32 + +one_tab: times 4 dq 1.0 +seq_tab_avx2: dq 3.0, 2.0, 1.0, 0.0 +sub_tab: dq -1.0, -2.0, -3.0, -4.0 +add_tab_avx2: times 4 dq 4.0 +dec_tab_avx2: times 4 dq -4.0 +add_tab_sse2: times 2 dq 2.0 +dec_tab_sse2: times 2 dq -2.0 +dec_tab_scalar: times 2 dq -1.0 +seq_tab_sse2: dq 1.0, 0.0 + +SECTION .text + +%macro APPLY_WELCH_FN 0 +cglobal lpc_apply_welch_window, 3, 6, 8, data, len, out, off1, off2, lbak + cmp lenq, 0 + je .end + + movapd m6, [one_tab] + + movd xm1, lend + cvtdq2pd xm1, xm1 ; len +%if cpuflag(avx2) + vbroadcastsd m1, xm1 +%else + shufpd m1, m1, 00b +%endif + + addpd m0, m6, m6 ; 2.0 + subpd m1, m6 ; len - 1 + divpd m0, m1 ; 2.0 / (len - 1) + + mov off1q, lenq + and off1q, 1 + je .even + + addpd m0, [sub_tab] +%if cpuflag(avx2) + movapd m7, [dec_tab_avx2] +%else + movapd m7, [dec_tab_sse2] +%endif + + lea off2q, [lenq*4 - mmsize/2] + sub lenq, mmsize/4 + xor off1q, off1q + + cmp lenq, mmsize/4 + jl .scalar_o + +.loop_o: + movapd m1, m6 + mulpd m2, m0, m0 + subpd m1, m2 +%if cpuflag(avx2) + vpermpd m2, m1, q0123 +%else + shufpd m2, m1, m1, 01b +%endif + + cvtdq2pd m3, [dataq + off1q] + cvtdq2pd m4, [dataq + off2q] + + mulpd m1, m3 + mulpd m2, m4 + + movupd [outq + off1q*2], m1 + movupd [outq + off2q*2], m2 + + addpd m0, m7 + add off1q, mmsize/2 + sub off2q, mmsize/2 + sub lenq, mmsize/4 + jg .loop_o + +.scalar_o: +%if !cpuflag(avx2) + subpd m0, m7 +%endif + subpd m0, m7 + movapd m7, [dec_tab_scalar] + subpd m0, m7 + + sub off1q, (mmsize/2) - 8*cpuflag(avx2) + add off2q, (mmsize/2) + 4 + + addpd xm0, [sub_tab] + +.loop_o_scalar: + movapd xm1, xm6 + mulpd xm2, xm0, xm0 + subpd xm1, xm2 + + cvtdq2pd m3, [dataq + off1q - 4] + cvtdq2pd m4, [dataq + off2q - 4] + + mulpd m3, m1 + mulpd m4, m1 + + movhpd [outq + off1q*2], xm3 + movhpd [outq + off2q*2], xm4 + + addpd xm0, xm7 + + add off1q, 4 + sub off2q, 4 + cmp off1q, off2q + jne .loop_o_scalar + RET + +.even: +%if cpuflag(avx2) + addpd m0, [seq_tab_avx2] +%else + addpd m0, [seq_tab_sse2] +%endif + + mov off1d, lend + shr off1d, 1 + movd xm1, off1d + cvtdq2pd xm1, xm1 ; len/2 +%if cpuflag(avx2) + vbroadcastsd m1, xm1 +%else + shufpd m1, m1, 00b +%endif + subpd m0, m1 + +%if cpuflag(avx2) + movapd m7, [add_tab_avx2] +%else + movapd m7, [add_tab_sse2] +%endif + + lea off2q, [lenq*2] + lea off1q, [lenq*2 - mmsize/2] + sub lenq, mmsize/4 + + cmp lenq, mmsize/4 + jl .scalar_e + +.loop_e: + movapd m1, m6 + mulpd m2, m0, m0 + subpd m1, m2 +%if cpuflag(avx2) + vpermpd m2, m1, q0123 +%else + shufpd m2, m1, m1, 01b +%endif + + cvtdq2pd m3, [dataq + off1q] + cvtdq2pd m4, [dataq + off2q] + + mulpd m1, m3 + mulpd m2, m4 + + movupd [outq + off1q*2], m1 + movupd [outq + off2q*2], m2 + + addpd m0, m7 + add off2q, mmsize/2 + sub off1q, mmsize/2 + sub lenq, mmsize/4 + jge .loop_e + + je .end + +.scalar_e: + subpd m0, m7 + movapd m7, [dec_tab_scalar] + subpd m0, m7 + subpd m0, m7 + subpd m0, m7 + + add off1q, (mmsize/2) + sub off2q, (mmsize/2) - 4 - 8*cpuflag(avx2) + + addpd xm0, [sub_tab] + +.loop_e_scalar: + movapd xm1, xm6 + mulpd xm2, xm0, xm0 + subpd xm1, xm2 + + cvtdq2pd m3, [dataq + off1q - 4] + cvtdq2pd m4, [dataq + off2q - 4] + + mulpd m3, m1 + mulpd m4, m1 + + movhpd [outq + off1q*2], xm3 + movhpd [outq + off2q*2], xm4 + + subpd xm0, xm7 + + add off2q, 4 + sub off1q, 4 + jge .loop_e_scalar + +.end: + RET +%endmacro + +INIT_XMM sse2 +APPLY_WELCH_FN + +INIT_YMM avx2 +APPLY_WELCH_FN diff --git a/libavcodec/x86/lpc.c b/libavcodec/x86/lpc_init.c similarity index 64% rename from libavcodec/x86/lpc.c rename to libavcodec/x86/lpc_init.c index 40fc29fc0f..df77c966c6 100644 --- a/libavcodec/x86/lpc.c +++ b/libavcodec/x86/lpc_init.c @@ -20,65 +20,19 @@ */ #include "libavutil/attributes.h" -#include "libavutil/cpu.h" -#include "libavutil/mem_internal.h" #include "libavutil/x86/asm.h" #include "libavutil/x86/cpu.h" #include "libavcodec/lpc.h" +void ff_lpc_apply_welch_window_sse2(const int32_t *data, int len, + double *w_data); +void ff_lpc_apply_welch_window_avx2(const int32_t *data, int len, + double *w_data); + DECLARE_ASM_CONST(16, double, pd_1)[2] = { 1.0, 1.0 }; -DECLARE_ASM_CONST(16, double, pd_2)[2] = { 2.0, 2.0 }; #if HAVE_SSE2_INLINE -static void lpc_apply_welch_window_sse2(const int32_t *data, int len, - double *w_data) -{ - double c = 2.0 / (len-1.0); - int n2 = len>>1; - x86_reg i = -n2*sizeof(int32_t); - x86_reg j = n2*sizeof(int32_t); - __asm__ volatile( - "movsd %4, %%xmm7 \n\t" - "movapd "MANGLE(pd_1)", %%xmm6 \n\t" - "movapd "MANGLE(pd_2)", %%xmm5 \n\t" - "movlhps %%xmm7, %%xmm7 \n\t" - "subpd %%xmm5, %%xmm7 \n\t" - "addsd %%xmm6, %%xmm7 \n\t" - "test $1, %5 \n\t" - "jz 2f \n\t" -#define WELCH(MOVPD, offset)\ - "1: \n\t"\ - "movapd %%xmm7, %%xmm1 \n\t"\ - "mulpd %%xmm1, %%xmm1 \n\t"\ - "movapd %%xmm6, %%xmm0 \n\t"\ - "subpd %%xmm1, %%xmm0 \n\t"\ - "pshufd $0x4e, %%xmm0, %%xmm1 \n\t"\ - "cvtpi2pd (%3,%0), %%xmm2 \n\t"\ - "cvtpi2pd "#offset"*4(%3,%1), %%xmm3 \n\t"\ - "mulpd %%xmm0, %%xmm2 \n\t"\ - "mulpd %%xmm1, %%xmm3 \n\t"\ - "movapd %%xmm2, (%2,%0,2) \n\t"\ - MOVPD" %%xmm3, "#offset"*8(%2,%1,2) \n\t"\ - "subpd %%xmm5, %%xmm7 \n\t"\ - "sub $8, %1 \n\t"\ - "add $8, %0 \n\t"\ - "jl 1b \n\t"\ - - WELCH("movupd", -1) - "jmp 3f \n\t" - "2: \n\t" - WELCH("movapd", -2) - "3: \n\t" - :"+&r"(i), "+&r"(j) - :"r"(w_data+n2), "r"(data+n2), "m"(c), "r"(len) - NAMED_CONSTRAINTS_ARRAY_ADD(pd_1,pd_2) - XMM_CLOBBERS_ONLY("%xmm0", "%xmm1", "%xmm2", "%xmm3", - "%xmm5", "%xmm6", "%xmm7") - ); -#undef WELCH -} - static void lpc_compute_autocorr_sse2(const double *data, int len, int lag, double *autoc) { @@ -151,12 +105,16 @@ static void lpc_compute_autocorr_sse2(const double *data, int len, int lag, av_cold void ff_lpc_init_x86(LPCContext *c) { -#if HAVE_SSE2_INLINE int cpu_flags = av_get_cpu_flags(); - if (INLINE_SSE2_SLOW(cpu_flags)) { - c->lpc_apply_welch_window = lpc_apply_welch_window_sse2; - c->lpc_compute_autocorr = lpc_compute_autocorr_sse2; - } -#endif /* HAVE_SSE2_INLINE */ +#if HAVE_SSE2_INLINE + if (INLINE_SSE2_SLOW(cpu_flags)) + c->lpc_compute_autocorr = lpc_compute_autocorr_sse2; +#endif + + if (EXTERNAL_SSE2(cpu_flags)) + c->lpc_apply_welch_window = ff_lpc_apply_welch_window_sse2; + + if (EXTERNAL_AVX2(cpu_flags)) + c->lpc_apply_welch_window = ff_lpc_apply_welch_window_avx2; } diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 1ac170491b..add6ed7e43 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -11,6 +11,7 @@ AVCODECOBJS-$(CONFIG_H264QPEL) += h264qpel.o AVCODECOBJS-$(CONFIG_IDCTDSP) += idctdsp.o AVCODECOBJS-$(CONFIG_LLVIDDSP) += llviddsp.o AVCODECOBJS-$(CONFIG_LLVIDENCDSP) += llviddspenc.o +AVCODECOBJS-$(CONFIG_LPC) += lpc.o AVCODECOBJS-$(CONFIG_ME_CMP) += motion.o AVCODECOBJS-$(CONFIG_VC1DSP) += vc1dsp.o AVCODECOBJS-$(CONFIG_VP8DSP) += vp8dsp.o diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index e56fd3850e..50e3d877e2 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -135,6 +135,9 @@ static const struct { #if CONFIG_LLVIDENCDSP { "llviddspenc", checkasm_check_llviddspenc }, #endif + #if CONFIG_LPC + { "lpc", checkasm_check_lpc }, + #endif #if CONFIG_ME_CMP { "motion", checkasm_check_motion }, #endif diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index d7645d3730..16571d8c3a 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -68,6 +68,7 @@ void checkasm_check_idctdsp(void); void checkasm_check_jpeg2000dsp(void); void checkasm_check_llviddsp(void); void checkasm_check_llviddspenc(void); +void checkasm_check_lpc(void); void checkasm_check_motion(void); void checkasm_check_nlmeans(void); void checkasm_check_opusdsp(void); diff --git a/tests/checkasm/lpc.c b/tests/checkasm/lpc.c new file mode 100644 index 0000000000..c94fe1d261 --- /dev/null +++ b/tests/checkasm/lpc.c @@ -0,0 +1,76 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include "libavutil/mem_internal.h" + +#include "libavcodec/lpc.h" + +#include "checkasm.h" + +#define randomize_int32(buf, len) \ + do { \ + for (int i = 0; i < len; i++) { \ + int32_t f = rnd() >> 16; \ + buf[i] = f; \ + } \ + } while (0) + +#define EPS 0.005 + +static void test_window(int len) +{ + LOCAL_ALIGNED(16, int32_t, src, [len]); + LOCAL_ALIGNED(16, double, dst0, [len]); + LOCAL_ALIGNED(16, double, dst1, [len]); + + declare_func(void, int32_t *in, int len, double *out); + + randomize_int32(src, len); + + call_ref(src, len, dst0); + call_new(src, len, dst1); + + if (!double_near_abs_eps_array(dst0, dst1, EPS, len)) + fail(); + + bench_new(src, len, dst1); +} + +void checkasm_check_lpc(void) +{ + LPCContext ctx; + ff_lpc_init(&ctx, 32, 16, FF_LPC_TYPE_DEFAULT); + + if (check_func(ctx.lpc_apply_welch_window, "apply_welch_window_32")) + test_window(32); + report("apply_welch_window_32"); + + if (check_func(ctx.lpc_apply_welch_window, "apply_welch_window_33")) + test_window(33); + report("apply_welch_window_33"); + + if (check_func(ctx.lpc_apply_welch_window, "apply_welch_window_4096")) + test_window(4096); + report("apply_welch_window_4096"); + + if (check_func(ctx.lpc_apply_welch_window, "apply_welch_window_4095")) + test_window(29); + report("apply_welch_window_4095"); + + ff_lpc_end(&ctx); +} -- 2.37.2.609.g9ff673ca1a