From patchwork Sun Feb 28 14:47:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thilo Borgmann X-Patchwork-Id: 26026 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 971D444A665 for ; Sun, 28 Feb 2021 16:47:14 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 62F3E68A7FD; Sun, 28 Feb 2021 16:47:14 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from shout01.mail.de (shout01.mail.de [62.201.172.24]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4E4F668A640 for ; Sun, 28 Feb 2021 16:47:07 +0200 (EET) Received: from postfix01.mail.de (postfix03.bt.mail.de [10.0.121.127]) by shout01.mail.de (Postfix) with ESMTP id DDF261003B9 for ; Sun, 28 Feb 2021 15:47:06 +0100 (CET) Received: from smtp03.mail.de (smtp03.bt.mail.de [10.0.121.213]) by postfix01.mail.de (Postfix) with ESMTP id C5A1880150 for ; Sun, 28 Feb 2021 15:47:06 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=mail.de; s=mailde202009; t=1614523626; bh=f17/C3NQsv+ldEaANnR5QJau6UkUExtsN4WD8TfVgoM=; h=To:From:Subject:Date:From; b=TP5rsIJeNcn63c4He7AKAc6iFz7o4A7yNeogU2jWBvEpi4xVYfPIaADyMBAhLHpcr fgK5l1+EsBXSEtc2cmK6XYrsqZuB2hfZm4KmFhgkNwTxA7EBSwmxAAoFqp+Nw0zZ76 xr6mBpfuMdsKWgLUdKVMntxs2QxN1HR4q7wNGFtQZ3/2Q6/vOVU1hede4G15op7kg7 dR185bmoxX1DjY8BzruHaZOOEfF88I7ZW6G2FoSeR9MwAnpOAtdKiU3HbC8rrUo3Xv bEGmgZSCciW5Ppdi/dbHjPwVoBXk5F2GllDTVEk50c42UCtv4/ZwYUtIbeZ1yh94h5 e5VXEpN8L4+gQ== Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by smtp03.mail.de (Postfix) with ESMTPSA id 7C8E2A02FF for ; Sun, 28 Feb 2021 15:47:06 +0100 (CET) To: FFmpeg development discussions and patches From: Thilo Borgmann Message-ID: <680e2122-47b1-008e-6ae2-cab6e3043bd4@mail.de> Date: Sun, 28 Feb 2021 15:47:05 +0100 MIME-Version: 1.0 Content-Language: en-US X-purgate: clean X-purgate: This mail is considered clean (visit http://www.eleven.de for further information) X-purgate-type: clean X-purgate-Ad: Categorized by eleven eXpurgate (R) http://www.eleven.de X-purgate: This mail is considered clean (visit http://www.eleven.de for further information) X-purgate: clean X-purgate-size: 27724 X-purgate-ID: 154282::1614523626-00002F64-995841A4/0/0 Subject: [FFmpeg-devel] [PATCH] lavc/alsdec: Add NEON optimizations X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Hi, it's my first attempt to do some assembly, it might still includes some dont's of the asm world... Tested with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Speed-wise, it sees a drop for small prediction orders until around 10 or 11. Well, the maximum prediction order is 1023. I therefore checked with the "real-world" samples from the fate-suite, which suggests low prediction orders are non-dominant: pred_order = 9, gain: -6% als_reconstruct_all_c: 15898.2 als_reconstruct_all_neon: 16460.0 pred_order = 15,gain: 35% als_reconstruct_all_c: 34843.7 als_reconstruct_all_neon: 22840.5 pred_order = {7..17}, gain: 23% als_reconstruct_all_c: 26645.2 als_reconstruct_all_neon: 20635.2 patched: TEST mpeg4-als-conformance-00 TEST mpeg4-als-conformance-01 TEST mpeg4-als-conformance-02 TEST mpeg4-als-conformance-03 TEST mpeg4-als-conformance-04 TEST mpeg4-als-conformance-05 TEST mpeg4-als-conformance-09 real 0m1.006s user 0m0.903s sys 0m0.112s real 0m1.007s user 0m0.889s sys 0m0.127s real 0m1.005s user 0m0.897s sys 0m0.117s unpatched: TEST mpeg4-als-conformance-00 TEST mpeg4-als-conformance-01 TEST mpeg4-als-conformance-02 TEST mpeg4-als-conformance-03 TEST mpeg4-als-conformance-04 TEST mpeg4-als-conformance-05 TEST mpeg4-als-conformance-09 real 0m1.204s user 0m1.122s sys 0m0.091s real 0m1.204s user 0m1.098s sys 0m0.115s real 0m1.205s user 0m1.077s sys 0m0.137s -Thilo From 42a4d5f581570b0d292b63bb193e3e8da9645fcd Mon Sep 17 00:00:00 2001 From: Thilo Borgmann Date: Sun, 28 Feb 2021 14:13:32 +0000 Subject: [PATCH] lavc/alsdec: Add NEON optimizations --- configure | 3 +- libavcodec/Makefile | 1 + libavcodec/aarch64/Makefile | 2 + libavcodec/aarch64/alsdsp_init_aarch64.c | 35 +++++ libavcodec/aarch64/alsdsp_neon.S | 155 +++++++++++++++++++++++ libavcodec/alsdec.c | 13 +- libavcodec/alsdsp.c | 49 +++++++ libavcodec/alsdsp.h | 35 +++++ tests/checkasm/Makefile | 1 + tests/checkasm/alsdsp.c | 81 ++++++++++++ tests/checkasm/checkasm.c | 3 + tests/checkasm/checkasm.h | 1 + 12 files changed, 370 insertions(+), 9 deletions(-) create mode 100644 libavcodec/aarch64/alsdsp_init_aarch64.c create mode 100644 libavcodec/aarch64/alsdsp_neon.S create mode 100644 libavcodec/alsdsp.c create mode 100644 libavcodec/alsdsp.h create mode 100644 tests/checkasm/alsdsp.c diff --git a/configure b/configure index 900505756b..30875f87f2 100755 --- a/configure +++ b/configure @@ -2345,6 +2345,7 @@ CONFIG_EXTRA=" aandcttables ac3dsp adts_header + alsdsp atsc_a53 audio_frame_queue audiodsp @@ -2664,7 +2665,7 @@ adpcm_g722_decoder_select="g722dsp" adpcm_g722_encoder_select="g722dsp" aic_decoder_select="golomb idctdsp" alac_encoder_select="lpc" -als_decoder_select="bswapdsp" +als_decoder_select="bswapdsp alsdsp" amrnb_decoder_select="lsp" amrwb_decoder_select="lsp" amv_decoder_select="sp5x_decoder exif" diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 35318f4f4d..8a23ab8ea0 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -62,6 +62,7 @@ OBJS = ac3_parser.o \ OBJS-$(CONFIG_AANDCTTABLES) += aandcttab.o OBJS-$(CONFIG_AC3DSP) += ac3dsp.o ac3.o ac3tab.o OBJS-$(CONFIG_ADTS_HEADER) += adts_header.o mpeg4audio.o +OBJS-$(CONFIG_ALSDSP) += alsdsp.o OBJS-$(CONFIG_AMF) += amfenc.o OBJS-$(CONFIG_AUDIO_FRAME_QUEUE) += audio_frame_queue.o OBJS-$(CONFIG_ATSC_A53) += atsc_a53.o diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index f6434e40da..a7493c7c2b 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -1,4 +1,5 @@ # subsystems +OBJS-$(CONFIG_ALSDSP) += aarch64/alsdsp_init_aarch64.o OBJS-$(CONFIG_FFT) += aarch64/fft_init_aarch64.o OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_init.o OBJS-$(CONFIG_H264CHROMA) += aarch64/h264chroma_init_aarch64.o @@ -52,6 +53,7 @@ NEON-OBJS-$(CONFIG_VP8DSP) += aarch64/vp8dsp_neon.o # decoders/encoders NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/aacpsdsp_neon.o +NEON-OBJS-$(CONFIG_ALS_DECODER) += aarch64/alsdsp_neon.o NEON-OBJS-$(CONFIG_DCA_DECODER) += aarch64/synth_filter_neon.o NEON-OBJS-$(CONFIG_OPUS_DECODER) += aarch64/opusdsp_neon.o NEON-OBJS-$(CONFIG_VORBIS_DECODER) += aarch64/vorbisdsp_neon.o diff --git a/libavcodec/aarch64/alsdsp_init_aarch64.c b/libavcodec/aarch64/alsdsp_init_aarch64.c new file mode 100644 index 0000000000..130b1a615e --- /dev/null +++ b/libavcodec/aarch64/alsdsp_init_aarch64.c @@ -0,0 +1,35 @@ +/* + * Copyright (c) 2021 Thilo Borgmann + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "config.h" + +#include "libavutil/aarch64/cpu.h" +#include "libavcodec/alsdsp.h" + +void ff_alsdsp_reconstruct_all_neon(int32_t *samples, int32_t *samples_end, int32_t *coeffs, unsigned int opt_order); + +av_cold void ff_alsdsp_init_neon(ALSDSPContext *s) +{ + int cpu_flags = av_get_cpu_flags(); + + if (have_neon(cpu_flags)) { + s->reconstruct_all = ff_alsdsp_reconstruct_all_neon; + } +} diff --git a/libavcodec/aarch64/alsdsp_neon.S b/libavcodec/aarch64/alsdsp_neon.S new file mode 100644 index 0000000000..fe95eaf843 --- /dev/null +++ b/libavcodec/aarch64/alsdsp_neon.S @@ -0,0 +1,155 @@ +/* + * Copyright (c) 2021 Thilo Borgmann + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" +#include "neon.S" + +//void ff_alsdsp_reconstruct_all_neon(int32_t *samples, int32_t *samples_end, int32_t *coeffs, unsigned int opt_order); +// x0: int32_t *samples +// x1: int32_t *samples_end +// x2: int32_t *coeffs +// w3: unsigned int opt_order +function ff_alsdsp_reconstruct_all_neon, export = 1 + sub sp, sp, #128 + st1 {v8.4s - v11.4s}, [sp], #64 + st1 {v12.4s - v15.4s}, [sp], #64 +// avoid 32-bit clubber from register + lsl x3, x3, #32 + neg x3, x3, lsr #32 +// x10 counts the bytes left to read, set to 4 * -opt_order + lsl x10, x3, #2 + +// loop x0 .. x1 +1: cmp x0, x1 + b.eq 4f + +// samples - opt_order, coeffs - opt_order + add x4, x0, x10 + add x5, x2, x10 +// reset local counter: count -opt_order .. 0 + mov x6, x3 + +// reset local acc + movi v8.2d, #0 + movi v9.2d, #0 + movi v10.2d, #0 + movi v11.2d, #0 + movi v12.2d, #0 + movi v13.2d, #0 + movi v14.2d, #0 + movi v15.2d, #0 + +// loop over 16 samples while >= 16 more to read + adds x6, x6, #16 + b.gt 3f + +2: ld1 {v0.4s - v3.4s}, [x4], #64 + ld1 {v4.4s - v7.4s}, [x5], #64 + + smlal v8.2d, v0.2s, v4.2s + smlal2 v9.2d, v0.4s, v4.4s + smlal v10.2d, v1.2s, v5.2s + smlal2 v11.2d, v1.4s, v5.4s + smlal v12.2d, v2.2s, v6.2s + smlal2 v13.2d, v2.4s, v6.4s + smlal v14.2d, v3.2s, v7.2s + smlal2 v15.2d, v3.4s, v7.4s + + adds x6, x6, #16 + b.le 2b + +// reduce to four NEON registers +// acc values into register +3: subs x6, x6, #16 + + add v4.2d, v8.2d, v9.2d + add v5.2d, v10.2d, v11.2d + add v6.2d, v12.2d, v13.2d + add v7.2d, v14.2d, v15.2d + +// next 8 samples + cmn x6, #8 + b.gt 3f + + ld1 {v0.4s - v1.4s}, [x4], #32 + ld1 {v2.4s - v3.4s}, [x5], #32 + + smlal v4.2d, v0.2s, v2.2s + smlal2 v5.2d, v0.4s, v2.4s + smlal v6.2d, v1.2s, v3.2s + smlal2 v7.2d, v1.4s, v3.4s + + adds x6, x6, #8 + +// reduce to two NEON registers +// acc values into register +3: add v2.2d, v4.2d, v5.2d + add v3.2d, v6.2d, v7.2d + +// next 4 samples + cmn x6, #4 + b.gt 3f + + ld1 {v0.4s}, [x4], #16 + ld1 {v1.4s}, [x5], #16 + + smlal v2.2d, v0.2s, v1.2s + smlal2 v3.2d, v0.4s, v1.4s + + adds x6, x6, #4 + +// reduce to A64 registers +// acc values into register +3: add v2.2d, v2.2d, v3.2d + mov x7, v2.2d[0] + mov x8, v2.2d[1] + add x7, x7, x8 + + cmn x6, #0 + b.eq 3f + +// loop over the remaining < 4 samples to read +2: ldrsw x8, [x4], #4 + ldrsw x9, [x5], #4 + + madd x7, x8, x9, x7 + adds x6, x6, #1 + b.lt 2b + +// add 1<<19 and store s-=X>>20 +3: mov x9, #1 + lsl x9, x9, #19 + add x7, x7, x9 + neg x7, x7, asr #20 + + ldrsw x9, [x4] + add x9, x9, x7 + str w9, [x4] + +// increment samples and loop + add x0, x0, #4 + b 1b + +4: sub sp, sp, #128 + ld1 {v8.4s - v11.4s}, [sp], #64 + ld1 {v12.4s - v15.4s}, [sp], #64 + + ret +endfunc diff --git a/libavcodec/alsdec.c b/libavcodec/alsdec.c index b3c444c54f..044e372b87 100644 --- a/libavcodec/alsdec.c +++ b/libavcodec/alsdec.c @@ -32,6 +32,7 @@ #include "unary.h" #include "mpeg4audio.h" #include "bgmc.h" +#include "alsdsp.h" #include "bswapdsp.h" #include "internal.h" #include "mlz.h" @@ -195,6 +196,7 @@ typedef struct ALSDecContext { AVCodecContext *avctx; ALSSpecificConfig sconf; GetBitContext gb; + ALSDSPContext dsp; BswapDSPContext bdsp; const AVCRC *crc_table; uint32_t crc_org; ///< CRC value of the original input data @@ -903,6 +905,7 @@ static int read_var_block_data(ALSDecContext *ctx, ALSBlockData *bd) static int decode_var_block_data(ALSDecContext *ctx, ALSBlockData *bd) { ALSSpecificConfig *sconf = &ctx->sconf; + ALSDSPContext *dsp = &ctx->dsp; unsigned int block_length = bd->block_length; unsigned int smp = 0; unsigned int k; @@ -987,14 +990,7 @@ static int decode_var_block_data(ALSDecContext *ctx, ALSBlockData *bd) raw_samples = bd->raw_samples + smp; lpc_cof = lpc_cof_reversed + opt_order; - for (; raw_samples < raw_samples_end; raw_samples++) { - y = 1 << 19; - - for (sb = -opt_order; sb < 0; sb++) - y += (uint64_t)MUL64(lpc_cof[sb], raw_samples[sb]); - - *raw_samples -= y >> 20; - } + dsp->reconstruct_all(raw_samples, raw_samples_end, lpc_cof, opt_order); raw_samples = bd->raw_samples; @@ -2150,6 +2146,7 @@ static av_cold int decode_init(AVCodecContext *avctx) } } + ff_alsdsp_init(&ctx->dsp); ff_bswapdsp_init(&ctx->bdsp); return 0; diff --git a/libavcodec/alsdsp.c b/libavcodec/alsdsp.c new file mode 100644 index 0000000000..00270bb5e6 --- /dev/null +++ b/libavcodec/alsdsp.c @@ -0,0 +1,49 @@ +/* + * Copyright (c) 2021 Thilo Borgmann + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/attributes.h" +#include "libavutil/samplefmt.h" +#include "mathops.h" +#include "alsdsp.h" +#include "config.h" + +static void als_reconstruct_all_c(int32_t *raw_samples, int32_t *raw_samples_end, int32_t *lpc_cof, unsigned int opt_order) +{ + int64_t y; + int sb; + + for (; raw_samples < raw_samples_end; raw_samples++) { + y = 1 << 19; + + for (sb = -opt_order; sb < 0; sb++) + y += (uint64_t)MUL64(lpc_cof[sb], raw_samples[sb]); + + *raw_samples -= y >> 20; + } +} + + +av_cold void ff_alsdsp_init(ALSDSPContext *ctx) +{ + ctx->reconstruct_all = als_reconstruct_all_c; + + if (ARCH_AARCH64) + ff_alsdsp_init_neon(ctx); +} diff --git a/libavcodec/alsdsp.h b/libavcodec/alsdsp.h new file mode 100644 index 0000000000..b285edbe6e --- /dev/null +++ b/libavcodec/alsdsp.h @@ -0,0 +1,35 @@ +/* + * Copyright (c) 2021 Thilo Borgmann + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVCODEC_ALSDSP_H +#define AVCODEC_ALSDSP_H + +#include +#include "libavutil/internal.h" +#include "libavutil/samplefmt.h" + +typedef struct ALSDSPContext { + void (*reconstruct_all)(int32_t *raw_samples, int32_t *raw_samples_end, int32_t *lpc_cof, unsigned int opt_order); +} ALSDSPContext; + +void ff_alsdsp_init(ALSDSPContext *c); +void ff_alsdsp_init_neon(ALSDSPContext *c); + +#endif /* AVCODEC_ALSDSP_H */ diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 9e9569777b..2f1c03d78c 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -1,6 +1,7 @@ # libavcodec tests # subsystems AVCODECOBJS-$(CONFIG_AUDIODSP) += audiodsp.o +AVCODECOBJS-$(CONFIG_ALSDSP) += alsdsp.o AVCODECOBJS-$(CONFIG_BLOCKDSP) += blockdsp.o AVCODECOBJS-$(CONFIG_BSWAPDSP) += bswapdsp.o AVCODECOBJS-$(CONFIG_FLACDSP) += flacdsp.o diff --git a/tests/checkasm/alsdsp.c b/tests/checkasm/alsdsp.c new file mode 100644 index 0000000000..f35c7d49be --- /dev/null +++ b/tests/checkasm/alsdsp.c @@ -0,0 +1,81 @@ +/* + * Copyright (c) 2021 Thilo Borgmann + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include +#include "checkasm.h" +#include "libavcodec/alsdsp.h" +#include "libavutil/common.h" +#include "libavutil/internal.h" +#include "libavutil/intreadwrite.h" +#include "libavutil/mem_internal.h" + +#define NUM 1024 + +#define randomize_buffers() \ + do { \ + int i; \ + for (i = 0; i < NUM; i++) { \ + uint32_t r = rnd(); \ + AV_WN32A(&ref_coeffs[i], r); \ + AV_WN32A(&new_coeffs[i], r); \ + r = rnd(); \ + AV_WN32A(&ref_samples[i], r); \ + AV_WN32A(&new_samples[i], r); \ + } \ + } while (0) + + +void checkasm_check_alsdsp(void) +{ + LOCAL_ALIGNED_16(uint32_t, ref_samples, [1024]); + LOCAL_ALIGNED_16(uint32_t, ref_coeffs, [1024]); + LOCAL_ALIGNED_16(uint32_t, new_samples, [1024]); + LOCAL_ALIGNED_16(uint32_t, new_coeffs, [1024]); + + ALSDSPContext dsp; + ff_alsdsp_init(&dsp); + + if (check_func(dsp.reconstruct_all, "als_reconstruct_all")) { + declare_func(void, int32_t *samples, int32_t *samples_end, int32_t *coeffs, unsigned int opt_order); + int32_t *s, *c, *e; + unsigned int o; + unsigned int O[] = {7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17}; + for (int k = 0; k <11; k++) { + o = O[k]; + + randomize_buffers(); + + s = (int32_t*)(ref_samples + o); + e = (int32_t*)(ref_samples + 1024); + c = (int32_t*)(ref_coeffs + o); + call_ref(s, e, c, o); + + s = (int32_t*)(new_samples + o); + e = (int32_t*)(new_samples + 1024); + c = (int32_t*)(new_coeffs + o); + call_new(s, e, c, o); + + if (memcmp(ref_samples, new_samples, o+1) || memcmp(ref_coeffs, new_coeffs, o+1)) + fail(); + bench_new(new_samples, e, new_coeffs, o); + } + } + report("reconstruct_all"); +} diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index b3ac76c325..c847ae28f5 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -80,6 +80,9 @@ static const struct { #if CONFIG_ALAC_DECODER { "alacdsp", checkasm_check_alacdsp }, #endif + #if CONFIG_ALSDSP + { "alsdsp", checkasm_check_alsdsp }, + #endif #if CONFIG_AUDIODSP { "audiodsp", checkasm_check_audiodsp }, #endif diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index 0190bc912c..da9f9c73fe 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -42,6 +42,7 @@ void checkasm_check_aacpsdsp(void); void checkasm_check_afir(void); void checkasm_check_alacdsp(void); +void checkasm_check_alsdsp(void); void checkasm_check_audiodsp(void); void checkasm_check_blend(void); void checkasm_check_blockdsp(void);