From patchwork Tue Jan 12 18:12:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Lynne X-Patchwork-Id: 24925 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 17FB844A3C0 for ; Tue, 12 Jan 2021 20:12:18 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E300168A94D; Tue, 12 Jan 2021 20:12:17 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C1BCE68A94D for ; Tue, 12 Jan 2021 20:12:11 +0200 (EET) Received: from w3.tutanota.de (unknown [192.168.1.164]) by w4.tutanota.de (Postfix) with ESMTP id 1C2271060308 for ; Tue, 12 Jan 2021 18:12:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1610475130; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:Sender; bh=33mrJ+0cEZo62bc5B8jWfSVrZa8JtpptUTw7nJmh+xc=; b=DQq/ycM6Dyad8rInGbBBYEQJx+7sUBzWDf1TF5HFaUvd8rxj4rQMCQGeH+xiCkWu /0T3HPfyGcbIHSvQ7Y6V5wAB28+Jf9kngf6HzGwhvHQxEM8nT7H8eXZuxjs44diUFZe +Z252YYM8ZJl5ZKpD5LbREfPRl4aQhdwa1x/lSf4SRPKK1JzKHLKEwrcHslWf4x+9E6 qwSg23/k4U+3HaNIKIcftPzaetaQAQS5Ff0yIMuvFpysow4zU9OoHc86RGKIqTjBcN1 Rc22tAAZ69KaCq/K9N9jpX42BturNcx3xVy2C8+yKANFg0hX8PrbINzLsWk+X/IBaI5 ePDb/NAIZw== Date: Tue, 12 Jan 2021 19:12:10 +0100 (CET) From: Lynne To: Ffmpeg Devel Message-ID: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 1/5] ac3enc_fixed: convert to 32-bit sample format X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The AC3 encoder used to be a separate library called "Aften", which got merged into libavcodec (literally, SVN commits and all). The merge preserved as much features from the library as possible. The code had two versions - a fixed point version and a floating point version. FFmpeg had floating point DSP code used by other codecs, the AC3 decoder including, so the floating-point DSP was simply replaced with FFmpeg's own functions. However, FFmpeg had no fixed-point audio code at that point. So the encoder brought along its own fixed-point DSP functions, including a fixed-point MDCT. The fixed-point MDCT itself is trivially just a float MDCT with a different type and each multiply being a fixed-point multiply. So over time, it got refactored, and the FFT used for all other codecs was templated. Due to design decisions at the time, the fixed-point version of the encoder operates at 16-bits of precision. Although convenient, this, even at the time, was inadequate and inefficient. The encoder is noisy, does not produce output comparable to the float encoder, and even rings at higher frequencies due to the badly approximated winow function. Enter MIPS (owned by Imagination Technologies at the time). They wanted quick fixed-point decoding on their FPUless cores. So they contributed patches to template the AC3 decoder so it had both a fixed-point and a floating-point version. They also did the same for the AAC decoder. They however, used 32-bit samples. Not 16-bits. And we did not have 32-bit fixed-point DSP functions, including an MDCT. But instead of templating our MDCT to output 3 versions (float, 32-bit fixed and 16-bit fixed), they simply copy-pasted their own MDCT into ours, and completely ifdeffed our own MDCT code out if a 32-bit fixed point MDCT was selected. This is also the status quo nowadays - 2 separate MDCTs, one which produces floating point and 16-bit fixed point versions, and one sort-of integrated which produces 32-bit MDCT. MIPS weren't all that interested in encoding, so they left the encoder as-is, and they didn't care much about the ifdeffery, mess or quality - it's not their problem. So the MDCT/FFT code has always been a thorn in anyone looking to clean up code's eye. Backstory over. Internally AC3 operates on 25-bit fixed-point coefficients. So for the floating point version, the encoder simply runs the float MDCT, and converts the resulting coefficients to 25-bit fixed-point, as AC3 is inherently a fixed-point codec. For the fixed-point version, the input is 16-bit samples, so to maximize precision the frame samples are analyzed and the highest set bit is detected via ac3_max_msb_abs_int16(), and the coefficients are then scaled up via ac3_lshift_int16(), so the input for the FFT is always at least 14 bits, computed in normalize_samples(). After FFT, the coefficients are scaled up to 25 bits. This patch simply changes the encoder to accept 32-bit samples, reusing the already well-optimized 32-bit MDCT code, allowing us to clean up and drop a large part of a very messy code of ours, as well as prepare for the future lavu/tx conversion. The coefficients are simply scaled down to 25 bits during windowing, skipping 2 separate scalings, as the hacks to extend precision are simply no longer necessary. There's no point in running the MDCT always at 32 bits when you're going to drop 6 bits off anyway, the headroom is plenty, and the MDCT rounds properly. This also makes the encoder even slightly more accurate over the float version, as there's no coefficient conversion step necessary. SIZE SAVINGS: ARM32: HARDCODED TABLES: BASE           - 10709590 DROP  DSP      - 10702872 - diff:   -6.56KiB DROP  MDCT     - 10667932 - diff:  -34.12KiB - both:   -40.68KiB DROP  FFT      - 10336652 - diff: -323.52KiB - all:   -364.20KiB SOFTCODED TABLES: BASE           -  9685096 DROP  DSP      -  9678378 - diff:   -6.56KiB DROP  MDCT     -  9643466 - diff:  -34.09KiB - both:   -40.65KiB DROP  FFT      -  9573918 - diff:  -67.92KiB - all:   -108.57KiB ARM64: HARDCODED TABLES: BASE           - 14641112 DROP  DSP      - 14633806 - diff:   -7.13KiB DROP  MDCT     - 14604812 - diff:  -28.31KiB - both:   -35.45KiB DROP  FFT      - 14286826 - diff: -310.53KiB - all:   -345.98KiB SOFTCODED TABLES: BASE           - 13636238 DROP  DSP      - 13628932 - diff:   -7.13KiB DROP  MDCT     - 13599866 - diff:  -28.38KiB - both:   -35.52KiB DROP  FFT      - 13542080 - diff:  -56.43KiB - all:    -91.95KiB x86: HARDCODED TABLES: BASE           - 12367336 DROP  DSP      - 12354698 - diff:  -12.34KiB DROP  MDCT     - 12331024 - diff:  -23.12KiB - both:   -35.46KiB DROP  FFT      - 12029788 - diff: -294.18KiB - all:   -329.64KiB SOFTCODED TABLES: BASE           - 11358094 DROP  DSP      - 11345456 - diff:  -12.34KiB DROP  MDCT     - 11321742 - diff:  -23.16KiB - both:   -35.50KiB DROP  FFT      - 11276946 - diff:  -43.75KiB - all:    -79.25KiB PERFORMANCE (10min random s32le): ARM32 - before -  39.9x - 0m15.046s ARM32 - after  -  28.2x - 0m21.525s                        Speed:  -30% ARM64 - before -  36.1x - 0m16.637s ARM64 - after  -  36.0x - 0m16.727s                        Speed: -0.5% x86   - before - 184x -    0m3.277s x86   - after  - 190x -    0m3.187s                        Speed:   +3% New patch attached. Subject: [PATCH v2 1/5] ac3enc_fixed: convert to 32-bit sample format The AC3 encoder used to be a separate library called "Aften", which got merged into libavcodec (literally, SVN commits and all). The merge preserved as much features from the library as possible. The code had two versions - a fixed point version and a floating point version. FFmpeg had floating point DSP code used by other codecs, the AC3 decoder including, so the floating-point DSP was simply replaced with FFmpeg's own functions. However, FFmpeg had no fixed-point audio code at that point. So the encoder brought along its own fixed-point DSP functions, including a fixed-point MDCT. The fixed-point MDCT itself is trivially just a float MDCT with a different type and each multiply being a fixed-point multiply. So over time, it got refactored, and the FFT used for all other codecs was templated. Due to design decisions at the time, the fixed-point version of the encoder operates at 16-bits of precision. Although convenient, this, even at the time, was inadequate and inefficient. The encoder is noisy, does not produce output comparable to the float encoder, and even rings at higher frequencies due to the badly approximated winow function. Enter MIPS (owned by Imagination Technologies at the time). They wanted quick fixed-point decoding on their FPUless cores. So they contributed patches to template the AC3 decoder so it had both a fixed-point and a floating-point version. They also did the same for the AAC decoder. They however, used 32-bit samples. Not 16-bits. And we did not have 32-bit fixed-point DSP functions, including an MDCT. But instead of templating our MDCT to output 3 versions (float, 32-bit fixed and 16-bit fixed), they simply copy-pasted their own MDCT into ours, and completely ifdeffed our own MDCT code out if a 32-bit fixed point MDCT was selected. This is also the status quo nowadays - 2 separate MDCTs, one which produces floating point and 16-bit fixed point versions, and one sort-of integrated which produces 32-bit MDCT. MIPS weren't all that interested in encoding, so they left the encoder as-is, and they didn't care much about the ifdeffery, mess or quality - it's not their problem. So the MDCT/FFT code has always been a thorn in anyone looking to clean up code's eye. Backstory over. Internally AC3 operates on 25-bit fixed-point coefficients. So for the floating point version, the encoder simply runs the float MDCT, and converts the resulting coefficients to 25-bit fixed-point, as AC3 is inherently a fixed-point codec. For the fixed-point version, the input is 16-bit samples, so to maximize precision the frame samples are analyzed and the highest set bit is detected via ac3_max_msb_abs_int16(), and the coefficients are then scaled up via ac3_lshift_int16(), so the input for the FFT is always at least 14 bits, computed in normalize_samples(). After FFT, the coefficients are scaled up to 25 bits. This patch simply changes the encoder to accept 32-bit samples, reusing the already well-optimized 32-bit MDCT code, allowing us to clean up and drop a large part of a very messy code of ours, as well as prepare for the future lavu/tx conversion. The coefficients are simply scaled down to 25 bits during windowing, skipping 2 separate scalings, as the hacks to extend precision are simply no longer necessary. There's no point in running the MDCT always at 32 bits when you're going to drop 6 bits off anyway, the headroom is plenty, and the MDCT rounds properly. This also makes the encoder even slightly more accurate over the float version, as there's no coefficient conversion step necessary. SIZE SAVINGS: ARM32: HARDCODED TABLES: BASE - 10709590 DROP DSP - 10702872 - diff: -6.56KiB DROP MDCT - 10667932 - diff: -34.12KiB - both: -40.68KiB DROP FFT - 10336652 - diff: -323.52KiB - all: -364.20KiB SOFTCODED TABLES: BASE - 9685096 DROP DSP - 9678378 - diff: -6.56KiB DROP MDCT - 9643466 - diff: -34.09KiB - both: -40.65KiB DROP FFT - 9573918 - diff: -67.92KiB - all: -108.57KiB ARM64: HARDCODED TABLES: BASE - 14641112 DROP DSP - 14633806 - diff: -7.13KiB DROP MDCT - 14604812 - diff: -28.31KiB - both: -35.45KiB DROP FFT - 14286826 - diff: -310.53KiB - all: -345.98KiB SOFTCODED TABLES: BASE - 13636238 DROP DSP - 13628932 - diff: -7.13KiB DROP MDCT - 13599866 - diff: -28.38KiB - both: -35.52KiB DROP FFT - 13542080 - diff: -56.43KiB - all: -91.95KiB x86: HARDCODED TABLES: BASE - 12367336 DROP DSP - 12354698 - diff: -12.34KiB DROP MDCT - 12331024 - diff: -23.12KiB - both: -35.46KiB DROP FFT - 12029788 - diff: -294.18KiB - all: -329.64KiB SOFTCODED TABLES: BASE - 11358094 DROP DSP - 11345456 - diff: -12.34KiB DROP MDCT - 11321742 - diff: -23.16KiB - both: -35.50KiB DROP FFT - 11276946 - diff: -43.75KiB - all: -79.25KiB PERFORMANCE (10min random s32le): ARM32 - before - 39.9x - 0m15.046s ARM32 - after - 28.2x - 0m21.525s Speed: -30% ARM64 - before - 36.1x - 0m16.637s ARM64 - after - 36.0x - 0m16.727s Speed: -0.5% x86 - before - 184x - 0m3.277s x86 - after - 190x - 0m3.187s Speed: +3% --- doc/encoders.texi | 7 ++-- libavcodec/Makefile | 2 +- libavcodec/ac3enc.c | 1 + libavcodec/ac3enc.h | 11 +++--- libavcodec/ac3enc_fixed.c | 59 ++++++++++++------------------- libavcodec/ac3enc_float.c | 1 - libavcodec/ac3enc_template.c | 21 ++++------- libavcodec/version.h | 2 +- tests/fate/ac3.mak | 2 +- tests/fate/ffmpeg.mak | 2 +- tests/ref/fate/unknown_layout-ac3 | 2 +- tests/ref/lavf/rm | 2 +- 12 files changed, 45 insertions(+), 67 deletions(-) diff --git a/doc/encoders.texi b/doc/encoders.texi index 0b1c69e982..60e763a704 100644 --- a/doc/encoders.texi +++ b/doc/encoders.texi @@ -151,10 +151,9 @@ the undocumented RealAudio 3 (a.k.a. dnet). The @var{ac3} encoder uses floating-point math, while the @var{ac3_fixed} encoder only uses fixed-point integer math. This does not mean that one is always faster, just that one or the other may be better suited to a -particular system. The floating-point encoder will generally produce better -quality audio for a given bitrate. The @var{ac3_fixed} encoder is not the -default codec for any of the output formats, so it must be specified explicitly -using the option @code{-acodec ac3_fixed} in order to use it. +particular system. The @var{ac3_fixed} encoder is not the default codec for +any of the output formats, so it must be specified explicitly using the option +@code{-acodec ac3_fixed} in order to use it. @subsection AC-3 Metadata diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 35318f4f4d..0546e6f6c5 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -181,7 +181,7 @@ OBJS-$(CONFIG_AC3_DECODER) += ac3dec_float.o ac3dec_data.o ac3.o kbd OBJS-$(CONFIG_AC3_FIXED_DECODER) += ac3dec_fixed.o ac3dec_data.o ac3.o kbdwin.o ac3tab.o OBJS-$(CONFIG_AC3_ENCODER) += ac3enc_float.o ac3enc.o ac3tab.o \ ac3.o kbdwin.o -OBJS-$(CONFIG_AC3_FIXED_ENCODER) += ac3enc_fixed.o ac3enc.o ac3tab.o ac3.o +OBJS-$(CONFIG_AC3_FIXED_ENCODER) += ac3enc_fixed.o ac3enc.o ac3tab.o ac3.o kbdwin.o OBJS-$(CONFIG_AC3_MF_ENCODER) += mfenc.o mf_utils.o OBJS-$(CONFIG_ACELP_KELVIN_DECODER) += g729dec.o lsp.o celp_math.o celp_filters.o acelp_filters.o acelp_pitch_delay.o acelp_vectors.o g729postfilter.o OBJS-$(CONFIG_AGM_DECODER) += agm.o diff --git a/libavcodec/ac3enc.c b/libavcodec/ac3enc.c index b2e3b2bb4b..9dafe0ef55 100644 --- a/libavcodec/ac3enc.c +++ b/libavcodec/ac3enc.c @@ -2047,6 +2047,7 @@ av_cold int ff_ac3_encode_close(AVCodecContext *avctx) int blk, ch; AC3EncodeContext *s = avctx->priv_data; + av_freep(&s->mdct_window); av_freep(&s->windowed_samples); if (s->planar_samples) for (ch = 0; ch < s->channels; ch++) diff --git a/libavcodec/ac3enc.h b/libavcodec/ac3enc.h index 044564ecb4..ba62891371 100644 --- a/libavcodec/ac3enc.h +++ b/libavcodec/ac3enc.h @@ -30,8 +30,6 @@ #include -#include "libavutil/float_dsp.h" - #include "ac3.h" #include "ac3dsp.h" #include "avcodec.h" @@ -53,6 +51,7 @@ #define AC3ENC_TYPE_EAC3 2 #if AC3ENC_FLOAT +#include "libavutil/float_dsp.h" #define AC3_NAME(x) ff_ac3_float_ ## x #define MAC_COEF(d,a,b) ((d)+=(a)*(b)) #define COEF_MIN (-16777215.0/16777216.0) @@ -62,12 +61,13 @@ typedef float SampleType; typedef float CoefType; typedef float CoefSumType; #else +#include "libavutil/fixed_dsp.h" #define AC3_NAME(x) ff_ac3_fixed_ ## x #define MAC_COEF(d,a,b) MAC64(d,a,b) #define COEF_MIN -16777215 #define COEF_MAX 16777215 #define NEW_CPL_COORD_THRESHOLD 503317 -typedef int16_t SampleType; +typedef int32_t SampleType; typedef int32_t CoefType; typedef int64_t CoefSumType; #endif @@ -141,7 +141,6 @@ typedef struct AC3Block { uint16_t **qmant; ///< quantized mantissas uint8_t **cpl_coord_exp; ///< coupling coord exponents (cplcoexp) uint8_t **cpl_coord_mant; ///< coupling coord mantissas (cplcomant) - uint8_t coeff_shift[AC3_MAX_CHANNELS]; ///< fixed-point coefficient shift values uint8_t new_rematrixing_strategy; ///< send new rematrixing flags in this block int num_rematrixing_bands; ///< number of rematrixing bands uint8_t rematrixing_flags[4]; ///< rematrixing flags @@ -165,7 +164,11 @@ typedef struct AC3EncodeContext { AVCodecContext *avctx; ///< parent AVCodecContext PutBitContext pb; ///< bitstream writer context AudioDSPContext adsp; +#if AC3ENC_FLOAT AVFloatDSPContext *fdsp; +#else + AVFixedDSPContext *fdsp; +#endif MECmpContext mecc; AC3DSPContext ac3dsp; ///< AC-3 optimized functions FFTContext mdct; ///< FFT context for MDCT calculation diff --git a/libavcodec/ac3enc_fixed.c b/libavcodec/ac3enc_fixed.c index 7818dd8c35..3b302d40df 100644 --- a/libavcodec/ac3enc_fixed.c +++ b/libavcodec/ac3enc_fixed.c @@ -26,12 +26,14 @@ * fixed-point AC-3 encoder. */ -#define FFT_FLOAT 0 #define AC3ENC_FLOAT 0 +#define FFT_FLOAT 0 +#define FFT_FIXED_32 1 #include "internal.h" #include "audiodsp.h" #include "ac3enc.h" #include "eac3enc.h" +#include "kbdwin.h" #define AC3ENC_TYPE AC3ENC_TYPE_AC3_FIXED #include "ac3enc_opts_template.c" @@ -43,37 +45,6 @@ static const AVClass ac3enc_class = { .version = LIBAVUTIL_VERSION_INT, }; -/* - * Normalize the input samples to use the maximum available precision. - * This assumes signed 16-bit input samples. - */ -static int normalize_samples(AC3EncodeContext *s) -{ - int v = s->ac3dsp.ac3_max_msb_abs_int16(s->windowed_samples, AC3_WINDOW_SIZE); - v = 14 - av_log2(v); - if (v > 0) - s->ac3dsp.ac3_lshift_int16(s->windowed_samples, AC3_WINDOW_SIZE, v); - /* +6 to right-shift from 31-bit to 25-bit */ - return v + 6; -} - - -/* - * Scale MDCT coefficients to 25-bit signed fixed-point. - */ -static void scale_coefficients(AC3EncodeContext *s) -{ - int blk, ch; - - for (blk = 0; blk < s->num_blocks; blk++) { - AC3Block *block = &s->blocks[blk]; - for (ch = 1; ch <= s->channels; ch++) { - s->ac3dsp.ac3_rshift_int32(block->mdct_coef[ch], AC3_MAX_COEFS, - block->coeff_shift[ch]); - } - } -} - static void sum_square_butterfly(AC3EncodeContext *s, int64_t sum[4], const int32_t *coef0, const int32_t *coef1, int len) @@ -120,7 +91,6 @@ static av_cold void ac3_fixed_mdct_end(AC3EncodeContext *s) ff_mdct_end(&s->mdct); } - /** * Initialize MDCT tables. * @@ -129,9 +99,24 @@ static av_cold void ac3_fixed_mdct_end(AC3EncodeContext *s) */ static av_cold int ac3_fixed_mdct_init(AC3EncodeContext *s) { - int ret = ff_mdct_init(&s->mdct, 9, 0, -1.0); - s->mdct_window = ff_ac3_window; - return ret; + float fwin[AC3_BLOCK_SIZE]; + + int32_t *iwin = av_malloc_array(AC3_WINDOW_SIZE, sizeof(*iwin)); + if (!iwin) + return AVERROR(ENOMEM); + + ff_kbd_window_init(fwin, 5.0, AC3_WINDOW_SIZE/2); + + for (int i = 0; i < AC3_WINDOW_SIZE/2; i++) + iwin[AC3_WINDOW_SIZE-1-i] = lrintf(fwin[i] * (1 << 22)); + + s->mdct_window = iwin; + + s->fdsp = avpriv_alloc_fixed_dsp(s->avctx->flags & AV_CODEC_FLAG_BITEXACT); + if (!s->fdsp) + return AVERROR(ENOMEM); + + return ff_mdct_init(&s->mdct, 9, 0, -1.0); } @@ -155,7 +140,7 @@ AVCodec ff_ac3_fixed_encoder = { .init = ac3_fixed_encode_init, .encode2 = ff_ac3_fixed_encode_frame, .close = ff_ac3_encode_close, - .sample_fmts = (const enum AVSampleFormat[]){ AV_SAMPLE_FMT_S16P, + .sample_fmts = (const enum AVSampleFormat[]){ AV_SAMPLE_FMT_S32P, AV_SAMPLE_FMT_NONE }, .priv_class = &ac3enc_class, .caps_internal = FF_CODEC_CAP_INIT_THREADSAFE | FF_CODEC_CAP_INIT_CLEANUP, diff --git a/libavcodec/ac3enc_float.c b/libavcodec/ac3enc_float.c index 45bfed34f9..b17b3a2365 100644 --- a/libavcodec/ac3enc_float.c +++ b/libavcodec/ac3enc_float.c @@ -97,7 +97,6 @@ static void sum_square_butterfly(AC3EncodeContext *s, float sum[4], static av_cold void ac3_float_mdct_end(AC3EncodeContext *s) { ff_mdct_end(&s->mdct); - av_freep(&s->mdct_window); } diff --git a/libavcodec/ac3enc_template.c b/libavcodec/ac3enc_template.c index 0fdc95b968..de6eba71d8 100644 --- a/libavcodec/ac3enc_template.c +++ b/libavcodec/ac3enc_template.c @@ -91,18 +91,11 @@ static void apply_mdct(AC3EncodeContext *s) AC3Block *block = &s->blocks[blk]; const SampleType *input_samples = &s->planar_samples[ch][blk * AC3_BLOCK_SIZE]; -#if AC3ENC_FLOAT s->fdsp->vector_fmul(s->windowed_samples, input_samples, - s->mdct_window, AC3_WINDOW_SIZE); -#else - s->ac3dsp.apply_window_int16(s->windowed_samples, input_samples, - s->mdct_window, AC3_WINDOW_SIZE); - - block->coeff_shift[ch + 1] = normalize_samples(s); -#endif + s->mdct_window, AC3_WINDOW_SIZE); - s->mdct.mdct_calcw(&s->mdct, block->mdct_coef[ch+1], - s->windowed_samples); + s->mdct.mdct_calc(&s->mdct, block->mdct_coef[ch+1], + s->windowed_samples); } } } @@ -390,9 +383,6 @@ int AC3_NAME(encode_frame)(AVCodecContext *avctx, AVPacket *avpkt, apply_mdct(s); - if (!AC3ENC_FLOAT) - scale_coefficients(s); - clip_coefficients(&s->adsp, s->blocks[0].mdct_coef[1], AC3_MAX_COEFS * s->num_blocks * s->channels); @@ -404,8 +394,9 @@ int AC3_NAME(encode_frame)(AVCodecContext *avctx, AVPacket *avpkt, compute_rematrixing_strategy(s); - if (AC3ENC_FLOAT) - scale_coefficients(s); +#if AC3ENC_FLOAT + scale_coefficients(s); +#endif return ff_ac3_encode_frame_common_end(avctx, avpkt, frame, got_packet_ptr); } diff --git a/libavcodec/version.h b/libavcodec/version.h index 1420439044..cd871f0fa0 100644 --- a/libavcodec/version.h +++ b/libavcodec/version.h @@ -28,7 +28,7 @@ #include "libavutil/version.h" #define LIBAVCODEC_VERSION_MAJOR 58 -#define LIBAVCODEC_VERSION_MINOR 116 +#define LIBAVCODEC_VERSION_MINOR 117 #define LIBAVCODEC_VERSION_MICRO 100 #define LIBAVCODEC_VERSION_INT AV_VERSION_INT(LIBAVCODEC_VERSION_MAJOR, \ diff --git a/tests/fate/ac3.mak b/tests/fate/ac3.mak index 757cd51cf2..d76e22bade 100644 --- a/tests/fate/ac3.mak +++ b/tests/fate/ac3.mak @@ -90,7 +90,7 @@ fate-ac3-fixed-encode: tests/data/asynth-44100-2.wav fate-ac3-fixed-encode: SRC = $(TARGET_PATH)/tests/data/asynth-44100-2.wav fate-ac3-fixed-encode: CMD = md5 -i $(SRC) -c ac3_fixed -ab 128k -f ac3 -flags +bitexact -af aresample fate-ac3-fixed-encode: CMP = oneline -fate-ac3-fixed-encode: REF = a1d1fc116463b771abf5aef7ed37d7b1 +fate-ac3-fixed-encode: REF = 1f548175e11a95e62ce20e442fcc8d08 FATE_EAC3-$(call ALLYES, EAC3_DEMUXER EAC3_MUXER EAC3_CORE_BSF) += fate-eac3-core-bsf fate-eac3-core-bsf: CMD = md5pipe -i $(TARGET_SAMPLES)/eac3/the_great_wall_7.1.eac3 -c:a copy -bsf:a eac3_core -fflags +bitexact -f eac3 diff --git a/tests/fate/ffmpeg.mak b/tests/fate/ffmpeg.mak index c6d8dc2e5c..4dfb77d250 100644 --- a/tests/fate/ffmpeg.mak +++ b/tests/fate/ffmpeg.mak @@ -83,7 +83,7 @@ fate-unknown_layout-pcm: CMD = md5 \ FATE_FFMPEG-$(call ALLYES, PCM_S16LE_DEMUXER AC3_MUXER PCM_S16LE_DECODER AC3_FIXED_ENCODER) += fate-unknown_layout-ac3 fate-unknown_layout-ac3: $(AREF) fate-unknown_layout-ac3: CMD = md5 -auto_conversion_filters \ - -guess_layout_max 0 -f s16le -ac 1 -ar 44100 -i $(TARGET_PATH)/$(AREF) \ + -guess_layout_max 0 -f s32le -ac 1 -ar 44100 -i $(TARGET_PATH)/$(AREF) \ -f ac3 -flags +bitexact -c ac3_fixed diff --git a/tests/ref/fate/unknown_layout-ac3 b/tests/ref/fate/unknown_layout-ac3 index d332efcec4..719a44aacf 100644 --- a/tests/ref/fate/unknown_layout-ac3 +++ b/tests/ref/fate/unknown_layout-ac3 @@ -1 +1 @@ -bbb7550d6d93973c10f4ee13c87cf799 +febdb165cfd6cba375aa086195e61213 diff --git a/tests/ref/lavf/rm b/tests/ref/lavf/rm index 43ea4c7897..fc2a6564a2 100644 --- a/tests/ref/lavf/rm +++ b/tests/ref/lavf/rm @@ -1,2 +1,2 @@ -e30681d05d6f3d24108d3614600bf116 *tests/data/lavf/lavf.rm +8dfb8d4556d61d3615e0d0012ffe540c *tests/data/lavf/lavf.rm 346424 tests/data/lavf/lavf.rm From patchwork Tue Jan 12 18:12:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lynne X-Patchwork-Id: 24926 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 7437144A3C0 for ; Tue, 12 Jan 2021 20:12:47 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5BCBF68AB8E; Tue, 12 Jan 2021 20:12:47 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3D12B68AB8E for ; Tue, 12 Jan 2021 20:12:40 +0200 (EET) Received: from w3.tutanota.de (unknown [192.168.1.164]) by w4.tutanota.de (Postfix) with ESMTP id DE1341060159 for ; Tue, 12 Jan 2021 18:12:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1610475159; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:Sender; bh=Vk2aNkekASua4tzxjTDa//JbOm4EgsR2BrgXDgWpvd4=; b=eJVoo1yq1Y/yEmjFDKbdo5WpcF8NVKZg1BE7+aq9Vnlrt6WL+/J7zT/e22DnopkU q19/6DI3nacuktqXhqyXKsF/0/hdNRkUevO0cSBTrGY2SGAdwx7w+chxod8BuLDZfKw JOn28EqL/c9UzMy07y6hRukrZuL/62shPD0U4qBB/l7gAvlbLFB2GNrEraR9KxYoeb0 O05TYHRMTxj1snZUNldODmo+znZvB1aI03DNdCcJBmH+iYWzreYPQiMu+tRe71fSjZJ 2TPObJUiXHqKFw5ncAyhjQVukEbwWThQkPtV+hnruV00rw0314DYS/VEEo+ZGwRY2u2 CfBBaO0+9A== Date: Tue, 12 Jan 2021 19:12:39 +0100 (CET) From: Lynne To: Ffmpeg Devel Message-ID: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 2/5] ac3enc: do not clip coefficients after transforms X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" In either encoder, its impossible for the coefficients to go past 25 bits right after the MDCT. Our MDCT is numerically stable. For the floating point encoder, in case a NaN is contained, lrintf() will raise a floating point exception during the conversion. Patch attached. Subject: [PATCH v2 2/5] ac3enc: do not clip coefficients after transforms In either encoder, its impossible for the coefficients to go past 25 bits right after the MDCT. Our MDCT is numerically stable. For the floating point encoder, in case a NaN is contained, lrintf() will raise a floating point exception during the conversion. --- libavcodec/ac3enc_template.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/libavcodec/ac3enc_template.c b/libavcodec/ac3enc_template.c index de6eba71d8..4f1e181e0b 100644 --- a/libavcodec/ac3enc_template.c +++ b/libavcodec/ac3enc_template.c @@ -383,9 +383,6 @@ int AC3_NAME(encode_frame)(AVCodecContext *avctx, AVPacket *avpkt, apply_mdct(s); - clip_coefficients(&s->adsp, s->blocks[0].mdct_coef[1], - AC3_MAX_COEFS * s->num_blocks * s->channels); - s->cpl_on = s->cpl_enabled; ff_ac3_compute_coupling_strategy(s); From patchwork Tue Jan 12 18:13:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lynne X-Patchwork-Id: 24927 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 7A6E244B228 for ; Tue, 12 Jan 2021 20:13:21 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 60A3068ABD5; Tue, 12 Jan 2021 20:13:21 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id BDCB068AB67 for ; Tue, 12 Jan 2021 20:13:14 +0200 (EET) Received: from w3.tutanota.de (unknown [192.168.1.164]) by w4.tutanota.de (Postfix) with ESMTP id 6A03410602E3 for ; Tue, 12 Jan 2021 18:13:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1610475194; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:Sender; bh=yqIOcUqDVgjoe3pvYyjYRWo4rpLq4/Q4Y6FYD2IPcAg=; b=AlmSJh/o8Mp/6slL3uIeClxbA1a3uieUY7Ba7frtg3bFZXjZELvQ1N6fQZL14n+X +MwBUc9tJHi/ZwTrbwGUM/0N/rrOcBJjJkWS7HD/Pj5jWdhSIuY6B2HCrbsyPkz/++u VmLaOsI5LhDAV/ofRLqs+ddCUNCXy9qI4pnoufUkGDry7SiJ7Ar7KEtGOjlNZPDkpsx KY50hFqmc+3uzGwSd3TyBhbq7qrl9sHz58u0aKbDX+5EPYKh+/loynMAkJTNEc5tBKn 3k0rAIbCIjt+7jCsomjlv4FN8kUw4KB+dB4Zh0awdXY1ig9Mm9dGab35Ad2GrsFtz12 b2zxOiPgvw== Date: Tue, 12 Jan 2021 19:13:14 +0100 (CET) From: Lynne To: Ffmpeg Devel Message-ID: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 3/5] ac3enc: halve the MDCT window size by using vector_fmul_reverse X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This brings the encoder in-line with the rest of ours and saves on a bit of memory. Patch attached. Subject: [PATCH v2 3/5] ac3enc: halve the MDCT window size by using vector_fmul_reverse This brings the encoder in-line with the rest of ours and saves on a bit of memory. --- libavcodec/ac3enc_fixed.c | 9 ++++----- libavcodec/ac3enc_float.c | 15 ++++----------- libavcodec/ac3enc_template.c | 5 ++++- 3 files changed, 12 insertions(+), 17 deletions(-) diff --git a/libavcodec/ac3enc_fixed.c b/libavcodec/ac3enc_fixed.c index 3b302d40df..7a8a77fb93 100644 --- a/libavcodec/ac3enc_fixed.c +++ b/libavcodec/ac3enc_fixed.c @@ -101,14 +101,13 @@ static av_cold int ac3_fixed_mdct_init(AC3EncodeContext *s) { float fwin[AC3_BLOCK_SIZE]; - int32_t *iwin = av_malloc_array(AC3_WINDOW_SIZE, sizeof(*iwin)); + int32_t *iwin = av_malloc_array(AC3_BLOCK_SIZE, sizeof(*iwin)); if (!iwin) return AVERROR(ENOMEM); - ff_kbd_window_init(fwin, 5.0, AC3_WINDOW_SIZE/2); - - for (int i = 0; i < AC3_WINDOW_SIZE/2; i++) - iwin[AC3_WINDOW_SIZE-1-i] = lrintf(fwin[i] * (1 << 22)); + ff_kbd_window_init(fwin, 5.0, AC3_BLOCK_SIZE); + for (int i = 0; i < AC3_BLOCK_SIZE; i++) + iwin[i] = lrintf(fwin[i] * (1 << 22)); s->mdct_window = iwin; diff --git a/libavcodec/ac3enc_float.c b/libavcodec/ac3enc_float.c index b17b3a2365..74f3ab8d86 100644 --- a/libavcodec/ac3enc_float.c +++ b/libavcodec/ac3enc_float.c @@ -108,23 +108,16 @@ static av_cold void ac3_float_mdct_end(AC3EncodeContext *s) */ static av_cold int ac3_float_mdct_init(AC3EncodeContext *s) { - float *window; - int i, n, n2; - - n = 1 << 9; - n2 = n >> 1; - - window = av_malloc_array(n, sizeof(*window)); + float *window = av_malloc_array(AC3_BLOCK_SIZE, sizeof(*window)); if (!window) { av_log(s->avctx, AV_LOG_ERROR, "Cannot allocate memory.\n"); return AVERROR(ENOMEM); } - ff_kbd_window_init(window, 5.0, n2); - for (i = 0; i < n2; i++) - window[n-1-i] = window[i]; + + ff_kbd_window_init(window, 5.0, AC3_BLOCK_SIZE); s->mdct_window = window; - return ff_mdct_init(&s->mdct, 9, 0, -2.0 / n); + return ff_mdct_init(&s->mdct, 9, 0, -2.0 / AC3_WINDOW_SIZE); } diff --git a/libavcodec/ac3enc_template.c b/libavcodec/ac3enc_template.c index 4f1e181e0b..5ecef3b178 100644 --- a/libavcodec/ac3enc_template.c +++ b/libavcodec/ac3enc_template.c @@ -92,7 +92,10 @@ static void apply_mdct(AC3EncodeContext *s) const SampleType *input_samples = &s->planar_samples[ch][blk * AC3_BLOCK_SIZE]; s->fdsp->vector_fmul(s->windowed_samples, input_samples, - s->mdct_window, AC3_WINDOW_SIZE); + s->mdct_window, AC3_BLOCK_SIZE); + s->fdsp->vector_fmul_reverse(s->windowed_samples + AC3_BLOCK_SIZE, + &input_samples[AC3_BLOCK_SIZE], + s->mdct_window, AC3_BLOCK_SIZE); s->mdct.mdct_calc(&s->mdct, block->mdct_coef[ch+1], s->windowed_samples); From patchwork Tue Jan 12 18:13:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lynne X-Patchwork-Id: 24928 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 9133644B228 for ; Tue, 12 Jan 2021 20:13:53 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 79FE268AB83; Tue, 12 Jan 2021 20:13:53 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3509B68A78B for ; Tue, 12 Jan 2021 20:13:47 +0200 (EET) Received: from w3.tutanota.de (unknown [192.168.1.164]) by w4.tutanota.de (Postfix) with ESMTP id D67FC106015E for ; Tue, 12 Jan 2021 18:13:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1610475226; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:Sender; bh=HeLW241Vt7KnFw83RyomqVBpjVFg3Zc6knF3yNbeLy0=; b=OH9tWz/fUGoJd3FgLEytbDlGx7e3owjDw+o42cyCCfdER8J+93mBzK/u9VhvzD4f /esrDnj0XCDB6OJZ8a1aavGWc4/0W+AybCgMd/vhxPhBlno1WJlJEsEaO4RhaXQ5Ncb eDbX+iP+pQ8mogFlMuT+yKz9pvkE34n2wmVFpkAkvLoyjRoOAtZZ5iOVZmqTbxB8bIE wSzktMVmqUnABB80Vk6HqVYEo79dDwIP5MkQspmgI3XWOF0Qhg3im4445kYzQRHzTzx pNhlEeE6q/TsHZftpI6v7VwlToWd71deRqnyjIqnwodenAR+WUSeQWR2kZOv9pXglBZ 1UlmHJw22Q== Date: Tue, 12 Jan 2021 19:13:46 +0100 (CET) From: Lynne To: Ffmpeg Devel Message-ID: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 4/5] ac3enc_fixed: drop unnecessary fixed-point DSP code X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Patch attached. Subject: [PATCH v2 4/5] ac3enc_fixed: drop unnecessary fixed-point DSP code --- libavcodec/ac3dsp.c | 60 ------- libavcodec/ac3dsp.h | 47 ------ libavcodec/ac3tab.c | 38 ----- libavcodec/ac3tab.h | 1 - libavcodec/arm/ac3dsp_init_arm.c | 9 -- libavcodec/x86/ac3dsp.asm | 258 ------------------------------- libavcodec/x86/ac3dsp_init.c | 52 +------ 7 files changed, 1 insertion(+), 464 deletions(-) diff --git a/libavcodec/ac3dsp.c b/libavcodec/ac3dsp.c index 382f87c05f..85c721dd3b 100644 --- a/libavcodec/ac3dsp.c +++ b/libavcodec/ac3dsp.c @@ -46,49 +46,6 @@ static void ac3_exponent_min_c(uint8_t *exp, int num_reuse_blocks, int nb_coefs) } } -static int ac3_max_msb_abs_int16_c(const int16_t *src, int len) -{ - int i, v = 0; - for (i = 0; i < len; i++) - v |= abs(src[i]); - return v; -} - -static void ac3_lshift_int16_c(int16_t *src, unsigned int len, - unsigned int shift) -{ - uint32_t *src32 = (uint32_t *)src; - const uint32_t mask = ~(((1 << shift) - 1) << 16); - int i; - len >>= 1; - for (i = 0; i < len; i += 8) { - src32[i ] = (src32[i ] << shift) & mask; - src32[i+1] = (src32[i+1] << shift) & mask; - src32[i+2] = (src32[i+2] << shift) & mask; - src32[i+3] = (src32[i+3] << shift) & mask; - src32[i+4] = (src32[i+4] << shift) & mask; - src32[i+5] = (src32[i+5] << shift) & mask; - src32[i+6] = (src32[i+6] << shift) & mask; - src32[i+7] = (src32[i+7] << shift) & mask; - } -} - -static void ac3_rshift_int32_c(int32_t *src, unsigned int len, - unsigned int shift) -{ - do { - *src++ >>= shift; - *src++ >>= shift; - *src++ >>= shift; - *src++ >>= shift; - *src++ >>= shift; - *src++ >>= shift; - *src++ >>= shift; - *src++ >>= shift; - len -= 8; - } while (len > 0); -} - static void float_to_fixed24_c(int32_t *dst, const float *src, unsigned int len) { const float scale = 1 << 24; @@ -376,19 +333,6 @@ void ff_ac3dsp_downmix_fixed(AC3DSPContext *c, int32_t **samples, int16_t **matr ac3_downmix_c_fixed(samples, matrix, out_ch, in_ch, len); } -static void apply_window_int16_c(int16_t *output, const int16_t *input, - const int16_t *window, unsigned int len) -{ - int i; - int len2 = len >> 1; - - for (i = 0; i < len2; i++) { - int16_t w = window[i]; - output[i] = (MUL16(input[i], w) + (1 << 14)) >> 15; - output[len-i-1] = (MUL16(input[len-i-1], w) + (1 << 14)) >> 15; - } -} - void ff_ac3dsp_downmix(AC3DSPContext *c, float **samples, float **matrix, int out_ch, int in_ch, int len) { @@ -424,9 +368,6 @@ void ff_ac3dsp_downmix(AC3DSPContext *c, float **samples, float **matrix, av_cold void ff_ac3dsp_init(AC3DSPContext *c, int bit_exact) { c->ac3_exponent_min = ac3_exponent_min_c; - c->ac3_max_msb_abs_int16 = ac3_max_msb_abs_int16_c; - c->ac3_lshift_int16 = ac3_lshift_int16_c; - c->ac3_rshift_int32 = ac3_rshift_int32_c; c->float_to_fixed24 = float_to_fixed24_c; c->bit_alloc_calc_bap = ac3_bit_alloc_calc_bap_c; c->update_bap_counts = ac3_update_bap_counts_c; @@ -438,7 +379,6 @@ av_cold void ff_ac3dsp_init(AC3DSPContext *c, int bit_exact) c->out_channels = 0; c->downmix = NULL; c->downmix_fixed = NULL; - c->apply_window_int16 = apply_window_int16_c; if (ARCH_ARM) ff_ac3dsp_init_arm(c, bit_exact); diff --git a/libavcodec/ac3dsp.h b/libavcodec/ac3dsp.h index 161de4cb86..a23b11526e 100644 --- a/libavcodec/ac3dsp.h +++ b/libavcodec/ac3dsp.h @@ -42,39 +42,6 @@ typedef struct AC3DSPContext { */ void (*ac3_exponent_min)(uint8_t *exp, int num_reuse_blocks, int nb_coefs); - /** - * Calculate the maximum MSB of the absolute value of each element in an - * array of int16_t. - * @param src input array - * constraints: align 16. values must be in range [-32767,32767] - * @param len number of values in the array - * constraints: multiple of 16 greater than 0 - * @return a value with the same MSB as max(abs(src[])) - */ - int (*ac3_max_msb_abs_int16)(const int16_t *src, int len); - - /** - * Left-shift each value in an array of int16_t by a specified amount. - * @param src input array - * constraints: align 16 - * @param len number of values in the array - * constraints: multiple of 32 greater than 0 - * @param shift left shift amount - * constraints: range [0,15] - */ - void (*ac3_lshift_int16)(int16_t *src, unsigned int len, unsigned int shift); - - /** - * Right-shift each value in an array of int32_t by a specified amount. - * @param src input array - * constraints: align 16 - * @param len number of values in the array - * constraints: multiple of 16 greater than 0 - * @param shift right shift amount - * constraints: range [0,31] - */ - void (*ac3_rshift_int32)(int32_t *src, unsigned int len, unsigned int shift); - /** * Convert an array of float in range [-1.0,1.0] to int32_t with range * [-(1<<24),(1<<24)] @@ -136,20 +103,6 @@ typedef struct AC3DSPContext { int in_channels; void (*downmix)(float **samples, float **matrix, int len); void (*downmix_fixed)(int32_t **samples, int16_t **matrix, int len); - - /** - * Apply symmetric window in 16-bit fixed-point. - * @param output destination array - * constraints: 16-byte aligned - * @param input source array - * constraints: 16-byte aligned - * @param window window array - * constraints: 16-byte aligned, at least len/2 elements - * @param len full window length - * constraints: multiple of ? greater than zero - */ - void (*apply_window_int16)(int16_t *output, const int16_t *input, - const int16_t *window, unsigned int len); } AC3DSPContext; void ff_ac3dsp_init (AC3DSPContext *c, int bit_exact); diff --git a/libavcodec/ac3tab.c b/libavcodec/ac3tab.c index d018110331..99307218cc 100644 --- a/libavcodec/ac3tab.c +++ b/libavcodec/ac3tab.c @@ -147,44 +147,6 @@ const uint8_t ff_eac3_default_cpl_band_struct[18] = { 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1 }; -/* AC-3 MDCT window */ - -/* MDCT window */ -DECLARE_ALIGNED(16, const int16_t, ff_ac3_window)[AC3_WINDOW_SIZE/2] = { - 4, 7, 12, 16, 21, 28, 34, 42, - 51, 61, 72, 84, 97, 111, 127, 145, - 164, 184, 207, 231, 257, 285, 315, 347, - 382, 419, 458, 500, 544, 591, 641, 694, - 750, 810, 872, 937, 1007, 1079, 1155, 1235, - 1318, 1406, 1497, 1593, 1692, 1796, 1903, 2016, - 2132, 2253, 2379, 2509, 2644, 2783, 2927, 3076, - 3230, 3389, 3552, 3721, 3894, 4072, 4255, 4444, - 4637, 4835, 5038, 5246, 5459, 5677, 5899, 6127, - 6359, 6596, 6837, 7083, 7334, 7589, 7848, 8112, - 8380, 8652, 8927, 9207, 9491, 9778,10069,10363, -10660,10960,11264,11570,11879,12190,12504,12820, -13138,13458,13780,14103,14427,14753,15079,15407, -15735,16063,16392,16720,17049,17377,17705,18032, -18358,18683,19007,19330,19651,19970,20287,20602, -20914,21225,21532,21837,22139,22438,22733,23025, -23314,23599,23880,24157,24430,24699,24964,25225, -25481,25732,25979,26221,26459,26691,26919,27142, -27359,27572,27780,27983,28180,28373,28560,28742, -28919,29091,29258,29420,29577,29729,29876,30018, -30155,30288,30415,30538,30657,30771,30880,30985, -31086,31182,31274,31363,31447,31528,31605,31678, -31747,31814,31877,31936,31993,32046,32097,32145, -32190,32232,32272,32310,32345,32378,32409,32438, -32465,32490,32513,32535,32556,32574,32592,32608, -32623,32636,32649,32661,32671,32681,32690,32698, -32705,32712,32718,32724,32729,32733,32737,32741, -32744,32747,32750,32752,32754,32756,32757,32759, -32760,32761,32762,32763,32764,32764,32765,32765, -32766,32766,32766,32766,32767,32767,32767,32767, -32767,32767,32767,32767,32767,32767,32767,32767, -32767,32767,32767,32767,32767,32767,32767,32767, -}; - const uint8_t ff_ac3_log_add_tab[260]= { 0x40,0x3f,0x3e,0x3d,0x3c,0x3b,0x3a,0x39,0x38,0x37, 0x36,0x35,0x34,0x34,0x33,0x32,0x31,0x30,0x2f,0x2f, diff --git a/libavcodec/ac3tab.h b/libavcodec/ac3tab.h index 1d1264e3fc..a0036a301b 100644 --- a/libavcodec/ac3tab.h +++ b/libavcodec/ac3tab.h @@ -37,7 +37,6 @@ extern const int ff_ac3_sample_rate_tab[]; extern const uint16_t ff_ac3_bitrate_tab[19]; extern const uint8_t ff_ac3_rematrix_band_tab[5]; extern const uint8_t ff_eac3_default_cpl_band_struct[18]; -extern const int16_t ff_ac3_window[AC3_WINDOW_SIZE/2]; extern const uint8_t ff_ac3_log_add_tab[260]; extern const uint16_t ff_ac3_hearing_threshold_tab[AC3_CRITICAL_BANDS][3]; extern const uint8_t ff_ac3_bap_tab[64]; diff --git a/libavcodec/arm/ac3dsp_init_arm.c b/libavcodec/arm/ac3dsp_init_arm.c index a3c32ff407..9217a7d0c2 100644 --- a/libavcodec/arm/ac3dsp_init_arm.c +++ b/libavcodec/arm/ac3dsp_init_arm.c @@ -26,13 +26,8 @@ #include "config.h" void ff_ac3_exponent_min_neon(uint8_t *exp, int num_reuse_blocks, int nb_coefs); -int ff_ac3_max_msb_abs_int16_neon(const int16_t *src, int len); -void ff_ac3_lshift_int16_neon(int16_t *src, unsigned len, unsigned shift); -void ff_ac3_rshift_int32_neon(int32_t *src, unsigned len, unsigned shift); void ff_float_to_fixed24_neon(int32_t *dst, const float *src, unsigned int len); void ff_ac3_extract_exponents_neon(uint8_t *exp, int32_t *coef, int nb_coefs); -void ff_apply_window_int16_neon(int16_t *dst, const int16_t *src, - const int16_t *window, unsigned n); void ff_ac3_sum_square_butterfly_int32_neon(int64_t sum[4], const int32_t *coef0, const int32_t *coef1, @@ -61,12 +56,8 @@ av_cold void ff_ac3dsp_init_arm(AC3DSPContext *c, int bit_exact) if (have_neon(cpu_flags)) { c->ac3_exponent_min = ff_ac3_exponent_min_neon; - c->ac3_max_msb_abs_int16 = ff_ac3_max_msb_abs_int16_neon; - c->ac3_lshift_int16 = ff_ac3_lshift_int16_neon; - c->ac3_rshift_int32 = ff_ac3_rshift_int32_neon; c->float_to_fixed24 = ff_float_to_fixed24_neon; c->extract_exponents = ff_ac3_extract_exponents_neon; - c->apply_window_int16 = ff_apply_window_int16_neon; c->sum_square_butterfly_int32 = ff_ac3_sum_square_butterfly_int32_neon; c->sum_square_butterfly_float = ff_ac3_sum_square_butterfly_float_neon; } diff --git a/libavcodec/x86/ac3dsp.asm b/libavcodec/x86/ac3dsp.asm index 675ade3101..4ddaa94320 100644 --- a/libavcodec/x86/ac3dsp.asm +++ b/libavcodec/x86/ac3dsp.asm @@ -35,10 +35,6 @@ pw_bap_mul2: dw 5, 7, 0, 7, 5, 7, 0, 7 cextern pd_1 pd_151: times 4 dd 151 -; used in ff_apply_window_int16() -pb_revwords: SHUFFLE_MASK_W 7, 6, 5, 4, 3, 2, 1, 0 -pd_16384: times 4 dd 16384 - SECTION .text ;----------------------------------------------------------------------------- @@ -81,133 +77,6 @@ AC3_EXPONENT_MIN %endif %undef LOOP_ALIGN -;----------------------------------------------------------------------------- -; int ff_ac3_max_msb_abs_int16(const int16_t *src, int len) -; -; This function uses 2 different methods to calculate a valid result. -; 1) logical 'or' of abs of each element -; This is used for ssse3 because of the pabsw instruction. -; It is also used for mmx because of the lack of min/max instructions. -; 2) calculate min/max for the array, then or(abs(min),abs(max)) -; This is used for mmxext and sse2 because they have pminsw/pmaxsw. -;----------------------------------------------------------------------------- - -; logical 'or' of 4 or 8 words in an mmx or xmm register into the low word -%macro OR_WORDS_HORIZ 2 ; src, tmp -%if cpuflag(sse2) - movhlps %2, %1 - por %1, %2 - pshuflw %2, %1, q0032 - por %1, %2 - pshuflw %2, %1, q0001 - por %1, %2 -%elif cpuflag(mmxext) - pshufw %2, %1, q0032 - por %1, %2 - pshufw %2, %1, q0001 - por %1, %2 -%else ; mmx - movq %2, %1 - psrlq %2, 32 - por %1, %2 - movq %2, %1 - psrlq %2, 16 - por %1, %2 -%endif -%endmacro - -%macro AC3_MAX_MSB_ABS_INT16 1 -cglobal ac3_max_msb_abs_int16, 2,2,5, src, len - pxor m2, m2 - pxor m3, m3 -.loop: -%ifidn %1, min_max - mova m0, [srcq] - mova m1, [srcq+mmsize] - pminsw m2, m0 - pminsw m2, m1 - pmaxsw m3, m0 - pmaxsw m3, m1 -%else ; or_abs -%if notcpuflag(ssse3) - mova m0, [srcq] - mova m1, [srcq+mmsize] - ABS2 m0, m1, m3, m4 -%else ; ssse3 - ; using memory args is faster for ssse3 - pabsw m0, [srcq] - pabsw m1, [srcq+mmsize] -%endif - por m2, m0 - por m2, m1 -%endif - add srcq, mmsize*2 - sub lend, mmsize - ja .loop -%ifidn %1, min_max - ABS2 m2, m3, m0, m1 - por m2, m3 -%endif - OR_WORDS_HORIZ m2, m0 - movd eax, m2 - and eax, 0xFFFF - RET -%endmacro - -INIT_MMX mmx -AC3_MAX_MSB_ABS_INT16 or_abs -INIT_MMX mmxext -AC3_MAX_MSB_ABS_INT16 min_max -INIT_XMM sse2 -AC3_MAX_MSB_ABS_INT16 min_max -INIT_XMM ssse3 -AC3_MAX_MSB_ABS_INT16 or_abs - -;----------------------------------------------------------------------------- -; macro used for ff_ac3_lshift_int16() and ff_ac3_rshift_int32() -;----------------------------------------------------------------------------- - -%macro AC3_SHIFT 3 ; l/r, 16/32, shift instruction, instruction set -cglobal ac3_%1shift_int%2, 3, 3, 5, src, len, shift - movd m0, shiftd -.loop: - mova m1, [srcq ] - mova m2, [srcq+mmsize ] - mova m3, [srcq+mmsize*2] - mova m4, [srcq+mmsize*3] - %3 m1, m0 - %3 m2, m0 - %3 m3, m0 - %3 m4, m0 - mova [srcq ], m1 - mova [srcq+mmsize ], m2 - mova [srcq+mmsize*2], m3 - mova [srcq+mmsize*3], m4 - add srcq, mmsize*4 - sub lend, mmsize*32/%2 - ja .loop -.end: - REP_RET -%endmacro - -;----------------------------------------------------------------------------- -; void ff_ac3_lshift_int16(int16_t *src, unsigned int len, unsigned int shift) -;----------------------------------------------------------------------------- - -INIT_MMX mmx -AC3_SHIFT l, 16, psllw -INIT_XMM sse2 -AC3_SHIFT l, 16, psllw - -;----------------------------------------------------------------------------- -; void ff_ac3_rshift_int32(int32_t *src, unsigned int len, unsigned int shift) -;----------------------------------------------------------------------------- - -INIT_MMX mmx -AC3_SHIFT r, 32, psrad -INIT_XMM sse2 -AC3_SHIFT r, 32, psrad - ;----------------------------------------------------------------------------- ; void ff_float_to_fixed24(int32_t *dst, const float *src, unsigned int len) ;----------------------------------------------------------------------------- @@ -423,130 +292,3 @@ AC3_EXTRACT_EXPONENTS INIT_XMM ssse3 AC3_EXTRACT_EXPONENTS %endif - -;----------------------------------------------------------------------------- -; void ff_apply_window_int16(int16_t *output, const int16_t *input, -; const int16_t *window, unsigned int len) -;----------------------------------------------------------------------------- - -%macro REVERSE_WORDS 1-2 -%if cpuflag(ssse3) && notcpuflag(atom) - pshufb %1, %2 -%elif cpuflag(sse2) - pshuflw %1, %1, 0x1B - pshufhw %1, %1, 0x1B - pshufd %1, %1, 0x4E -%elif cpuflag(mmxext) - pshufw %1, %1, 0x1B -%endif -%endmacro - -%macro MUL16FIXED 3 -%if cpuflag(ssse3) ; dst, src, unused -; dst = ((dst * src) + (1<<14)) >> 15 - pmulhrsw %1, %2 -%elif cpuflag(mmxext) ; dst, src, temp -; dst = (dst * src) >> 15 -; pmulhw cuts off the bottom bit, so we have to lshift by 1 and add it back -; in from the pmullw result. - mova %3, %1 - pmulhw %1, %2 - pmullw %3, %2 - psrlw %3, 15 - psllw %1, 1 - por %1, %3 -%endif -%endmacro - -%macro APPLY_WINDOW_INT16 1 ; %1 bitexact version -%if %1 -cglobal apply_window_int16, 4,5,6, output, input, window, offset, offset2 -%else -cglobal apply_window_int16_round, 4,5,6, output, input, window, offset, offset2 -%endif - lea offset2q, [offsetq-mmsize] -%if cpuflag(ssse3) && notcpuflag(atom) - mova m5, [pb_revwords] - ALIGN 16 -%elif %1 - mova m5, [pd_16384] -%endif -.loop: -%if cpuflag(ssse3) - ; This version does the 16x16->16 multiplication in-place without expanding - ; to 32-bit. The ssse3 version is bit-identical. - mova m0, [windowq+offset2q] - mova m1, [ inputq+offset2q] - pmulhrsw m1, m0 - REVERSE_WORDS m0, m5 - pmulhrsw m0, [ inputq+offsetq ] - mova [outputq+offset2q], m1 - mova [outputq+offsetq ], m0 -%elif %1 - ; This version expands 16-bit to 32-bit, multiplies by the window, - ; adds 16384 for rounding, right shifts 15, then repacks back to words to - ; save to the output. The window is reversed for the second half. - mova m3, [windowq+offset2q] - mova m4, [ inputq+offset2q] - pxor m0, m0 - punpcklwd m0, m3 - punpcklwd m1, m4 - pmaddwd m0, m1 - paddd m0, m5 - psrad m0, 15 - pxor m2, m2 - punpckhwd m2, m3 - punpckhwd m1, m4 - pmaddwd m2, m1 - paddd m2, m5 - psrad m2, 15 - packssdw m0, m2 - mova [outputq+offset2q], m0 - REVERSE_WORDS m3 - mova m4, [ inputq+offsetq] - pxor m0, m0 - punpcklwd m0, m3 - punpcklwd m1, m4 - pmaddwd m0, m1 - paddd m0, m5 - psrad m0, 15 - pxor m2, m2 - punpckhwd m2, m3 - punpckhwd m1, m4 - pmaddwd m2, m1 - paddd m2, m5 - psrad m2, 15 - packssdw m0, m2 - mova [outputq+offsetq], m0 -%else - ; This version does the 16x16->16 multiplication in-place without expanding - ; to 32-bit. The mmxext and sse2 versions do not use rounding, and - ; therefore are not bit-identical to the C version. - mova m0, [windowq+offset2q] - mova m1, [ inputq+offset2q] - mova m2, [ inputq+offsetq ] - MUL16FIXED m1, m0, m3 - REVERSE_WORDS m0 - MUL16FIXED m2, m0, m3 - mova [outputq+offset2q], m1 - mova [outputq+offsetq ], m2 -%endif - add offsetd, mmsize - sub offset2d, mmsize - jae .loop - REP_RET -%endmacro - -INIT_MMX mmxext -APPLY_WINDOW_INT16 0 -INIT_XMM sse2 -APPLY_WINDOW_INT16 0 - -INIT_MMX mmxext -APPLY_WINDOW_INT16 1 -INIT_XMM sse2 -APPLY_WINDOW_INT16 1 -INIT_XMM ssse3 -APPLY_WINDOW_INT16 1 -INIT_XMM ssse3, atom -APPLY_WINDOW_INT16 1 diff --git a/libavcodec/x86/ac3dsp_init.c b/libavcodec/x86/ac3dsp_init.c index 2e7e2fb6da..2ae762af46 100644 --- a/libavcodec/x86/ac3dsp_init.c +++ b/libavcodec/x86/ac3dsp_init.c @@ -30,17 +30,6 @@ void ff_ac3_exponent_min_mmx (uint8_t *exp, int num_reuse_blocks, int nb_coefs void ff_ac3_exponent_min_mmxext(uint8_t *exp, int num_reuse_blocks, int nb_coefs); void ff_ac3_exponent_min_sse2 (uint8_t *exp, int num_reuse_blocks, int nb_coefs); -int ff_ac3_max_msb_abs_int16_mmx (const int16_t *src, int len); -int ff_ac3_max_msb_abs_int16_mmxext(const int16_t *src, int len); -int ff_ac3_max_msb_abs_int16_sse2 (const int16_t *src, int len); -int ff_ac3_max_msb_abs_int16_ssse3(const int16_t *src, int len); - -void ff_ac3_lshift_int16_mmx (int16_t *src, unsigned int len, unsigned int shift); -void ff_ac3_lshift_int16_sse2(int16_t *src, unsigned int len, unsigned int shift); - -void ff_ac3_rshift_int32_mmx (int32_t *src, unsigned int len, unsigned int shift); -void ff_ac3_rshift_int32_sse2(int32_t *src, unsigned int len, unsigned int shift); - void ff_float_to_fixed24_3dnow(int32_t *dst, const float *src, unsigned int len); void ff_float_to_fixed24_sse (int32_t *dst, const float *src, unsigned int len); void ff_float_to_fixed24_sse2 (int32_t *dst, const float *src, unsigned int len); @@ -50,28 +39,12 @@ int ff_ac3_compute_mantissa_size_sse2(uint16_t mant_cnt[6][16]); void ff_ac3_extract_exponents_sse2 (uint8_t *exp, int32_t *coef, int nb_coefs); void ff_ac3_extract_exponents_ssse3(uint8_t *exp, int32_t *coef, int nb_coefs); -void ff_apply_window_int16_round_mmxext(int16_t *output, const int16_t *input, - const int16_t *window, unsigned int len); -void ff_apply_window_int16_round_sse2(int16_t *output, const int16_t *input, - const int16_t *window, unsigned int len); -void ff_apply_window_int16_mmxext(int16_t *output, const int16_t *input, - const int16_t *window, unsigned int len); -void ff_apply_window_int16_sse2(int16_t *output, const int16_t *input, - const int16_t *window, unsigned int len); -void ff_apply_window_int16_ssse3(int16_t *output, const int16_t *input, - const int16_t *window, unsigned int len); -void ff_apply_window_int16_ssse3_atom(int16_t *output, const int16_t *input, - const int16_t *window, unsigned int len); - av_cold void ff_ac3dsp_init_x86(AC3DSPContext *c, int bit_exact) { int cpu_flags = av_get_cpu_flags(); if (EXTERNAL_MMX(cpu_flags)) { c->ac3_exponent_min = ff_ac3_exponent_min_mmx; - c->ac3_max_msb_abs_int16 = ff_ac3_max_msb_abs_int16_mmx; - c->ac3_lshift_int16 = ff_ac3_lshift_int16_mmx; - c->ac3_rshift_int32 = ff_ac3_rshift_int32_mmx; } if (EXTERNAL_AMD3DNOW(cpu_flags)) { if (!bit_exact) { @@ -80,43 +53,20 @@ av_cold void ff_ac3dsp_init_x86(AC3DSPContext *c, int bit_exact) } if (EXTERNAL_MMXEXT(cpu_flags)) { c->ac3_exponent_min = ff_ac3_exponent_min_mmxext; - c->ac3_max_msb_abs_int16 = ff_ac3_max_msb_abs_int16_mmxext; - if (bit_exact) { - c->apply_window_int16 = ff_apply_window_int16_mmxext; - } else { - c->apply_window_int16 = ff_apply_window_int16_round_mmxext; - } } if (EXTERNAL_SSE(cpu_flags)) { c->float_to_fixed24 = ff_float_to_fixed24_sse; } if (EXTERNAL_SSE2(cpu_flags)) { c->ac3_exponent_min = ff_ac3_exponent_min_sse2; - c->ac3_max_msb_abs_int16 = ff_ac3_max_msb_abs_int16_sse2; c->float_to_fixed24 = ff_float_to_fixed24_sse2; c->compute_mantissa_size = ff_ac3_compute_mantissa_size_sse2; c->extract_exponents = ff_ac3_extract_exponents_sse2; - if (bit_exact) { - c->apply_window_int16 = ff_apply_window_int16_sse2; - } - } - - if (EXTERNAL_SSE2_FAST(cpu_flags)) { - c->ac3_lshift_int16 = ff_ac3_lshift_int16_sse2; - c->ac3_rshift_int32 = ff_ac3_rshift_int32_sse2; - if (!bit_exact) { - c->apply_window_int16 = ff_apply_window_int16_round_sse2; - } } if (EXTERNAL_SSSE3(cpu_flags)) { - c->ac3_max_msb_abs_int16 = ff_ac3_max_msb_abs_int16_ssse3; - if (cpu_flags & AV_CPU_FLAG_ATOM) { - c->apply_window_int16 = ff_apply_window_int16_ssse3_atom; - } else { + if (!(cpu_flags & AV_CPU_FLAG_ATOM)) c->extract_exponents = ff_ac3_extract_exponents_ssse3; - c->apply_window_int16 = ff_apply_window_int16_ssse3; - } } } From patchwork Tue Jan 12 18:14:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lynne X-Patchwork-Id: 24930 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 82C56449F15 for ; Tue, 12 Jan 2021 20:14:35 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 69B7768ABC0; Tue, 12 Jan 2021 20:14:35 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0747968AB7E for ; Tue, 12 Jan 2021 20:14:29 +0200 (EET) Received: from w3.tutanota.de (unknown [192.168.1.164]) by w4.tutanota.de (Postfix) with ESMTP id A876710602E3 for ; Tue, 12 Jan 2021 18:14:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1610475268; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:Sender; bh=8UQmUcH5KSWW+vj+bNzBYYdUpQ6CqlQCZs8I4YRiPig=; b=Hx3FDA6qX/vovxQENjrjM4UuMfOPEe51uI/tSy9DwMbKV6yt8rSLMFJFhDdg32cB DUEY+OkibaTVP8ZqQsGOk5sGi5p0cvnrr1+L8d6Fk08USk7H+BdM+Xn19Bq0+0rLrAx 7G+u6dYR3Ka2/JrmkuDC11ZdqKKTr+IeN2ddriM0wqEA9IKWnRwEdRxsbA2vUfWLyRi r/1Oat+yrJrF25ZIqL5SZ2A5zO0jncfszaa129DI8shRFmCTge5ZtnvpZ0VWuQRcnAL y+AfiGKyslHwBNNoyAmxHd/FUpF0utXdsak0LDqL4lDQ0mGVpiufywTrk5FzZpJL6JG b96HMCnlbA== Date: Tue, 12 Jan 2021 19:14:28 +0100 (CET) From: Lynne To: Ffmpeg Devel Message-ID: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 5/5] fft: remove 16-bit FFT and MDCT code X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" No longer used by anything. Unfortunately the old FFT_FLOAT/FFT_FIXED_32 is left as-is. It's simply too much work for code meant to be all removed anyway. Merged the 2 patches and fixed all tests. Subject: [PATCH v2 5/5] fft: remove 16-bit FFT and MDCT code No longer used by anything. Unfortunately the old FFT_FLOAT/FFT_FIXED_32 is left as-is. It's simply too much work for code meant to be all removed anyway. --- libavcodec/Makefile | 11 +- libavcodec/arm/Makefile | 9 +- libavcodec/arm/fft_fixed_init_arm.c | 50 ------ libavcodec/arm/fft_fixed_neon.S | 261 ---------------------------- libavcodec/arm/mdct_fixed_neon.S | 193 -------------------- libavcodec/fft-internal.h | 29 +--- libavcodec/fft.h | 9 - libavcodec/fft_fixed.c | 21 --- libavcodec/fft_template.c | 4 - libavcodec/mdct_fixed.c | 65 ------- libavcodec/tests/.gitignore | 1 - libavcodec/tests/fft-fixed.c | 21 --- tests/fate/fft.mak | 30 +--- 13 files changed, 14 insertions(+), 690 deletions(-) delete mode 100644 libavcodec/arm/fft_fixed_init_arm.c delete mode 100644 libavcodec/arm/fft_fixed_neon.S delete mode 100644 libavcodec/arm/mdct_fixed_neon.S delete mode 100644 libavcodec/fft_fixed.c delete mode 100644 libavcodec/mdct_fixed.c delete mode 100644 libavcodec/tests/fft-fixed.c diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 0546e6f6c5..446e6e6b3b 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -83,10 +83,9 @@ OBJS-$(CONFIG_EXIF) += exif.o tiff_common.o OBJS-$(CONFIG_FAANDCT) += faandct.o OBJS-$(CONFIG_FAANIDCT) += faanidct.o OBJS-$(CONFIG_FDCTDSP) += fdctdsp.o jfdctfst.o jfdctint.o -FFT-OBJS-$(CONFIG_HARDCODED_TABLES) += cos_tables.o cos_fixed_tables.o -OBJS-$(CONFIG_FFT) += avfft.o fft_fixed.o fft_float.o \ - fft_fixed_32.o fft_init_table.o \ - $(FFT-OBJS-yes) +FFT-OBJS-$(CONFIG_HARDCODED_TABLES) += cos_tables.o +OBJS-$(CONFIG_FFT) += avfft.o fft_float.o fft_fixed_32.o \ + fft_init_table.o $(FFT-OBJS-yes) OBJS-$(CONFIG_FLACDSP) += flacdsp.o OBJS-$(CONFIG_FMTCONVERT) += fmtconvert.o OBJS-$(CONFIG_GOLOMB) += golomb.o @@ -115,7 +114,7 @@ OBJS-$(CONFIG_LLVIDENCDSP) += lossless_videoencdsp.o OBJS-$(CONFIG_LPC) += lpc.o OBJS-$(CONFIG_LSP) += lsp.o OBJS-$(CONFIG_LZF) += lzf.o -OBJS-$(CONFIG_MDCT) += mdct_fixed.o mdct_float.o mdct_fixed_32.o +OBJS-$(CONFIG_MDCT) += mdct_float.o mdct_fixed_32.o OBJS-$(CONFIG_ME_CMP) += me_cmp.o OBJS-$(CONFIG_MEDIACODEC) += mediacodecdec_common.o mediacodec_surface.o mediacodec_wrapper.o mediacodec_sw_buffer.o OBJS-$(CONFIG_MPEG_ER) += mpeg_er.o @@ -1217,7 +1216,7 @@ TESTPROGS = avpacket \ TESTPROGS-$(CONFIG_CABAC) += cabac TESTPROGS-$(CONFIG_DCT) += avfft -TESTPROGS-$(CONFIG_FFT) += fft fft-fixed fft-fixed32 +TESTPROGS-$(CONFIG_FFT) += fft fft-fixed32 TESTPROGS-$(CONFIG_GOLOMB) += golomb TESTPROGS-$(CONFIG_IDCTDSP) += dct TESTPROGS-$(CONFIG_IIRFILTER) += iirfilter diff --git a/libavcodec/arm/Makefile b/libavcodec/arm/Makefile index c6be814153..c4ab93aeeb 100644 --- a/libavcodec/arm/Makefile +++ b/libavcodec/arm/Makefile @@ -5,8 +5,7 @@ OBJS-$(CONFIG_AC3DSP) += arm/ac3dsp_init_arm.o \ arm/ac3dsp_arm.o OBJS-$(CONFIG_AUDIODSP) += arm/audiodsp_init_arm.o OBJS-$(CONFIG_BLOCKDSP) += arm/blockdsp_init_arm.o -OBJS-$(CONFIG_FFT) += arm/fft_init_arm.o \ - arm/fft_fixed_init_arm.o +OBJS-$(CONFIG_FFT) += arm/fft_init_arm.o OBJS-$(CONFIG_FLACDSP) += arm/flacdsp_init_arm.o \ arm/flacdsp_arm.o OBJS-$(CONFIG_FMTCONVERT) += arm/fmtconvert_init_arm.o @@ -108,8 +107,7 @@ NEON-OBJS-$(CONFIG_AUDIODSP) += arm/audiodsp_init_neon.o \ arm/int_neon.o NEON-OBJS-$(CONFIG_BLOCKDSP) += arm/blockdsp_init_neon.o \ arm/blockdsp_neon.o -NEON-OBJS-$(CONFIG_FFT) += arm/fft_neon.o \ - arm/fft_fixed_neon.o +NEON-OBJS-$(CONFIG_FFT) += arm/fft_neon.o NEON-OBJS-$(CONFIG_FMTCONVERT) += arm/fmtconvert_neon.o NEON-OBJS-$(CONFIG_G722DSP) += arm/g722dsp_neon.o NEON-OBJS-$(CONFIG_H264CHROMA) += arm/h264cmc_neon.o @@ -123,8 +121,7 @@ NEON-OBJS-$(CONFIG_HPELDSP) += arm/hpeldsp_init_neon.o \ NEON-OBJS-$(CONFIG_IDCTDSP) += arm/idctdsp_init_neon.o \ arm/idctdsp_neon.o \ arm/simple_idct_neon.o -NEON-OBJS-$(CONFIG_MDCT) += arm/mdct_neon.o \ - arm/mdct_fixed_neon.o +NEON-OBJS-$(CONFIG_MDCT) += arm/mdct_neon.o NEON-OBJS-$(CONFIG_MPEGVIDEO) += arm/mpegvideo_neon.o NEON-OBJS-$(CONFIG_PIXBLOCKDSP) += arm/pixblockdsp_neon.o NEON-OBJS-$(CONFIG_RDFT) += arm/rdft_neon.o diff --git a/libavcodec/arm/fft_fixed_init_arm.c b/libavcodec/arm/fft_fixed_init_arm.c deleted file mode 100644 index 11226d65ff..0000000000 --- a/libavcodec/arm/fft_fixed_init_arm.c +++ /dev/null @@ -1,50 +0,0 @@ -/* - * Copyright (c) 2009 Mans Rullgard - * - * This file is part of FFmpeg. - * - * FFmpeg is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; either - * version 2.1 of the License, or (at your option) any later version. - * - * FFmpeg is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with FFmpeg; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA - */ - -#include "libavutil/attributes.h" -#include "libavutil/cpu.h" -#include "libavutil/arm/cpu.h" - -#define FFT_FLOAT 0 -#include "libavcodec/fft.h" - -void ff_fft_fixed_calc_neon(FFTContext *s, FFTComplex *z); -void ff_mdct_fixed_calc_neon(FFTContext *s, FFTSample *o, const FFTSample *i); -void ff_mdct_fixed_calcw_neon(FFTContext *s, FFTDouble *o, const FFTSample *i); - -av_cold void ff_fft_fixed_init_arm(FFTContext *s) -{ - int cpu_flags = av_get_cpu_flags(); - - if (have_neon(cpu_flags)) { - s->fft_permutation = FF_FFT_PERM_SWAP_LSBS; -#if CONFIG_FFT - s->fft_calc = ff_fft_fixed_calc_neon; -#endif - -#if CONFIG_MDCT - if (!s->inverse && s->nbits >= 3) { - s->mdct_permutation = FF_MDCT_PERM_INTERLEAVE; - s->mdct_calc = ff_mdct_fixed_calc_neon; - s->mdct_calcw = ff_mdct_fixed_calcw_neon; - } -#endif - } -} diff --git a/libavcodec/arm/fft_fixed_neon.S b/libavcodec/arm/fft_fixed_neon.S deleted file mode 100644 index 2651607544..0000000000 --- a/libavcodec/arm/fft_fixed_neon.S +++ /dev/null @@ -1,261 +0,0 @@ -/* - * Copyright (c) 2011 Mans Rullgard - * - * This file is part of FFmpeg. - * - * FFmpeg is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; either - * version 2.1 of the License, or (at your option) any later version. - * - * FFmpeg is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with FFmpeg; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA - */ - -#include "libavutil/arm/asm.S" - -.macro bflies d0, d1, r0, r1 - vrev64.32 \r0, \d1 @ t5, t6, t1, t2 - vhsub.s16 \r1, \d1, \r0 @ t1-t5, t2-t6, t5-t1, t6-t2 - vhadd.s16 \r0, \d1, \r0 @ t1+t5, t2+t6, t5+t1, t6+t2 - vext.16 \r1, \r1, \r1, #1 @ t2-t6, t5-t1, t6-t2, t1-t5 - vtrn.32 \r0, \r1 @ t1+t5, t2+t6, t2-t6, t5-t1 - @ t5, t6, t4, t3 - vhsub.s16 \d1, \d0, \r0 - vhadd.s16 \d0, \d0, \r0 -.endm - -.macro transform01 q0, q1, d3, c0, c1, r0, w0, w1 - vrev32.16 \r0, \d3 - vmull.s16 \w0, \d3, \c0 - vmlal.s16 \w0, \r0, \c1 - vshrn.s32 \d3, \w0, #15 - bflies \q0, \q1, \w0, \w1 -.endm - -.macro transform2 d0, d1, d2, d3, q0, q1, c0, c1, c2, c3, \ - r0, r1, w0, w1 - vrev32.16 \r0, \d1 - vrev32.16 \r1, \d3 - vmull.s16 \w0, \d1, \c0 - vmlal.s16 \w0, \r0, \c1 - vmull.s16 \w1, \d3, \c2 - vmlal.s16 \w1, \r1, \c3 - vshrn.s32 \d1, \w0, #15 - vshrn.s32 \d3, \w1, #15 - bflies \q0, \q1, \w0, \w1 -.endm - -.macro fft4 d0, d1, r0, r1 - vhsub.s16 \r0, \d0, \d1 @ t3, t4, t8, t7 - vhsub.s16 \r1, \d1, \d0 - vhadd.s16 \d0, \d0, \d1 @ t1, t2, t6, t5 - vmov.i64 \d1, #0xffff00000000 - vbit \r0, \r1, \d1 - vrev64.16 \r1, \r0 @ t7, t8, t4, t3 - vtrn.32 \r0, \r1 @ t3, t4, t7, t8 - vtrn.32 \d0, \r0 @ t1, t2, t3, t4, t6, t5, t8, t7 - vhsub.s16 \d1, \d0, \r0 @ r2, i2, r3, i1 - vhadd.s16 \d0, \d0, \r0 @ r0, i0, r1, i3 -.endm - -.macro fft8 d0, d1, d2, d3, q0, q1, c0, c1, r0, r1, w0, w1 - fft4 \d0, \d1, \r0, \r1 - vtrn.32 \d0, \d1 @ z0, z2, z1, z3 - vhadd.s16 \r0, \d2, \d3 @ t1, t2, t3, t4 - vhsub.s16 \d3, \d2, \d3 @ z5, z7 - vmov \d2, \r0 - transform01 \q0, \q1, \d3, \c0, \c1, \r0, \w0, \w1 -.endm - -function fft4_neon - vld1.16 {d0-d1}, [r0] - fft4 d0, d1, d2, d3 - vst1.16 {d0-d1}, [r0] - bx lr -endfunc - -function fft8_neon - vld1.16 {d0-d3}, [r0,:128] - movrel r1, coefs - vld1.16 {d30}, [r1,:64] - vdup.16 d31, d30[0] - fft8 d0, d1, d2, d3, q0, q1, d31, d30, d20, d21, q8, q9 - vtrn.32 d0, d1 - vtrn.32 d2, d3 - vst1.16 {d0-d3}, [r0,:128] - bx lr -endfunc - -function fft16_neon - vld1.16 {d0-d3}, [r0,:128]! - vld1.16 {d4-d7}, [r0,:128] - movrel r1, coefs - sub r0, r0, #32 - vld1.16 {d28-d31},[r1,:128] - vdup.16 d31, d28[0] - fft8 d0, d1, d2, d3, q0, q1, d31, d28, d20, d21, q8, q9 - vswp d5, d6 - fft4 q2, q3, q8, q9 - vswp d5, d6 - vtrn.32 q0, q1 @ z0, z4, z2, z6, z1, z5, z3, z7 - vtrn.32 q2, q3 @ z8, z12,z10,z14,z9, z13,z11,z15 - vswp d1, d2 - vdup.16 d31, d28[0] - transform01 q0, q2, d5, d31, d28, d20, q8, q9 - vdup.16 d26, d29[0] - vdup.16 d27, d30[0] - transform2 d2, d6, d3, d7, q1, q3, d26, d30, d27, d29, \ - d20, d21, q8, q9 - vtrn.32 q0, q1 - vtrn.32 q2, q3 - vst1.16 {d0-d3}, [r0,:128]! - vst1.16 {d4-d7}, [r0,:128] - bx lr -endfunc - -function fft_pass_neon - push {r4,lr} - movrel lr, coefs + 24 - vld1.16 {d30}, [lr,:64] - lsl r12, r2, #3 - vmov d31, d30 - add r3, r1, r2, lsl #2 - mov lr, #-8 - sub r3, r3, #2 - mov r4, r0 - vld1.16 {d27[]}, [r3,:16] - sub r3, r3, #6 - vld1.16 {q0}, [r4,:128], r12 - vld1.16 {q1}, [r4,:128], r12 - vld1.16 {q2}, [r4,:128], r12 - vld1.16 {q3}, [r4,:128], r12 - vld1.16 {d28}, [r1,:64]! - vld1.16 {d29}, [r3,:64], lr - vswp d1, d2 - vswp d5, d6 - vtrn.32 d0, d1 - vtrn.32 d4, d5 - vdup.16 d25, d28[1] - vmul.s16 d27, d27, d31 - transform01 q0, q2, d5, d25, d27, d20, q8, q9 - b 2f -1: - mov r4, r0 - vdup.16 d26, d29[0] - vld1.16 {q0}, [r4,:128], r12 - vld1.16 {q1}, [r4,:128], r12 - vld1.16 {q2}, [r4,:128], r12 - vld1.16 {q3}, [r4,:128], r12 - vld1.16 {d28}, [r1,:64]! - vld1.16 {d29}, [r3,:64], lr - vswp d1, d2 - vswp d5, d6 - vtrn.32 d0, d1 - vtrn.32 d4, d5 - vdup.16 d24, d28[0] - vdup.16 d25, d28[1] - vdup.16 d27, d29[3] - vmul.s16 q13, q13, q15 - transform2 d0, d4, d1, d5, q0, q2, d24, d26, d25, d27, \ - d16, d17, q9, q10 -2: - vtrn.32 d2, d3 - vtrn.32 d6, d7 - vdup.16 d24, d28[2] - vdup.16 d26, d29[2] - vdup.16 d25, d28[3] - vdup.16 d27, d29[1] - vmul.s16 q13, q13, q15 - transform2 d2, d6, d3, d7, q1, q3, d24, d26, d25, d27, \ - d16, d17, q9, q10 - vtrn.32 d0, d1 - vtrn.32 d2, d3 - vtrn.32 d4, d5 - vtrn.32 d6, d7 - vswp d1, d2 - vswp d5, d6 - mov r4, r0 - vst1.16 {q0}, [r4,:128], r12 - vst1.16 {q1}, [r4,:128], r12 - vst1.16 {q2}, [r4,:128], r12 - vst1.16 {q3}, [r4,:128], r12 - add r0, r0, #16 - subs r2, r2, #2 - bgt 1b - pop {r4,pc} -endfunc - -#define F_SQRT1_2 23170 -#define F_COS_16_1 30274 -#define F_COS_16_3 12540 - -const coefs, align=4 - .short F_SQRT1_2, -F_SQRT1_2, -F_SQRT1_2, F_SQRT1_2 - .short F_COS_16_1,-F_COS_16_1,-F_COS_16_1, F_COS_16_1 - .short F_COS_16_3,-F_COS_16_3,-F_COS_16_3, F_COS_16_3 - .short 1, -1, -1, 1 -endconst - -.macro def_fft n, n2, n4 -function fft\n\()_neon - push {r4, lr} - mov r4, r0 - bl fft\n2\()_neon - add r0, r4, #\n4*2*4 - bl fft\n4\()_neon - add r0, r4, #\n4*3*4 - bl fft\n4\()_neon - mov r0, r4 - pop {r4, lr} - movrelx r1, X(ff_cos_\n\()_fixed) - mov r2, #\n4/2 - b fft_pass_neon -endfunc -.endm - - def_fft 32, 16, 8 - def_fft 64, 32, 16 - def_fft 128, 64, 32 - def_fft 256, 128, 64 - def_fft 512, 256, 128 - def_fft 1024, 512, 256 - def_fft 2048, 1024, 512 - def_fft 4096, 2048, 1024 - def_fft 8192, 4096, 2048 - def_fft 16384, 8192, 4096 - def_fft 32768, 16384, 8192 - def_fft 65536, 32768, 16384 - -function ff_fft_fixed_calc_neon, export=1 - ldr r2, [r0] - sub r2, r2, #2 - movrel r3, fft_fixed_tab_neon - ldr r3, [r3, r2, lsl #2] - mov r0, r1 - bx r3 -endfunc - -const fft_fixed_tab_neon, relocate=1 - .word fft4_neon - .word fft8_neon - .word fft16_neon - .word fft32_neon - .word fft64_neon - .word fft128_neon - .word fft256_neon - .word fft512_neon - .word fft1024_neon - .word fft2048_neon - .word fft4096_neon - .word fft8192_neon - .word fft16384_neon - .word fft32768_neon - .word fft65536_neon -endconst diff --git a/libavcodec/arm/mdct_fixed_neon.S b/libavcodec/arm/mdct_fixed_neon.S deleted file mode 100644 index 365c5e7faf..0000000000 --- a/libavcodec/arm/mdct_fixed_neon.S +++ /dev/null @@ -1,193 +0,0 @@ -/* - * Copyright (c) 2011 Mans Rullgard - * - * This file is part of FFmpeg. - * - * FFmpeg is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; either - * version 2.1 of the License, or (at your option) any later version. - * - * FFmpeg is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with FFmpeg; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA - */ - -#include "libavutil/arm/asm.S" - -.macro prerot dst, rt - lsr r3, r6, #2 @ n4 - add \rt, r4, r6, lsr #1 @ revtab + n4 - add r9, r3, r3, lsl #1 @ n3 - add r8, r7, r6 @ tcos + n4 - add r3, r2, r6, lsr #1 @ in + n4 - add r9, r2, r9, lsl #1 @ in + n3 - sub r8, r8, #16 - sub r10, r3, #16 - sub r11, r9, #16 - mov r12, #-16 -1: - vld2.16 {d0,d1}, [r9, :128]! - vld2.16 {d2,d3}, [r11,:128], r12 - vld2.16 {d4,d5}, [r3, :128]! - vld2.16 {d6,d7}, [r10,:128], r12 - vld2.16 {d16,d17},[r7, :128]! @ cos, sin - vld2.16 {d18,d19},[r8, :128], r12 - vrev64.16 q1, q1 - vrev64.16 q3, q3 - vrev64.16 q9, q9 - vneg.s16 d0, d0 - vneg.s16 d2, d2 - vneg.s16 d16, d16 - vneg.s16 d18, d18 - vhsub.s16 d0, d0, d3 @ re - vhsub.s16 d4, d7, d4 @ im - vhsub.s16 d6, d6, d5 - vhsub.s16 d2, d2, d1 - vmull.s16 q10, d0, d16 - vmlsl.s16 q10, d4, d17 - vmull.s16 q11, d0, d17 - vmlal.s16 q11, d4, d16 - vmull.s16 q12, d6, d18 - vmlsl.s16 q12, d2, d19 - vmull.s16 q13, d6, d19 - vmlal.s16 q13, d2, d18 - vshrn.s32 d0, q10, #15 - vshrn.s32 d1, q11, #15 - vshrn.s32 d2, q12, #15 - vshrn.s32 d3, q13, #15 - vzip.16 d0, d1 - vzip.16 d2, d3 - ldrh lr, [r4], #2 - ldrh r2, [\rt, #-2]! - add lr, \dst, lr, lsl #2 - add r2, \dst, r2, lsl #2 - vst1.32 {d0[0]}, [lr,:32] - vst1.32 {d2[0]}, [r2,:32] - ldrh lr, [r4], #2 - ldrh r2, [\rt, #-2]! - add lr, \dst, lr, lsl #2 - add r2, \dst, r2, lsl #2 - vst1.32 {d0[1]}, [lr,:32] - vst1.32 {d2[1]}, [r2,:32] - ldrh lr, [r4], #2 - ldrh r2, [\rt, #-2]! - add lr, \dst, lr, lsl #2 - add r2, \dst, r2, lsl #2 - vst1.32 {d1[0]}, [lr,:32] - vst1.32 {d3[0]}, [r2,:32] - ldrh lr, [r4], #2 - ldrh r2, [\rt, #-2]! - add lr, \dst, lr, lsl #2 - add r2, \dst, r2, lsl #2 - vst1.32 {d1[1]}, [lr,:32] - vst1.32 {d3[1]}, [r2,:32] - subs r6, r6, #32 - bgt 1b -.endm - -function ff_mdct_fixed_calc_neon, export=1 - push {r1,r4-r11,lr} - - ldr r4, [r0, #8] @ revtab - ldr r6, [r0, #16] @ mdct_size; n - ldr r7, [r0, #24] @ tcos - - prerot r1, r5 - - mov r4, r0 - bl X(ff_fft_fixed_calc_neon) - - pop {r5} - mov r12, #-16 - ldr r6, [r4, #16] @ mdct_size; n - ldr r7, [r4, #24] @ tcos - add r5, r5, r6, lsr #1 - add r7, r7, r6, lsr #1 - sub r1, r5, #16 - sub r2, r7, #16 -1: - vld2.16 {d4,d5}, [r7,:128]! - vld2.16 {d6,d7}, [r2,:128], r12 - vld2.16 {d0,d1}, [r5,:128] - vld2.16 {d2,d3}, [r1,:128] - vrev64.16 q3, q3 - vrev64.16 q1, q1 - vneg.s16 q3, q3 - vneg.s16 q2, q2 - vmull.s16 q11, d2, d6 - vmlal.s16 q11, d3, d7 - vmull.s16 q8, d0, d5 - vmlsl.s16 q8, d1, d4 - vmull.s16 q9, d0, d4 - vmlal.s16 q9, d1, d5 - vmull.s16 q10, d2, d7 - vmlsl.s16 q10, d3, d6 - vshrn.s32 d0, q11, #15 - vshrn.s32 d1, q8, #15 - vshrn.s32 d2, q9, #15 - vshrn.s32 d3, q10, #15 - vrev64.16 q0, q0 - vst2.16 {d2,d3}, [r5,:128]! - vst2.16 {d0,d1}, [r1,:128], r12 - subs r6, r6, #32 - bgt 1b - - pop {r4-r11,pc} -endfunc - -function ff_mdct_fixed_calcw_neon, export=1 - push {r1,r4-r11,lr} - - ldrd r4, r5, [r0, #8] @ revtab, tmp_buf - ldr r6, [r0, #16] @ mdct_size; n - ldr r7, [r0, #24] @ tcos - - prerot r5, r1 - - mov r4, r0 - mov r1, r5 - bl X(ff_fft_fixed_calc_neon) - - pop {r7} - mov r12, #-16 - ldr r6, [r4, #16] @ mdct_size; n - ldr r9, [r4, #24] @ tcos - add r5, r5, r6, lsr #1 - add r7, r7, r6 - add r9, r9, r6, lsr #1 - sub r3, r5, #16 - sub r1, r7, #16 - sub r2, r9, #16 -1: - vld2.16 {d4,d5}, [r9,:128]! - vld2.16 {d6,d7}, [r2,:128], r12 - vld2.16 {d0,d1}, [r5,:128]! - vld2.16 {d2,d3}, [r3,:128], r12 - vrev64.16 q3, q3 - vrev64.16 q1, q1 - vneg.s16 q3, q3 - vneg.s16 q2, q2 - vmull.s16 q8, d2, d6 - vmlal.s16 q8, d3, d7 - vmull.s16 q9, d0, d5 - vmlsl.s16 q9, d1, d4 - vmull.s16 q10, d0, d4 - vmlal.s16 q10, d1, d5 - vmull.s16 q11, d2, d7 - vmlsl.s16 q11, d3, d6 - vrev64.32 q8, q8 - vrev64.32 q9, q9 - vst2.32 {q10,q11},[r7,:128]! - vst2.32 {d16,d18},[r1,:128], r12 - vst2.32 {d17,d19},[r1,:128], r12 - subs r6, r6, #32 - bgt 1b - - pop {r4-r11,pc} -endfunc diff --git a/libavcodec/fft-internal.h b/libavcodec/fft-internal.h index 0a8f7d05cf..3bd5a1123d 100644 --- a/libavcodec/fft-internal.h +++ b/libavcodec/fft-internal.h @@ -34,7 +34,7 @@ (dim) = (are) * (bim) + (aim) * (bre); \ } while (0) -#else +#else /* FFT_FLOAT */ #define SCALE_FLOAT(a, bits) lrint((a) * (double)(1 << (bits))) @@ -52,33 +52,6 @@ #define FIX15(a) av_clip(SCALE_FLOAT(a, 31), -2147483647, 2147483647) -#else /* FFT_FIXED_32 */ - -#include "fft.h" -#include "mathops.h" - -void ff_mdct_calcw_c(FFTContext *s, FFTDouble *output, const FFTSample *input); - -#define FIX15(a) av_clip(SCALE_FLOAT(a, 15), -32767, 32767) - -#define sqrthalf ((int16_t)((1<<15)*M_SQRT1_2)) - -#define BF(x, y, a, b) do { \ - x = (a - b) >> 1; \ - y = (a + b) >> 1; \ - } while (0) - -#define CMULS(dre, dim, are, aim, bre, bim, sh) do { \ - (dre) = (MUL16(are, bre) - MUL16(aim, bim)) >> sh; \ - (dim) = (MUL16(are, bim) + MUL16(aim, bre)) >> sh; \ - } while (0) - -#define CMUL(dre, dim, are, aim, bre, bim) \ - CMULS(dre, dim, are, aim, bre, bim, 15) - -#define CMULL(dre, dim, are, aim, bre, bim) \ - CMULS(dre, dim, are, aim, bre, bim, 0) - #endif /* FFT_FIXED_32 */ #endif /* FFT_FLOAT */ diff --git a/libavcodec/fft.h b/libavcodec/fft.h index 5f67b61f06..5ca2d18432 100644 --- a/libavcodec/fft.h +++ b/libavcodec/fft.h @@ -52,12 +52,6 @@ typedef float FFTDouble; typedef int32_t FFTSample; -#else /* FFT_FIXED_32 */ - -#define FFT_NAME(x) x ## _fixed - -typedef int16_t FFTSample; - #endif /* FFT_FIXED_32 */ typedef struct FFTComplex { @@ -108,7 +102,6 @@ struct FFTContext { void (*imdct_calc)(struct FFTContext *s, FFTSample *output, const FFTSample *input); void (*imdct_half)(struct FFTContext *s, FFTSample *output, const FFTSample *input); void (*mdct_calc)(struct FFTContext *s, FFTSample *output, const FFTSample *input); - void (*mdct_calcw)(struct FFTContext *s, FFTDouble *output, const FFTSample *input); enum fft_permutation_type fft_permutation; enum mdct_permutation_type mdct_permutation; uint32_t *revtab32; @@ -163,8 +156,6 @@ void ff_fft_init_arm(FFTContext *s); void ff_fft_init_mips(FFTContext *s); void ff_fft_init_ppc(FFTContext *s); -void ff_fft_fixed_init_arm(FFTContext *s); - void ff_fft_end(FFTContext *s); #define ff_mdct_init FFT_NAME(ff_mdct_init) diff --git a/libavcodec/fft_fixed.c b/libavcodec/fft_fixed.c deleted file mode 100644 index 3d3bd2fca6..0000000000 --- a/libavcodec/fft_fixed.c +++ /dev/null @@ -1,21 +0,0 @@ -/* - * This file is part of FFmpeg. - * - * FFmpeg is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; either - * version 2.1 of the License, or (at your option) any later version. - * - * FFmpeg is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with FFmpeg; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA - */ - -#define FFT_FLOAT 0 -#define FFT_FIXED_32 0 -#include "fft_template.c" diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c index e807f4b255..2d05990ca9 100644 --- a/libavcodec/fft_template.c +++ b/libavcodec/fft_template.c @@ -236,11 +236,7 @@ av_cold int ff_fft_init(FFTContext *s, int nbits, int inverse) if (ARCH_ARM) ff_fft_init_arm(s); if (ARCH_PPC) ff_fft_init_ppc(s); if (ARCH_X86) ff_fft_init_x86(s); - if (CONFIG_MDCT) s->mdct_calcw = s->mdct_calc; if (HAVE_MIPSFPU) ff_fft_init_mips(s); -#else - if (CONFIG_MDCT) s->mdct_calcw = ff_mdct_calcw_c; - if (ARCH_ARM) ff_fft_fixed_init_arm(s); #endif for(j=4; j<=nbits; j++) { ff_init_ff_cos_tabs(j); diff --git a/libavcodec/mdct_fixed.c b/libavcodec/mdct_fixed.c deleted file mode 100644 index aabf0c88f8..0000000000 --- a/libavcodec/mdct_fixed.c +++ /dev/null @@ -1,65 +0,0 @@ -/* - * This file is part of FFmpeg. - * - * FFmpeg is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; either - * version 2.1 of the License, or (at your option) any later version. - * - * FFmpeg is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with FFmpeg; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA - */ - -#define FFT_FLOAT 0 -#define FFT_FIXED_32 0 -#include "mdct_template.c" - -/* same as ff_mdct_calcw_c with double-width unscaled output */ -void ff_mdct_calcw_c(FFTContext *s, FFTDouble *out, const FFTSample *input) -{ - int i, j, n, n8, n4, n2, n3; - FFTDouble re, im; - const uint16_t *revtab = s->revtab; - const FFTSample *tcos = s->tcos; - const FFTSample *tsin = s->tsin; - FFTComplex *x = s->tmp_buf; - FFTDComplex *o = (FFTDComplex *)out; - - n = 1 << s->mdct_bits; - n2 = n >> 1; - n4 = n >> 2; - n8 = n >> 3; - n3 = 3 * n4; - - /* pre rotation */ - for(i=0;ifft_calc(s, x); - - /* post rotation */ - for(i=0;i