From patchwork Wed Feb 1 03:13:06 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rostislav Pehlivanov X-Patchwork-Id: 2390 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.89.21 with SMTP id n21csp2247242vsb; Tue, 31 Jan 2017 19:19:35 -0800 (PST) X-Received: by 10.223.139.142 with SMTP id o14mr469982wra.6.1485919175559; Tue, 31 Jan 2017 19:19:35 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id e196si19561579wma.116.2017.01.31.19.19.35; Tue, 31 Jan 2017 19:19:35 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EF9766898C8; Wed, 1 Feb 2017 05:19:29 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm0-f66.google.com (mail-wm0-f66.google.com [74.125.82.66]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4A806680375 for ; Wed, 1 Feb 2017 05:19:23 +0200 (EET) Received: by mail-wm0-f66.google.com with SMTP id r18so2641918wmd.3 for ; Tue, 31 Jan 2017 19:19:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=pJ/J4eS4TJ4QnipDuDEgso2XDRqH9B9h1gONZDl/lAs=; b=GWTkpsoeC9OrtTTBqcEWwbCcZg+K/HVu+PW3VZGyB7AH3zTutqgWeVR7nyMeyOGtm+ OYUTjkR0+0uHXVtAul7JfCH0cYu1DEAB5uiStIz9SLWQyF6f5ciTeh6sHpdzC0jmBUFe iemXiTvQ6CfF1KjoEgAN86Cv3Vyyc2g6Za003rDt4OqBq74HABcOsin/7vifwpW5h1zY HYmzgY2Cfz0DaQVSjkUhsg9w/yxvbU+o/D8yAB/UM26RjDBPDNTnch+cKvmrS+ShMtgt zdMfz9ISnh7lmQc1tFVrJkzSuugiVl2wenX6g7FBIynsXwA13oqIwC+8TtaDyBzp8gtS yfUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=pJ/J4eS4TJ4QnipDuDEgso2XDRqH9B9h1gONZDl/lAs=; b=JVDWL2yEIo9EhKWRDLxWF4Y+VCXoIUYjl/SJM2JmguapNEzRKjhH17EH5hcPeCgZoN JCNGXwXXeLKL6YvGetudDr8DKdl8we7Y27EHQL+CeHbNHIEg2XcEay7AsKhpL5X4U2IT 4zoQgNk6sFhZjWASahR9ZJ+5Hi5srYML/ML8TDIjjnis2NiUGZ/oEqUfMYcfIFfHDzrW vd+j+l83CmmQ8CRVSUGfx9SlaJ0kSwUYqRKnpXcC3JzcbMpAV5vhCoi1u9uTFAWdNZWb ti7+5+jGfG94iHIRGCeL2Rtzefp5GW+jCsLCv2L8yCOuLEJpAPR0e2Fvk42laK6GLgSF 2e9Q== X-Gm-Message-State: AIkVDXIY6EIq2jN1b075olxuKwPY7XvtNm2CsCfG2ak4yyeOmdTeXbNxVQ0WgLJ6MoGDFw== X-Received: by 10.223.142.1 with SMTP id n1mr398511wrb.185.1485918796093; Tue, 31 Jan 2017 19:13:16 -0800 (PST) Received: from moonbase.lan (host86-136-238-164.range86-136.btcentralplus.com. [86.136.238.164]) by smtp.gmail.com with ESMTPSA id b8sm31441545wrb.17.2017.01.31.19.13.15 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 31 Jan 2017 19:13:15 -0800 (PST) From: Rostislav Pehlivanov To: ffmpeg-devel@ffmpeg.org Date: Wed, 1 Feb 2017 03:13:06 +0000 Message-Id: <20170201031309.99552-4-atomnuker@gmail.com> X-Mailer: git-send-email 2.11.0.483.g087da7b7c In-Reply-To: <20170201031309.99552-1-atomnuker@gmail.com> References: <20170201031309.99552-1-atomnuker@gmail.com> Subject: [FFmpeg-devel] [PATCH 3/6] imdct15: rename to mdct15 and add a forward transform X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Rostislav Pehlivanov MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The stride parameter does not currently do anything. I can't figure out how to stride since the damn post-rotation happens from the middle and goes both up and down at the same time. Signed-off-by: Rostislav Pehlivanov --- configure | 6 +- libavcodec/Makefile | 2 +- libavcodec/aac.h | 4 +- libavcodec/aacdec.c | 2 +- libavcodec/aacdec_template.c | 4 +- libavcodec/mdct15.c | 351 +++++++++++++++++++++++++++++++++++++++++++ libavcodec/mdct15.h | 70 +++++++++ libavcodec/opus_celt.c | 10 +- 8 files changed, 435 insertions(+), 14 deletions(-) create mode 100644 libavcodec/mdct15.c create mode 100644 libavcodec/mdct15.h diff --git a/configure b/configure index 7154142006..76caeb8277 100755 --- a/configure +++ b/configure @@ -2106,7 +2106,7 @@ CONFIG_EXTRA=" huffyuvencdsp idctdsp iirfilter - imdct15 + mdct15 intrax8 iso_media ividsp @@ -2348,7 +2348,7 @@ vc1dsp_select="h264chroma qpeldsp startcode" rdft_select="fft" # decoders / encoders -aac_decoder_select="imdct15 mdct sinewin" +aac_decoder_select="mdct15 mdct sinewin" aac_fixed_decoder_select="mdct sinewin" aac_encoder_select="audio_frame_queue iirfilter lpc mdct sinewin" aac_latm_decoder_select="aac_decoder aac_latm_parser" @@ -2490,7 +2490,7 @@ nellymoser_encoder_select="audio_frame_queue mdct sinewin" nuv_decoder_select="idctdsp lzo" on2avc_decoder_select="mdct" opus_decoder_deps="swresample" -opus_decoder_select="imdct15" +opus_decoder_select="mdct15" png_decoder_select="zlib" png_encoder_select="llvidencdsp zlib" prores_decoder_select="blockdsp idctdsp" diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 43a6add317..6c37ca513b 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -83,7 +83,7 @@ OBJS-$(CONFIG_HUFFYUVDSP) += huffyuvdsp.o OBJS-$(CONFIG_HUFFYUVENCDSP) += huffyuvencdsp.o OBJS-$(CONFIG_IDCTDSP) += idctdsp.o simple_idct.o jrevdct.o OBJS-$(CONFIG_IIRFILTER) += iirfilter.o -OBJS-$(CONFIG_IMDCT15) += imdct15.o +OBJS-$(CONFIG_MDCT15) += mdct15.o OBJS-$(CONFIG_INTRAX8) += intrax8.o intrax8dsp.o OBJS-$(CONFIG_IVIDSP) += ivi_dsp.o OBJS-$(CONFIG_JNI) += ffjni.o jni.o diff --git a/libavcodec/aac.h b/libavcodec/aac.h index b1f4aa74f0..97a2df6b86 100644 --- a/libavcodec/aac.h +++ b/libavcodec/aac.h @@ -36,7 +36,7 @@ #include "libavutil/fixed_dsp.h" #include "avcodec.h" #if !USE_FIXED -#include "imdct15.h" +#include "mdct15.h" #endif #include "fft.h" #include "mpeg4audio.h" @@ -327,7 +327,7 @@ struct AACContext { #if USE_FIXED AVFixedDSPContext *fdsp; #else - IMDCT15Context *mdct480; + MDCT15Context *mdct480; AVFloatDSPContext *fdsp; #endif /* USE_FIXED */ int random_state; diff --git a/libavcodec/aacdec.c b/libavcodec/aacdec.c index ee9b4eb45f..1a10c121b9 100644 --- a/libavcodec/aacdec.c +++ b/libavcodec/aacdec.c @@ -42,7 +42,7 @@ #include "internal.h" #include "get_bits.h" #include "fft.h" -#include "imdct15.h" +#include "mdct15.h" #include "lpc.h" #include "kbdwin.h" #include "sinewin.h" diff --git a/libavcodec/aacdec_template.c b/libavcodec/aacdec_template.c index 83e9fb55ba..0bfd633336 100644 --- a/libavcodec/aacdec_template.c +++ b/libavcodec/aacdec_template.c @@ -1185,7 +1185,7 @@ static av_cold int aac_decode_init(AVCodecContext *avctx) AAC_RENAME_32(ff_mdct_init)(&ac->mdct_small, 8, 1, 1.0 / RANGE15(128.0)); AAC_RENAME_32(ff_mdct_init)(&ac->mdct_ltp, 11, 0, RANGE15(-2.0)); #if !USE_FIXED - ret = ff_imdct15_init(&ac->mdct480, 5); + ret = ff_mdct15_init(&ac->mdct480, 1, 5, -1.0f); if (ret < 0) return ret; #endif @@ -3192,7 +3192,7 @@ static av_cold int aac_decode_close(AVCodecContext *avctx) ff_mdct_end(&ac->mdct_ld); ff_mdct_end(&ac->mdct_ltp); #if !USE_FIXED - ff_imdct15_uninit(&ac->mdct480); + ff_mdct15_uninit(&ac->mdct480); #endif av_freep(&ac->fdsp); return 0; diff --git a/libavcodec/mdct15.c b/libavcodec/mdct15.c new file mode 100644 index 0000000000..4edbc69dbe --- /dev/null +++ b/libavcodec/mdct15.c @@ -0,0 +1,351 @@ +/* + * Copyright (c) 2013-2014 Mozilla Corporation + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * Celt non-power of 2 iMDCT + */ + +#include +#include +#include + +#include "config.h" + +#include "libavutil/attributes.h" +#include "libavutil/common.h" + +#include "avfft.h" +#include "mdct15.h" + +// complex c = a * b +#define CMUL3(cre, cim, are, aim, bre, bim) \ +do { \ + cre = are * bre - aim * bim; \ + cim = are * bim + aim * bre; \ +} while (0) + +#define CMUL(c, a, b) CMUL3((c).re, (c).im, (a).re, (a).im, (b).re, (b).im) + +av_cold void ff_mdct15_uninit(MDCT15Context **ps) +{ + MDCT15Context *s = *ps; + + if (!s) + return; + + ff_fft_end(&s->ptwo_fft); + + av_freep(&s->pfa_prereindex); + av_freep(&s->pfa_postreindex); + av_freep(&s->twiddle_exptab); + av_freep(&s->tmp); + + av_freep(ps); +} + +static void mdct15(MDCT15Context *s, float *dst, const float *src, ptrdiff_t stride); + +static void imdct15_half(MDCT15Context *s, float *dst, const float *src, + ptrdiff_t stride, float scale); + +static inline int init_pfa_reindex_tabs(MDCT15Context *s) +{ + int i, j; + const int b_ptwo = s->ptwo_fft.nbits; /* Bits for the power of two FFTs */ + const int l_ptwo = 1 << b_ptwo; /* Total length for the power of two FFTs */ + const int inv_1 = l_ptwo << ((4 - b_ptwo) & 3); /* (2^b_ptwo)^-1 mod 15 */ + const int inv_2 = 0xeeeeeeef & ((1U << b_ptwo) - 1); /* 15^-1 mod 2^b_ptwo */ + + s->pfa_prereindex = av_malloc(15 * l_ptwo * sizeof(*s->pfa_prereindex)); + if (!s->pfa_prereindex) + return 1; + + s->pfa_postreindex = av_malloc(15 * l_ptwo * sizeof(*s->pfa_postreindex)); + if (!s->pfa_postreindex) + return 1; + + /* Pre/Post-reindex */ + for (i = 0; i < l_ptwo; i++) { + for (j = 0; j < 15; j++) { + const int q_pre = ((l_ptwo * j)/15 + i) >> b_ptwo; + const int q_post = (((j*inv_1)/15) + (i*inv_2)) >> b_ptwo; + const int k_pre = 15*i + ((j - q_pre*15) << b_ptwo); + const int k_post = i*inv_2*15 + j*inv_1 - 15*q_post*l_ptwo; + s->pfa_prereindex[i*15 + j] = k_pre; + s->pfa_postreindex[k_post] = l_ptwo*j + i; + } + } + + return 0; +} + +av_cold int ff_mdct15_init(MDCT15Context **ps, int inverse, int N, double scale) +{ + MDCT15Context *s; + double alpha, theta; + int len2 = 15 * (1 << N); + int len = 2 * len2; + int i; + + /* Tested and verified to work on everything in between */ + if ((N < 2) || (N > 13)) + return AVERROR(EINVAL); + + s = av_mallocz(sizeof(*s)); + if (!s) + return AVERROR(ENOMEM); + + s->fft_n = N - 1; + s->len4 = len2 / 2; + s->len2 = len2; + s->inverse = inverse; + s->mdct = mdct15; + s->imdct_half = imdct15_half; + + if (ff_fft_init(&s->ptwo_fft, N - 1, s->inverse) < 0) + goto fail; + + if (init_pfa_reindex_tabs(s)) + goto fail; + + s->tmp = av_malloc_array(len, 2 * sizeof(*s->tmp)); + if (!s->tmp) + goto fail; + + s->twiddle_exptab = av_malloc_array(s->len4, sizeof(*s->twiddle_exptab)); + if (!s->twiddle_exptab) + goto fail; + + theta = 0.125f + (scale < 0 ? s->len4 : 0); + scale = sqrt(fabs(scale)); + for (i = 0; i < s->len4; i++) { + alpha = 2 * M_PI * (i + theta) / len; + s->twiddle_exptab[i].re = cos(alpha) * scale; + s->twiddle_exptab[i].im = sin(alpha) * scale; + } + + /* 15-point FFT exptab */ + for (i = 0; i < 19; i++) { + if (i < 15) { + double theta = (2.0f * M_PI * i) / 15.0f; + if (!s->inverse) + theta *= -1; + s->exptab[i].re = cos(theta); + s->exptab[i].im = sin(theta); + } else { /* Wrap around to simplify fft15 */ + s->exptab[i] = s->exptab[i - 15]; + } + } + + /* 5-point FFT exptab */ + s->exptab[19].re = cos(2.0f * M_PI / 5.0f); + s->exptab[19].im = sin(2.0f * M_PI / 5.0f); + s->exptab[20].re = cos(1.0f * M_PI / 5.0f); + s->exptab[20].im = sin(1.0f * M_PI / 5.0f); + + /* Invert the phase for an inverse transform, do nothing for a forward transform */ + if (s->inverse) { + s->exptab[19].im *= -1; + s->exptab[20].im *= -1; + } + + *ps = s; + + return 0; + +fail: + ff_mdct15_uninit(&s); + return AVERROR(ENOMEM); +} + +/* Stride is hardcoded to 3 */ +static inline void fft5(const FFTComplex exptab[2], FFTComplex *out, + const FFTComplex *in) +{ + FFTComplex z0[4], t[6]; + + t[0].re = in[3].re + in[12].re; + t[0].im = in[3].im + in[12].im; + t[1].im = in[3].re - in[12].re; + t[1].re = in[3].im - in[12].im; + t[2].re = in[6].re + in[ 9].re; + t[2].im = in[6].im + in[ 9].im; + t[3].im = in[6].re - in[ 9].re; + t[3].re = in[6].im - in[ 9].im; + + out[0].re = in[0].re + in[3].re + in[6].re + in[9].re + in[12].re; + out[0].im = in[0].im + in[3].im + in[6].im + in[9].im + in[12].im; + + t[4].re = exptab[0].re * t[2].re - exptab[1].re * t[0].re; + t[4].im = exptab[0].re * t[2].im - exptab[1].re * t[0].im; + t[0].re = exptab[0].re * t[0].re - exptab[1].re * t[2].re; + t[0].im = exptab[0].re * t[0].im - exptab[1].re * t[2].im; + t[5].re = exptab[0].im * t[3].re - exptab[1].im * t[1].re; + t[5].im = exptab[0].im * t[3].im - exptab[1].im * t[1].im; + t[1].re = exptab[0].im * t[1].re + exptab[1].im * t[3].re; + t[1].im = exptab[0].im * t[1].im + exptab[1].im * t[3].im; + + z0[0].re = t[0].re - t[1].re; + z0[0].im = t[0].im - t[1].im; + z0[1].re = t[4].re + t[5].re; + z0[1].im = t[4].im + t[5].im; + + z0[2].re = t[4].re - t[5].re; + z0[2].im = t[4].im - t[5].im; + z0[3].re = t[0].re + t[1].re; + z0[3].im = t[0].im + t[1].im; + + out[1].re = in[0].re + z0[3].re; + out[1].im = in[0].im + z0[0].im; + out[2].re = in[0].re + z0[2].re; + out[2].im = in[0].im + z0[1].im; + out[3].re = in[0].re + z0[1].re; + out[3].im = in[0].im + z0[2].im; + out[4].re = in[0].re + z0[0].re; + out[4].im = in[0].im + z0[3].im; +} + +static __attribute__ ((noinline)) void fft15(const FFTComplex exptab[22], FFTComplex *out, + const FFTComplex *in, size_t stride) +{ + int k; + FFTComplex tmp1[5], tmp2[5], tmp3[5]; + + fft5(exptab + 19, tmp1, in + 0); + fft5(exptab + 19, tmp2, in + 1); + fft5(exptab + 19, tmp3, in + 2); + + for (k = 0; k < 5; k++) { + FFTComplex t[2]; + + CMUL(t[0], tmp2[k], exptab[k]); + CMUL(t[1], tmp3[k], exptab[2 * k]); + out[stride*k].re = tmp1[k].re + t[0].re + t[1].re; + out[stride*k].im = tmp1[k].im + t[0].im + t[1].im; + + CMUL(t[0], tmp2[k], exptab[k + 5]); + CMUL(t[1], tmp3[k], exptab[2 * (k + 5)]); + out[stride*(k + 5)].re = tmp1[k].re + t[0].re + t[1].re; + out[stride*(k + 5)].im = tmp1[k].im + t[0].im + t[1].im; + + CMUL(t[0], tmp2[k], exptab[k + 10]); + CMUL(t[1], tmp3[k], exptab[2 * k + 5]); + out[stride*(k + 10)].re = tmp1[k].re + t[0].re + t[1].re; + out[stride*(k + 10)].im = tmp1[k].im + t[0].im + t[1].im; + } +} + +int count = 0; + +static void mdct15(MDCT15Context *s, float *dst, const float *src, ptrdiff_t stride) +{ + int i, j; + FFTComplex fft15in[15]; + + int len1 = s->len4 << 2; + int len2 = s->len4 << 1; + int len3 = s->len4 * 3; + int len4 = s->len4; + int len8 = s->len4 >> 1; + + int l_ptwo = 1 << s->ptwo_fft.nbits; + + FFTComplex *z = (FFTComplex *)dst; + FFTComplex x[960]; + + /* pre rotation */ + for(i = 0; i < len8; i++) { + float re, im; + + re = -src[2*i+len3] - src[len3-1-2*i]; + im = -src[len4+2*i] + src[len4-1-2*i]; + j = i; + CMUL3(x[j].re, x[j].im, re, im, s->twiddle_exptab[i].re, -s->twiddle_exptab[i].im); + + re = src[2*i] - src[len2-1-2*i]; + im = -src[len2+2*i] - src[len1-1-2*i]; + j = len8 + i; + CMUL3(x[j].re, x[j].im, re, im, s->twiddle_exptab[len8 + i].re, -s->twiddle_exptab[len8 + i].im); + } + + for (i = 0; i < l_ptwo; i++) { + for (j = 0; j < 15; j++) { + const int k = s->pfa_prereindex[i*15 + j]; + fft15in[j] = x[k]; + } + fft15(s->exptab, s->tmp + s->ptwo_fft.revtab[i], fft15in, l_ptwo); + } + + /* Then a 15xN FFT (where N is a power of two) */ + for (i = 0; i < 15; i++) + s->ptwo_fft.fft_calc(&s->ptwo_fft, s->tmp + l_ptwo*i); + + /* Reindex again, apply twiddles and output */ + for (i = 0; i < len8; i++) { + float re0, im0, re1, im1; + const int i0 = len8 + i, i1 = len8 - i - 1; + const int s0 = s->pfa_postreindex[i0], s1 = s->pfa_postreindex[i1]; + + CMUL3(im1, re0, s->tmp[s1].re, s->tmp[s1].im, s->twiddle_exptab[i1].im, s->twiddle_exptab[i1].re); + CMUL3(im0, re1, s->tmp[s0].re, s->tmp[s0].im, s->twiddle_exptab[i0].im, s->twiddle_exptab[i0].re); + z[i1].re = re0; + z[i1].im = im0; + z[i0].re = re1; + z[i0].im = im1; + } +} + +static void imdct15_half(MDCT15Context *s, float *dst, const float *src, + ptrdiff_t stride, float scale) +{ + FFTComplex fft15in[15]; + FFTComplex *z = (FFTComplex *)dst; + int i, j, len8 = s->len4 >> 1, l_ptwo = 1 << s->ptwo_fft.nbits; + const float *in1 = src, *in2 = src + (s->len2 - 1) * stride; + + /* Reindex input, putting it into a buffer and doing an Nx15 FFT */ + for (i = 0; i < l_ptwo; i++) { + for (j = 0; j < 15; j++) { + const int k = s->pfa_prereindex[i*15 + j]; + FFTComplex tmp = { *(in2 - 2*k*stride), *(in1 + 2*k*stride) }; + CMUL(fft15in[j], tmp, s->twiddle_exptab[k]); + } + fft15(s->exptab, s->tmp + s->ptwo_fft.revtab[i], fft15in, l_ptwo); + } + + /* Then a 15xN FFT (where N is a power of two) */ + for (i = 0; i < 15; i++) + s->ptwo_fft.fft_calc(&s->ptwo_fft, s->tmp + l_ptwo*i); + + /* Reindex again, apply twiddles and output */ + for (i = 0; i < len8; i++) { + float re0, im0, re1, im1; + const int i0 = len8 + i, i1 = len8 - i - 1; + const int s0 = s->pfa_postreindex[i0], s1 = s->pfa_postreindex[i1]; + + CMUL3(re0, im1, s->tmp[s1].im, s->tmp[s1].re, s->twiddle_exptab[i1].im, s->twiddle_exptab[i1].re); + CMUL3(re1, im0, s->tmp[s0].im, s->tmp[s0].re, s->twiddle_exptab[i0].im, s->twiddle_exptab[i0].re); + z[i1].re = scale * re0; + z[i1].im = scale * im0; + z[i0].re = scale * re1; + z[i0].im = scale * im1; + } +} diff --git a/libavcodec/mdct15.h b/libavcodec/mdct15.h new file mode 100644 index 0000000000..c905f7a01e --- /dev/null +++ b/libavcodec/mdct15.h @@ -0,0 +1,70 @@ +/* + * Copyright (c) 2016 Rostislav Pehlivanov + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVCODEC_MDCT15_H +#define AVCODEC_MDCT15_H + +#include + +#include "fft.h" + +typedef struct MDCT15Context { + int fft_n; + int len2; + int len4; + int inverse; + int *pfa_prereindex; + int *pfa_postreindex; + + FFTContext ptwo_fft; + + FFTComplex *tmp; + + FFTComplex *twiddle_exptab; + + /* 0 - 18: fft15 twiddles, 19 - 20: fft5 twiddles */ + FFTComplex exptab[21]; + + /** + * Calculate a full 2N -> N MDCT + */ + void (*mdct)(struct MDCT15Context *s, float *dst, const float *src, ptrdiff_t stride); + + /** + * Calculate the middle half of the iMDCT + */ + void (*imdct_half)(struct MDCT15Context *s, float *dst, const float *src, + ptrdiff_t src_stride, float scale); +} MDCT15Context; + +/** + * Init an (i)MDCT of the length 2 * 15 * (2^N) + */ +int ff_mdct15_init(MDCT15Context **ps, int inverse, int N, double scale); + +/** + * Frees a context + */ +void ff_mdct15_uninit(MDCT15Context **ps); + + +void ff_mdct15_init_aarch64(MDCT15Context *s); + +#endif /* AVCODEC_MDCT15_H */ diff --git a/libavcodec/opus_celt.c b/libavcodec/opus_celt.c index 96fedb7a49..a0f018e664 100644 --- a/libavcodec/opus_celt.c +++ b/libavcodec/opus_celt.c @@ -29,7 +29,7 @@ #include "libavutil/float_dsp.h" #include "libavutil/libm.h" -#include "imdct15.h" +#include "mdct15.h" #include "opus.h" #include "opustab.h" @@ -63,7 +63,7 @@ typedef struct CeltFrame { struct CeltContext { // constant values that do not change during context lifetime AVCodecContext *avctx; - IMDCT15Context *imdct[4]; + MDCT15Context *imdct[4]; AVFloatDSPContext *dsp; int output_channels; @@ -1596,7 +1596,7 @@ int ff_celt_decode_frame(CeltContext *s, OpusRangeCoder *rc, int silence = 0; int transient = 0; int anticollapse = 0; - IMDCT15Context *imdct; + MDCT15Context *imdct; float imdct_scale = 1.0; if (coded_channels != 1 && coded_channels != 2) { @@ -1792,7 +1792,7 @@ void ff_celt_free(CeltContext **ps) return; for (i = 0; i < FF_ARRAY_ELEMS(s->imdct); i++) - ff_imdct15_uninit(&s->imdct[i]); + ff_mdct15_uninit(&s->imdct[i]); av_freep(&s->dsp); av_freep(ps); @@ -1817,7 +1817,7 @@ int ff_celt_init(AVCodecContext *avctx, CeltContext **ps, int output_channels) s->output_channels = output_channels; for (i = 0; i < FF_ARRAY_ELEMS(s->imdct); i++) { - ret = ff_imdct15_init(&s->imdct[i], i + 3); + ret = ff_mdct15_init(&s->imdct[i], 1, i + 3, -1.0f); if (ret < 0) goto fail; }