From patchwork Fri Jul 28 15:15:08 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tyler Jones X-Patchwork-Id: 4498 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.1.76 with SMTP id 73csp477317vsb; Fri, 28 Jul 2017 08:15:22 -0700 (PDT) X-Received: by 10.28.2.84 with SMTP id 81mr5721318wmc.1.1501254922018; Fri, 28 Jul 2017 08:15:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1501254921; cv=none; d=google.com; s=arc-20160816; b=bPOie3F4Q2biU76zNu9CHVwQU+doYNZMTAMid3u6W2n9SyIFbgTLxFYY2s5hYIaRdd crk+ojmaN36bNGKbNrZOnHJWHaq0/oa1TDuNjFR22XWqY1fTGJHf/ygnP5pqLg75BwMy iZkxOJKh455fKTOIXwe0E6gEM0QBlMiV2ZqfmElCBih3XKwc5mCbmSEHlBboxgT3lDmR KcMDP0b4PFDgvlsO9CuaqNuS0jxqU8v5CnkNEzeUIuFw9CASAhMUo8uJkQfeHcuKXEyX uyFT3U3hlU5qTj0SCw83AKnEFhCFLoUo42XMrDniU9x2sNUKO2OPN3ycKQxRGyrVdsdb i1Cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:user-agent :mime-version:message-id:to:from:date:dkim-signature:delivered-to :arc-authentication-results; bh=fdMF7rHrUxPYleRWgt8BN95HGPa+JwwKstrF4hcvmmY=; b=PSTtMVa+T0h0NDrl8w8SaT+mN1sMMqbhKtAafDhbVk7RJP/H4eGqe1EdNJVH5S2lfU 2WmEMHbdRExYDaxAbieaMF/WzDj9H5zORaG/wnwka6z2cGpVlqXyPuzQesGcx3GnHdeZ gVh95H+ljj4ELh5eIP44Hku9i4oR44Io91ed/aPFQbdtuWnrDQDTreJG89owwysoYVaD 1ils/3SG/LfKmMWlLtYAvlP3MEiuil1TocBINWu+bViWZBpfF5udaPmTzgkWzgjLcm4N TNSXt0yA7IVOGNrcL+dR8LvCjYKGCXd3WCI1bl9H/dQLwsCgaqF7tUth1NG0nlcHyG0z 3zPw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=uQSsZhdH; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id r21si13057471wmg.259.2017.07.28.08.15.21; Fri, 28 Jul 2017 08:15:21 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=uQSsZhdH; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 74EBF689BCA; Fri, 28 Jul 2017 18:15:17 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-io0-f175.google.com (mail-io0-f175.google.com [209.85.223.175]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4363F689B64 for ; Fri, 28 Jul 2017 18:15:11 +0300 (EEST) Received: by mail-io0-f175.google.com with SMTP id c74so91029763iod.4 for ; Fri, 28 Jul 2017 08:15:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:subject:message-id:mime-version:content-disposition :user-agent; bh=V2lOdiG6B0+PzUXFlJSd3n4vtkNJ6pRyYc8mUBbS/kA=; b=uQSsZhdH1rWsmXKJMVnLD67Fltarxg9oXYN6aTi4aBuqd4ieL5ITsy6WCxhuy8ZNL6 KI9DEIling+sfnaia+A6+qSuu+hfsj+ZeL5D04kw3h8XS28oIv/HRPrKrlhjz/CRc1Fg 4+tWIg3KifdPyWGfBgt+kwu0bTTXpzfHSrNIeuabOLzOy7Nvu6qOmf4BC16ngDUPjPy8 9brLQyqw6P8u1He0wcW++/CO++IZhCsnZuh0LTj8vCLvm2vo7GpEjpnXfwo7weysxdKE k100p5DNR3ubihgtQPS+VrgQaoSaMzd8lav1oFmXfCp6rO7B+ZJb83nkYbna+ZwJ1wpj xmhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:user-agent; bh=V2lOdiG6B0+PzUXFlJSd3n4vtkNJ6pRyYc8mUBbS/kA=; b=ewrIG63LItt6Ct79jXJuCV4Ofuxlf/wfNGkq9V8lL0GIHx63n8Rz7/N70EgVsh5T7N zRmGecy+3dBxvklCPBtnOr9hNr7EsfSkFt41OK5R8M8HHjIDxMlPLgRn3W5ZFT+2ngYr bqie9hiohDmeXDT9V2U+Gm4AyC40NRp6n5paEuDMQ9sWrr+AZZgxcIbJvMRfuA0lLpQW pNnc3EWHhqgk9yVinH+6UEI5pU5deQzSZtLu2NLo/2WgIz8fGGRTA62BLCrlT3p/aicJ rqxmc5n78Gv/DruXS2G23wdCUtBmGD/fCRqjLGxkY+LTaOUN9EerHFX5VGBTTq69S5NV bCrg== X-Gm-Message-State: AIVw110ykkkmC4r9SlYg/9h35N3SoHwjyfABsIFch+jOiLynX95W/gmN 0M+LRGGaixwDOU9K X-Received: by 10.107.132.148 with SMTP id o20mr9030921ioi.259.1501254910520; Fri, 28 Jul 2017 08:15:10 -0700 (PDT) Received: from tdjones.localdomain (host-184-167-177-46.csp-wy.client.bresnan.net. [184.167.177.46]) by smtp.gmail.com with ESMTPSA id c18sm7978216itc.8.2017.07.28.08.15.08 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 28 Jul 2017 08:15:09 -0700 (PDT) Date: Fri, 28 Jul 2017 09:15:08 -0600 From: Tyler Jones To: ffmpeg-devel@ffmpeg.org Message-ID: <20170728151508.GA2481@tdjones.localdomain> MIME-Version: 1.0 User-Agent: Mutt/1.8.3 (2017-05-23) Subject: [FFmpeg-devel] [PATCH V4 1/2] avcodec/vorbisenc: Add pre-echo detection X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The encoder will attempt to determine the existence of transient signals by applying a 4th order highpass filter to remove dominant low frequency waveforms. Frames are then split up into blocks where the variance is calculated and compared with blocks from the previous frame. A preecho is only likely to be noticeable when relatively quiet audio is followed by a loud transient signal. Signed-off-by: Tyler Jones --- V4: Use AVFloatDSPContext for variance calculation Correctly change quality factors to const Remove unnecessary malloc and free for VorbisPsyContext V3: Use normal float notation Don't check before freeing NULL pointers Remove unnecessary includes V2: Provide proper prefix for non-static function libavcodec/Makefile | 2 +- libavcodec/vorbisenc.c | 27 +++++++-- libavcodec/vorbispsy.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++++ libavcodec/vorbispsy.h | 82 +++++++++++++++++++++++++++ 4 files changed, 253 insertions(+), 5 deletions(-) create mode 100644 libavcodec/vorbispsy.c create mode 100644 libavcodec/vorbispsy.h diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 357fa1a361..08acbc723e 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -611,7 +611,7 @@ OBJS-$(CONFIG_VMNC_DECODER) += vmnc.o OBJS-$(CONFIG_VORBIS_DECODER) += vorbisdec.o vorbisdsp.o vorbis.o \ vorbis_data.o OBJS-$(CONFIG_VORBIS_ENCODER) += vorbisenc.o vorbis.o \ - vorbis_data.o + vorbis_data.o vorbispsy.o OBJS-$(CONFIG_VP3_DECODER) += vp3.o OBJS-$(CONFIG_VP5_DECODER) += vp5.o vp56.o vp56data.o vp56rac.o OBJS-$(CONFIG_VP6_DECODER) += vp6.o vp56.o vp56data.o \ diff --git a/libavcodec/vorbisenc.c b/libavcodec/vorbisenc.c index bf21a3b1ff..6da5f012c2 100644 --- a/libavcodec/vorbisenc.c +++ b/libavcodec/vorbisenc.c @@ -33,6 +33,7 @@ #include "mathops.h" #include "vorbis.h" #include "vorbis_enc_data.h" +#include "vorbispsy.h" #include "audio_frame_queue.h" #include "libavfilter/bufferqueue.h" @@ -136,6 +137,7 @@ typedef struct vorbis_enc_context { int64_t next_pts; AVFloatDSPContext *fdsp; + VorbisPsyContext vpctx; } vorbis_enc_context; #define MAX_CHANNELS 2 @@ -272,11 +274,12 @@ static int create_vorbis_context(vorbis_enc_context *venc, vorbis_enc_floor *fc; vorbis_enc_residue *rc; vorbis_enc_mapping *mc; - int i, book, ret; + int i, book, ret, blocks; venc->channels = avctx->channels; venc->sample_rate = avctx->sample_rate; - venc->log2_blocksize[0] = venc->log2_blocksize[1] = 11; + venc->log2_blocksize[0] = 8; + venc->log2_blocksize[1] = 11; venc->ncodebooks = FF_ARRAY_ELEMS(cvectors); venc->codebooks = av_malloc(sizeof(vorbis_enc_codebook) * venc->ncodebooks); @@ -464,6 +467,11 @@ static int create_vorbis_context(vorbis_enc_context *venc, if ((ret = dsp_init(avctx, venc)) < 0) return ret; + blocks = 1 << (venc->log2_blocksize[1] - venc->log2_blocksize[0]); + if ((ret = ff_psy_vorbis_init(&venc->vpctx, venc->sample_rate, + venc->channels, blocks, venc->fdsp)) < 0) + return ret; + return 0; } @@ -1078,15 +1086,17 @@ static void move_audio(vorbis_enc_context *venc, int sf_size) av_frame_free(&cur); } venc->have_saved = 1; - memcpy(venc->scratch, venc->samples, 2 * venc->channels * frame_size); + memcpy(venc->scratch, venc->samples, sizeof(float) * venc->channels * 2 * frame_size); } static int vorbis_encode_frame(AVCodecContext *avctx, AVPacket *avpkt, const AVFrame *frame, int *got_packet_ptr) { vorbis_enc_context *venc = avctx->priv_data; - int i, ret, need_more; + int i, ret, need_more, ch; + int curr_win = 1; int frame_size = 1 << (venc->log2_blocksize[1] - 1); + int block_size = 1 << (venc->log2_blocksize[0] - 1); vorbis_enc_mode *mode; vorbis_enc_mapping *mapping; PutBitContext pb; @@ -1121,6 +1131,14 @@ static int vorbis_encode_frame(AVCodecContext *avctx, AVPacket *avpkt, move_audio(venc, avctx->frame_size); + for (ch = 0; ch < venc->channels; ch++) { + float *scratch = venc->scratch + 2 * ch * frame_size + frame_size; + + if (!ff_psy_vorbis_block_frame(&venc->vpctx, scratch, ch, + frame_size, block_size)) + curr_win = 0; + } + if (!apply_window_and_mdct(venc)) return 0; @@ -1252,6 +1270,7 @@ static av_cold int vorbis_encode_close(AVCodecContext *avctx) ff_mdct_end(&venc->mdct[1]); ff_af_queue_close(&venc->afq); ff_bufqueue_discard_all(&venc->bufqueue); + ff_psy_vorbis_close(&venc->vpctx); av_freep(&avctx->extradata); diff --git a/libavcodec/vorbispsy.c b/libavcodec/vorbispsy.c new file mode 100644 index 0000000000..ab2d41f62f --- /dev/null +++ b/libavcodec/vorbispsy.c @@ -0,0 +1,147 @@ +/* + * Vorbis encoder psychoacoustic model + * Copyright (C) 2017 Tyler Jones + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/ffmath.h" + +#include "avcodec.h" +#include "vorbispsy.h" + +/** + * Generate the coefficients for a highpass biquad filter + * + * @param filter Instance of biquad filter to be initialized + * @param Fs Input's sampling frequency + * @param Fc Critical frequency for samples to be passed + * @param Q Quality factor + */ +static av_cold void biquad_filter_init(IIRFilter *filter, int Fs, int Fc, float Q) +{ + float k = tan(M_PI * Fc / Fs); + float normalize = 1 / (1 + k / Q + k * k); + + filter->b[0] = normalize; + filter->b[1] = -2 * normalize; + filter->b[2] = normalize; + + filter->a[0] = 1; + filter->a[1] = 2 * (k * k - 1) * normalize; + filter->a[2] = (1 - k / Q + k * k) * normalize; +} + +/** + * Direct Form II implementation for a second order digital filter + * + * @param filter Filter to be applied to input samples + * @param in Single input sample to be filtered + * @param delay Array of IIR feedback values + * @return Filtered sample + */ +static float apply_filter(IIRFilter *filter, float in, float *delay) +{ + float ret, w; + + w = filter->a[0] * in - filter->a[1] * delay[0] - filter->a[2] * delay[1]; + ret = filter->b[0] * w + filter->b[1] * delay[0] + filter->b[2] * delay[1]; + + delay[1] = delay[0]; + delay[0] = w; + + return ret; +} + +/** + * Calculate the variance of a block of samples + * + * @param in Array of input samples + * @param length Number of input samples being analyzed + * @return The variance for the current block + */ +static float variance(const float *in, int length, AVFloatDSPContext *fdsp) +{ + int i; + float mean = 0.0f, square_sum = 0.0f; + + for (i = 0; i < length; i++) { + mean += in[i]; + } + + square_sum = fdsp->scalarproduct_float(in, in, length); + + mean /= length; + return (square_sum - length * mean * mean) / (length - 1); +} + +av_cold int ff_psy_vorbis_init(VorbisPsyContext *vpctx, int sample_rate, + int channels, int blocks, AVFloatDSPContext *fdsp) +{ + int crit_freq; + const float Q[2] = {.54, 1.31}; // Quality values for maximally flat cascaded filters + + vpctx->filter_delay = av_mallocz_array(4 * channels, sizeof(vpctx->filter_delay[0])); + if (!vpctx->filter_delay) + return AVERROR(ENOMEM); + + crit_freq = sample_rate / 4; + biquad_filter_init(&vpctx->filter[0], sample_rate, crit_freq, Q[0]); + biquad_filter_init(&vpctx->filter[1], sample_rate, crit_freq, Q[1]); + + vpctx->variance = av_mallocz_array(channels * blocks, sizeof(vpctx->variance[0])); + if (!vpctx->variance) + return AVERROR(ENOMEM); + + vpctx->preecho_thresh = 100.0f; + vpctx->fdsp = fdsp; + + return 0; +} + +int ff_psy_vorbis_block_frame(VorbisPsyContext *vpctx, float *audio, + int ch, int frame_size, int block_size) +{ + int i, block_flag = 1; + int blocks = frame_size / block_size; + float last_var; + const float eps = 0.0001f; + float *var = vpctx->variance + ch * blocks; + + for (i = 0; i < frame_size; i++) { + apply_filter(&vpctx->filter[0], audio[i], vpctx->filter_delay + 4 * ch); + apply_filter(&vpctx->filter[1], audio[i], vpctx->filter_delay + 4 * ch + 2); + } + + for (i = 0; i < blocks; i++) { + last_var = var[i]; + var[i] = variance(audio + i * block_size, block_size, vpctx->fdsp); + + /* A small constant is added to the threshold in order to prevent false + * transients from being detected when quiet sounds follow near-silence */ + if (var[i] > vpctx->preecho_thresh * last_var + eps) + block_flag = 0; + } + + return block_flag; +} + +av_cold void ff_psy_vorbis_close(VorbisPsyContext *vpctx) +{ + av_freep(&vpctx->filter_delay); + av_freep(&vpctx->variance); +} diff --git a/libavcodec/vorbispsy.h b/libavcodec/vorbispsy.h new file mode 100644 index 0000000000..93a03fd8ca --- /dev/null +++ b/libavcodec/vorbispsy.h @@ -0,0 +1,82 @@ +/* + * Vorbis encoder psychoacoustic model + * Copyright (C) 2017 Tyler Jones + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * Vorbis psychoacoustic model + */ + +#ifndef AVCODEC_VORBISPSY_H +#define AVCODEC_VORBISPSY_H + +#include "libavutil/attributes.h" +#include "libavutil/float_dsp.h" + +/** + * Second order IIR Filter + */ +typedef struct IIRFilter { + float b[3]; ///< Normalized cofficients for numerator of transfer function + float a[3]; ///< Normalized coefficiets for denominator of transfer function +} IIRFilter; + +typedef struct VorbisPsyContext { + AVFloatDSPContext *fdsp; + IIRFilter filter[2]; + float *filter_delay; ///< Direct Form II delay registers for each channel + float *variance; ///< Saved variances from previous sub-blocks for each channel + float preecho_thresh; ///< Threshold for determining prescence of a preecho +} VorbisPsyContext; + +/** + * Initializes the psychoacoustic model context + * + * @param vpctx Uninitialized pointer to the model context + * @param sample_rate Input audio sample rate + * @param channels Number of channels being analyzed + * @param blocks Number of short blocks for every frame of input + * @param fdsp Parent context's AVFloatDSPContext + * @return 0 on success, negative on failure + */ +av_cold int ff_psy_vorbis_init(VorbisPsyContext *vpctx, int sample_rate, + int channels, int blocks, AVFloatDSPContext *fdsp); + +/** + * Suggest the type of block to use for encoding the current frame + * + * Each frame of input is passed through a highpass filter to remove dominant + * low-frequency waveforms and the variance of each short block of input is + * then calculated. If the variance over this block is significantly more than + * blocks from the previous frame, a transient signal is likely present. + * + * @param audio Pointer to the current channel's input samples + * @param ch Current channel being analyzed + * @param frame_size Size of a full frame, i.e. the size of the long block + * @param block_size Size of the short block + * @return The correct blockflag to use for encoding, 0 short and 1 long + */ +int ff_psy_vorbis_block_frame(VorbisPsyContext *vpctx, float *audio, + int ch, int frame_size, int block_size); +/** + * Closes and frees the memory used by the psychoacoustic model + */ +av_cold void ff_psy_vorbis_close(VorbisPsyContext *vpctx); +#endif /* AVCODEC_VORBISPSY_H */