From patchwork Wed Jul 12 22:18:06 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tyler Jones X-Patchwork-Id: 4312 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.1.76 with SMTP id 73csp1392549vsb; Wed, 12 Jul 2017 15:18:21 -0700 (PDT) X-Received: by 10.223.162.156 with SMTP id s28mr323496wra.2.1499897901448; Wed, 12 Jul 2017 15:18:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1499897901; cv=none; d=google.com; s=arc-20160816; b=htMZvun5/AX0JC5JqEz6FjBfR6b2rKRNm2fD4WtYmc+kzD9YmFk9dGBugfBVTOgawm otmOpVmEEZaNubEQOMEeaehvqxis3vorRGyRIhqf9EKK4mRNRqIzwNdB+SplF1mWW0jG 1caAlLd1adhUr3l9TRZTcr2I6drNWIVwtuMmfT4kO/IJ3LhAqzjJb1eqm5WyYdp/WgjM bUi43cbjOlOi6bPbu6sV9OStz6Y/wKtekI+Jgmudt+rg6FFDiGMluJWzw8bLSTR53pQo w0bFCuikKayfYkHnBHfCOOC8yYCS6BvvvHhmmXbM97NrWjUg8XJfPACeVro59f5N1jv1 zyIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:user-agent :mime-version:message-id:to:from:date:dkim-signature:delivered-to :arc-authentication-results; bh=3OOfOU1DjbrBee8ikV2QjSilEIGDc0VAhRSnhXw5E8U=; b=wQpMhsMpoHGYjH565gAxDZNugHoxh3pzaW2uxFRGfPnV4c4GTXY8YWR5DRg3xst7M1 UaEPM9Sd96chiOWUDbg67xl39t1pkQPQjPqwKZTl2R7Vltbcb401POHF7/Ubr2d3q2XP hUWYGiP/Os/YTrATulE7QdtxCE3i/RuHOcJYOj9gKmIiE+vS64sYMKmTlk4d6hB5homK r/ovVXSv2GYGLWdTZsZCVUgUCQtIPlKfkPELVOcembUfaZ/TBrHth8pbyHX9jr3aPwzZ QVivvX1oWKqYzLHDzJgb/Cj2Jv5VQ/H0d3itddKZG4G+9oU8JR0W43IMeDM9h6XX9qkd kQkg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.b=XDJt2h6Y; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id m67si3427515wmc.145.2017.07.12.15.18.20; Wed, 12 Jul 2017 15:18:21 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.b=XDJt2h6Y; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9EB6A689A3E; Thu, 13 Jul 2017 01:18:12 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-it0-f51.google.com (mail-it0-f51.google.com [209.85.214.51]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 70F8668972D for ; Thu, 13 Jul 2017 01:18:06 +0300 (EEST) Received: by mail-it0-f51.google.com with SMTP id m68so23120528ith.1 for ; Wed, 12 Jul 2017 15:18:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:subject:message-id:mime-version:content-disposition :user-agent; bh=+jqeBqYYvOJEjfFwKH2GCpsBF/XbAwjuJmHwY87shiE=; b=XDJt2h6Yw2i7XZChX1jE3994mBq8un23Mc9VmnZE6Eqzwp5nwydumatDBWKY2ruTHT zxaM6yAj3vE2y4LFkb/nLzVpMlRLhLig81BMky+c+ExCMlJU1RgITZUTDvNT6YfbXOZO b8Rie3ubaFW6FdCRnmFnQcfDGHu6EMMzBtv2ze/CKshFnGTcurtAi7zChecVNIzdR8ny 3IdKHBvz7xnURL6XZmzGKHmzdP2jZb3jGr7CfxFSzJJUUU++SHzt81wLBGyAIln4RWou fnx2mTLX+1qSGMlcjZKirsX2V8g4YjRzDwckRMF30ISqkT6kDtVIvFtDsp1hvD7BocSj cBcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:user-agent; bh=+jqeBqYYvOJEjfFwKH2GCpsBF/XbAwjuJmHwY87shiE=; b=VZrIWI6HnYmwIaTI1Vez4ahOHtUTzo4ROb+LrjD9bNlqtn/+fJU6f1NKwRenhO0VLO L8AWYSg7Fem3x/pfpkxWoTIt9d+Lq3n+4l2aUk+v27Q9hMGjfzyLkT3cBdabrz+qaieP I0rnCtX69+q9CXqhSuLFw7yXIp/kpiN94WSK0/cX08wu11Vuo/QyPg1BScSXLkrflxM8 93ZKiZnaSShL3tBBs4Zkng879OZhNnXP1SFZgBSQ85Lis7AnieWP4dQooXbcyZYKnh0o rCKA5b0FeMR3xtiskYQIbKkXRMI2ANcJoohTEh1Wi6qmywpVxPG7l2oXUscdEZPKNS1m 4OmA== X-Gm-Message-State: AIVw111jNlGjExAICCUnPS2+NdxBx2of7T9ugola2bIUZRPyEij2MndQ xqH9yUqbqVwENLio X-Received: by 10.36.115.19 with SMTP id y19mr25238051itb.109.1499897889582; Wed, 12 Jul 2017 15:18:09 -0700 (PDT) Received: from tdjones879 ([72.166.237.56]) by smtp.gmail.com with ESMTPSA id n65sm1861235itg.23.2017.07.12.15.18.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 12 Jul 2017 15:18:08 -0700 (PDT) Date: Wed, 12 Jul 2017 16:18:06 -0600 From: Tyler Jones To: ffmpeg-devel@ffmpeg.org Message-ID: <20170712221806.GA13505@tdjones879> MIME-Version: 1.0 User-Agent: Mutt/1.5.24 (2015-08-30) Subject: [FFmpeg-devel] [PATCH 1/2] avcodec/vorbisenc: Add pre-echo detection X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The encoder will attempt to determine the existence of transient signals by applying a 4th order highpass filter to remove dominant low frequency waveforms. Frames are then split up into blocks where the variance is calculated and compared with blocks from the previous frame. A preecho is only likely to be noticeable when relatively quiet audio is followed by a loud transient signal. Signed-off-by: Tyler Jones --- libavcodec/Makefile | 2 +- libavcodec/vorbisenc.c | 28 +++++++-- libavcodec/vorbispsy.c | 153 +++++++++++++++++++++++++++++++++++++++++++++++++ libavcodec/vorbispsy.h | 79 +++++++++++++++++++++++++ 4 files changed, 256 insertions(+), 6 deletions(-) create mode 100644 libavcodec/vorbispsy.c create mode 100644 libavcodec/vorbispsy.h diff --git a/libavcodec/Makefile b/libavcodec/Makefile index b440a00..2db6727 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -611,7 +611,7 @@ OBJS-$(CONFIG_VMNC_DECODER) += vmnc.o OBJS-$(CONFIG_VORBIS_DECODER) += vorbisdec.o vorbisdsp.o vorbis.o \ vorbis_data.o OBJS-$(CONFIG_VORBIS_ENCODER) += vorbisenc.o vorbis.o \ - vorbis_data.o + vorbis_data.o vorbispsy.o OBJS-$(CONFIG_VP3_DECODER) += vp3.o OBJS-$(CONFIG_VP5_DECODER) += vp5.o vp56.o vp56data.o vp56rac.o OBJS-$(CONFIG_VP6_DECODER) += vp6.o vp56.o vp56data.o \ diff --git a/libavcodec/vorbisenc.c b/libavcodec/vorbisenc.c index bf21a3b..3482cf0 100644 --- a/libavcodec/vorbisenc.c +++ b/libavcodec/vorbisenc.c @@ -33,6 +33,7 @@ #include "mathops.h" #include "vorbis.h" #include "vorbis_enc_data.h" +#include "vorbispsy.h" #include "audio_frame_queue.h" #include "libavfilter/bufferqueue.h" @@ -136,6 +137,7 @@ typedef struct vorbis_enc_context { int64_t next_pts; AVFloatDSPContext *fdsp; + VorbisPsyContext *vpctx; } vorbis_enc_context; #define MAX_CHANNELS 2 @@ -272,11 +274,12 @@ static int create_vorbis_context(vorbis_enc_context *venc, vorbis_enc_floor *fc; vorbis_enc_residue *rc; vorbis_enc_mapping *mc; - int i, book, ret; + int i, book, ret, blocks; venc->channels = avctx->channels; venc->sample_rate = avctx->sample_rate; - venc->log2_blocksize[0] = venc->log2_blocksize[1] = 11; + venc->log2_blocksize[0] = 8; + venc->log2_blocksize[1] = 11; venc->ncodebooks = FF_ARRAY_ELEMS(cvectors); venc->codebooks = av_malloc(sizeof(vorbis_enc_codebook) * venc->ncodebooks); @@ -464,6 +467,12 @@ static int create_vorbis_context(vorbis_enc_context *venc, if ((ret = dsp_init(avctx, venc)) < 0) return ret; + blocks = 1 << (venc->log2_blocksize[1] - venc->log2_blocksize[0]); + venc->vpctx = av_mallocz(sizeof(VorbisPsyContext)); + if (!venc->vpctx || (ret = psy_vorbis_init(venc->vpctx, venc->sample_rate, + venc->channels, blocks)) < 0) + return AVERROR(ENOMEM); + return 0; } @@ -1071,22 +1080,23 @@ static void move_audio(vorbis_enc_context *venc, int sf_size) float *save = venc->saved + ch * frame_size; const float *input = (float *) cur->extended_data[ch]; const size_t len = cur->nb_samples * sizeof(float); - memcpy(offset + sf*sf_size, input, len); memcpy(save + sf*sf_size, input, len); // Move samples for next frame } av_frame_free(&cur); } venc->have_saved = 1; - memcpy(venc->scratch, venc->samples, 2 * venc->channels * frame_size); + memcpy(venc->scratch, venc->samples, sizeof(float) * venc->channels * 2 * frame_size); } static int vorbis_encode_frame(AVCodecContext *avctx, AVPacket *avpkt, const AVFrame *frame, int *got_packet_ptr) { vorbis_enc_context *venc = avctx->priv_data; - int i, ret, need_more; + int i, ret, need_more, ch; + int curr_win = 1; int frame_size = 1 << (venc->log2_blocksize[1] - 1); + int block_size = 1 << (venc->log2_blocksize[0] - 1); vorbis_enc_mode *mode; vorbis_enc_mapping *mapping; PutBitContext pb; @@ -1121,6 +1131,13 @@ static int vorbis_encode_frame(AVCodecContext *avctx, AVPacket *avpkt, move_audio(venc, avctx->frame_size); + for (ch = 0; ch < venc->channels; ch++) { + float *scratch = venc->scratch + 2 * ch * frame_size + frame_size; + + if (!psy_vorbis_block_frame(venc->vpctx, scratch, ch, frame_size, block_size)) + curr_win = 0; + } + if (!apply_window_and_mdct(venc)) return 0; @@ -1252,6 +1269,7 @@ static av_cold int vorbis_encode_close(AVCodecContext *avctx) ff_mdct_end(&venc->mdct[1]); ff_af_queue_close(&venc->afq); ff_bufqueue_discard_all(&venc->bufqueue); + psy_vorbis_close(venc->vpctx); av_freep(&avctx->extradata); diff --git a/libavcodec/vorbispsy.c b/libavcodec/vorbispsy.c new file mode 100644 index 0000000..a200ecc --- /dev/null +++ b/libavcodec/vorbispsy.c @@ -0,0 +1,153 @@ +/* + * Vorbis encoder psychoacoustic model + * Copyright (C) 2017 Tyler Jones + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include + +#include "avcodec.h" +#include "libavutil/attributes.h" +#include "vorbispsy.h" + +/** + * Generate the coefficients for a highpass biquad filter + * + * @param filter Instance of biquad filter to be initialized + * @param Fs Input's sampling frequency + * @param Fc Critical frequency for samples to be passed + * @param Q Quality factor + */ +static av_cold void biquad_filter_init(IIRFilter *filter, int Fs, int Fc, float Q) +{ + float k = tan(M_PI * Fc / Fs); + float normalize = 1 / (1 + k / Q + k * k); + + filter->b[0] = normalize; + filter->b[1] = -2 * normalize; + filter->b[2] = normalize; + + filter->a[0] = 1; + filter->a[1] = 2 * (k * k - 1) * normalize; + filter->a[2] = (1 - k / Q + k * k) * normalize; +} + +/** + * Direct Form II implementation for a second order digital filter + * + * @param filter Filter to be applied to input samples + * @param in Single input sample to be filtered + * @param delay Array of IIR feedback values + * @return Filtered sample + */ +static float apply_filter(IIRFilter *filter, float in, float *delay) +{ + float ret, w; + + w = filter->a[0] * in - filter->a[1] * delay[0] - filter->a[2] * delay[1]; + ret = filter->b[0] * w + filter->b[1] * delay[0] + filter->b[2] * delay[1]; + + delay[1] = delay[0]; + delay[0] = w; + + return ret; +} + +/** + * Calculate the variance of a block of samples + * + * @param in Array of input samples + * @param length Number of input samples being analyzed + * @return The variance for the current block + */ +static float variance(const float *in, int length) +{ + int i; + float mean = 0.0f, square_sum = 0.0f; + + for (i = 0; i < length; i++) { + mean += in[i]; + square_sum += in[i] * in[i]; + } + + mean /= length; + return (square_sum - length * mean * mean) / (length - 1); +} + +av_cold int psy_vorbis_init(VorbisPsyContext *vpctx, int sample_rate, + int channels, int blocks) +{ + int crit_freq; + float Q[2] = {.54, 1.31}; // Quality values for maximally flat cascaded filters + + vpctx->filter_delay = av_mallocz_array(4 * channels, sizeof(vpctx->filter_delay[0])); + if (!vpctx->filter_delay) + return AVERROR(ENOMEM); + + crit_freq = sample_rate / 4; + biquad_filter_init(&vpctx->filter[0], sample_rate, crit_freq, Q[0]); + biquad_filter_init(&vpctx->filter[1], sample_rate, crit_freq, Q[1]); + + vpctx->variance = av_mallocz_array(channels * blocks, sizeof(vpctx->variance[0])); + if (!vpctx->variance) + return AVERROR(ENOMEM); + + vpctx->preecho_thresh = 100.0f; + + return 0; +} + +int psy_vorbis_block_frame(VorbisPsyContext *vpctx, float *audio, + int ch, int frame_size, int block_size) +{ + int i, block_flag = 1; + int blocks = frame_size / block_size; + float last_var; + const float eps = 1e-4; + float *var = vpctx->variance + ch * blocks; + + for (i = 0; i < frame_size; i++) { + apply_filter(&vpctx->filter[0], audio[i], vpctx->filter_delay + 4 * ch); + apply_filter(&vpctx->filter[1], audio[i], vpctx->filter_delay + 4 * ch + 2); + } + + for (i = 0; i < blocks; i++) { + last_var = var[i]; + var[i] = variance(audio + i * block_size, block_size); + + /* A small constant is added to the threshold in order to prevent false + * transients from being detected when quiet sounds follow near-silence */ + if (var[i] > vpctx->preecho_thresh * last_var + eps) + block_flag = 0; + } + + return block_flag; +} + +av_cold void psy_vorbis_close(VorbisPsyContext *vpctx) +{ + if (vpctx) { + if (vpctx->filter_delay) + av_freep(&vpctx->filter_delay); + + if (vpctx->variance) + av_freep(&vpctx->variance); + + av_freep(&vpctx); + } +} diff --git a/libavcodec/vorbispsy.h b/libavcodec/vorbispsy.h new file mode 100644 index 0000000..dad4b3b --- /dev/null +++ b/libavcodec/vorbispsy.h @@ -0,0 +1,79 @@ +/* + * Vorbis encoder psychoacoustic model + * Copyright (C) 2017 Tyler Jones + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * Vorbis psychoacoustic model + */ + +#ifndef AVCODEC_VORBISPSY_H +#define AVCODEC_VORBISPSY_H + +#include "libavutil/attributes.h" + +/** + * Second order IIR Filter + */ +typedef struct IIRFilter { + float b[3]; ///< Normalized cofficients for numerator of transfer function + float a[3]; ///< Normalized coefficiets for denominator of transfer function +} IIRFilter; + +typedef struct VorbisPsyContext { + IIRFilter filter[2]; + float *filter_delay; ///< Direct Form II delay registers for each channel + float *variance; ///< Saved variances from previous sub-blocks for each channel + float preecho_thresh; ///< Threshold for determining prescence of a preecho +} VorbisPsyContext; + +/** + * Initializes the psychoacoustic model context + * + * @param vpctx Uninitialized pointer to the model context + * @param sample_rate Input audio sample rate + * @param channels Number of channels being analyzed + * @param blocks Number of short blocks for every frame of input + * @return 0 on success, negative on failure + */ +av_cold int psy_vorbis_init(VorbisPsyContext *vpctx, int sample_rate, + int channels, int blocks); + +/** + * Suggest the type of block to use for encoding the current frame + * + * Each frame of input is passed through a highpass filter to remove dominant + * low-frequency waveforms and the variance of each short block of input is + * then calculated. If the variance over this block is significantly more than + * blocks from the previous frame, a transient signal is likely present. + * + * @param audio Pointer to the current channel's input samples + * @param ch Current channel being analyzed + * @param frame_size Size of a full frame, i.e. the size of the long block + * @param block_size Size of the short block + * @return The correct blockflag to use for encoding, 0 short and 1 long + */ +int psy_vorbis_block_frame(VorbisPsyContext *vpctx, float *audio, + int ch, int frame_size, int block_size); +/** + * Closes and frees the memory used by the psychoacoustic model + */ +av_cold void psy_vorbis_close(VorbisPsyContext *vpctx); +#endif /* AVCODEC_VORBISPSY_H */