From patchwork Mon Mar 27 02:58:33 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tyler Jones X-Patchwork-Id: 3111 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.50.79 with SMTP id y76csp1028013vsy; Sun, 26 Mar 2017 19:58:48 -0700 (PDT) X-Received: by 10.223.147.100 with SMTP id 91mr585866wro.89.1490583528158; Sun, 26 Mar 2017 19:58:48 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s11si12781951wra.190.2017.03.26.19.58.47; Sun, 26 Mar 2017 19:58:48 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8E8896882A8; Mon, 27 Mar 2017 05:58:22 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-it0-f68.google.com (mail-it0-f68.google.com [209.85.214.68]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 951A46809D7 for ; Mon, 27 Mar 2017 05:58:16 +0300 (EEST) Received: by mail-it0-f68.google.com with SMTP id 190so8895131itm.3 for ; Sun, 26 Mar 2017 19:58:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:subject:message-id:mime-version:content-disposition :user-agent; bh=j9WVzmf2i8jBbbVhZID5KpjEAJW5DU27sqmNo8t/m8I=; b=oacGKl71P9eBIMtIAWvzEF4UC9BncuUUMpA3/zPdE+MqHAWl798fNRE8JhpH/g8l0m /ugm9mTZb3p/TA1xNTlMSWlSeu7SZnT51S0GrfQ7AgYUrnVDZ3gtxgZ5fb0out/wDYox bMJAmqOcpeQPLvQWQU+P1wF6TVYsB4fJSO6eiFOXq5OSmIk5J6Lk4cM0F6CpFCDb9JNh hp2VMHJcfHPTuPGYG/M0eTvXzpRxCNdpYkC513iq4mTb8/3zVzl1kAJ6W2TqetU39o2E bgsP1y94rh5MstV6XuQCwV0pqpObj1u05M9RZdv3H9DhI/5K0vkrJ/Csgl9m/ySB2ZS5 CF3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:user-agent; bh=j9WVzmf2i8jBbbVhZID5KpjEAJW5DU27sqmNo8t/m8I=; b=IlQ5koxJWlvRhZL/3SY6DcWz9KtTSKdpjAXvWFt7WSwUT7gbliYhD7v+n48Bn3PBtS S9QA353wKKa57UmK8ax6C5vElzMAeE9mGabGFR8fJdeAUE32PM4OdHpyJiLn5LiUj8b0 4JlbhFDQOtwLwWAPDVBqkz0dtfTAPkiPqmAcYbsMQUq9lbg2tkfjjYP2mDQWfy0Wc4bB 93UEMeVYtRSaUzjXf/XA+8HPRVXAf/RSaHTu7QZ8dC+Uqv1Ey0n++U2JrEfzBe87+nAi RoqGUR0YcxNEfyrLFu6osMK/yYX7bT7hYZuVToo0inDy8DNt1r1J6jzrfUvTMPa5qAyg cO3g== X-Gm-Message-State: AFeK/H35qZSjZmcuU+gC5/XZ4RSUZBfw0iRWDziowWw3h5ZgBoEgDzRBPddnAD5lu5USng== X-Received: by 10.36.70.210 with SMTP id j201mr101614itb.109.1490583516893; Sun, 26 Mar 2017 19:58:36 -0700 (PDT) Received: from tdjones879 (host-184-167-177-46.csp-wy.client.bresnan.net. [184.167.177.46]) by smtp.gmail.com with ESMTPSA id c91sm5042866iod.18.2017.03.26.19.58.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Mar 2017 19:58:36 -0700 (PDT) Date: Sun, 26 Mar 2017 20:58:33 -0600 From: Tyler Jones To: ffmpeg-devel@ffmpeg.org Message-ID: <20170327025833.GA1669@tdjones879> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) Subject: [FFmpeg-devel] [PATCH 2/2] avcodec/vorbisenc: Implement transient detection in Vorbis encoder X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The existing AAC psychoacoustic system is used to detect transients within the vorbis encoder. This is useful, in general, as an initial step in later utilizing a complex psychoacoustic model for the vorbis encoder, but more specifically allows the cacellation of pre-echo effects that frequently occur with this codec. Signed-off-by: Tyler Jones --- libavcodec/psymodel.c | 1 + libavcodec/vorbisenc.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 61 insertions(+) diff --git a/libavcodec/psymodel.c b/libavcodec/psymodel.c index 2b5f111..38831ce 100644 --- a/libavcodec/psymodel.c +++ b/libavcodec/psymodel.c @@ -62,6 +62,7 @@ av_cold int ff_psy_init(FFPsyContext *ctx, AVCodecContext *avctx, int num_lens, switch (ctx->avctx->codec_id) { case AV_CODEC_ID_AAC: + case AV_CODEC_ID_VORBIS: ctx->model = &ff_aac_psy_model; break; } diff --git a/libavcodec/vorbisenc.c b/libavcodec/vorbisenc.c index 2974ca2..e4ec822 100644 --- a/libavcodec/vorbisenc.c +++ b/libavcodec/vorbisenc.c @@ -33,6 +33,8 @@ #include "vorbis.h" #include "vorbis_enc_data.h" +#include "psymodel.h" + #define BITSTREAM_WRITER_LE #include "put_bits.h" @@ -126,6 +128,9 @@ typedef struct vorbis_enc_context { vorbis_enc_mode *modes; int64_t next_pts; + + FFPsyContext psy; + struct FFPsyPreprocessContext* psypp; } vorbis_enc_context; #define MAX_CHANNELS 2 @@ -1024,10 +1029,38 @@ static int vorbis_encode_frame(AVCodecContext *avctx, AVPacket *avpkt, vorbis_enc_context *venc = avctx->priv_data; float **audio = frame ? (float **)frame->extended_data : NULL; int samples = frame ? frame->nb_samples : 0; + float *samples2, *la, *overlap; vorbis_enc_mode *mode; vorbis_enc_mapping *mapping; PutBitContext pb; int i, ret; + int start_ch, ch, chans, cur_channel; + FFPsyWindowInfo windows[MAX_CHANNELS]; + enum WindowSequence window_sequence[MAX_CHANNELS]; + + if (!avctx->frame_number) + return 0; + + if (venc->psypp) + ff_psy_preprocess(venc->psypp, audio, venc->channels); + + if (frame) { + start_ch = 0; + cur_channel = 0; + for (i = 0; i < venc->channels - 1; i++) { + FFPsyWindowInfo* wi = windows + start_ch; + chans = 2; + for (ch = 0; ch < 2; ch++) { + cur_channel = start_ch + ch; + overlap = &audio[cur_channel][0]; + samples2 = overlap + 1024; + la = samples2 + (448+64); + wi[ch] = venc->psy.model->window(&venc->psy, samples2, la, + cur_channel, window_sequence[0]); + } + start_ch += chans; + } + } if (!apply_window_and_mdct(venc, audio, samples)) return 0; @@ -1158,7 +1191,10 @@ static av_cold int vorbis_encode_close(AVCodecContext *avctx) ff_mdct_end(&venc->mdct[0]); ff_mdct_end(&venc->mdct[1]); + ff_psy_end(&venc->psy); + if (venc->psypp) + ff_psy_preprocess_end(venc->psypp); av_freep(&avctx->extradata); return 0 ; @@ -1168,6 +1204,10 @@ static av_cold int vorbis_encode_init(AVCodecContext *avctx) { vorbis_enc_context *venc = avctx->priv_data; int ret; + const uint8_t *sizes[MAX_CHANNELS]; + uint8_t grouping[MAX_CHANNELS]; + int lengths[MAX_CHANNELS]; + int samplerate_index; if (avctx->channels != 2) { av_log(avctx, AV_LOG_ERROR, "Current FFmpeg Vorbis encoder only supports 2 channels.\n"); @@ -1190,6 +1230,26 @@ static av_cold int vorbis_encode_init(AVCodecContext *avctx) avctx->frame_size = 1 << (venc->log2_blocksize[0] - 1); + for (samplerate_index = 0; samplerate_index < 16; samplerate_index++) + if (avctx->sample_rate == mpeg4audio_sample_rates[samplerate_index]) + break; + if (samplerate_index == 16 || + samplerate_index >= ff_vorbis_swb_size_1024_len || + samplerate_index >= ff_vorbis_swb_size_128_len) + av_log(avctx, AV_LOG_ERROR, "Unsupported sample rate %d\n", avctx->sample_rate); + + sizes[0] = ff_vorbis_swb_size_1024[samplerate_index]; + sizes[1] = ff_vorbis_swb_size_128[samplerate_index]; + lengths[0] = ff_vorbis_num_swb_1024[samplerate_index]; + lengths[1] = ff_vorbis_num_swb_128[samplerate_index]; + grouping[0] = 1; + + if ((ret = ff_psy_init(&venc->psy, avctx, 2, + sizes, lengths, + 1, grouping)) < 0) + goto error; + venc->psypp = ff_psy_preprocess_init(avctx); + return 0; error: vorbis_encode_close(avctx);