From patchwork Sun Jul 18 14:08:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 28967 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a5d:965a:0:0:0:0:0 with SMTP id d26csp3587645ios; Sun, 18 Jul 2021 07:14:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwi/2Jw4J0SlfJfKdQwoydGWzJ8EBkCOcif5F7Fla9n64S9Ew7AaDzZtmJ/nXqJRUxf+Q2a X-Received: by 2002:a05:6402:221c:: with SMTP id cq28mr22339741edb.115.1626617679814; Sun, 18 Jul 2021 07:14:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626617679; cv=none; d=google.com; s=arc-20160816; b=ixppfgynVPm83Ok1UWSUQHHn5iYLT8K34DsvEYjeHG4X0J/elE6vtTsD95iTiRYbrc /+ioQVDVkPNnpFV8WDIqGATjDkJu8lINqtWmTi+cUVUaX90iqZiIWvSonSmzx8JRQWiN BcWCCxnl0T6GalK+lKeMGtxCX8ZZBgaRcyF7BHcG2a48rWYq6JAo0GVyZMLpiVSYAVdV 50ThJYiQPCPn+eYHtD+sk/Uex/+X6dvl0mqf8nn75ad5ovX9ZD/NQB/H1btT3itC7Tt2 cxK8zgV8wKi5vincEKcaiqL/fcahn4koJ+xQe6xGD/ounoBHVv1j13ODlUgXzxVclnKB sgtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:message-id:date:to:from:dkim-signature :delivered-to; bh=dnXCuLdijtDAo4CPSOP+YJO6FVoEJX0nmmNNz3vfCO8=; b=Nm6eipLdTNOO1wi8fNEXJuU+i/CVe4XZsWNV3fPJGk1WXhzoHHbsVek3DFTjLtIIVo yHwUdf8kQGCS8RUgnTDYsHRi5MccjIgSf+pRxTFttENUZGjVvGOdqM2bMB6vzNOlkpEa c6mrVHzhzRxmQ77GyS/CX7Fwjd6CyuyCk+ljkJptcSnwB2zl82J4BDxQkod6GZh7Oi7c 7e8rSCnRKj4snh971T9hqPmPdau2M9n3PVvoy1uZWZwN1DdXopk+bLOp+J0rLY8e/e24 JjTSEhtCNTUflIvsrvbhtRwcjMm6+X1w0dYf7gkz7R7kWnhpE2zZPz8GcAN0yO2vLpyq eK3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=vXvC10Ou; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a17si17490033edv.499.2021.07.18.07.14.39; Sun, 18 Jul 2021 07:14:39 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=vXvC10Ou; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B3DA268AB27; Sun, 18 Jul 2021 17:14:35 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lj1-f171.google.com (mail-lj1-f171.google.com [209.85.208.171]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 923C668A9D3 for ; Sun, 18 Jul 2021 17:14:29 +0300 (EEST) Received: by mail-lj1-f171.google.com with SMTP id e20so21718870ljn.8 for ; Sun, 18 Jul 2021 07:14:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=+N59WLb2QDikRysW60sPATudrRvaAxnVv1n7fL98XEc=; b=vXvC10OuBn8w/RVsI76kG/LKDQ7Z7n5rT4jR1pbDyhogvnrUcRIMBNFEW9I557pKIs xosLkxcopVhLsdhPoW3DQHCQNly8iNspjfOj9j+YVEa1PfuYJVZKZ3C6WI6VO+F1ePCu uHQDtP3pIPkN72jyQtx5myb9xOEUugt4RbY1m3TYiImekZqAsTTatORR9RjTyEVoT/gV NWzgjGWUMwTVIiyA6uHu2fEk4bDThf5uA5WmhAFUlxTjpdxevHd6geXZKPAcwRMsw/YU PBZjYqvDUBQkoDk5c0qBJhNU5aM05RAeyfIGb65Msvhtalv/ExS4EntPmw6IGIWvRTxw W4bA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=+N59WLb2QDikRysW60sPATudrRvaAxnVv1n7fL98XEc=; b=fV70r9lMfAhCxhdo5Nte1pGb7KVT8b293DZtSQAtSkPVPw5xsYcNDP3X345SOLWmma STI+Xc53yGVd1L5THRLE/ztBUYnmgYC6sSgL+jIqd3m7F6sV7gzz5SFD8bxbrnflmpDB zQIp5da52vFsh5X3AkGIj0rXVZom4C1ptDFm0mCKpEadDe+Erb/legtg6NDTILeWvoMg /6NTR3YSNNJQrtsLZVzmIxs/gfKelnRYT6O5XLr6H4FXyXyp/S9/JqWJ56LlDoHMbtwT d+089u5CU7BPiWuaIDwbkYbLr5SDFgyqil8eGa3rmG2dUxLmOSJP8iEzocNI6S73CdPo WFBQ== X-Gm-Message-State: AOAM533xlVMRz4qh4hq3dXTnc4Su6wbkHxE4UGc4RUGeG8xAK6vpHTD6 060DYEOPx/FD5/YlbUQbOrL4dlDeSj4= X-Received: by 2002:a5d:4c50:: with SMTP id n16mr24820009wrt.249.1626617347736; Sun, 18 Jul 2021 07:09:07 -0700 (PDT) Received: from localhost.localdomain ([95.168.118.78]) by smtp.gmail.com with ESMTPSA id p11sm16473113wro.78.2021.07.18.07.09.06 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 18 Jul 2021 07:09:07 -0700 (PDT) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Sun, 18 Jul 2021 16:08:59 +0200 Message-Id: <20210718140859.6024-1-onemda@gmail.com> X-Mailer: git-send-email 2.17.1 Subject: [FFmpeg-devel] [WIP] [RFC] [PATCH] avfilter: add audio psychoacoustic clipper X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 1TvAA2xxgdVX Signed-off-by: Paul B Mahol --- doc/filters.texi | 15 + libavfilter/Makefile | 1 + libavfilter/af_apsyclip.c | 645 ++++++++++++++++++++++++++++++++++++++ libavfilter/allfilters.c | 1 + 4 files changed, 662 insertions(+) create mode 100644 libavfilter/af_apsyclip.c diff --git a/doc/filters.texi b/doc/filters.texi index 232d81ae3e..ee0e841049 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -2281,6 +2281,21 @@ Default value is 1.0. This filter supports the all above options as @ref{commands}. +@section apsyclip +Apply Psychoacoustic clipper to audio samples. + +The filter accepts the following options: + +@table @option +@item level_in +@item level_out +@item clip_level +@item diff_only +@item adaptive +@item iterations +@item auto_level +@end table + @section apulsator Audio pulsator is something between an autopanner and a tremolo. diff --git a/libavfilter/Makefile b/libavfilter/Makefile index 62ee3d7b67..1c943cbbc4 100644 --- a/libavfilter/Makefile +++ b/libavfilter/Makefile @@ -72,6 +72,7 @@ OBJS-$(CONFIG_APAD_FILTER) += af_apad.o OBJS-$(CONFIG_APERMS_FILTER) += f_perms.o OBJS-$(CONFIG_APHASER_FILTER) += af_aphaser.o generate_wave_table.o OBJS-$(CONFIG_APHASESHIFT_FILTER) += af_afreqshift.o +OBJS-$(CONFIG_APSYCLIP_FILTER) += af_apsyclip.o OBJS-$(CONFIG_APULSATOR_FILTER) += af_apulsator.o OBJS-$(CONFIG_AREALTIME_FILTER) += f_realtime.o OBJS-$(CONFIG_ARESAMPLE_FILTER) += af_aresample.o diff --git a/libavfilter/af_apsyclip.c b/libavfilter/af_apsyclip.c new file mode 100644 index 0000000000..29ea0f2b26 --- /dev/null +++ b/libavfilter/af_apsyclip.c @@ -0,0 +1,645 @@ +/* + * Copyright (c) 2014 - 2021 Jason Jang + * Copyright (c) 2021 Paul B Mahol + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public License + * as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public License + * along with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/opt.h" +#include "libavutil/tx.h" +#include "audio.h" +#include "avfilter.h" +#include "filters.h" +#include "internal.h" + +typedef struct AudioPsyClipContext { + const AVClass *class; + + double level_in; + double level_out; + double clip_level; + double adaptive_distortion; + int auto_level; + int diff_only; + int iterations; + char *protections_str; + double *protections; + + int num_psy_bins; + int fft_size; + int overlap; + + int spread_table_rows; + int *spread_table_index; + int (*spread_table_range)[2]; + float *window, *inv_window, *spread_table, *margin_curve; + + AVFrame *in_buffer; + AVFrame *out_buffer; + AVFrame *in_frame; + AVFrame *out_dist_frame; + AVFrame *windowed_frame; + AVFrame *clipping_delta; + AVFrame *spectrum_buf; + AVFrame *mask_curve; + + AVTXContext *tx_ctx; + av_tx_fn tx_fn; + AVTXContext *itx_ctx; + av_tx_fn itx_fn; +} AudioPsyClipContext; + +#define OFFSET(x) offsetof(AudioPsyClipContext, x) +#define A AV_OPT_FLAG_AUDIO_PARAM +#define F AV_OPT_FLAG_FILTERING_PARAM + +static const AVOption apsyclip_options[] = { + { "level_in", "set input level", OFFSET(level_in), AV_OPT_TYPE_DOUBLE, {.dbl=1},.015625, 64, A|F }, + { "level_out", "set output level", OFFSET(level_out), AV_OPT_TYPE_DOUBLE, {.dbl=1},.015625, 64, A|F }, + { "clip_level", "set clip level", OFFSET(clip_level), AV_OPT_TYPE_DOUBLE, {.dbl=1},.015625, 1, A|F }, + { "diff_only", "set hear difference", OFFSET(diff_only), AV_OPT_TYPE_BOOL, {.i64=0}, 0, 1, A|F }, + { "adaptive", "set adaptive distortion", OFFSET(adaptive_distortion), AV_OPT_TYPE_DOUBLE, {.dbl=0.5}, 0, 1, A|F }, + { "iterations", "set iterations", OFFSET(iterations), AV_OPT_TYPE_INT, {.i64=10}, 0, 20, A|F }, + { "auto_level", "set auto level", OFFSET(auto_level), AV_OPT_TYPE_BOOL,{.i64=1}, 0, 1, A|F }, + {NULL} +}; + +AVFILTER_DEFINE_CLASS(apsyclip); + +static int query_formats(AVFilterContext *ctx) +{ + AVFilterFormats *formats; + AVFilterChannelLayouts *layouts; + static const enum AVSampleFormat sample_fmts[] = { + AV_SAMPLE_FMT_FLTP, + AV_SAMPLE_FMT_NONE + }; + int ret; + + layouts = ff_all_channel_counts(); + if (!layouts) + return AVERROR(ENOMEM); + ret = ff_set_common_channel_layouts(ctx, layouts); + if (ret < 0) + return ret; + + formats = ff_make_format_list(sample_fmts); + if (!formats) + return AVERROR(ENOMEM); + ret = ff_set_common_formats(ctx, formats); + if (ret < 0) + return ret; + + formats = ff_all_samplerates(); + if (!formats) + return AVERROR(ENOMEM); + return ff_set_common_samplerates(ctx, formats); +} + +static void generate_hann_window(float *window, float *inv_window, int size) +{ + for (int i = 0; i < size; i++) { + float value = 0.5f * (1.f - cosf(2.f * M_PI * i / size)); + + window[i] = value; + // 1/window to calculate unwindowed peak. + inv_window[i] = value > 0.01f ? 1.f / value : 0.f; + } +} + +static void set_margin_curve(AudioPsyClipContext *s, + const int (*points)[2], int num_points, int sample_rate) +{ + int j = 0; + + s->margin_curve[0] = points[0][1]; + + for (int i = 0; i < num_points - 1; i++) { + while (j < s->fft_size / 2 + 1 && j * sample_rate / s->fft_size < points[i + 1][0]) { + // linearly interpolate between points + int binHz = j * sample_rate / s->fft_size; + s->margin_curve[j] = points[i][1] + (binHz - points[i][0]) * (points[i + 1][1] - points[i][1]) / (points[i + 1][0] - points[i][0]); + j++; + } + } + // handle bins after the last point + while (j < s->fft_size / 2 + 1) { + s->margin_curve[j] = points[num_points - 1][1]; + j++; + } + + // convert margin curve to linear amplitude scale + for (j = 0; j < s->fft_size / 2 + 1; j++) + s->margin_curve[j] = powf(10.f, s->margin_curve[j] / 20.f); +} + +static void generate_spread_table(AudioPsyClipContext *s) +{ + // Calculate tent-shape function in log-log scale. + + // As an optimization, only consider bins close to "bin" + // This reduces the number of multiplications needed in calculate_mask_curve + // The masking contribution at faraway bins is negligeable + + // Another optimization to save memory and speed up the calculation of the + // spread table is to calculate and store only 2 spread functions per + // octave, and reuse the same spread function for multiple bins. + int table_index = 0; + int bin = 0; + int increment = 1; + + while (bin < s->num_psy_bins) { + float sum = 0; + int base_idx = table_index * s->num_psy_bins; + int start_bin = bin * 3 / 4; + int end_bin = FFMIN(s->num_psy_bins, ((bin + 1) * 4 + 2) / 3); + int next_bin; + + for (int j = start_bin; j < end_bin; j++) { + // add 0.5 so i=0 doesn't get log(0) + float rel_idx_log = FFABS(logf((j + 0.5f) / (bin + 0.5f))); + float value; + if (j >= bin) { + // mask up + value = expf(-rel_idx_log * 40.f); + } else { + // mask down + value = expf(-rel_idx_log * 80.f); + } + // the spreading function is centred in the row + sum += value; + s->spread_table[base_idx + s->num_psy_bins / 2 + j - bin] = value; + } + // now normalize it + for (int j = start_bin; j < end_bin; j++) { + s->spread_table[base_idx + s->num_psy_bins / 2 + j - bin] /= sum; + } + + s->spread_table_range[table_index][0] = start_bin - bin; + s->spread_table_range[table_index][1] = end_bin - bin; + + if (bin <= 1) { + next_bin = bin + 1; + } else { + if ((bin & (bin - 1)) == 0) { + // power of 2 + increment = bin / 2; + } + + next_bin = bin + increment; + } + + // set bins between "bin" and "next_bin" to use this table_index + for (int i = bin; i < next_bin; i++) + s->spread_table_index[i] = table_index; + + bin = next_bin; + table_index++; + } +} + +static int config_input(AVFilterLink *inlink) +{ + AVFilterContext *ctx = inlink->dst; + AudioPsyClipContext *s = ctx->priv; + static const int points[][2] = { {0,14}, {125,14}, {250,16}, {500,18}, {1000,20}, {2000,20}, {4000,20}, {8000,15}, {16000,5}, {20000,-10} }; + static const int num_points = 10; + float scale; + int ret; + + s->fft_size = inlink->sample_rate > 100000 ? 1024 : inlink->sample_rate > 50000 ? 512 : 256; + s->overlap = s->fft_size / 4; + + // The psy masking calculation is O(n^2), + // so skip it for frequencies not covered by base sampling rantes (i.e. 44k) + if (inlink->sample_rate <= 50000) { + s->num_psy_bins = s->fft_size / 2; + } else if (inlink->sample_rate <= 100000) { + s->num_psy_bins = s->fft_size / 4; + } else { + s->num_psy_bins = s->fft_size / 8; + } + + s->window = av_calloc(s->fft_size, sizeof(*s->window)); + s->inv_window = av_calloc(s->fft_size, sizeof(*s->inv_window)); + if (!s->window || !s->inv_window) + return AVERROR(ENOMEM); + + s->in_buffer = ff_get_audio_buffer(inlink, s->fft_size * 2); + s->out_buffer = ff_get_audio_buffer(inlink, s->fft_size * 2); + s->in_frame = ff_get_audio_buffer(inlink, s->fft_size * 2); + s->out_dist_frame = ff_get_audio_buffer(inlink, s->fft_size * 2); + s->windowed_frame = ff_get_audio_buffer(inlink, s->fft_size * 2); + s->clipping_delta = ff_get_audio_buffer(inlink, s->fft_size * 2); + s->spectrum_buf = ff_get_audio_buffer(inlink, s->fft_size * 2); + s->mask_curve = ff_get_audio_buffer(inlink, s->fft_size / 2 + 1); + if (!s->in_buffer || !s->out_buffer || !s->in_frame || + !s->out_dist_frame || !s->windowed_frame || + !s->clipping_delta || !s->spectrum_buf || !s->mask_curve) + return AVERROR(ENOMEM); + + generate_hann_window(s->window, s->inv_window, s->fft_size); + + s->margin_curve = av_calloc(s->fft_size / 2 + 1, sizeof(*s->margin_curve)); + if (!s->margin_curve) + return AVERROR(ENOMEM); + + s->spread_table_rows = av_log2(s->num_psy_bins) * 2; + s->spread_table = av_calloc(s->spread_table_rows * s->num_psy_bins, sizeof(*s->spread_table)); + if (!s->spread_table) + return AVERROR(ENOMEM); + + s->spread_table_range = av_calloc(s->spread_table_rows * 2, sizeof(*s->spread_table_range)); + if (!s->spread_table_range) + return AVERROR(ENOMEM); + + s->spread_table_index = av_calloc(s->num_psy_bins, sizeof(*s->spread_table_index)); + if (!s->spread_table_index) + return AVERROR(ENOMEM); + + set_margin_curve(s, points, num_points, inlink->sample_rate); + + generate_spread_table(s); + + ret = av_tx_init(&s->tx_ctx, &s->tx_fn, AV_TX_FLOAT_FFT, 0, s->fft_size, &scale, 0); + if (ret < 0) + return ret; + + ret = av_tx_init(&s->itx_ctx, &s->itx_fn, AV_TX_FLOAT_FFT, 1, s->fft_size, &scale, 0); + if (ret < 0) + return ret; + + return 0; +} + +static void apply_window(AudioPsyClipContext *s, + const float *in_frame, float *out_frame, const int add_to_out_frame) +{ + const float *window = s->window; + + for (int i = 0; i < s->fft_size; i++) { + if (add_to_out_frame) { + out_frame[i] += in_frame[i] * window[i]; + } else { + out_frame[i] = in_frame[i] * window[i]; + } + } +} + +static void calculate_mask_curve(AudioPsyClipContext *s, + const float *spectrum, float *mask_curve) +{ + for (int i = 0; i < s->fft_size / 2 + 1; i++) + mask_curve[i] = 0; + + for (int i = 0; i < s->num_psy_bins; i++) { + float magnitude; + int table_idx; + int range[2]; + + if (i == 0) { + magnitude = FFABS(spectrum[0]); + } else if (i == s->fft_size / 2) { + magnitude = FFABS(spectrum[1]); + } else { + // although the negative frequencies are omitted because they are redundant, + // the magnitude of the positive frequencies are not doubled. + // Multiply the magnitude by 2 to simulate adding up the + and - frequencies. + magnitude = hypotf(spectrum[2 * i], spectrum[2 * i + 1]) * 2; + } + + table_idx = s->spread_table_index[i]; + range[0] = s->spread_table_range[table_idx][0]; + range[1] = s->spread_table_range[table_idx][1]; + int base_idx = table_idx * s->num_psy_bins; + int start_bin = FFMAX(0, i + range[0]); + int end_bin = FFMIN(s->num_psy_bins, i + range[1]); + + for (int j = start_bin; j < end_bin; j++) + mask_curve[j] += s->spread_table[base_idx + s->num_psy_bins / 2 + j - i] * magnitude; + } + + // for ultrasonic frequencies, skip the O(n^2) spread calculation and just copy the magnitude + for (int i = s->num_psy_bins; i < s->fft_size / 2 + 1; i++) { + float magnitude; + if (i == s->fft_size / 2) { + magnitude = FFABS(spectrum[1]); + } else { + // although the negative frequencies are omitted because they are redundant, + // the magnitude of the positive frequencies are not doubled. + // Multiply the magnitude by 2 to simulate adding up the + and - frequencies. + magnitude = hypotf(spectrum[2 * i], spectrum[2 * i + 1]) * 2; + } + + mask_curve[i] = magnitude; + } + + for (int i = 0; i < s->fft_size / 2 + 1; i++) + mask_curve[i] = mask_curve[i] / s->margin_curve[i]; +} + +static void clip_to_window(AudioPsyClipContext *s, + const float *windowed_frame, float *clipping_delta, float delta_boost) +{ + const float *window = s->window; + + for (int i = 0; i < s->fft_size; i++) { + float limit = s->clip_level * window[i]; + float effective_value = windowed_frame[i] + clipping_delta[i]; + + if (effective_value > limit) { + clipping_delta[i] += (limit - effective_value) * delta_boost; + } else if (effective_value < -limit) { + clipping_delta[i] += (-limit - effective_value) * delta_boost; + } + } +} + +static void limit_clip_spectrum(AudioPsyClipContext *s, + float *clip_spectrum, const float *mask_curve) +{ + // bin 0 + float relative_distortion_level = FFABS(clip_spectrum[0]) / mask_curve[0]; + + if (relative_distortion_level > 1.f) + clip_spectrum[0] /= relative_distortion_level; + + // bin 1..N/2-1 + for (int i = 1; i < s->fft_size / 2; i++) { + float real = clip_spectrum[i * 2]; + float imag = clip_spectrum[i * 2 + 1]; + // although the negative frequencies are omitted because they are redundant, + // the magnitude of the positive frequencies are not doubled. + // Multiply the magnitude by 2 to simulate adding up the + and - frequencies. + relative_distortion_level = hypotf(real, imag) * 2 / mask_curve[i]; + if (relative_distortion_level > 1.0) { + clip_spectrum[i * 2] /= relative_distortion_level; + clip_spectrum[i * 2 + 1] /= relative_distortion_level; + } + } + // bin N/2 + relative_distortion_level = FFABS(clip_spectrum[1]) / mask_curve[s->fft_size / 2]; + if (relative_distortion_level > 1.f) + clip_spectrum[1] /= relative_distortion_level; +} + +static void r2c(float *buffer, int size) +{ + for (int i = size - 1; i >= 0; i--) + buffer[2 * i] = buffer[i]; + + for (int i = size - 1; i >= 0; i--) + buffer[2 * i + 1] = 0.f; +} + +static void c2r(float *buffer, int size) +{ + for (int i = 0; i < size; i++) + buffer[i] = buffer[2 * i]; + + for (int i = 0; i < size; i++) + buffer[i + size] = 0.f; +} + +static void feed(AudioPsyClipContext *s, + const float *in_samples, float *out_samples, int diff_only, + float *in_frame, float *out_dist_frame, + float *windowed_frame, float *clipping_delta, + float *spectrum_buf, float *mask_curve) +{ + float orig_peak = 0; + float peak; + + // shift in/out buffers + for (int i = 0; i < s->fft_size - s->overlap; i++) { + in_frame[i] = in_frame[i + s->overlap]; + out_dist_frame[i] = out_dist_frame[i + s->overlap]; + } + + for (int i = 0; i < s->overlap; i++) { + in_frame[i + s->fft_size - s->overlap] = in_samples[i]; + out_dist_frame[i + s->fft_size - s->overlap] = 0.f; + } + + apply_window(s, in_frame, windowed_frame, 0); + r2c(windowed_frame, s->fft_size); + s->tx_fn(s->tx_ctx, spectrum_buf, windowed_frame, sizeof(float)); + c2r(windowed_frame, s->fft_size); + calculate_mask_curve(s, spectrum_buf, mask_curve); + + // It would be easier to calculate the peak from the unwindowed input. + // This is just for consistency with the clipped peak calculateion + // because the inv_window zeros out samples on the edge of the window. + for (int i = 0; i < s->fft_size; i++) + orig_peak = FFMAX(orig_peak, FFABS(windowed_frame[i] * s->inv_window[i])); + orig_peak /= s->clip_level; + peak = orig_peak; + + // clear clipping_delta + for (int i = 0; i < s->fft_size * 2; i++) { + clipping_delta[i] = 0.f; + } + + // repeat clipping-filtering process a few times to control both the peaks and the spectrum + for (int i = 0; i < s->iterations; i++) { + float mask_curve_shift = 1.122f; // 1.122 is 1dB + // The last 1/3 of rounds have boosted delta to help reach the peak target faster + float delta_boost = 1.f; + if (i >= s->iterations - s->iterations / 3) { + // boosting the delta when largs peaks are still present is dangerous + if (peak < 2.f) + delta_boost = 2.f; + } + + clip_to_window(s, windowed_frame, clipping_delta, delta_boost); + + r2c(clipping_delta, s->fft_size); + s->tx_fn(s->tx_ctx, spectrum_buf, clipping_delta, sizeof(float)); + + limit_clip_spectrum(s, spectrum_buf, mask_curve); + + s->itx_fn(s->itx_ctx, clipping_delta, spectrum_buf, sizeof(float)); + c2r(clipping_delta, s->fft_size); + + for (int i = 0; i < s->fft_size; i++) + clipping_delta[i] /= s->fft_size; + + peak = 0; + for (int i = 0; i < s->fft_size; i++) + peak = FFMAX(peak, FFABS((windowed_frame[i] + clipping_delta[i]) * s->inv_window[i])); + peak /= s->clip_level; + + // Automatically adjust mask_curve as necessary to reach peak target + if (orig_peak > 1.f && peak > 1.f) { + float diff_achieved = orig_peak - peak; + if (i + 1 < s->iterations - s->iterations / 3 && diff_achieved > 0) { + float diff_needed = orig_peak - 1.f; + float diff_ratio = diff_needed / diff_achieved; + // If a good amount of peak reduction was already achieved, + // don't shift the mask_curve by the full peak value + // On the other hand, if only a little peak reduction was achieved, + // don't shift the mask_curve by the enormous diff_ratio. + diff_ratio = FFMIN(diff_ratio, peak); + mask_curve_shift = FFMAX(mask_curve_shift, diff_ratio); + } else { + // If the peak got higher than the input or we are in the last 1/3 rounds, + // go back to the heavy-handed peak heuristic. + mask_curve_shift = FFMAX(mask_curve_shift, peak); + } + } + + mask_curve_shift = 1.f + (mask_curve_shift - 1.f) * s->adaptive_distortion; + + // Be less strict in the next iteration. + // This helps with peak control. + for (int i = 0; i < s->fft_size / 2 + 1; i++) + mask_curve[i] *= mask_curve_shift; + } + + // do overlap & add + apply_window(s, clipping_delta, out_dist_frame, 1); + + for (int i = 0; i < s->overlap; i++) { + out_samples[i] = out_dist_frame[i] / 1.5f; + // 4 times overlap with squared hanning window results in 1.5 time increase in amplitude + if (!diff_only) + out_samples[i] += in_frame[i]; + } +} + +static int filter_frame(AVFilterLink *inlink, AVFrame *in) +{ + AVFilterContext *ctx = inlink->dst; + AVFilterLink *outlink = ctx->outputs[0]; + AudioPsyClipContext *s = ctx->priv; + AVFrame *out; + int ret; + + out = ff_get_audio_buffer(outlink, s->overlap); + if (!out) { + ret = AVERROR(ENOMEM); + goto fail; + } + + out->pts = in->pts; + + for (int ch = 0; ch < inlink->channels; ch++) { + const float *src = (const float *)in->extended_data[ch]; + float *in_buffer = (float *)s->in_buffer->extended_data[ch]; + float *dst = (float *)out->extended_data[ch]; + + for (int n = 0; n < s->overlap; n++) { + in_buffer[n] = src[n] * s->level_in; + } + + feed(s, in_buffer, dst, s->diff_only, + (float *)(s->in_frame->extended_data[ch]), + (float *)(s->out_dist_frame->extended_data[ch]), + (float *)(s->windowed_frame->extended_data[ch]), + (float *)(s->clipping_delta->extended_data[ch]), + (float *)(s->spectrum_buf->extended_data[ch]), + (float *)(s->mask_curve->extended_data[ch])); + } + + ret = ff_filter_frame(outlink, out); + if (ret < 0) + goto fail; + +fail: + av_frame_free(&in); + return ret < 0 ? ret : 0; +} + +static int activate(AVFilterContext *ctx) +{ + AVFilterLink *inlink = ctx->inputs[0]; + AVFilterLink *outlink = ctx->outputs[0]; + AudioPsyClipContext *s = ctx->priv; + AVFrame *in = NULL; + int ret = 0, status; + int64_t pts; + + FF_FILTER_FORWARD_STATUS_BACK(outlink, inlink); + + ret = ff_inlink_consume_samples(inlink, s->overlap, s->overlap, &in); + if (ret < 0) + return ret; + + if (ret > 0) { + return filter_frame(inlink, in); + } else if (ff_inlink_acknowledge_status(inlink, &status, &pts)) { + ff_outlink_set_status(outlink, status, pts); + return 0; + } else { + if (ff_outlink_frame_wanted(outlink)) + ff_inlink_request_frame(inlink); + return 0; + } +} + +static av_cold void uninit(AVFilterContext *ctx) +{ + AudioPsyClipContext *s = ctx->priv; + + av_freep(&s->window); + av_freep(&s->inv_window); + av_freep(&s->spread_table); + av_freep(&s->spread_table_range); + av_freep(&s->spread_table_index); + av_freep(&s->margin_curve); + + av_frame_free(&s->in_buffer); + av_frame_free(&s->out_buffer); + av_frame_free(&s->in_frame); + av_frame_free(&s->out_dist_frame); + av_frame_free(&s->windowed_frame); + av_frame_free(&s->clipping_delta); + av_frame_free(&s->spectrum_buf); + av_frame_free(&s->mask_curve); + + av_tx_uninit(&s->tx_ctx); + av_tx_uninit(&s->itx_ctx); +} + +static const AVFilterPad inputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_AUDIO, + .config_props = config_input, + }, + { NULL } +}; + +static const AVFilterPad outputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_AUDIO, + }, + { NULL } +}; + +const AVFilter ff_af_apsyclip = { + .name = "apsyclip", + .description = NULL_IF_CONFIG_SMALL("Audio Psychoacoustic Clipper."), + .query_formats = query_formats, + .priv_size = sizeof(AudioPsyClipContext), + .priv_class = &apsyclip_class, + .uninit = uninit, + .inputs = inputs, + .outputs = outputs, + .activate = activate, +}; diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c index c6afef835f..316e2b1dda 100644 --- a/libavfilter/allfilters.c +++ b/libavfilter/allfilters.c @@ -65,6 +65,7 @@ extern const AVFilter ff_af_apad; extern const AVFilter ff_af_aperms; extern const AVFilter ff_af_aphaser; extern const AVFilter ff_af_aphaseshift; +extern const AVFilter ff_af_apsyclip; extern const AVFilter ff_af_apulsator; extern const AVFilter ff_af_arealtime; extern const AVFilter ff_af_aresample;