From patchwork Fri Nov 6 23:27:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 23424 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 61C8744BDF7 for ; Sat, 7 Nov 2020 01:33:17 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2E6E868BADE; Sat, 7 Nov 2020 01:33:17 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ej1-f65.google.com (mail-ej1-f65.google.com [209.85.218.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C8FDC68BABE for ; Sat, 7 Nov 2020 01:33:10 +0200 (EET) Received: by mail-ej1-f65.google.com with SMTP id w13so4163660eju.13 for ; Fri, 06 Nov 2020 15:33:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=dUzOPnXXLHm6x0rq/2Auz88tOvqAMCw+HdOfoXsg3bM=; b=RPPetH5Sdc8kOAEE9GGoZe3X05TvjpNTj8tjeF0T17aM5gOg7N+un3tv7WwXvdBmTo vf1gygjpR0Q2NLhLSe+ajy8T9XsoLua4b5+kiQm1wECapcFCcwJnAO2lRA25GhsejkqU KOwq1AQXcOUI/AeVp/ci7+H89y3I6WLY8BKa6MsCq6jEsODMRKlH67LoVLDfMLMwv6gi GsxQtabmR82zHWFSrSK2VH7g9qUdhLC+wa5skYdPNpGYFnuP5/8+ISifAFgLmsQOMdOB gYv3Tve+gS6aH+sEr2+1CQNtlgsC2f1bJ2FBWau35sHQmVcDw0kazmHeVPDPgM6m0eWq eGvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=dUzOPnXXLHm6x0rq/2Auz88tOvqAMCw+HdOfoXsg3bM=; b=LOsyJ8Q4+U820qvdEOA07RS0UgKMdJnz+p3kMk32VNw352Be2fMuvHAFvEUYgU4YhH 8pH5ydxEUSubqNIOZ3ppecYcTwAUOZPbrRYw7MA7F1ReZkG0aQ4qToMhX7eoRv481wGx cA/02qMlslNbSaoC5AHMXq7HaSp2D/HGTTxDY5Zvyt6HIrfjXFZgwmDKYO5kKCH4oleb FuMkJtS4IGZH9ArUqtRjg/B7dQ8Y5Dpa1NohLm2rxtccmMLNV585N37ZOqBVq1ntZcMm M4mthfGvc5ghXILUbll8VAYd4vEoTUqVIgKUtAsGLQ76GKBspKQi+mSMzsKYB+YW6L2T uQfg== X-Gm-Message-State: AOAM5321PT3X18YjlxtF+n4uxbyk+B4VyrepecVIxOFcjHIwq/xOIDwn opJfmsbmQVuqxnm1WUmVdiIzMYZrOT03Rg== X-Google-Smtp-Source: ABdhPJz5SNK4/WGV3fNnt5UyYjdq+ZSAbjNHWqNe7RMp4bxZAtaN8bmnDlragt/cHL9MYvFKMqKfwQ== X-Received: by 2002:a17:906:17d1:: with SMTP id u17mr4220599eje.229.1604705246365; Fri, 06 Nov 2020 15:27:26 -0800 (PST) Received: from localhost.localdomain ([77.237.107.67]) by smtp.gmail.com with ESMTPSA id lu33sm1548723ejb.98.2020.11.06.15.27.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Nov 2020 15:27:25 -0800 (PST) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Sat, 7 Nov 2020 00:27:15 +0100 Message-Id: <20201106232715.10304-1-onemda@gmail.com> X-Mailer: git-send-email 2.17.1 Subject: [FFmpeg-devel] [PATCH] avfilter: add speechnorm filter X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: Paul B Mahol --- doc/filters.texi | 49 +++++ libavfilter/Makefile | 1 + libavfilter/af_speechnorm.c | 422 ++++++++++++++++++++++++++++++++++++ libavfilter/allfilters.c | 1 + 4 files changed, 473 insertions(+) create mode 100644 libavfilter/af_speechnorm.c diff --git a/doc/filters.texi b/doc/filters.texi index 8380f6cac2..7db43dd34c 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -5276,6 +5276,55 @@ and also with custom gain: @end example @end itemize +@section speechnorm +Speech Normalizer. + +This filter expands or compress each half-cycle of audio samples +(local set of samples all above or all bellow zero) depending on threshold value, +so audio reaches target peak value under conditions controlled by bellow options. + +The filter accepts the following options: + +@table @option +@item peak, p +Set the expansion target peak value. This specifies the highest allowed absolute amplitude +level for the normalized audio input. Default value is 0.95. Allowed range is from 0.0 to 1.0. + +@item expansion, e +Set the maximum expansion factor. Allowed range is from 1.0 to 10.0. Default value is 2.0. +This option controls maximum local half-cycle of samples expansion. The maximum expansion +would be such that local peak value reach target peak value but never to surpass it and that +ratio between new and previous peak value does not surpass this option value. + +@item compression, c +Set the maximum compression factor. Allowed range is from 1.0 to 10.0. Default value is 2.0. +This option controls maximum local half-cycle of samples compression. This option is used +only if @option{threshold} option is set to value greater than 0.0, than in such cases +when local peak is lower or same as value set by @option{threshold} all samples belonging to +that peak's half-cycle would be compressed by current compression factor. + +@item threshold, t +Set the threshold value. Default value is 0.0. Allowed range is from 0.0 to 1.0. +This option specify which half-cycles of samples will be compressed and which will be expanded. +Any half-cycle samples with their local peak value bellow or same as this option value will be +compressed by current compression factor, otherwise, if greater than threshold value it will be +expanded with expansion factor so that it could reach peak target value but never surpass it. + +@item raise, r +Set the expansion raising amount per each half-cycle of samples. Default value is 0.001. +Allowed range is from 0.0 to 1.0. This controls how fast expansion factor is raised per +each new half-cycle until it reaches @option{expansion} value. + +@item fall, f +Set the compression raising amount per each half-cycle of samples. Default value is 0.001. +Allowed range is from 0.0 to 1.0. This controls how fast compression factor is raised per +each new half-cycle until it reaches @option{compression} value. +@end table + +@subsection Commands + +This filter supports the all above options as @ref{commands}. + @section stereotools This filter has some handy utilities to manage stereo signals, for converting diff --git a/libavfilter/Makefile b/libavfilter/Makefile index 0c2a5d1cf4..36f3d2d0e4 100644 --- a/libavfilter/Makefile +++ b/libavfilter/Makefile @@ -138,6 +138,7 @@ OBJS-$(CONFIG_SIDECHAINGATE_FILTER) += af_agate.o OBJS-$(CONFIG_SILENCEDETECT_FILTER) += af_silencedetect.o OBJS-$(CONFIG_SILENCEREMOVE_FILTER) += af_silenceremove.o OBJS-$(CONFIG_SOFALIZER_FILTER) += af_sofalizer.o +OBJS-$(CONFIG_SPEECHNORM_FILTER) += af_speechnorm.o OBJS-$(CONFIG_STEREOTOOLS_FILTER) += af_stereotools.o OBJS-$(CONFIG_STEREOWIDEN_FILTER) += af_stereowiden.o OBJS-$(CONFIG_SUPEREQUALIZER_FILTER) += af_superequalizer.o diff --git a/libavfilter/af_speechnorm.c b/libavfilter/af_speechnorm.c new file mode 100644 index 0000000000..ef06c72fee --- /dev/null +++ b/libavfilter/af_speechnorm.c @@ -0,0 +1,422 @@ +/* + * Speech Normalizer + * Copyright (c) 2020 Paul B Mahol + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * Speech Normalizer + */ + +#include + +#include "libavutil/avassert.h" +#include "libavutil/opt.h" + +#define FF_BUFQUEUE_SIZE (1024) +#include "bufferqueue.h" + +#include "audio.h" +#include "avfilter.h" +#include "filters.h" +#include "internal.h" + +#define MAX_ITEMS 882000 + +typedef struct PeriodItem { + int size; + int type; + double max_peak; +} PeriodItem; + +typedef struct ChannelContext { + int state; + PeriodItem pi[MAX_ITEMS]; + double gain_state; + int pi_start; + int pi_end; +} ChannelContext; + +typedef struct SpeechNormalizerContext { + const AVClass *class; + + double peak_value; + double max_expansion; + double max_compression; + double threshold_value; + double raise_amount; + double fall_amount; + int channels; + + ChannelContext *cc; + + int max_period; + int eof; + int64_t pts; + + struct FFBufQueue queue; +} SpeechNormalizerContext; + +#define OFFSET(x) offsetof(SpeechNormalizerContext, x) +#define FLAGS AV_OPT_FLAG_AUDIO_PARAM|AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_RUNTIME_PARAM + +static const AVOption speechnorm_options[] = { + { "peak", "set the peak value", OFFSET(peak_value), AV_OPT_TYPE_DOUBLE, {.dbl=0.95}, 0.0, 1.0, FLAGS }, + { "p", "set the peak value", OFFSET(peak_value), AV_OPT_TYPE_DOUBLE, {.dbl=0.95}, 0.0, 1.0, FLAGS }, + { "expansion", "set the max expansion factor", OFFSET(max_expansion), AV_OPT_TYPE_DOUBLE, {.dbl=2.0}, 1.0, 10.0, FLAGS }, + { "e", "set the max expansion factor", OFFSET(max_expansion), AV_OPT_TYPE_DOUBLE, {.dbl=2.0}, 1.0, 10.0, FLAGS }, + { "compresion", "set the max compression factor", OFFSET(max_compression), AV_OPT_TYPE_DOUBLE, {.dbl=2.0}, 1.0, 10.0, FLAGS }, + { "c", "set the max compression factor", OFFSET(max_compression), AV_OPT_TYPE_DOUBLE, {.dbl=2.0}, 1.0, 10.0, FLAGS }, + { "threshold", "set the threshold value", OFFSET(threshold_value), AV_OPT_TYPE_DOUBLE, {.dbl=0}, 0.0, 1.0, FLAGS }, + { "t", "set the threshold value", OFFSET(threshold_value), AV_OPT_TYPE_DOUBLE, {.dbl=0}, 0.0, 1.0, FLAGS }, + { "raise", "set the expansion raising amount", OFFSET(raise_amount), AV_OPT_TYPE_DOUBLE, {.dbl=0.001}, 0.0, 1.0, FLAGS }, + { "r", "set the expansion raising amount", OFFSET(raise_amount), AV_OPT_TYPE_DOUBLE, {.dbl=0.001}, 0.0, 1.0, FLAGS }, + { "fall", "set the compression raising amount", OFFSET(fall_amount), AV_OPT_TYPE_DOUBLE, {.dbl=0.001}, 0.0, 1.0, FLAGS }, + { "f", "set the compression raising amount", OFFSET(fall_amount), AV_OPT_TYPE_DOUBLE, {.dbl=0.001}, 0.0, 1.0, FLAGS }, + { NULL } +}; + +AVFILTER_DEFINE_CLASS(speechnorm); + +static int query_formats(AVFilterContext *ctx) +{ + AVFilterFormats *formats; + AVFilterChannelLayouts *layouts; + static const enum AVSampleFormat sample_fmts[] = { + AV_SAMPLE_FMT_DBLP, + AV_SAMPLE_FMT_NONE + }; + int ret; + + layouts = ff_all_channel_counts(); + if (!layouts) + return AVERROR(ENOMEM); + ret = ff_set_common_channel_layouts(ctx, layouts); + if (ret < 0) + return ret; + + formats = ff_make_format_list(sample_fmts); + if (!formats) + return AVERROR(ENOMEM); + ret = ff_set_common_formats(ctx, formats); + if (ret < 0) + return ret; + + formats = ff_all_samplerates(); + if (!formats) + return AVERROR(ENOMEM); + return ff_set_common_samplerates(ctx, formats); +} + +static int config_input(AVFilterLink *inlink) +{ + AVFilterContext *ctx = inlink->dst; + SpeechNormalizerContext *s = ctx->priv; + + s->max_period = inlink->sample_rate / (2 * 20); + s->channels = inlink->channels; + + s->cc = av_calloc(inlink->channels, sizeof(*s->cc)); + if (!s->cc) + return AVERROR(ENOMEM); + + for (int ch = 0; ch < s->channels; ch++) { + ChannelContext *cc = &s->cc[ch]; + + cc->state = -1; + cc->gain_state = 1.; + } + + return 0; +} + +static int get_pi_samples(PeriodItem *pi, int start, int end, int mode) +{ + int sum; + + if (mode && pi[start].type == 0) + return 0; + + sum = pi[start].size; + av_assert0(sum >= 0); + while (start != end) { + start++; + if (start >= MAX_ITEMS) + start = 0; + if (mode && pi[start].type == 0) + break; + av_assert0(pi[start].size > 0); + sum += pi[start].size; + if (pi[start].type == 0) + break; + } + + return sum; +} + +static int consume_pi(PeriodItem *pi, int start, int end, int nb_samples) +{ + int sum; + + sum = pi[start].size; + av_assert0(pi[start].size > 0); + while (sum < nb_samples) { + av_assert0(pi[start].type == 1); + av_assert0(start != end); + start++; + if (start >= MAX_ITEMS) + start = 0; + av_assert0(pi[start].size > 0); + sum += pi[start].size; + } + + av_assert0(pi[start].size >= sum - nb_samples); + pi[start].size = sum - nb_samples; + av_assert0(pi[start].size >= 0); + if (pi[start].size == 0 && start != end) { + start++; + if (start >= MAX_ITEMS) + start = 0; + } + + return start; +} + +static int get_queued_samples(SpeechNormalizerContext *s) +{ + int sum = 0; + + for (int i = 0; i < s->queue.available; i++) { + AVFrame *frame = ff_bufqueue_peek(&s->queue, i); + sum += frame->nb_samples; + } + + return sum; +} + +static int filter_frame(AVFilterContext *ctx) +{ + SpeechNormalizerContext *s = ctx->priv; + AVFilterLink *outlink = ctx->outputs[0]; + AVFilterLink *inlink = ctx->inputs[0]; + int min_pi_nb_samples; + AVFrame *in = NULL; + int ret; + + for (int f = 0; f < ff_inlink_queued_frames(inlink); f++) { + ret = ff_inlink_consume_frame(inlink, &in); + if (ret < 0) + return ret; + if (ret == 0) + break; + + ff_bufqueue_add(ctx, &s->queue, in); + + for (int ch = 0; ch < inlink->channels; ch++) { + ChannelContext *cc = &s->cc[ch]; + const double *src = (const double *)in->extended_data[ch]; + int n = 0; + + if (cc->state < 0) + cc->state = src[0] >= 0.; + + while (n < in->nb_samples) { + if (cc->state != (src[n] >= 0.) || cc->pi[cc->pi_end].size > s->max_period) { + cc->state = src[n] >= 0.; + av_assert0(cc->pi[cc->pi_end].size > 0); + cc->pi[cc->pi_end].type = 1; + cc->pi_end++; + if (cc->pi_end >= MAX_ITEMS) + cc->pi_end = 0; + cc->pi[cc->pi_end].max_peak = DBL_MIN; + cc->pi[cc->pi_end].type = 0; + cc->pi[cc->pi_end].size = 0; + av_assert0(cc->pi_end != cc->pi_start); + } + + if (src[n] >= 0.) { + while (src[n] >= 0.) { + cc->pi[cc->pi_end].max_peak = FFMAX(cc->pi[cc->pi_end].max_peak, FFABS(src[n])); + cc->pi[cc->pi_end].size++; + n++; + if (n >= in->nb_samples) + break; + } + } else { + while (src[n] < 0.) { + cc->pi[cc->pi_end].max_peak = FFMAX(cc->pi[cc->pi_end].max_peak, FFABS(src[n])); + cc->pi[cc->pi_end].size++; + n++; + if (n >= in->nb_samples) + break; + } + } + } + } + } + + if (s->queue.available > 0) { + in = ff_bufqueue_peek(&s->queue, 0); + if (!in) + return 1; + } else { + return 1; + } + + min_pi_nb_samples = get_pi_samples(s->cc[0].pi, s->cc[0].pi_start, s->cc[0].pi_end, 1); + for (int ch = 1; ch < inlink->channels; ch++) { + ChannelContext *cc = &s->cc[ch]; + min_pi_nb_samples = FFMIN(min_pi_nb_samples, get_pi_samples(cc->pi, cc->pi_start, cc->pi_end, 1)); + } + + if (min_pi_nb_samples >= in->nb_samples || s->eof) { + int nb_samples = get_queued_samples(s); + + in = ff_bufqueue_get(&s->queue); + + av_frame_make_writable(in); + + nb_samples -= in->nb_samples; + + for (int ch = 0; ch < inlink->channels; ch++) { + ChannelContext *cc = &s->cc[ch]; + double *src = (double *)in->extended_data[ch]; + int start = cc->pi_start; + int offset = 0; + double gain = 1.; + + for (int n = 0; n < in->nb_samples; n++) { + if (n >= offset) { + int type = cc->pi[start].max_peak > s->threshold_value; + + if (type) + gain = FFMIN(s->max_expansion, s->peak_value / cc->pi[start].max_peak); + else + gain = 1. / s->max_compression; + + av_assert0(cc->pi[start].size > 0); + offset += cc->pi[start++].size; + if (start >= MAX_ITEMS) + start = 0; + + if (type) + cc->gain_state = FFMIN(gain, cc->gain_state + s->raise_amount); + else + cc->gain_state = FFMAX(gain, cc->gain_state - s->fall_amount); + } + src[n] *= cc->gain_state; + } + } + + for (int ch = 0; ch < inlink->channels; ch++) { + ChannelContext *cc = &s->cc[ch]; + + cc->pi_start = consume_pi(cc->pi, cc->pi_start, cc->pi_end, in->nb_samples); + } + + for (int ch = 0; ch < inlink->channels; ch++) { + ChannelContext *cc = &s->cc[ch]; + int pi_nb_samples = get_pi_samples(cc->pi, cc->pi_start, cc->pi_end, 0); + + if (nb_samples != pi_nb_samples) { + av_assert0(0); + } + } + + s->pts = in->pts + in->nb_samples; + + return ff_filter_frame(outlink, in); + } + + return 1; +} + +static int activate(AVFilterContext *ctx) +{ + AVFilterLink *inlink = ctx->inputs[0]; + AVFilterLink *outlink = ctx->outputs[0]; + SpeechNormalizerContext *s = ctx->priv; + int ret = 0, status; + int64_t pts; + + FF_FILTER_FORWARD_STATUS_BACK(outlink, inlink); + + ret = filter_frame(ctx); + if (ret <= 0) + return ret; + + if (!s->eof && ff_inlink_acknowledge_status(inlink, &status, &pts)) { + if (status == AVERROR_EOF) + s->eof = 1; + } + + if (s->eof && ff_inlink_queued_samples(inlink) == 0 && + s->queue.available == 0) { + ff_outlink_set_status(outlink, AVERROR_EOF, s->pts); + return 0; + } + + if (!s->eof) + FF_FILTER_FORWARD_WANTED(outlink, inlink); + + if (s->eof && s->queue.available > 0) { + ff_filter_set_ready(ctx, 10); + return 0; + } + + return FFERROR_NOT_READY; +} + +static av_cold void uninit(AVFilterContext *ctx) +{ + SpeechNormalizerContext *s = ctx->priv; + + ff_bufqueue_discard_all(&s->queue); + av_freep(&s->cc); +} + +static const AVFilterPad avfilter_af_speechnorm_inputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_AUDIO, + .config_props = config_input, + }, + { NULL } +}; + +static const AVFilterPad avfilter_af_speechnorm_outputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_AUDIO, + }, + { NULL } +}; + +AVFilter ff_af_speechnorm = { + .name = "speechnorm", + .description = NULL_IF_CONFIG_SMALL("Speech Normalizer."), + .query_formats = query_formats, + .priv_size = sizeof(SpeechNormalizerContext), + .priv_class = &speechnorm_class, + .activate = activate, + .uninit = uninit, + .inputs = avfilter_af_speechnorm_inputs, + .outputs = avfilter_af_speechnorm_outputs, + .process_command = ff_filter_process_command, +}; diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c index 7796959dc7..fde535d50c 100644 --- a/libavfilter/allfilters.c +++ b/libavfilter/allfilters.c @@ -132,6 +132,7 @@ extern AVFilter ff_af_sidechaingate; extern AVFilter ff_af_silencedetect; extern AVFilter ff_af_silenceremove; extern AVFilter ff_af_sofalizer; +extern AVFilter ff_af_speechnorm; extern AVFilter ff_af_stereotools; extern AVFilter ff_af_stereowiden; extern AVFilter ff_af_superequalizer;