From patchwork Fri Oct 14 18:09:51 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Greg Rowe <growe@shoretel.com>
X-Patchwork-Id: 1005
Delivered-To: ffmpegpatchwork@gmail.com
Received: by 10.103.140.133 with SMTP id o127csp34092vsd;
	Fri, 14 Oct 2016 11:10:04 -0700 (PDT)
X-Received: by 10.28.163.5 with SMTP id m5mr2741439wme.58.1476468604573;
	Fri, 14 Oct 2016 11:10:04 -0700 (PDT)
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
	by mx.google.com with ESMTP id l8si921827wmg.56.2016.10.14.11.10.03;
	Fri, 14 Oct 2016 11:10:04 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
	dkim=neutral (body hash did not verify)
	header.i=@shoretel1.onmicrosoft.com;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 48C686897B4;
	Fri, 14 Oct 2016 21:10:00 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from NAM01-BY2-obe.outbound.protection.outlook.com
	(mail-by2nam01on0061.outbound.protection.outlook.com [104.47.34.61])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 77181680D03
	for <ffmpeg-devel@ffmpeg.org>; Fri, 14 Oct 2016 21:09:53 +0300 (EEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=shoretel1.onmicrosoft.com; s=selector1-shoretel-com;
	h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;
	bh=3eZ2C1tovUIJjLhj/SmflCHIJKtrF6Vo+j97LmAJbic=;
	b=AKvnY6SfqfSTpReYd5/KOlH0Y46AQ36Xpx6nYh+aG6whZRRZCV461OQuPpF0g/rZzQU+FK3owF18wWfTH88kmibgWAaBUkETwcBf9P480v787bz0L/Y04GZV1FsEIbAxEzXrP6JkC3TleYAAVkrEjeZnpfncaLo3rfySw51LkBQ=
Received: from DM5PR10MB1258.namprd10.prod.outlook.com (10.172.39.138) by
	DM5PR10MB1258.namprd10.prod.outlook.com (10.172.39.138) with
	Microsoft SMTP Server (version=TLS1_2,
	cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id
	15.1.659.11; Fri, 14 Oct 2016 18:09:52 +0000
Received: from DM5PR10MB1258.namprd10.prod.outlook.com ([10.172.39.138]) by
	DM5PR10MB1258.namprd10.prod.outlook.com ([10.172.39.138]) with
	mapi id 15.01.0659.020; Fri, 14 Oct 2016 18:09:51 +0000
From: Greg Rowe <growe@shoretel.com>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Thread-Topic: [FFmpeg-devel] [PATCH] avfilter/af_silenceremove: add optional
	tone when silence is removed
Thread-Index: AQHSJbx1eUslKe6H70aZnW62OC253aCoPD+Y
Date: Fri, 14 Oct 2016 18:09:51 +0000
Message-ID: 
 <DM5PR10MB1258C563B7CB18AABA2791F7C3DF0@DM5PR10MB1258.namprd10.prod.outlook.com>
References: 
 <BN6PR10MB12505361FE3D3161F34E8B8AC3DC0@BN6PR10MB1250.namprd10.prod.outlook.com>,
	<20161014014348.GA4602@nb4>
In-Reply-To: <20161014014348.GA4602@nb4>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: 
authentication-results: spf=none (sender IP is )
	smtp.mailfrom=growe@shoretel.com;
x-originating-ip: [25.168.192.4]
x-ms-office365-filtering-correlation-id: cf067611-7504-4bab-988f-08d3f45d4b24
x-microsoft-exchange-diagnostics: 1; DM5PR10MB1258;
	7:ZMu3raak7nQWjAHtgCMJ3mL8/GlKbNMmZJsHU4uX8W+aYzDGRYf6+3uOBI9vgMDhQL1WK2Dzs7anP/nk7/XM0++Re0UMbTAfboA1uJIoHqaIXSSpOtH3nWPHaV0ZDnjlQR3itWji/nNNYR6ma21BPeS3zcJ3Q3xUOx8KEpqdEDs56YEMcfmdXjZ0oZRdTCSOB97v4OSPLigYwSlc3H85Nui/nQLENviTADCE/7+bFoQdKJG7shpVA+j9OIRVRylrGWNVAgNTTqCfTaDeI0+W8WaztEsV3Wh1m+KV5D9XRS6xKsfjyKoYacJgjlQ3otohQm8eZ0xDxlLiuRzT0oe/yyxTTxJHlAmtq0O7vLRAzsY=
x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DM5PR10MB1258;
x-microsoft-antispam-prvs: 
 <DM5PR10MB1258228F2BACA1CB93C5FB90C3DF0@DM5PR10MB1258.namprd10.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(114461547978260);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0;
	RULEID:(102415321)(6040176)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6042046)(6043046);
	SRVR:DM5PR10MB1258; BCL:0; PCL:0; RULEID:; SRVR:DM5PR10MB1258;
x-forefront-prvs: 0095BCF226
x-forefront-antispam-report: SFV:NSPM;
	SFS:(10009020)(6009001)(7916002)(189002)(199003)(2906002)(19580395003)(92566002)(106116001)(66066001)(122556002)(97736004)(7696004)(3280700002)(5660300001)(5890100001)(77096005)(3660700001)(87936001)(105586002)(15974865002)(8936002)(9686002)(76576001)(2950100002)(6916009)(189998001)(10400500002)(99936001)(76176999)(81166006)(107886002)(101416001)(81156014)(74316002)(54356999)(5002640100001)(86362001)(68736007)(450100001)(110136003)(7846002)(106356001)(305945005)(99286002)(50986999)(33656002)(7736002)(2900100001)(102836003)(3846002)(6116002)(586003)(18886075002);
	DIR:OUT; SFP:1101; SCL:1; SRVR:DM5PR10MB1258;
	H:DM5PR10MB1258.namprd10.prod.outlook.com; FPR:; SPF:None;
	PTR:InfoNoRecords; MX:1; A:1; LANG:en;
received-spf: None (protection.outlook.com: shoretel.com does not designate
	permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
MIME-Version: 1.0
X-OriginatorOrg: shoretel.com
X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Oct 2016 18:09:51.5994
	(UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 610c7684-bc75-4e31-a66a-d12e77c45e5c
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR10MB1258
Subject: Re: [FFmpeg-devel] [PATCH] avfilter/af_silenceremove: add
	optional	tone when silence is removed
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <http://ffmpeg.org/mailman/options/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <http://ffmpeg.org/pipermail/ffmpeg-devel/>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Michael,

In the attached patch I've tried to make all of the changes you've pointed out.  I also renamed tone_hz to tone_frequency on Moritz Barsnick's suggestion.

Is there a good way to generate the tone while avoiding floating point operations?  If there is then don't bother reviewing this patch and I'll make that change once I know better how to do it.

I removed the unrelated changes.  The two parameters, tone_duration and tone_frequency, are integers now.  The tone_duration parameter is changed from seconds to milliseconds.  I have updated the documentation to reflect that.  I moved the tone generation to an initialization function and fill a buffer that exists for the duration of the filter instead of needlessly generating the tone on the fly.  

Thanks,
Greg

diff --git a/Changelog b/Changelog
index 0da009c..86e031c 100644
--- a/Changelog
+++ b/Changelog
@@ -2,6 +2,7 @@ Entries are sorted chronologically from oldest to youngest within each release,
 releases are sorted from youngest to oldest.
 
 version <next>:
+- Added optional tone insertion in af_silenceremove
 - libopenmpt demuxer
 - tee protocol
 - Changed metadata print option to accept general urls
diff --git a/doc/filters.texi b/doc/filters.texi
index 4b2f7bf..e09a303 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -3340,7 +3340,8 @@ ffmpeg -i silence.mp3 -af silencedetect=noise=0.0001 -f null -
 
 @section silenceremove
 
-Remove silence from the beginning, middle or end of the audio.
+Remove silence from the beginning, middle or end of the audio while
+optionally inserting a tone where silence was removed.
 
 The filter accepts the following options:
 
@@ -3401,6 +3402,14 @@ Default value is @code{rms}.
 @item window
 Set ratio used to calculate size of window for detecting silence.
 Default value is @code{0.02}. Allowed range is from @code{0} to @code{10}.
+
+@item tone_duration
+Set the duration of the tone inserted in the stream when silence is removed.  A value of @code{0} disables tone insertion.
+Default value is @code{0.0}.
+
+@item tone_hz
+Set the frequency of the tone inserted in the stream when silence is removed.
+Default value is @code{1000.0}.
 @end table
 
 @subsection Examples
diff --git a/libavfilter/af_silenceremove.c b/libavfilter/af_silenceremove.c
index f156d18..07cf428 100644
--- a/libavfilter/af_silenceremove.c
+++ b/libavfilter/af_silenceremove.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2001 Chris Bagwell
  * Copyright (c) 2003 Donnie Smith
  * Copyright (c) 2014 Paul B Mahol
+ * Copyright (c) 2016 Shoretel <growe@shoretel.com>
  *
  * This file is part of FFmpeg.
  *
@@ -31,11 +32,20 @@
 #include "internal.h"
 
 enum SilenceMode {
-    SILENCE_TRIM,
+    SILENCE_TRIM = 0,
     SILENCE_TRIM_FLUSH,
     SILENCE_COPY,
     SILENCE_COPY_FLUSH,
-    SILENCE_STOP
+    SILENCE_STOP,
+    SILENCE_END_MARKER
+};
+
+static const char* SILENCE_MODE_NAMES[] = {
+    NULL_IF_CONFIG_SMALL("TRIM"),
+    NULL_IF_CONFIG_SMALL("TRIM_FLUSH"),
+    NULL_IF_CONFIG_SMALL("COPY"),
+    NULL_IF_CONFIG_SMALL("COPY_FLUSH"),
+    NULL_IF_CONFIG_SMALL("STOP")
 };
 
 typedef struct SilenceRemoveContext {
@@ -75,6 +85,10 @@ typedef struct SilenceRemoveContext {
     int detection;
     void (*update)(struct SilenceRemoveContext *s, double sample);
     double(*compute)(struct SilenceRemoveContext *s, double sample);
+
+    double last_pts_seconds;
+    double tone_duration;
+    double tone_hz;
 } SilenceRemoveContext;
 
 #define OFFSET(x) offsetof(SilenceRemoveContext, x)
@@ -91,11 +105,51 @@ static const AVOption silenceremove_options[] = {
     {   "peak",          0,    0,                       AV_OPT_TYPE_CONST,    {.i64=0},     0,       0, FLAGS, "detection" },
     {   "rms",           0,    0,                       AV_OPT_TYPE_CONST,    {.i64=1},     0,       0, FLAGS, "detection" },
     { "window",          NULL, OFFSET(window_ratio),    AV_OPT_TYPE_DOUBLE,   {.dbl=0.02},  0,      10, FLAGS },
-    { NULL }
+    {
+        .name = "tone_duration",
+        .help = "length of tone inserted when silence is detected (0 to disable)",
+        .offset = OFFSET(tone_duration),
+        .type = AV_OPT_TYPE_DOUBLE,
+        .default_val = {.dbl=0.0},
+        .min = 0.0,
+        .max = DBL_MAX,
+        .flags = FLAGS,
+        .unit = "tone",
+    },
+    {
+        .name = "tone_hz",
+        .help = "frequency of tone inserted when silence is removed, 1 kHz default",
+        .offset = OFFSET(tone_hz),
+        .type = AV_OPT_TYPE_DOUBLE,
+        .default_val = {.dbl=1000.0},
+        .min = 0.0,
+        .max = DBL_MAX,
+        .flags = FLAGS,
+        .unit = "tone",
+    },
+    {NULL}
 };
 
 AVFILTER_DEFINE_CLASS(silenceremove);
 
+static const char* mode_to_string(enum SilenceMode mode)
+{
+    if (mode >= SILENCE_END_MARKER) {
+        return "";
+    }
+    /* This can be null if the config is small.  */
+    return SILENCE_MODE_NAMES[mode] ? SILENCE_MODE_NAMES[mode]:"";
+}
+
+
+static void set_mode(AVFilterContext *ctx, enum SilenceMode new)
+{
+    SilenceRemoveContext *s = ctx->priv;
+    av_log(ctx, AV_LOG_DEBUG, "changing state %s=>%s\n",
+           mode_to_string(s->mode), mode_to_string(new));
+    s->mode = new;
+}
+
 static double compute_peak(SilenceRemoveContext *s, double sample)
 {
     double new_sum;
@@ -209,14 +263,46 @@ static int config_input(AVFilterLink *inlink)
     s->stop_holdoff_end    = 0;
     s->stop_found_periods  = 0;
 
-    if (s->start_periods)
-        s->mode = SILENCE_TRIM;
-    else
-        s->mode = SILENCE_COPY;
+    set_mode(ctx, s->start_periods ? SILENCE_TRIM:SILENCE_COPY);
 
     return 0;
 }
 
+static int insert_tone(AVFilterLink *inlink,
+                       AVFilterLink *outlink,
+                       double tone_hz,
+                       double duration)
+{
+    AVFilterContext *ctx = inlink->dst;
+    int sample_count = duration * inlink->sample_rate;
+    double twopi = 2.0 * M_PI;
+    int i = 0;
+    AVFrame *out = NULL;
+    double *obuf = NULL;
+    double step = 0.0;
+    double s = 0.0;
+
+    out = ff_get_audio_buffer(inlink, sample_count / inlink->channels);
+    if (!out) {
+        return AVERROR(ENOMEM);
+    }
+    obuf = (double *)out->data[0];
+    step = tone_hz / (double)out->sample_rate;
+    s = step;
+
+    av_log(ctx, AV_LOG_DEBUG,
+           "insert beep tone=%fhz duration=%f seconds\n",
+           tone_hz, duration);
+
+
+    for (i=0; i<sample_count; ++i) {
+        *obuf++ = sin(twopi * s);
+        s += step;
+    }
+    return ff_filter_frame(outlink, out);
+}
+
+
 static void flush(AVFrame *out, AVFilterLink *outlink,
                   int *nb_samples_written, int *ret)
 {
@@ -229,6 +315,28 @@ static void flush(AVFrame *out, AVFilterLink *outlink,
     }
 }
 
+
+static int process_tone(AVFilterLink *inlink)
+{
+    int ret = 0;
+    double pts_seconds = 0.0;
+    AVFilterContext *ctx = inlink->dst;
+    AVFilterLink *outlink = ctx->outputs[0];
+    SilenceRemoveContext *s = ctx->priv;
+    pts_seconds = (inlink->current_pts_us / 1000000.0) / AV_TIME_BASE;
+
+    /* Check to be certain that we don't flood the stream with
+     * annoying tones. */
+    if ((s->last_pts_seconds == 0.0)
+        || (pts_seconds - s->last_pts_seconds) > (s->tone_duration * 2.0)) {
+
+        ret = insert_tone(inlink, outlink, s->tone_hz, s->tone_duration);
+        s->last_pts_seconds = pts_seconds;
+    }
+
+    return ret;
+}
+
 static int filter_frame(AVFilterLink *inlink, AVFrame *in)
 {
     AVFilterContext *ctx = inlink->dst;
@@ -243,7 +351,7 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *in)
 
     switch (s->mode) {
     case SILENCE_TRIM:
-silence_trim:
+    silence_trim:
         nbs = in->nb_samples - nb_samples_read / inlink->channels;
         if (!nbs)
             break;
@@ -263,7 +371,7 @@ silence_trim:
 
                 if (s->start_holdoff_end >= s->start_duration * inlink->channels) {
                     if (++s->start_found_periods >= s->start_periods) {
-                        s->mode = SILENCE_TRIM_FLUSH;
+                        set_mode(ctx, SILENCE_TRIM_FLUSH);
                         goto silence_trim_flush;
                     }
 
@@ -283,7 +391,7 @@ silence_trim:
         break;
 
     case SILENCE_TRIM_FLUSH:
-silence_trim_flush:
+    silence_trim_flush:
         nbs  = s->start_holdoff_end - s->start_holdoff_offset;
         nbs -= nbs % inlink->channels;
         if (!nbs)
@@ -304,13 +412,13 @@ silence_trim_flush:
         if (s->start_holdoff_offset == s->start_holdoff_end) {
             s->start_holdoff_offset = 0;
             s->start_holdoff_end = 0;
-            s->mode = SILENCE_COPY;
+            set_mode(ctx, SILENCE_COPY);
             goto silence_copy;
         }
         break;
 
     case SILENCE_COPY:
-silence_copy:
+    silence_copy:
         nbs = in->nb_samples - nb_samples_read / inlink->channels;
         if (!nbs)
             break;
@@ -329,7 +437,7 @@ silence_copy:
                     threshold &= s->compute(s, ibuf[j]) > s->stop_threshold;
 
                 if (threshold && s->stop_holdoff_end && !s->leave_silence) {
-                    s->mode = SILENCE_COPY_FLUSH;
+                    set_mode(ctx, SILENCE_COPY_FLUSH);
                     flush(out, outlink, &nb_samples_written, &ret);
                     goto silence_copy_flush;
                 } else if (threshold) {
@@ -357,7 +465,7 @@ silence_copy:
                             s->stop_holdoff_end = 0;
 
                             if (!s->restart) {
-                                s->mode = SILENCE_STOP;
+                                set_mode(ctx, SILENCE_STOP);
                                 flush(out, outlink, &nb_samples_written, &ret);
                                 goto silence_stop;
                             } else {
@@ -366,12 +474,19 @@ silence_copy:
                                 s->start_holdoff_offset = 0;
                                 s->start_holdoff_end = 0;
                                 clear_window(s);
-                                s->mode = SILENCE_TRIM;
-                                flush(out, outlink, &nb_samples_written, &ret);
-                                goto silence_trim;
+                                set_mode(ctx, SILENCE_TRIM);
+
+                                if (s->tone_duration > 0.0) {
+                                    ret = process_tone(inlink);
+                                }
+                                if (!ret) {
+                                    flush(out, outlink,
+                                          &nb_samples_written, &ret);
+                                    goto silence_trim;
+                                }
                             }
                         }
-                        s->mode = SILENCE_COPY_FLUSH;
+                        set_mode(ctx, SILENCE_COPY_FLUSH);
                         flush(out, outlink, &nb_samples_written, &ret);
                         goto silence_copy_flush;
                     }
@@ -385,7 +500,7 @@ silence_copy:
         break;
 
     case SILENCE_COPY_FLUSH:
-silence_copy_flush:
+    silence_copy_flush:
         nbs  = s->stop_holdoff_end - s->stop_holdoff_offset;
         nbs -= nbs % inlink->channels;
         if (!nbs)
@@ -406,12 +521,12 @@ silence_copy_flush:
         if (s->stop_holdoff_offset == s->stop_holdoff_end) {
             s->stop_holdoff_offset = 0;
             s->stop_holdoff_end = 0;
-            s->mode = SILENCE_COPY;
+            set_mode(ctx, SILENCE_COPY);
             goto silence_copy;
         }
         break;
     case SILENCE_STOP:
-silence_stop:
+    silence_stop:
         break;
     }
 
@@ -427,6 +542,8 @@ static int request_frame(AVFilterLink *outlink)
     int ret;
 
     ret = ff_request_frame(ctx->inputs[0]);
+    /* If there is no more data but the holdoff buffer still has data
+     * then copy the holdoff buffer out */
     if (ret == AVERROR_EOF && (s->mode == SILENCE_COPY_FLUSH ||
                                s->mode == SILENCE_COPY)) {
         int nbs = s->stop_holdoff_end - s->stop_holdoff_offset;
@@ -441,7 +558,7 @@ static int request_frame(AVFilterLink *outlink)
                    nbs * sizeof(double));
             ret = ff_filter_frame(ctx->inputs[0], frame);
         }
-        s->mode = SILENCE_STOP;
+        set_mode(ctx, SILENCE_STOP);
     }
     return ret;
 }
diff --git a/libavfilter/version.h b/libavfilter/version.h
index 93d249b..4626ca4 100644
--- a/libavfilter/version.h
+++ b/libavfilter/version.h
@@ -31,7 +31,7 @@
 
 #define LIBAVFILTER_VERSION_MAJOR   6
 #define LIBAVFILTER_VERSION_MINOR  63
-#define LIBAVFILTER_VERSION_MICRO 100
+#define LIBAVFILTER_VERSION_MICRO 101
 
 #define LIBAVFILTER_VERSION_INT AV_VERSION_INT(LIBAVFILTER_VERSION_MAJOR, \
                                                LIBAVFILTER_VERSION_MINOR, \