diff mbox

[FFmpeg-devel,v1,2/6] avfilter/af_silencedetect: document metadata

Message ID 20190930133647.6290-2-lance.lmwang@gmail.com
State Superseded
Headers show

Commit Message

Lance Wang Sept. 30, 2019, 1:36 p.m. UTC
From: Limin Wang <lance.lmwang@gmail.com>

Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
---
 doc/filters.texi               | 10 +++++++++-
 libavfilter/af_silencedetect.c |  1 -
 2 files changed, 9 insertions(+), 2 deletions(-)

Comments

Moritz Barsnick Oct. 1, 2019, 7:40 a.m. UTC | #1
On Mon, Sep 30, 2019 at 21:36:43 +0800, lance.lmwang@gmail.com wrote:
> -The printed times and duration are expressed in seconds.
> +The printed times and duration are expressed in seconds. The @code{lavfi.silence_start}
> +or @code{lavfi.silence_start.X} metadata key is set on the first frame whose timestamp
> +equals or exceeds the detection duration and it contains the timestamp of the first
> +frame of the silence.
> +
> +The @code{lavfi.silence_duration} or @code{lavfi.silence_duration.X} and
> +@code{lavfi.silence_end} or @code{lavfi.silence_end.X}metadata keys are set on the
> +first frame after the silence. Where @code{X} is the channel number and .X is used
> +if @option{mono} is enabled.

I don't understand. Literal ".X" (which you should write as
"@code{.X}", by the way)? And if not mono, it's ".0", ".1"? And when is
".N" completely omitted?

This is a bit unclear to me, without experimenting or looking at the
code.

Cheers,
Moritz
Lance Wang Oct. 7, 2019, 2:23 p.m. UTC | #2
On Tue, Oct 01, 2019 at 09:40:45AM +0200, Moritz Barsnick wrote:
> On Mon, Sep 30, 2019 at 21:36:43 +0800, lance.lmwang@gmail.com wrote:
> > -The printed times and duration are expressed in seconds.
> > +The printed times and duration are expressed in seconds. The @code{lavfi.silence_start}
> > +or @code{lavfi.silence_start.X} metadata key is set on the first frame whose timestamp
> > +equals or exceeds the detection duration and it contains the timestamp of the first
> > +frame of the silence.
> > +
> > +The @code{lavfi.silence_duration} or @code{lavfi.silence_duration.X} and
> > +@code{lavfi.silence_end} or @code{lavfi.silence_end.X}metadata keys are set on the
> > +first frame after the silence. Where @code{X} is the channel number and .X is used
> > +if @option{mono} is enabled.
> 
> I don't understand. Literal ".X" (which you should write as
> "@code{.X}", by the way)? And if not mono, it's ".0", ".1"? And when is
> ".N" completely omitted?
By the code,  if it's mono mode, it'll output lavfi.silence_duration metadata, if not mono,
it'll output lavfi.silence_duration.0, lavfi.silence_duration.1(0, 1 is the channel number).
Please advise how to describe this clearly.
> 
> This is a bit unclear to me, without experimenting or looking at the
> code.
> 
> Cheers,
> Moritz
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Moritz Barsnick Oct. 8, 2019, 12:21 p.m. UTC | #3
On Mon, Oct 07, 2019 at 22:23:21 +0800, Limin Wang wrote:
> On Tue, Oct 01, 2019 at 09:40:45AM +0200, Moritz Barsnick wrote:
> > On Mon, Sep 30, 2019 at 21:36:43 +0800, lance.lmwang@gmail.com wrote:
> > > -The printed times and duration are expressed in seconds.
> > > +The printed times and duration are expressed in seconds. The @code{lavfi.silence_start}
> > > +or @code{lavfi.silence_start.X} metadata key is set on the first frame whose timestamp
> > > +equals or exceeds the detection duration and it contains the timestamp of the first
> > > +frame of the silence.
> > > +
> > > +The @code{lavfi.silence_duration} or @code{lavfi.silence_duration.X} and
> > > +@code{lavfi.silence_end} or @code{lavfi.silence_end.X}metadata keys are set on the
> > > +first frame after the silence. Where @code{X} is the channel number and .X is used
> > > +if @option{mono} is enabled.
> >
> > I don't understand. Literal ".X" (which you should write as
> > "@code{.X}", by the way)? And if not mono, it's ".0", ".1"? And when is
> > ".N" completely omitted?
> By the code,  if it's mono mode, it'll output lavfi.silence_duration metadata, if not mono,
> it'll output lavfi.silence_duration.0, lavfi.silence_duration.1(0, 1 is the channel number).
> Please advise how to describe this clearly.

Well, I must say I was confused by the option "mono", because it does
the opposite in my opinion[*].

Anyway, since it's technically correct, I would rearrange or explain as
such:

The @code{lavfi.silence_duration} or @code{lavfi.silence_duration.X}
and @code{lavfi.silence_end} or @code{lavfi.silence_end.X} metadata
keys are set on the first frame after the silence. If @option{mono} is
enabled, and each channel is evaluated separately, the @code{.X}
suffixed keys are used, and @code{X} corresponds to the channel number.

[*] In "mono" mode, from its name, I would have expected all the
channels to be mixed down and interpreted as a monaural signal. But
this is actually what happens if "mono" mode is *not* selected,
apparently. In reality, in "mono" mode, each channel is interpreted
separately, but that doesn't make these mono, in my opinion. But
there's nothing we can change about that now. ;-)

Cheers,
Moritz
Lance Wang Oct. 8, 2019, 1:11 p.m. UTC | #4
On Tue, Oct 08, 2019 at 02:21:06PM +0200, Moritz Barsnick wrote:
> On Mon, Oct 07, 2019 at 22:23:21 +0800, Limin Wang wrote:
> > On Tue, Oct 01, 2019 at 09:40:45AM +0200, Moritz Barsnick wrote:
> > > On Mon, Sep 30, 2019 at 21:36:43 +0800, lance.lmwang@gmail.com wrote:
> > > > -The printed times and duration are expressed in seconds.
> > > > +The printed times and duration are expressed in seconds. The @code{lavfi.silence_start}
> > > > +or @code{lavfi.silence_start.X} metadata key is set on the first frame whose timestamp
> > > > +equals or exceeds the detection duration and it contains the timestamp of the first
> > > > +frame of the silence.
> > > > +
> > > > +The @code{lavfi.silence_duration} or @code{lavfi.silence_duration.X} and
> > > > +@code{lavfi.silence_end} or @code{lavfi.silence_end.X}metadata keys are set on the
> > > > +first frame after the silence. Where @code{X} is the channel number and .X is used
> > > > +if @option{mono} is enabled.
> > >
> > > I don't understand. Literal ".X" (which you should write as
> > > "@code{.X}", by the way)? And if not mono, it's ".0", ".1"? And when is
> > > ".N" completely omitted?
> > By the code,  if it's mono mode, it'll output lavfi.silence_duration metadata, if not mono,
> > it'll output lavfi.silence_duration.0, lavfi.silence_duration.1(0, 1 is the channel number).
> > Please advise how to describe this clearly.
> 
> Well, I must say I was confused by the option "mono", because it does
> the opposite in my opinion[*].
> 
> Anyway, since it's technically correct, I would rearrange or explain as
> such:
> 
> The @code{lavfi.silence_duration} or @code{lavfi.silence_duration.X}
> and @code{lavfi.silence_end} or @code{lavfi.silence_end.X} metadata
> keys are set on the first frame after the silence. If @option{mono} is
> enabled, and each channel is evaluated separately, the @code{.X}
> suffixed keys are used, and @code{X} corresponds to the channel number.

OK, I'll update it by your description, it's more native to understand.

> 
> [*] In "mono" mode, from its name, I would have expected all the
> channels to be mixed down and interpreted as a monaural signal. But
> this is actually what happens if "mono" mode is *not* selected,
> apparently. In reality, in "mono" mode, each channel is interpreted
> separately, but that doesn't make these mono, in my opinion. But
> there's nothing we can change about that now. ;-)

Yes, the mono mode make the metadata difficult to process also.

> 
> Cheers,
> Moritz
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
diff mbox

Patch

diff --git a/doc/filters.texi b/doc/filters.texi
index 333f502..697ec21 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -4563,7 +4563,15 @@  This filter logs a message when it detects that the input audio volume is less
 or equal to a noise tolerance value for a duration greater or equal to the
 minimum detected noise duration.
 
-The printed times and duration are expressed in seconds.
+The printed times and duration are expressed in seconds. The @code{lavfi.silence_start}
+or @code{lavfi.silence_start.X} metadata key is set on the first frame whose timestamp
+equals or exceeds the detection duration and it contains the timestamp of the first
+frame of the silence.
+
+The @code{lavfi.silence_duration} or @code{lavfi.silence_duration.X} and
+@code{lavfi.silence_end} or @code{lavfi.silence_end.X}metadata keys are set on the
+first frame after the silence. Where @code{X} is the channel number and .X is used
+if @option{mono} is enabled.
 
 The filter accepts the following options:
 
diff --git a/libavfilter/af_silencedetect.c b/libavfilter/af_silencedetect.c
index c31109f..193d0fe 100644
--- a/libavfilter/af_silencedetect.c
+++ b/libavfilter/af_silencedetect.c
@@ -187,7 +187,6 @@  static int filter_frame(AVFilterLink *inlink, AVFrame *insamples)
     s->frame_end = insamples->pts + av_rescale_q(insamples->nb_samples,
             (AVRational){ 1, s->last_sample_rate }, inlink->time_base);
 
-    // TODO: document metadata
     s->silencedetect(s, insamples, nb_samples, nb_samples_notify,
                      inlink->time_base);