diff mbox series

[FFmpeg-devel] avfilter/alimiter: Add "flush_buffer" option to flush the remaining valid data to the output

Message ID 20220327060800.3732289-1-wangcao@google.com
State New
Headers show
Series [FFmpeg-devel] avfilter/alimiter: Add "flush_buffer" option to flush the remaining valid data to the output | expand

Commit Message

Wang Cao March 27, 2022, 6:08 a.m. UTC
The change in the commit will add some samples to the end of the audio
stream. The intention is to add a "zero_delay" option eventually to not
have the delay in the begining the output from alimiter due to
lookahead.

Signed-off-by: Wang Cao <wangcao@google.com>
---
 doc/filters.texi          |  5 ++++
 libavfilter/af_alimiter.c | 57 ++++++++++++++++++++++++++++++++-------
 2 files changed, 53 insertions(+), 9 deletions(-)

If the intention is clear to you, do you prefer us add the "zero_delay"
option to the same patch or it needs to go in a separate patch? Thanks!

Comments

Marton Balint March 27, 2022, 9:40 p.m. UTC | #1
On Sat, 26 Mar 2022, Wang Cao wrote:

> The change in the commit will add some samples to the end of the audio
> stream. The intention is to add a "zero_delay" option eventually to not
> have the delay in the begining the output from alimiter due to
> lookahead.

I was very much suprised to see that the alimiter filter actually delays 
the audio - as in extra samples are inserted in the beginning and some 
samples are cut in the end. This trashes A-V sync, so it is a bug IMHO.

So unless somebody has some valid usecase for the legacy way of operation 
I'd just simply change it to be "zero delay" without any additional user 
option, in a single patch.

Regards,
Marton
Paul B Mahol April 4, 2022, 3:49 p.m. UTC | #2
On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <cus@passwd.hu> wrote:

>
>
> On Sat, 26 Mar 2022, Wang Cao wrote:
>
> > The change in the commit will add some samples to the end of the audio
> > stream. The intention is to add a "zero_delay" option eventually to not
> > have the delay in the begining the output from alimiter due to
> > lookahead.
>
> I was very much suprised to see that the alimiter filter actually delays
> the audio - as in extra samples are inserted in the beginning and some
> samples are cut in the end. This trashes A-V sync, so it is a bug IMHO.
>
> So unless somebody has some valid usecase for the legacy way of operation
> I'd just simply change it to be "zero delay" without any additional user
> option, in a single patch.
>


This is done by this patch in very complicated way and also it really
should be optional.

Something like in af_ladspa is more reasonable.




>
> Regards,
> Marton
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Marton Balint April 4, 2022, 10:28 p.m. UTC | #3
On Mon, 4 Apr 2022, Paul B Mahol wrote:

> On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <cus@passwd.hu> wrote:
>
>>
>>
>> On Sat, 26 Mar 2022, Wang Cao wrote:
>>
>>> The change in the commit will add some samples to the end of the audio
>>> stream. The intention is to add a "zero_delay" option eventually to not
>>> have the delay in the begining the output from alimiter due to
>>> lookahead.
>>
>> I was very much suprised to see that the alimiter filter actually delays
>> the audio - as in extra samples are inserted in the beginning and some
>> samples are cut in the end. This trashes A-V sync, so it is a bug IMHO.
>>
>> So unless somebody has some valid usecase for the legacy way of operation
>> I'd just simply change it to be "zero delay" without any additional user
>> option, in a single patch.
>>
>
>
> This is done by this patch in very complicated way and also it really
> should be optional.

But why does it make sense to keep the current (IMHO buggy) operational 
mode which adds silence in the beginning and trims the end? I understand 
that the original implementation worked like this, but libavfilter has 
packet timestamps and N:M filtering so there is absolutely no reason to 
use an 1:1 implementation and live with its limitations.

Regards,
Marton
Wang Cao April 5, 2022, 6:56 p.m. UTC | #4
On Mon, Apr 4, 2022 at 3:28 PM Marton Balint <cus@passwd.hu> wrote:

>
>
> On Mon, 4 Apr 2022, Paul B Mahol wrote:
>
> > On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <cus@passwd.hu> wrote:
> >
> >>
> >>
> >> On Sat, 26 Mar 2022, Wang Cao wrote:
> >>
> >>> The change in the commit will add some samples to the end of the audio
> >>> stream. The intention is to add a "zero_delay" option eventually to not
> >>> have the delay in the begining the output from alimiter due to
> >>> lookahead.
> >>
> >> I was very much suprised to see that the alimiter filter actually delays
> >> the audio - as in extra samples are inserted in the beginning and some
> >> samples are cut in the end. This trashes A-V sync, so it is a bug IMHO.
> >>
> >> So unless somebody has some valid usecase for the legacy way of
> operation
> >> I'd just simply change it to be "zero delay" without any additional user
> >> option, in a single patch.
> >>
> >
> >
> > This is done by this patch in very complicated way and also it really
> > should be optional.
>
> But why does it make sense to keep the current (IMHO buggy) operational
> mode which adds silence in the beginning and trims the end? I understand
> that the original implementation worked like this, but libavfilter has
> packet timestamps and N:M filtering so there is absolutely no reason to
> use an 1:1 implementation and live with its limitations.
>
Hello Paul and Marton, thank you so much for taking time to review my
patch.
I totally understand that my patch may seem a little bit complicated but I
can
show with a FATE test that if we set the alimiter to behave as a
passthrough filter,
the output frames will be the same from "framecrc" with my patch. The
existing
behavior will not work for all gapless audio processing.

The complete patch to fix this issue is at
https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220330210314.2055201-1-wangcao@google.com/

Regarding Paul's concern, I personally don't have any preference whether to
put
the patch as an extra option or not. With respect to the implementation,
the patch
is the best I can think of by preserving as much information as possible
from input
frames. I also understand it may break concept that "filter_frame" outputs
one frame
at a time. For alimiter with my patch, depending on the size of the
lookahead buffer,
it may take a few frames before one output frame can be generated. This is
inevitable
to compensate for the delay of the lookahead buffer.

Thanks again for reviewing my patch and I'm looking forward to hearing from
you :)
Paul B Mahol April 6, 2022, 11:49 a.m. UTC | #5
On Tue, Apr 5, 2022 at 8:57 PM Wang Cao <wangcao-at-google.com@ffmpeg.org>
wrote:

> On Mon, Apr 4, 2022 at 3:28 PM Marton Balint <cus@passwd.hu> wrote:
>
> >
> >
> > On Mon, 4 Apr 2022, Paul B Mahol wrote:
> >
> > > On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <cus@passwd.hu> wrote:
> > >
> > >>
> > >>
> > >> On Sat, 26 Mar 2022, Wang Cao wrote:
> > >>
> > >>> The change in the commit will add some samples to the end of the
> audio
> > >>> stream. The intention is to add a "zero_delay" option eventually to
> not
> > >>> have the delay in the begining the output from alimiter due to
> > >>> lookahead.
> > >>
> > >> I was very much suprised to see that the alimiter filter actually
> delays
> > >> the audio - as in extra samples are inserted in the beginning and some
> > >> samples are cut in the end. This trashes A-V sync, so it is a bug
> IMHO.
> > >>
> > >> So unless somebody has some valid usecase for the legacy way of
> > operation
> > >> I'd just simply change it to be "zero delay" without any additional
> user
> > >> option, in a single patch.
> > >>
> > >
> > >
> > > This is done by this patch in very complicated way and also it really
> > > should be optional.
> >
> > But why does it make sense to keep the current (IMHO buggy) operational
> > mode which adds silence in the beginning and trims the end? I understand
> > that the original implementation worked like this, but libavfilter has
> > packet timestamps and N:M filtering so there is absolutely no reason to
> > use an 1:1 implementation and live with its limitations.
> >
> Hello Paul and Marton, thank you so much for taking time to review my
> patch.
> I totally understand that my patch may seem a little bit complicated but I
> can
> show with a FATE test that if we set the alimiter to behave as a
> passthrough filter,
> the output frames will be the same from "framecrc" with my patch. The
> existing
> behavior will not work for all gapless audio processing.
>
> The complete patch to fix this issue is at
>
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220330210314.2055201-1-wangcao@google.com/
>
> Regarding Paul's concern, I personally don't have any preference whether to
> put
> the patch as an extra option or not. With respect to the implementation,
> the patch
> is the best I can think of by preserving as much information as possible
> from input
> frames. I also understand it may break concept that "filter_frame" outputs
> one frame
> at a time. For alimiter with my patch, depending on the size of the
> lookahead buffer,
> it may take a few frames before one output frame can be generated. This is
> inevitable
> to compensate for the delay of the lookahead buffer.
>
> Thanks again for reviewing my patch and I'm looking forward to hearing from
> you :)
>

Better than (because its no more 1 frame X nb_samples in, 1 frame X
nb_samples out) to replace .filter_frame/.request_frame with .activate
logic.

And make this output delay compensation filtering optional.

In this process make sure that output PTS frame timestamps are unchanged
from input one, by keeping reference of needed frames in filter queue.

Look how speechnorm/dynaudnorm does it.


> --
>
> Wang Cao |  Software Engineer |  wangcao@google.com |  650-203-7807
> <(650)%20203-7807>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Paul B Mahol April 7, 2022, 7:46 a.m. UTC | #6
On Wed, Apr 6, 2022 at 1:49 PM Paul B Mahol <onemda@gmail.com> wrote:

>
>
> On Tue, Apr 5, 2022 at 8:57 PM Wang Cao <wangcao-at-google.com@ffmpeg.org>
> wrote:
>
>> On Mon, Apr 4, 2022 at 3:28 PM Marton Balint <cus@passwd.hu> wrote:
>>
>> >
>> >
>> > On Mon, 4 Apr 2022, Paul B Mahol wrote:
>> >
>> > > On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <cus@passwd.hu> wrote:
>> > >
>> > >>
>> > >>
>> > >> On Sat, 26 Mar 2022, Wang Cao wrote:
>> > >>
>> > >>> The change in the commit will add some samples to the end of the
>> audio
>> > >>> stream. The intention is to add a "zero_delay" option eventually to
>> not
>> > >>> have the delay in the begining the output from alimiter due to
>> > >>> lookahead.
>> > >>
>> > >> I was very much suprised to see that the alimiter filter actually
>> delays
>> > >> the audio - as in extra samples are inserted in the beginning and
>> some
>> > >> samples are cut in the end. This trashes A-V sync, so it is a bug
>> IMHO.
>> > >>
>> > >> So unless somebody has some valid usecase for the legacy way of
>> > operation
>> > >> I'd just simply change it to be "zero delay" without any additional
>> user
>> > >> option, in a single patch.
>> > >>
>> > >
>> > >
>> > > This is done by this patch in very complicated way and also it really
>> > > should be optional.
>> >
>> > But why does it make sense to keep the current (IMHO buggy) operational
>> > mode which adds silence in the beginning and trims the end? I understand
>> > that the original implementation worked like this, but libavfilter has
>> > packet timestamps and N:M filtering so there is absolutely no reason to
>> > use an 1:1 implementation and live with its limitations.
>> >
>> Hello Paul and Marton, thank you so much for taking time to review my
>> patch.
>> I totally understand that my patch may seem a little bit complicated but I
>> can
>> show with a FATE test that if we set the alimiter to behave as a
>> passthrough filter,
>> the output frames will be the same from "framecrc" with my patch. The
>> existing
>> behavior will not work for all gapless audio processing.
>>
>> The complete patch to fix this issue is at
>>
>> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220330210314.2055201-1-wangcao@google.com/
>>
>> Regarding Paul's concern, I personally don't have any preference whether
>> to
>> put
>> the patch as an extra option or not. With respect to the implementation,
>> the patch
>> is the best I can think of by preserving as much information as possible
>> from input
>> frames. I also understand it may break concept that "filter_frame" outputs
>> one frame
>> at a time. For alimiter with my patch, depending on the size of the
>> lookahead buffer,
>> it may take a few frames before one output frame can be generated. This is
>> inevitable
>> to compensate for the delay of the lookahead buffer.
>>
>> Thanks again for reviewing my patch and I'm looking forward to hearing
>> from
>> you :)
>>
>
> Better than (because its no more 1 frame X nb_samples in, 1 frame X
> nb_samples out) to replace .filter_frame/.request_frame with .activate
> logic.
>
> And make this output delay compensation filtering optional.
>
> In this process make sure that output PTS frame timestamps are unchanged
> from input one, by keeping reference of needed frames in filter queue.
>
> Look how speechnorm/dynaudnorm does it.
>


Alternatively, use current logic in ladspa filter, (also add option with
same name).

This would need less code, and probably better approach, as there is no
extra latency introduced.

Than mapping 1:1 between same number of samples per frame is not hold any
more, but I think that is not much important any more.



>
>
>> --
>>
>> Wang Cao |  Software Engineer |  wangcao@google.com |  650-203-7807
>> <(650)%20203-7807>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>>
>
Wang Cao April 7, 2022, 9:56 p.m. UTC | #7
On Thu, Apr 7, 2022 at 12:44 AM Paul B Mahol <onemda@gmail.com> wrote:

> On Wed, Apr 6, 2022 at 1:49 PM Paul B Mahol <onemda@gmail.com> wrote:
>
> >
> >
> > On Tue, Apr 5, 2022 at 8:57 PM Wang Cao <
> wangcao-at-google.com@ffmpeg.org>
> > wrote:
> >
> >> On Mon, Apr 4, 2022 at 3:28 PM Marton Balint <cus@passwd.hu> wrote:
> >>
> >> >
> >> >
> >> > On Mon, 4 Apr 2022, Paul B Mahol wrote:
> >> >
> >> > > On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <cus@passwd.hu>
> wrote:
> >> > >
> >> > >>
> >> > >>
> >> > >> On Sat, 26 Mar 2022, Wang Cao wrote:
> >> > >>
> >> > >>> The change in the commit will add some samples to the end of the
> >> audio
> >> > >>> stream. The intention is to add a "zero_delay" option eventually
> to
> >> not
> >> > >>> have the delay in the begining the output from alimiter due to
> >> > >>> lookahead.
> >> > >>
> >> > >> I was very much suprised to see that the alimiter filter actually
> >> delays
> >> > >> the audio - as in extra samples are inserted in the beginning and
> >> some
> >> > >> samples are cut in the end. This trashes A-V sync, so it is a bug
> >> IMHO.
> >> > >>
> >> > >> So unless somebody has some valid usecase for the legacy way of
> >> > operation
> >> > >> I'd just simply change it to be "zero delay" without any additional
> >> user
> >> > >> option, in a single patch.
> >> > >>
> >> > >
> >> > >
> >> > > This is done by this patch in very complicated way and also it
> really
> >> > > should be optional.
> >> >
> >> > But why does it make sense to keep the current (IMHO buggy)
> operational
> >> > mode which adds silence in the beginning and trims the end? I
> understand
> >> > that the original implementation worked like this, but libavfilter has
> >> > packet timestamps and N:M filtering so there is absolutely no reason
> to
> >> > use an 1:1 implementation and live with its limitations.
> >> >
> >> Hello Paul and Marton, thank you so much for taking time to review my
> >> patch.
> >> I totally understand that my patch may seem a little bit complicated
> but I
> >> can
> >> show with a FATE test that if we set the alimiter to behave as a
> >> passthrough filter,
> >> the output frames will be the same from "framecrc" with my patch. The
> >> existing
> >> behavior will not work for all gapless audio processing.
> >>
> >> The complete patch to fix this issue is at
> >>
> >>
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220330210314.2055201-1-wangcao@google.com/
> >>
> >> Regarding Paul's concern, I personally don't have any preference whether
> >> to
> >> put
> >> the patch as an extra option or not. With respect to the implementation,
> >> the patch
> >> is the best I can think of by preserving as much information as possible
> >> from input
> >> frames. I also understand it may break concept that "filter_frame"
> outputs
> >> one frame
> >> at a time. For alimiter with my patch, depending on the size of the
> >> lookahead buffer,
> >> it may take a few frames before one output frame can be generated. This
> is
> >> inevitable
> >> to compensate for the delay of the lookahead buffer.
> >>
> >> Thanks again for reviewing my patch and I'm looking forward to hearing
> >> from
> >> you :)
> >>
> >
> > Better than (because its no more 1 frame X nb_samples in, 1 frame X
> > nb_samples out) to replace .filter_frame/.request_frame with .activate
> > logic.
> >
> > And make this output delay compensation filtering optional.
> >
> > In this process make sure that output PTS frame timestamps are unchanged
> > from input one, by keeping reference of needed frames in filter queue.
> >
> > Look how speechnorm/dynaudnorm does it.
> >
>
>
> Alternatively, use current logic in ladspa filter, (also add option with
> same name).
>
> This would need less code, and probably better approach, as there is no
> extra latency introduced.
>
> Than mapping 1:1 between same number of samples per frame is not hold any
> more, but I think that is not much important any more.
>
Thank you for replying to me with your valuable feedback! I have checked
af_ladspa
and the "request_frame" function in af_ladspa looks similar to what I'm
doing. The
difference comes from the fact that I need an internal frame buffer to keep
track of
output frames. Essentially I add a frame to the internal buffer when an
input frame
comes in. The frames in this buffer will be the future output frames. We
start writing
these output frame data buffers only when the internal lookahead buffer has
been filled.
When there is no more input frames, we flushing all the remaining data in
the internal
frame buffers and lookahead buffers. Can you advise on my approach here?
Thank you!

I can put my current implementation of "filter_frame" and "request_frame"
into the "activate" approach as you suggested with speechnorm/dynaudnorm.
Paul B Mahol April 8, 2022, 6:42 p.m. UTC | #8
On Thu, Apr 7, 2022 at 11:56 PM Wang Cao <wangcao-at-google.com@ffmpeg.org>
wrote:

> On Thu, Apr 7, 2022 at 12:44 AM Paul B Mahol <onemda@gmail.com> wrote:
>
> > On Wed, Apr 6, 2022 at 1:49 PM Paul B Mahol <onemda@gmail.com> wrote:
> >
> > >
> > >
> > > On Tue, Apr 5, 2022 at 8:57 PM Wang Cao <
> > wangcao-at-google.com@ffmpeg.org>
> > > wrote:
> > >
> > >> On Mon, Apr 4, 2022 at 3:28 PM Marton Balint <cus@passwd.hu> wrote:
> > >>
> > >> >
> > >> >
> > >> > On Mon, 4 Apr 2022, Paul B Mahol wrote:
> > >> >
> > >> > > On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <cus@passwd.hu>
> > wrote:
> > >> > >
> > >> > >>
> > >> > >>
> > >> > >> On Sat, 26 Mar 2022, Wang Cao wrote:
> > >> > >>
> > >> > >>> The change in the commit will add some samples to the end of the
> > >> audio
> > >> > >>> stream. The intention is to add a "zero_delay" option eventually
> > to
> > >> not
> > >> > >>> have the delay in the begining the output from alimiter due to
> > >> > >>> lookahead.
> > >> > >>
> > >> > >> I was very much suprised to see that the alimiter filter actually
> > >> delays
> > >> > >> the audio - as in extra samples are inserted in the beginning and
> > >> some
> > >> > >> samples are cut in the end. This trashes A-V sync, so it is a bug
> > >> IMHO.
> > >> > >>
> > >> > >> So unless somebody has some valid usecase for the legacy way of
> > >> > operation
> > >> > >> I'd just simply change it to be "zero delay" without any
> additional
> > >> user
> > >> > >> option, in a single patch.
> > >> > >>
> > >> > >
> > >> > >
> > >> > > This is done by this patch in very complicated way and also it
> > really
> > >> > > should be optional.
> > >> >
> > >> > But why does it make sense to keep the current (IMHO buggy)
> > operational
> > >> > mode which adds silence in the beginning and trims the end? I
> > understand
> > >> > that the original implementation worked like this, but libavfilter
> has
> > >> > packet timestamps and N:M filtering so there is absolutely no reason
> > to
> > >> > use an 1:1 implementation and live with its limitations.
> > >> >
> > >> Hello Paul and Marton, thank you so much for taking time to review my
> > >> patch.
> > >> I totally understand that my patch may seem a little bit complicated
> > but I
> > >> can
> > >> show with a FATE test that if we set the alimiter to behave as a
> > >> passthrough filter,
> > >> the output frames will be the same from "framecrc" with my patch. The
> > >> existing
> > >> behavior will not work for all gapless audio processing.
> > >>
> > >> The complete patch to fix this issue is at
> > >>
> > >>
> >
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220330210314.2055201-1-wangcao@google.com/
> > >>
> > >> Regarding Paul's concern, I personally don't have any preference
> whether
> > >> to
> > >> put
> > >> the patch as an extra option or not. With respect to the
> implementation,
> > >> the patch
> > >> is the best I can think of by preserving as much information as
> possible
> > >> from input
> > >> frames. I also understand it may break concept that "filter_frame"
> > outputs
> > >> one frame
> > >> at a time. For alimiter with my patch, depending on the size of the
> > >> lookahead buffer,
> > >> it may take a few frames before one output frame can be generated.
> This
> > is
> > >> inevitable
> > >> to compensate for the delay of the lookahead buffer.
> > >>
> > >> Thanks again for reviewing my patch and I'm looking forward to hearing
> > >> from
> > >> you :)
> > >>
> > >
> > > Better than (because its no more 1 frame X nb_samples in, 1 frame X
> > > nb_samples out) to replace .filter_frame/.request_frame with .activate
> > > logic.
> > >
> > > And make this output delay compensation filtering optional.
> > >
> > > In this process make sure that output PTS frame timestamps are
> unchanged
> > > from input one, by keeping reference of needed frames in filter queue.
> > >
> > > Look how speechnorm/dynaudnorm does it.
> > >
> >
> >
> > Alternatively, use current logic in ladspa filter, (also add option with
> > same name).
> >
> > This would need less code, and probably better approach, as there is no
> > extra latency introduced.
> >
> > Than mapping 1:1 between same number of samples per frame is not hold any
> > more, but I think that is not much important any more.
> >
> Thank you for replying to me with your valuable feedback! I have checked
> af_ladspa
> and the "request_frame" function in af_ladspa looks similar to what I'm
> doing. The
> difference comes from the fact that I need an internal frame buffer to keep
> track of
> output frames. Essentially I add a frame to the internal buffer when an
> input frame
> comes in. The frames in this buffer will be the future output frames. We
> start writing
> these output frame data buffers only when the internal lookahead buffer has
> been filled.
> When there is no more input frames, we flushing all the remaining data in
> the internal
> frame buffers and lookahead buffers. Can you advise on my approach here?
> Thank you!
>
> I can put my current implementation of "filter_frame" and "request_frame"
> into the "activate" approach as you suggested with speechnorm/dynaudnorm.
>

No need to wait for all buffers to fill up, only lookahead buffer.

Just trim initial samples that is size of lookahead, and than start
outputing samples
just once you get whatever modulo of current frame number of samples.

So there should not be need for extra buffers to keep audio samples.
Just buffers to keep input pts and number of samples of previous input
frames, like in ladspa filter.


>
> --
>
> Wang Cao |  Software Engineer |  wangcao@google.com |  650-203-7807
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Wang Cao April 8, 2022, 8:41 p.m. UTC | #9
On Fri, Apr 8, 2022 at 11:40 AM Paul B Mahol <onemda@gmail.com> wrote:

> On Thu, Apr 7, 2022 at 11:56 PM Wang Cao <wangcao-at-google.com@ffmpeg.org
> >
> wrote:
>
> > On Thu, Apr 7, 2022 at 12:44 AM Paul B Mahol <onemda@gmail.com> wrote:
> >
> > > On Wed, Apr 6, 2022 at 1:49 PM Paul B Mahol <onemda@gmail.com> wrote:
> > >
> > > >
> > > >
> > > > On Tue, Apr 5, 2022 at 8:57 PM Wang Cao <
> > > wangcao-at-google.com@ffmpeg.org>
> > > > wrote:
> > > >
> > > >> On Mon, Apr 4, 2022 at 3:28 PM Marton Balint <cus@passwd.hu> wrote:
> > > >>
> > > >> >
> > > >> >
> > > >> > On Mon, 4 Apr 2022, Paul B Mahol wrote:
> > > >> >
> > > >> > > On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <cus@passwd.hu>
> > > wrote:
> > > >> > >
> > > >> > >>
> > > >> > >>
> > > >> > >> On Sat, 26 Mar 2022, Wang Cao wrote:
> > > >> > >>
> > > >> > >>> The change in the commit will add some samples to the end of
> the
> > > >> audio
> > > >> > >>> stream. The intention is to add a "zero_delay" option
> eventually
> > > to
> > > >> not
> > > >> > >>> have the delay in the begining the output from alimiter due to
> > > >> > >>> lookahead.
> > > >> > >>
> > > >> > >> I was very much suprised to see that the alimiter filter
> actually
> > > >> delays
> > > >> > >> the audio - as in extra samples are inserted in the beginning
> and
> > > >> some
> > > >> > >> samples are cut in the end. This trashes A-V sync, so it is a
> bug
> > > >> IMHO.
> > > >> > >>
> > > >> > >> So unless somebody has some valid usecase for the legacy way of
> > > >> > operation
> > > >> > >> I'd just simply change it to be "zero delay" without any
> > additional
> > > >> user
> > > >> > >> option, in a single patch.
> > > >> > >>
> > > >> > >
> > > >> > >
> > > >> > > This is done by this patch in very complicated way and also it
> > > really
> > > >> > > should be optional.
> > > >> >
> > > >> > But why does it make sense to keep the current (IMHO buggy)
> > > operational
> > > >> > mode which adds silence in the beginning and trims the end? I
> > > understand
> > > >> > that the original implementation worked like this, but libavfilter
> > has
> > > >> > packet timestamps and N:M filtering so there is absolutely no
> reason
> > > to
> > > >> > use an 1:1 implementation and live with its limitations.
> > > >> >
> > > >> Hello Paul and Marton, thank you so much for taking time to review
> my
> > > >> patch.
> > > >> I totally understand that my patch may seem a little bit complicated
> > > but I
> > > >> can
> > > >> show with a FATE test that if we set the alimiter to behave as a
> > > >> passthrough filter,
> > > >> the output frames will be the same from "framecrc" with my patch.
> The
> > > >> existing
> > > >> behavior will not work for all gapless audio processing.
> > > >>
> > > >> The complete patch to fix this issue is at
> > > >>
> > > >>
> > >
> >
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220330210314.2055201-1-wangcao@google.com/
> > > >>
> > > >> Regarding Paul's concern, I personally don't have any preference
> > whether
> > > >> to
> > > >> put
> > > >> the patch as an extra option or not. With respect to the
> > implementation,
> > > >> the patch
> > > >> is the best I can think of by preserving as much information as
> > possible
> > > >> from input
> > > >> frames. I also understand it may break concept that "filter_frame"
> > > outputs
> > > >> one frame
> > > >> at a time. For alimiter with my patch, depending on the size of the
> > > >> lookahead buffer,
> > > >> it may take a few frames before one output frame can be generated.
> > This
> > > is
> > > >> inevitable
> > > >> to compensate for the delay of the lookahead buffer.
> > > >>
> > > >> Thanks again for reviewing my patch and I'm looking forward to
> hearing
> > > >> from
> > > >> you :)
> > > >>
> > > >
> > > > Better than (because its no more 1 frame X nb_samples in, 1 frame X
> > > > nb_samples out) to replace .filter_frame/.request_frame with
> .activate
> > > > logic.
> > > >
> > > > And make this output delay compensation filtering optional.
> > > >
> > > > In this process make sure that output PTS frame timestamps are
> > unchanged
> > > > from input one, by keeping reference of needed frames in filter
> queue.
> > > >
> > > > Look how speechnorm/dynaudnorm does it.
> > > >
> > >
> > >
> > > Alternatively, use current logic in ladspa filter, (also add option
> with
> > > same name).
> > >
> > > This would need less code, and probably better approach, as there is no
> > > extra latency introduced.
> > >
> > > Than mapping 1:1 between same number of samples per frame is not hold
> any
> > > more, but I think that is not much important any more.
> > >
> > Thank you for replying to me with your valuable feedback! I have checked
> > af_ladspa
> > and the "request_frame" function in af_ladspa looks similar to what I'm
> > doing. The
> > difference comes from the fact that I need an internal frame buffer to
> keep
> > track of
> > output frames. Essentially I add a frame to the internal buffer when an
> > input frame
> > comes in. The frames in this buffer will be the future output frames. We
> > start writing
> > these output frame data buffers only when the internal lookahead buffer
> has
> > been filled.
> > When there is no more input frames, we flushing all the remaining data in
> > the internal
> > frame buffers and lookahead buffers. Can you advise on my approach here?
> > Thank you!
> >
> > I can put my current implementation of "filter_frame" and "request_frame"
> > into the "activate" approach as you suggested with speechnorm/dynaudnorm.
> >
>
> No need to wait for all buffers to fill up, only lookahead buffer.
>
> Just trim initial samples that is size of lookahead, and than start
> outputing samples
> just once you get whatever modulo of current frame number of samples.
>
> So there should not be need for extra buffers to keep audio samples.
> Just buffers to keep input pts and number of samples of previous input
> frames, like in ladspa filter.
>
Thank you for the suggestion! From my understanding, we have two ways to
achieve
"zero_delay" functionality here.

Option 1: as you mentioned, we can trim the initial samples of lookahead
size.
The size of the lookahead buffer can be multiple frames. For example when
the
attack is 0.08 second, sample rate is 44100 and frame size is 1024, the
lookahead
buffer size is about 3 frames so the filter needs to see at least 3 input
audio frames
before it can output one audio frame. We also need to make assumptions
about the
size of the audio frame (meaning the number of audio samples per frame)
when flushing.
The frame is probably 1024 conventionally but I think it's better to make
less assumption
as possible to allow the filter to be used as flexible as possible.

Option 2: this is what I proposed before. We basically map the same number
of input
frames to the output and we also make sure everything about the frame the
same as
the input except for the audio signal data itself, which will be changed by
whatever
processing alimiter has to do with that. I think it is safer to make the
filter only work on
the signal itself. It can help other people who use this filter without
worrying about
any unexpected change on the frame. The downside is that the filter
internally needs to
store some empty frames, which will be written as the lookahead buffer is
filled.

I don't see any performance difference between these two options but option
2 looks
better to me because it works solely on the signals without any changes on
the frame
metadata.
Paul B Mahol April 9, 2022, 12:37 p.m. UTC | #10
On Fri, Apr 8, 2022 at 10:41 PM Wang Cao <wangcao-at-google.com@ffmpeg.org>
wrote:

> On Fri, Apr 8, 2022 at 11:40 AM Paul B Mahol <onemda@gmail.com> wrote:
>
> > On Thu, Apr 7, 2022 at 11:56 PM Wang Cao <
> wangcao-at-google.com@ffmpeg.org
> > >
> > wrote:
> >
> > > On Thu, Apr 7, 2022 at 12:44 AM Paul B Mahol <onemda@gmail.com> wrote:
> > >
> > > > On Wed, Apr 6, 2022 at 1:49 PM Paul B Mahol <onemda@gmail.com>
> wrote:
> > > >
> > > > >
> > > > >
> > > > > On Tue, Apr 5, 2022 at 8:57 PM Wang Cao <
> > > > wangcao-at-google.com@ffmpeg.org>
> > > > > wrote:
> > > > >
> > > > >> On Mon, Apr 4, 2022 at 3:28 PM Marton Balint <cus@passwd.hu>
> wrote:
> > > > >>
> > > > >> >
> > > > >> >
> > > > >> > On Mon, 4 Apr 2022, Paul B Mahol wrote:
> > > > >> >
> > > > >> > > On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <cus@passwd.hu
> >
> > > > wrote:
> > > > >> > >
> > > > >> > >>
> > > > >> > >>
> > > > >> > >> On Sat, 26 Mar 2022, Wang Cao wrote:
> > > > >> > >>
> > > > >> > >>> The change in the commit will add some samples to the end of
> > the
> > > > >> audio
> > > > >> > >>> stream. The intention is to add a "zero_delay" option
> > eventually
> > > > to
> > > > >> not
> > > > >> > >>> have the delay in the begining the output from alimiter due
> to
> > > > >> > >>> lookahead.
> > > > >> > >>
> > > > >> > >> I was very much suprised to see that the alimiter filter
> > actually
> > > > >> delays
> > > > >> > >> the audio - as in extra samples are inserted in the beginning
> > and
> > > > >> some
> > > > >> > >> samples are cut in the end. This trashes A-V sync, so it is a
> > bug
> > > > >> IMHO.
> > > > >> > >>
> > > > >> > >> So unless somebody has some valid usecase for the legacy way
> of
> > > > >> > operation
> > > > >> > >> I'd just simply change it to be "zero delay" without any
> > > additional
> > > > >> user
> > > > >> > >> option, in a single patch.
> > > > >> > >>
> > > > >> > >
> > > > >> > >
> > > > >> > > This is done by this patch in very complicated way and also it
> > > > really
> > > > >> > > should be optional.
> > > > >> >
> > > > >> > But why does it make sense to keep the current (IMHO buggy)
> > > > operational
> > > > >> > mode which adds silence in the beginning and trims the end? I
> > > > understand
> > > > >> > that the original implementation worked like this, but
> libavfilter
> > > has
> > > > >> > packet timestamps and N:M filtering so there is absolutely no
> > reason
> > > > to
> > > > >> > use an 1:1 implementation and live with its limitations.
> > > > >> >
> > > > >> Hello Paul and Marton, thank you so much for taking time to review
> > my
> > > > >> patch.
> > > > >> I totally understand that my patch may seem a little bit
> complicated
> > > > but I
> > > > >> can
> > > > >> show with a FATE test that if we set the alimiter to behave as a
> > > > >> passthrough filter,
> > > > >> the output frames will be the same from "framecrc" with my patch.
> > The
> > > > >> existing
> > > > >> behavior will not work for all gapless audio processing.
> > > > >>
> > > > >> The complete patch to fix this issue is at
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220330210314.2055201-1-wangcao@google.com/
> > > > >>
> > > > >> Regarding Paul's concern, I personally don't have any preference
> > > whether
> > > > >> to
> > > > >> put
> > > > >> the patch as an extra option or not. With respect to the
> > > implementation,
> > > > >> the patch
> > > > >> is the best I can think of by preserving as much information as
> > > possible
> > > > >> from input
> > > > >> frames. I also understand it may break concept that "filter_frame"
> > > > outputs
> > > > >> one frame
> > > > >> at a time. For alimiter with my patch, depending on the size of
> the
> > > > >> lookahead buffer,
> > > > >> it may take a few frames before one output frame can be generated.
> > > This
> > > > is
> > > > >> inevitable
> > > > >> to compensate for the delay of the lookahead buffer.
> > > > >>
> > > > >> Thanks again for reviewing my patch and I'm looking forward to
> > hearing
> > > > >> from
> > > > >> you :)
> > > > >>
> > > > >
> > > > > Better than (because its no more 1 frame X nb_samples in, 1 frame X
> > > > > nb_samples out) to replace .filter_frame/.request_frame with
> > .activate
> > > > > logic.
> > > > >
> > > > > And make this output delay compensation filtering optional.
> > > > >
> > > > > In this process make sure that output PTS frame timestamps are
> > > unchanged
> > > > > from input one, by keeping reference of needed frames in filter
> > queue.
> > > > >
> > > > > Look how speechnorm/dynaudnorm does it.
> > > > >
> > > >
> > > >
> > > > Alternatively, use current logic in ladspa filter, (also add option
> > with
> > > > same name).
> > > >
> > > > This would need less code, and probably better approach, as there is
> no
> > > > extra latency introduced.
> > > >
> > > > Than mapping 1:1 between same number of samples per frame is not hold
> > any
> > > > more, but I think that is not much important any more.
> > > >
> > > Thank you for replying to me with your valuable feedback! I have
> checked
> > > af_ladspa
> > > and the "request_frame" function in af_ladspa looks similar to what I'm
> > > doing. The
> > > difference comes from the fact that I need an internal frame buffer to
> > keep
> > > track of
> > > output frames. Essentially I add a frame to the internal buffer when an
> > > input frame
> > > comes in. The frames in this buffer will be the future output frames.
> We
> > > start writing
> > > these output frame data buffers only when the internal lookahead buffer
> > has
> > > been filled.
> > > When there is no more input frames, we flushing all the remaining data
> in
> > > the internal
> > > frame buffers and lookahead buffers. Can you advise on my approach
> here?
> > > Thank you!
> > >
> > > I can put my current implementation of "filter_frame" and
> "request_frame"
> > > into the "activate" approach as you suggested with
> speechnorm/dynaudnorm.
> > >
> >
> > No need to wait for all buffers to fill up, only lookahead buffer.
> >
> > Just trim initial samples that is size of lookahead, and than start
> > outputing samples
> > just once you get whatever modulo of current frame number of samples.
> >
> > So there should not be need for extra buffers to keep audio samples.
> > Just buffers to keep input pts and number of samples of previous input
> > frames, like in ladspa filter.
> >
> Thank you for the suggestion! From my understanding, we have two ways to
> achieve
> "zero_delay" functionality here.
>
> Option 1: as you mentioned, we can trim the initial samples of lookahead
> size.
> The size of the lookahead buffer can be multiple frames. For example when
> the
> attack is 0.08 second, sample rate is 44100 and frame size is 1024, the
> lookahead
> buffer size is about 3 frames so the filter needs to see at least 3 input
> audio frames
> before it can output one audio frame. We also need to make assumptions
> about the
> size of the audio frame (meaning the number of audio samples per frame)
> when flushing.
> The frame is probably 1024 conventionally but I think it's better to make
> less assumption
> as possible to allow the filter to be used as flexible as possible.
>
> Option 2: this is what I proposed before. We basically map the same number
> of input
> frames to the output and we also make sure everything about the frame the
> same as
> the input except for the audio signal data itself, which will be changed by
> whatever
> processing alimiter has to do with that. I think it is safer to make the
> filter only work on
> the signal itself. It can help other people who use this filter without
> worrying about
> any unexpected change on the frame. The downside is that the filter
> internally needs to
> store some empty frames, which will be written as the lookahead buffer is
> filled.
>
> I don't see any performance difference between these two options but option
> 2 looks
> better to me because it works solely on the signals without any changes on
> the frame
>

option 1 does not add extra delay in processing chain at all, and option 2
adds extra delay.

Just look at latest version of af_ladspa.c filter code.



> metadata.
> --
>
> Wang Cao |  Software Engineer |  wangcao@google.com |  650-203-7807
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Wang Cao April 11, 2022, 8:59 p.m. UTC | #11
On Sat, Apr 9, 2022 at 5:35 AM Paul B Mahol <onemda@gmail.com> wrote:

> On Fri, Apr 8, 2022 at 10:41 PM Wang Cao <wangcao-at-google.com@ffmpeg.org
> >
> wrote:
>
> > On Fri, Apr 8, 2022 at 11:40 AM Paul B Mahol <onemda@gmail.com> wrote:
> >
> > > On Thu, Apr 7, 2022 at 11:56 PM Wang Cao <
> > wangcao-at-google.com@ffmpeg.org
> > > >
> > > wrote:
> > >
> > > > On Thu, Apr 7, 2022 at 12:44 AM Paul B Mahol <onemda@gmail.com>
> wrote:
> > > >
> > > > > On Wed, Apr 6, 2022 at 1:49 PM Paul B Mahol <onemda@gmail.com>
> > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 5, 2022 at 8:57 PM Wang Cao <
> > > > > wangcao-at-google.com@ffmpeg.org>
> > > > > > wrote:
> > > > > >
> > > > > >> On Mon, Apr 4, 2022 at 3:28 PM Marton Balint <cus@passwd.hu>
> > wrote:
> > > > > >>
> > > > > >> >
> > > > > >> >
> > > > > >> > On Mon, 4 Apr 2022, Paul B Mahol wrote:
> > > > > >> >
> > > > > >> > > On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <
> cus@passwd.hu
> > >
> > > > > wrote:
> > > > > >> > >
> > > > > >> > >>
> > > > > >> > >>
> > > > > >> > >> On Sat, 26 Mar 2022, Wang Cao wrote:
> > > > > >> > >>
> > > > > >> > >>> The change in the commit will add some samples to the end
> of
> > > the
> > > > > >> audio
> > > > > >> > >>> stream. The intention is to add a "zero_delay" option
> > > eventually
> > > > > to
> > > > > >> not
> > > > > >> > >>> have the delay in the begining the output from alimiter
> due
> > to
> > > > > >> > >>> lookahead.
> > > > > >> > >>
> > > > > >> > >> I was very much suprised to see that the alimiter filter
> > > actually
> > > > > >> delays
> > > > > >> > >> the audio - as in extra samples are inserted in the
> beginning
> > > and
> > > > > >> some
> > > > > >> > >> samples are cut in the end. This trashes A-V sync, so it
> is a
> > > bug
> > > > > >> IMHO.
> > > > > >> > >>
> > > > > >> > >> So unless somebody has some valid usecase for the legacy
> way
> > of
> > > > > >> > operation
> > > > > >> > >> I'd just simply change it to be "zero delay" without any
> > > > additional
> > > > > >> user
> > > > > >> > >> option, in a single patch.
> > > > > >> > >>
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > This is done by this patch in very complicated way and also
> it
> > > > > really
> > > > > >> > > should be optional.
> > > > > >> >
> > > > > >> > But why does it make sense to keep the current (IMHO buggy)
> > > > > operational
> > > > > >> > mode which adds silence in the beginning and trims the end? I
> > > > > understand
> > > > > >> > that the original implementation worked like this, but
> > libavfilter
> > > > has
> > > > > >> > packet timestamps and N:M filtering so there is absolutely no
> > > reason
> > > > > to
> > > > > >> > use an 1:1 implementation and live with its limitations.
> > > > > >> >
> > > > > >> Hello Paul and Marton, thank you so much for taking time to
> review
> > > my
> > > > > >> patch.
> > > > > >> I totally understand that my patch may seem a little bit
> > complicated
> > > > > but I
> > > > > >> can
> > > > > >> show with a FATE test that if we set the alimiter to behave as a
> > > > > >> passthrough filter,
> > > > > >> the output frames will be the same from "framecrc" with my
> patch.
> > > The
> > > > > >> existing
> > > > > >> behavior will not work for all gapless audio processing.
> > > > > >>
> > > > > >> The complete patch to fix this issue is at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220330210314.2055201-1-wangcao@google.com/
> > > > > >>
> > > > > >> Regarding Paul's concern, I personally don't have any preference
> > > > whether
> > > > > >> to
> > > > > >> put
> > > > > >> the patch as an extra option or not. With respect to the
> > > > implementation,
> > > > > >> the patch
> > > > > >> is the best I can think of by preserving as much information as
> > > > possible
> > > > > >> from input
> > > > > >> frames. I also understand it may break concept that
> "filter_frame"
> > > > > outputs
> > > > > >> one frame
> > > > > >> at a time. For alimiter with my patch, depending on the size of
> > the
> > > > > >> lookahead buffer,
> > > > > >> it may take a few frames before one output frame can be
> generated.
> > > > This
> > > > > is
> > > > > >> inevitable
> > > > > >> to compensate for the delay of the lookahead buffer.
> > > > > >>
> > > > > >> Thanks again for reviewing my patch and I'm looking forward to
> > > hearing
> > > > > >> from
> > > > > >> you :)
> > > > > >>
> > > > > >
> > > > > > Better than (because its no more 1 frame X nb_samples in, 1
> frame X
> > > > > > nb_samples out) to replace .filter_frame/.request_frame with
> > > .activate
> > > > > > logic.
> > > > > >
> > > > > > And make this output delay compensation filtering optional.
> > > > > >
> > > > > > In this process make sure that output PTS frame timestamps are
> > > > unchanged
> > > > > > from input one, by keeping reference of needed frames in filter
> > > queue.
> > > > > >
> > > > > > Look how speechnorm/dynaudnorm does it.
> > > > > >
> > > > >
> > > > >
> > > > > Alternatively, use current logic in ladspa filter, (also add option
> > > with
> > > > > same name).
> > > > >
> > > > > This would need less code, and probably better approach, as there
> is
> > no
> > > > > extra latency introduced.
> > > > >
> > > > > Than mapping 1:1 between same number of samples per frame is not
> hold
> > > any
> > > > > more, but I think that is not much important any more.
> > > > >
> > > > Thank you for replying to me with your valuable feedback! I have
> > checked
> > > > af_ladspa
> > > > and the "request_frame" function in af_ladspa looks similar to what
> I'm
> > > > doing. The
> > > > difference comes from the fact that I need an internal frame buffer
> to
> > > keep
> > > > track of
> > > > output frames. Essentially I add a frame to the internal buffer when
> an
> > > > input frame
> > > > comes in. The frames in this buffer will be the future output frames.
> > We
> > > > start writing
> > > > these output frame data buffers only when the internal lookahead
> buffer
> > > has
> > > > been filled.
> > > > When there is no more input frames, we flushing all the remaining
> data
> > in
> > > > the internal
> > > > frame buffers and lookahead buffers. Can you advise on my approach
> > here?
> > > > Thank you!
> > > >
> > > > I can put my current implementation of "filter_frame" and
> > "request_frame"
> > > > into the "activate" approach as you suggested with
> > speechnorm/dynaudnorm.
> > > >
> > >
> > > No need to wait for all buffers to fill up, only lookahead buffer.
> > >
> > > Just trim initial samples that is size of lookahead, and than start
> > > outputing samples
> > > just once you get whatever modulo of current frame number of samples.
> > >
> > > So there should not be need for extra buffers to keep audio samples.
> > > Just buffers to keep input pts and number of samples of previous input
> > > frames, like in ladspa filter.
> > >
> > Thank you for the suggestion! From my understanding, we have two ways to
> > achieve
> > "zero_delay" functionality here.
> >
> > Option 1: as you mentioned, we can trim the initial samples of lookahead
> > size.
> > The size of the lookahead buffer can be multiple frames. For example when
> > the
> > attack is 0.08 second, sample rate is 44100 and frame size is 1024, the
> > lookahead
> > buffer size is about 3 frames so the filter needs to see at least 3 input
> > audio frames
> > before it can output one audio frame. We also need to make assumptions
> > about the
> > size of the audio frame (meaning the number of audio samples per frame)
> > when flushing.
> > The frame is probably 1024 conventionally but I think it's better to make
> > less assumption
> > as possible to allow the filter to be used as flexible as possible.
> >
> > Option 2: this is what I proposed before. We basically map the same
> number
> > of input
> > frames to the output and we also make sure everything about the frame the
> > same as
> > the input except for the audio signal data itself, which will be changed
> by
> > whatever
> > processing alimiter has to do with that. I think it is safer to make the
> > filter only work on
> > the signal itself. It can help other people who use this filter without
> > worrying about
> > any unexpected change on the frame. The downside is that the filter
> > internally needs to
> > store some empty frames, which will be written as the lookahead buffer is
> > filled.
> >
> > I don't see any performance difference between these two options but
> option
> > 2 looks
> > better to me because it works solely on the signals without any changes
> on
> > the frame
> >
>
> option 1 does not add extra delay in processing chain at all, and option 2
> adds extra delay.
>
> Just look at latest version of af_ladspa.c filter code.
>
Hi Paul, I have checked af_ladspa filter. Option 1 certainly doesn't
introduce any delay.
Option 2 will at most introduce one audio frame of delay in the processing
chain. Option
2 needs to fill the lookahead buffer and some frames of samples to push the
data in the
lookahead buffer out for one output audio frame. The processing delay is
probably not a
big concern.

The reason I proposed option 2 is about the sync of metadata for the output
frames:

It appears to me that the only metadata we need to worry about is PTS and
the number of
samples in af_ladspa. However, AVFrame has other fields that also seem to
specify some metadata
for the frame:
1. AVDictionary *metadata
2. void *opaque (this seems to be client-specific)
3. AVFrameSideData** side_data (this is not handled by av_frame_copy_props).
If we go for option 1, it is likely these fields in the input frame will
not be mapped to the corresponding
output frames. I believe this is also an unexpected behavior for other
users who rely on these fields. I
understand other filters may not consider this as important but
conceptually I believe it is better to make
the filter focus on changing what it is supposed to, which is the audio
signal itself.

Looking forward to hearing your opinion on this, thanks!
Paul B Mahol April 12, 2022, 7:42 p.m. UTC | #12
On Mon, Apr 11, 2022 at 10:59 PM Wang Cao <wangcao-at-google.com@ffmpeg.org>
wrote:

> On Sat, Apr 9, 2022 at 5:35 AM Paul B Mahol <onemda@gmail.com> wrote:
>
> > On Fri, Apr 8, 2022 at 10:41 PM Wang Cao <
> wangcao-at-google.com@ffmpeg.org
> > >
> > wrote:
> >
> > > On Fri, Apr 8, 2022 at 11:40 AM Paul B Mahol <onemda@gmail.com> wrote:
> > >
> > > > On Thu, Apr 7, 2022 at 11:56 PM Wang Cao <
> > > wangcao-at-google.com@ffmpeg.org
> > > > >
> > > > wrote:
> > > >
> > > > > On Thu, Apr 7, 2022 at 12:44 AM Paul B Mahol <onemda@gmail.com>
> > wrote:
> > > > >
> > > > > > On Wed, Apr 6, 2022 at 1:49 PM Paul B Mahol <onemda@gmail.com>
> > > wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 5, 2022 at 8:57 PM Wang Cao <
> > > > > > wangcao-at-google.com@ffmpeg.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> On Mon, Apr 4, 2022 at 3:28 PM Marton Balint <cus@passwd.hu>
> > > wrote:
> > > > > > >>
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Mon, 4 Apr 2022, Paul B Mahol wrote:
> > > > > > >> >
> > > > > > >> > > On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <
> > cus@passwd.hu
> > > >
> > > > > > wrote:
> > > > > > >> > >
> > > > > > >> > >>
> > > > > > >> > >>
> > > > > > >> > >> On Sat, 26 Mar 2022, Wang Cao wrote:
> > > > > > >> > >>
> > > > > > >> > >>> The change in the commit will add some samples to the
> end
> > of
> > > > the
> > > > > > >> audio
> > > > > > >> > >>> stream. The intention is to add a "zero_delay" option
> > > > eventually
> > > > > > to
> > > > > > >> not
> > > > > > >> > >>> have the delay in the begining the output from alimiter
> > due
> > > to
> > > > > > >> > >>> lookahead.
> > > > > > >> > >>
> > > > > > >> > >> I was very much suprised to see that the alimiter filter
> > > > actually
> > > > > > >> delays
> > > > > > >> > >> the audio - as in extra samples are inserted in the
> > beginning
> > > > and
> > > > > > >> some
> > > > > > >> > >> samples are cut in the end. This trashes A-V sync, so it
> > is a
> > > > bug
> > > > > > >> IMHO.
> > > > > > >> > >>
> > > > > > >> > >> So unless somebody has some valid usecase for the legacy
> > way
> > > of
> > > > > > >> > operation
> > > > > > >> > >> I'd just simply change it to be "zero delay" without any
> > > > > additional
> > > > > > >> user
> > > > > > >> > >> option, in a single patch.
> > > > > > >> > >>
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > This is done by this patch in very complicated way and
> also
> > it
> > > > > > really
> > > > > > >> > > should be optional.
> > > > > > >> >
> > > > > > >> > But why does it make sense to keep the current (IMHO buggy)
> > > > > > operational
> > > > > > >> > mode which adds silence in the beginning and trims the end?
> I
> > > > > > understand
> > > > > > >> > that the original implementation worked like this, but
> > > libavfilter
> > > > > has
> > > > > > >> > packet timestamps and N:M filtering so there is absolutely
> no
> > > > reason
> > > > > > to
> > > > > > >> > use an 1:1 implementation and live with its limitations.
> > > > > > >> >
> > > > > > >> Hello Paul and Marton, thank you so much for taking time to
> > review
> > > > my
> > > > > > >> patch.
> > > > > > >> I totally understand that my patch may seem a little bit
> > > complicated
> > > > > > but I
> > > > > > >> can
> > > > > > >> show with a FATE test that if we set the alimiter to behave
> as a
> > > > > > >> passthrough filter,
> > > > > > >> the output frames will be the same from "framecrc" with my
> > patch.
> > > > The
> > > > > > >> existing
> > > > > > >> behavior will not work for all gapless audio processing.
> > > > > > >>
> > > > > > >> The complete patch to fix this issue is at
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220330210314.2055201-1-wangcao@google.com/
> > > > > > >>
> > > > > > >> Regarding Paul's concern, I personally don't have any
> preference
> > > > > whether
> > > > > > >> to
> > > > > > >> put
> > > > > > >> the patch as an extra option or not. With respect to the
> > > > > implementation,
> > > > > > >> the patch
> > > > > > >> is the best I can think of by preserving as much information
> as
> > > > > possible
> > > > > > >> from input
> > > > > > >> frames. I also understand it may break concept that
> > "filter_frame"
> > > > > > outputs
> > > > > > >> one frame
> > > > > > >> at a time. For alimiter with my patch, depending on the size
> of
> > > the
> > > > > > >> lookahead buffer,
> > > > > > >> it may take a few frames before one output frame can be
> > generated.
> > > > > This
> > > > > > is
> > > > > > >> inevitable
> > > > > > >> to compensate for the delay of the lookahead buffer.
> > > > > > >>
> > > > > > >> Thanks again for reviewing my patch and I'm looking forward to
> > > > hearing
> > > > > > >> from
> > > > > > >> you :)
> > > > > > >>
> > > > > > >
> > > > > > > Better than (because its no more 1 frame X nb_samples in, 1
> > frame X
> > > > > > > nb_samples out) to replace .filter_frame/.request_frame with
> > > > .activate
> > > > > > > logic.
> > > > > > >
> > > > > > > And make this output delay compensation filtering optional.
> > > > > > >
> > > > > > > In this process make sure that output PTS frame timestamps are
> > > > > unchanged
> > > > > > > from input one, by keeping reference of needed frames in filter
> > > > queue.
> > > > > > >
> > > > > > > Look how speechnorm/dynaudnorm does it.
> > > > > > >
> > > > > >
> > > > > >
> > > > > > Alternatively, use current logic in ladspa filter, (also add
> option
> > > > with
> > > > > > same name).
> > > > > >
> > > > > > This would need less code, and probably better approach, as there
> > is
> > > no
> > > > > > extra latency introduced.
> > > > > >
> > > > > > Than mapping 1:1 between same number of samples per frame is not
> > hold
> > > > any
> > > > > > more, but I think that is not much important any more.
> > > > > >
> > > > > Thank you for replying to me with your valuable feedback! I have
> > > checked
> > > > > af_ladspa
> > > > > and the "request_frame" function in af_ladspa looks similar to what
> > I'm
> > > > > doing. The
> > > > > difference comes from the fact that I need an internal frame buffer
> > to
> > > > keep
> > > > > track of
> > > > > output frames. Essentially I add a frame to the internal buffer
> when
> > an
> > > > > input frame
> > > > > comes in. The frames in this buffer will be the future output
> frames.
> > > We
> > > > > start writing
> > > > > these output frame data buffers only when the internal lookahead
> > buffer
> > > > has
> > > > > been filled.
> > > > > When there is no more input frames, we flushing all the remaining
> > data
> > > in
> > > > > the internal
> > > > > frame buffers and lookahead buffers. Can you advise on my approach
> > > here?
> > > > > Thank you!
> > > > >
> > > > > I can put my current implementation of "filter_frame" and
> > > "request_frame"
> > > > > into the "activate" approach as you suggested with
> > > speechnorm/dynaudnorm.
> > > > >
> > > >
> > > > No need to wait for all buffers to fill up, only lookahead buffer.
> > > >
> > > > Just trim initial samples that is size of lookahead, and than start
> > > > outputing samples
> > > > just once you get whatever modulo of current frame number of samples.
> > > >
> > > > So there should not be need for extra buffers to keep audio samples.
> > > > Just buffers to keep input pts and number of samples of previous
> input
> > > > frames, like in ladspa filter.
> > > >
> > > Thank you for the suggestion! From my understanding, we have two ways
> to
> > > achieve
> > > "zero_delay" functionality here.
> > >
> > > Option 1: as you mentioned, we can trim the initial samples of
> lookahead
> > > size.
> > > The size of the lookahead buffer can be multiple frames. For example
> when
> > > the
> > > attack is 0.08 second, sample rate is 44100 and frame size is 1024, the
> > > lookahead
> > > buffer size is about 3 frames so the filter needs to see at least 3
> input
> > > audio frames
> > > before it can output one audio frame. We also need to make assumptions
> > > about the
> > > size of the audio frame (meaning the number of audio samples per frame)
> > > when flushing.
> > > The frame is probably 1024 conventionally but I think it's better to
> make
> > > less assumption
> > > as possible to allow the filter to be used as flexible as possible.
> > >
> > > Option 2: this is what I proposed before. We basically map the same
> > number
> > > of input
> > > frames to the output and we also make sure everything about the frame
> the
> > > same as
> > > the input except for the audio signal data itself, which will be
> changed
> > by
> > > whatever
> > > processing alimiter has to do with that. I think it is safer to make
> the
> > > filter only work on
> > > the signal itself. It can help other people who use this filter without
> > > worrying about
> > > any unexpected change on the frame. The downside is that the filter
> > > internally needs to
> > > store some empty frames, which will be written as the lookahead buffer
> is
> > > filled.
> > >
> > > I don't see any performance difference between these two options but
> > option
> > > 2 looks
> > > better to me because it works solely on the signals without any changes
> > on
> > > the frame
> > >
> >
> > option 1 does not add extra delay in processing chain at all, and option
> 2
> > adds extra delay.
> >
> > Just look at latest version of af_ladspa.c filter code.
> >
> Hi Paul, I have checked af_ladspa filter. Option 1 certainly doesn't
> introduce any delay.
> Option 2 will at most introduce one audio frame of delay in the processing
> chain. Option
> 2 needs to fill the lookahead buffer and some frames of samples to push the
> data in the
> lookahead buffer out for one output audio frame. The processing delay is
> probably not a
> big concern.
>
> The reason I proposed option 2 is about the sync of metadata for the output
> frames:
>
> It appears to me that the only metadata we need to worry about is PTS and
> the number of
> samples in af_ladspa. However, AVFrame has other fields that also seem to
> specify some metadata
> for the frame:
> 1. AVDictionary *metadata
> 2. void *opaque (this seems to be client-specific)
> 3. AVFrameSideData** side_data (this is not handled by
> av_frame_copy_props).
> If we go for option 1, it is likely these fields in the input frame will
> not be mapped to the corresponding
> output frames. I believe this is also an unexpected behavior for other
> users who rely on these fields. I
> understand other filters may not consider this as important but
> conceptually I believe it is better to make
> the filter focus on changing what it is supposed to, which is the audio
> signal itself.
>
> Looking forward to hearing your opinion on this, thanks!
>


AVFrame Metadata is mostly useful for cases when input frame after filtering
with filter that adds metadata no longer changes output frames later in
filter graph.
Thus using something like astats=metadata=1,alimiter is not very useful and
logical.

Why by, lightly insisting on no extra delay/latency introduction, and
prefer 1, option for alimiter filter
is mostly because in audio domain, latency is relevant and important
subject for most audio processing specially for limiters.
So it is very important to keep it at minimum if possible.


> --
>
> Wang Cao |  Software Engineer |  wangcao@google.com |  650-203-7807
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Wang Cao April 13, 2022, 4:11 a.m. UTC | #13
On Tue, Apr 12, 2022 at 12:40 PM Paul B Mahol <onemda@gmail.com> wrote:

> On Mon, Apr 11, 2022 at 10:59 PM Wang Cao <
> wangcao-at-google.com@ffmpeg.org>
> wrote:
>
> > On Sat, Apr 9, 2022 at 5:35 AM Paul B Mahol <onemda@gmail.com> wrote:
> >
> > > On Fri, Apr 8, 2022 at 10:41 PM Wang Cao <
> > wangcao-at-google.com@ffmpeg.org
> > > >
> > > wrote:
> > >
> > > > On Fri, Apr 8, 2022 at 11:40 AM Paul B Mahol <onemda@gmail.com>
> wrote:
> > > >
> > > > > On Thu, Apr 7, 2022 at 11:56 PM Wang Cao <
> > > > wangcao-at-google.com@ffmpeg.org
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > On Thu, Apr 7, 2022 at 12:44 AM Paul B Mahol <onemda@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > On Wed, Apr 6, 2022 at 1:49 PM Paul B Mahol <onemda@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 5, 2022 at 8:57 PM Wang Cao <
> > > > > > > wangcao-at-google.com@ffmpeg.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> On Mon, Apr 4, 2022 at 3:28 PM Marton Balint <cus@passwd.hu
> >
> > > > wrote:
> > > > > > > >>
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On Mon, 4 Apr 2022, Paul B Mahol wrote:
> > > > > > > >> >
> > > > > > > >> > > On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <
> > > cus@passwd.hu
> > > > >
> > > > > > > wrote:
> > > > > > > >> > >
> > > > > > > >> > >>
> > > > > > > >> > >>
> > > > > > > >> > >> On Sat, 26 Mar 2022, Wang Cao wrote:
> > > > > > > >> > >>
> > > > > > > >> > >>> The change in the commit will add some samples to the
> > end
> > > of
> > > > > the
> > > > > > > >> audio
> > > > > > > >> > >>> stream. The intention is to add a "zero_delay" option
> > > > > eventually
> > > > > > > to
> > > > > > > >> not
> > > > > > > >> > >>> have the delay in the begining the output from
> alimiter
> > > due
> > > > to
> > > > > > > >> > >>> lookahead.
> > > > > > > >> > >>
> > > > > > > >> > >> I was very much suprised to see that the alimiter
> filter
> > > > > actually
> > > > > > > >> delays
> > > > > > > >> > >> the audio - as in extra samples are inserted in the
> > > beginning
> > > > > and
> > > > > > > >> some
> > > > > > > >> > >> samples are cut in the end. This trashes A-V sync, so
> it
> > > is a
> > > > > bug
> > > > > > > >> IMHO.
> > > > > > > >> > >>
> > > > > > > >> > >> So unless somebody has some valid usecase for the
> legacy
> > > way
> > > > of
> > > > > > > >> > operation
> > > > > > > >> > >> I'd just simply change it to be "zero delay" without
> any
> > > > > > additional
> > > > > > > >> user
> > > > > > > >> > >> option, in a single patch.
> > > > > > > >> > >>
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > This is done by this patch in very complicated way and
> > also
> > > it
> > > > > > > really
> > > > > > > >> > > should be optional.
> > > > > > > >> >
> > > > > > > >> > But why does it make sense to keep the current (IMHO
> buggy)
> > > > > > > operational
> > > > > > > >> > mode which adds silence in the beginning and trims the
> end?
> > I
> > > > > > > understand
> > > > > > > >> > that the original implementation worked like this, but
> > > > libavfilter
> > > > > > has
> > > > > > > >> > packet timestamps and N:M filtering so there is absolutely
> > no
> > > > > reason
> > > > > > > to
> > > > > > > >> > use an 1:1 implementation and live with its limitations.
> > > > > > > >> >
> > > > > > > >> Hello Paul and Marton, thank you so much for taking time to
> > > review
> > > > > my
> > > > > > > >> patch.
> > > > > > > >> I totally understand that my patch may seem a little bit
> > > > complicated
> > > > > > > but I
> > > > > > > >> can
> > > > > > > >> show with a FATE test that if we set the alimiter to behave
> > as a
> > > > > > > >> passthrough filter,
> > > > > > > >> the output frames will be the same from "framecrc" with my
> > > patch.
> > > > > The
> > > > > > > >> existing
> > > > > > > >> behavior will not work for all gapless audio processing.
> > > > > > > >>
> > > > > > > >> The complete patch to fix this issue is at
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220330210314.2055201-1-wangcao@google.com/
> > > > > > > >>
> > > > > > > >> Regarding Paul's concern, I personally don't have any
> > preference
> > > > > > whether
> > > > > > > >> to
> > > > > > > >> put
> > > > > > > >> the patch as an extra option or not. With respect to the
> > > > > > implementation,
> > > > > > > >> the patch
> > > > > > > >> is the best I can think of by preserving as much information
> > as
> > > > > > possible
> > > > > > > >> from input
> > > > > > > >> frames. I also understand it may break concept that
> > > "filter_frame"
> > > > > > > outputs
> > > > > > > >> one frame
> > > > > > > >> at a time. For alimiter with my patch, depending on the size
> > of
> > > > the
> > > > > > > >> lookahead buffer,
> > > > > > > >> it may take a few frames before one output frame can be
> > > generated.
> > > > > > This
> > > > > > > is
> > > > > > > >> inevitable
> > > > > > > >> to compensate for the delay of the lookahead buffer.
> > > > > > > >>
> > > > > > > >> Thanks again for reviewing my patch and I'm looking forward
> to
> > > > > hearing
> > > > > > > >> from
> > > > > > > >> you :)
> > > > > > > >>
> > > > > > > >
> > > > > > > > Better than (because its no more 1 frame X nb_samples in, 1
> > > frame X
> > > > > > > > nb_samples out) to replace .filter_frame/.request_frame with
> > > > > .activate
> > > > > > > > logic.
> > > > > > > >
> > > > > > > > And make this output delay compensation filtering optional.
> > > > > > > >
> > > > > > > > In this process make sure that output PTS frame timestamps
> are
> > > > > > unchanged
> > > > > > > > from input one, by keeping reference of needed frames in
> filter
> > > > > queue.
> > > > > > > >
> > > > > > > > Look how speechnorm/dynaudnorm does it.
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Alternatively, use current logic in ladspa filter, (also add
> > option
> > > > > with
> > > > > > > same name).
> > > > > > >
> > > > > > > This would need less code, and probably better approach, as
> there
> > > is
> > > > no
> > > > > > > extra latency introduced.
> > > > > > >
> > > > > > > Than mapping 1:1 between same number of samples per frame is
> not
> > > hold
> > > > > any
> > > > > > > more, but I think that is not much important any more.
> > > > > > >
> > > > > > Thank you for replying to me with your valuable feedback! I have
> > > > checked
> > > > > > af_ladspa
> > > > > > and the "request_frame" function in af_ladspa looks similar to
> what
> > > I'm
> > > > > > doing. The
> > > > > > difference comes from the fact that I need an internal frame
> buffer
> > > to
> > > > > keep
> > > > > > track of
> > > > > > output frames. Essentially I add a frame to the internal buffer
> > when
> > > an
> > > > > > input frame
> > > > > > comes in. The frames in this buffer will be the future output
> > frames.
> > > > We
> > > > > > start writing
> > > > > > these output frame data buffers only when the internal lookahead
> > > buffer
> > > > > has
> > > > > > been filled.
> > > > > > When there is no more input frames, we flushing all the remaining
> > > data
> > > > in
> > > > > > the internal
> > > > > > frame buffers and lookahead buffers. Can you advise on my
> approach
> > > > here?
> > > > > > Thank you!
> > > > > >
> > > > > > I can put my current implementation of "filter_frame" and
> > > > "request_frame"
> > > > > > into the "activate" approach as you suggested with
> > > > speechnorm/dynaudnorm.
> > > > > >
> > > > >
> > > > > No need to wait for all buffers to fill up, only lookahead buffer.
> > > > >
> > > > > Just trim initial samples that is size of lookahead, and than start
> > > > > outputing samples
> > > > > just once you get whatever modulo of current frame number of
> samples.
> > > > >
> > > > > So there should not be need for extra buffers to keep audio
> samples.
> > > > > Just buffers to keep input pts and number of samples of previous
> > input
> > > > > frames, like in ladspa filter.
> > > > >
> > > > Thank you for the suggestion! From my understanding, we have two ways
> > to
> > > > achieve
> > > > "zero_delay" functionality here.
> > > >
> > > > Option 1: as you mentioned, we can trim the initial samples of
> > lookahead
> > > > size.
> > > > The size of the lookahead buffer can be multiple frames. For example
> > when
> > > > the
> > > > attack is 0.08 second, sample rate is 44100 and frame size is 1024,
> the
> > > > lookahead
> > > > buffer size is about 3 frames so the filter needs to see at least 3
> > input
> > > > audio frames
> > > > before it can output one audio frame. We also need to make
> assumptions
> > > > about the
> > > > size of the audio frame (meaning the number of audio samples per
> frame)
> > > > when flushing.
> > > > The frame is probably 1024 conventionally but I think it's better to
> > make
> > > > less assumption
> > > > as possible to allow the filter to be used as flexible as possible.
> > > >
> > > > Option 2: this is what I proposed before. We basically map the same
> > > number
> > > > of input
> > > > frames to the output and we also make sure everything about the frame
> > the
> > > > same as
> > > > the input except for the audio signal data itself, which will be
> > changed
> > > by
> > > > whatever
> > > > processing alimiter has to do with that. I think it is safer to make
> > the
> > > > filter only work on
> > > > the signal itself. It can help other people who use this filter
> without
> > > > worrying about
> > > > any unexpected change on the frame. The downside is that the filter
> > > > internally needs to
> > > > store some empty frames, which will be written as the lookahead
> buffer
> > is
> > > > filled.
> > > >
> > > > I don't see any performance difference between these two options but
> > > option
> > > > 2 looks
> > > > better to me because it works solely on the signals without any
> changes
> > > on
> > > > the frame
> > > >
> > >
> > > option 1 does not add extra delay in processing chain at all, and
> option
> > 2
> > > adds extra delay.
> > >
> > > Just look at latest version of af_ladspa.c filter code.
> > >
> > Hi Paul, I have checked af_ladspa filter. Option 1 certainly doesn't
> > introduce any delay.
> > Option 2 will at most introduce one audio frame of delay in the
> processing
> > chain. Option
> > 2 needs to fill the lookahead buffer and some frames of samples to push
> the
> > data in the
> > lookahead buffer out for one output audio frame. The processing delay is
> > probably not a
> > big concern.
> >
> > The reason I proposed option 2 is about the sync of metadata for the
> output
> > frames:
> >
> > It appears to me that the only metadata we need to worry about is PTS and
> > the number of
> > samples in af_ladspa. However, AVFrame has other fields that also seem to
> > specify some metadata
> > for the frame:
> > 1. AVDictionary *metadata
> > 2. void *opaque (this seems to be client-specific)
> > 3. AVFrameSideData** side_data (this is not handled by
> > av_frame_copy_props).
> > If we go for option 1, it is likely these fields in the input frame will
> > not be mapped to the corresponding
> > output frames. I believe this is also an unexpected behavior for other
> > users who rely on these fields. I
> > understand other filters may not consider this as important but
> > conceptually I believe it is better to make
> > the filter focus on changing what it is supposed to, which is the audio
> > signal itself.
> >
> > Looking forward to hearing your opinion on this, thanks!
> >
>
>
> AVFrame Metadata is mostly useful for cases when input frame after
> filtering
> with filter that adds metadata no longer changes output frames later in
> filter graph.
> Thus using something like astats=metadata=1,alimiter is not very useful and
> logical.
>
> Why by, lightly insisting on no extra delay/latency introduction, and
> prefer 1, option for alimiter filter
> is mostly because in audio domain, latency is relevant and important
> subject for most audio processing specially for limiters.
> So it is very important to keep it at minimum if possible.
>
Thank you so much for introducing me with the context. As I understand the
possible metadata can be, I think I have fully understood what I can do.
Thanks!
Wang Cao April 15, 2022, 6:51 p.m. UTC | #14
On Tue, Apr 12, 2022 at 12:40 PM Paul B Mahol <onemda@gmail.com> wrote:

> On Mon, Apr 11, 2022 at 10:59 PM Wang Cao <
> wangcao-at-google.com@ffmpeg.org>
> wrote:
>
> > On Sat, Apr 9, 2022 at 5:35 AM Paul B Mahol <onemda@gmail.com> wrote:
> >
> > > On Fri, Apr 8, 2022 at 10:41 PM Wang Cao <
> > wangcao-at-google.com@ffmpeg.org
> > > >
> > > wrote:
> > >
> > > > On Fri, Apr 8, 2022 at 11:40 AM Paul B Mahol <onemda@gmail.com>
> wrote:
> > > >
> > > > > On Thu, Apr 7, 2022 at 11:56 PM Wang Cao <
> > > > wangcao-at-google.com@ffmpeg.org
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > On Thu, Apr 7, 2022 at 12:44 AM Paul B Mahol <onemda@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > On Wed, Apr 6, 2022 at 1:49 PM Paul B Mahol <onemda@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 5, 2022 at 8:57 PM Wang Cao <
> > > > > > > wangcao-at-google.com@ffmpeg.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> On Mon, Apr 4, 2022 at 3:28 PM Marton Balint <cus@passwd.hu
> >
> > > > wrote:
> > > > > > > >>
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On Mon, 4 Apr 2022, Paul B Mahol wrote:
> > > > > > > >> >
> > > > > > > >> > > On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <
> > > cus@passwd.hu
> > > > >
> > > > > > > wrote:
> > > > > > > >> > >
> > > > > > > >> > >>
> > > > > > > >> > >>
> > > > > > > >> > >> On Sat, 26 Mar 2022, Wang Cao wrote:
> > > > > > > >> > >>
> > > > > > > >> > >>> The change in the commit will add some samples to the
> > end
> > > of
> > > > > the
> > > > > > > >> audio
> > > > > > > >> > >>> stream. The intention is to add a "zero_delay" option
> > > > > eventually
> > > > > > > to
> > > > > > > >> not
> > > > > > > >> > >>> have the delay in the begining the output from
> alimiter
> > > due
> > > > to
> > > > > > > >> > >>> lookahead.
> > > > > > > >> > >>
> > > > > > > >> > >> I was very much suprised to see that the alimiter
> filter
> > > > > actually
> > > > > > > >> delays
> > > > > > > >> > >> the audio - as in extra samples are inserted in the
> > > beginning
> > > > > and
> > > > > > > >> some
> > > > > > > >> > >> samples are cut in the end. This trashes A-V sync, so
> it
> > > is a
> > > > > bug
> > > > > > > >> IMHO.
> > > > > > > >> > >>
> > > > > > > >> > >> So unless somebody has some valid usecase for the
> legacy
> > > way
> > > > of
> > > > > > > >> > operation
> > > > > > > >> > >> I'd just simply change it to be "zero delay" without
> any
> > > > > > additional
> > > > > > > >> user
> > > > > > > >> > >> option, in a single patch.
> > > > > > > >> > >>
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > This is done by this patch in very complicated way and
> > also
> > > it
> > > > > > > really
> > > > > > > >> > > should be optional.
> > > > > > > >> >
> > > > > > > >> > But why does it make sense to keep the current (IMHO
> buggy)
> > > > > > > operational
> > > > > > > >> > mode which adds silence in the beginning and trims the
> end?
> > I
> > > > > > > understand
> > > > > > > >> > that the original implementation worked like this, but
> > > > libavfilter
> > > > > > has
> > > > > > > >> > packet timestamps and N:M filtering so there is absolutely
> > no
> > > > > reason
> > > > > > > to
> > > > > > > >> > use an 1:1 implementation and live with its limitations.
> > > > > > > >> >
> > > > > > > >> Hello Paul and Marton, thank you so much for taking time to
> > > review
> > > > > my
> > > > > > > >> patch.
> > > > > > > >> I totally understand that my patch may seem a little bit
> > > > complicated
> > > > > > > but I
> > > > > > > >> can
> > > > > > > >> show with a FATE test that if we set the alimiter to behave
> > as a
> > > > > > > >> passthrough filter,
> > > > > > > >> the output frames will be the same from "framecrc" with my
> > > patch.
> > > > > The
> > > > > > > >> existing
> > > > > > > >> behavior will not work for all gapless audio processing.
> > > > > > > >>
> > > > > > > >> The complete patch to fix this issue is at
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220330210314.2055201-1-wangcao@google.com/
> > > > > > > >>
> > > > > > > >> Regarding Paul's concern, I personally don't have any
> > preference
> > > > > > whether
> > > > > > > >> to
> > > > > > > >> put
> > > > > > > >> the patch as an extra option or not. With respect to the
> > > > > > implementation,
> > > > > > > >> the patch
> > > > > > > >> is the best I can think of by preserving as much information
> > as
> > > > > > possible
> > > > > > > >> from input
> > > > > > > >> frames. I also understand it may break concept that
> > > "filter_frame"
> > > > > > > outputs
> > > > > > > >> one frame
> > > > > > > >> at a time. For alimiter with my patch, depending on the size
> > of
> > > > the
> > > > > > > >> lookahead buffer,
> > > > > > > >> it may take a few frames before one output frame can be
> > > generated.
> > > > > > This
> > > > > > > is
> > > > > > > >> inevitable
> > > > > > > >> to compensate for the delay of the lookahead buffer.
> > > > > > > >>
> > > > > > > >> Thanks again for reviewing my patch and I'm looking forward
> to
> > > > > hearing
> > > > > > > >> from
> > > > > > > >> you :)
> > > > > > > >>
> > > > > > > >
> > > > > > > > Better than (because its no more 1 frame X nb_samples in, 1
> > > frame X
> > > > > > > > nb_samples out) to replace .filter_frame/.request_frame with
> > > > > .activate
> > > > > > > > logic.
> > > > > > > >
> > > > > > > > And make this output delay compensation filtering optional.
> > > > > > > >
> > > > > > > > In this process make sure that output PTS frame timestamps
> are
> > > > > > unchanged
> > > > > > > > from input one, by keeping reference of needed frames in
> filter
> > > > > queue.
> > > > > > > >
> > > > > > > > Look how speechnorm/dynaudnorm does it.
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Alternatively, use current logic in ladspa filter, (also add
> > option
> > > > > with
> > > > > > > same name).
> > > > > > >
> > > > > > > This would need less code, and probably better approach, as
> there
> > > is
> > > > no
> > > > > > > extra latency introduced.
> > > > > > >
> > > > > > > Than mapping 1:1 between same number of samples per frame is
> not
> > > hold
> > > > > any
> > > > > > > more, but I think that is not much important any more.
> > > > > > >
> > > > > > Thank you for replying to me with your valuable feedback! I have
> > > > checked
> > > > > > af_ladspa
> > > > > > and the "request_frame" function in af_ladspa looks similar to
> what
> > > I'm
> > > > > > doing. The
> > > > > > difference comes from the fact that I need an internal frame
> buffer
> > > to
> > > > > keep
> > > > > > track of
> > > > > > output frames. Essentially I add a frame to the internal buffer
> > when
> > > an
> > > > > > input frame
> > > > > > comes in. The frames in this buffer will be the future output
> > frames.
> > > > We
> > > > > > start writing
> > > > > > these output frame data buffers only when the internal lookahead
> > > buffer
> > > > > has
> > > > > > been filled.
> > > > > > When there is no more input frames, we flushing all the remaining
> > > data
> > > > in
> > > > > > the internal
> > > > > > frame buffers and lookahead buffers. Can you advise on my
> approach
> > > > here?
> > > > > > Thank you!
> > > > > >
> > > > > > I can put my current implementation of "filter_frame" and
> > > > "request_frame"
> > > > > > into the "activate" approach as you suggested with
> > > > speechnorm/dynaudnorm.
> > > > > >
> > > > >
> > > > > No need to wait for all buffers to fill up, only lookahead buffer.
> > > > >
> > > > > Just trim initial samples that is size of lookahead, and than start
> > > > > outputing samples
> > > > > just once you get whatever modulo of current frame number of
> samples.
> > > > >
> > > > > So there should not be need for extra buffers to keep audio
> samples.
> > > > > Just buffers to keep input pts and number of samples of previous
> > input
> > > > > frames, like in ladspa filter.
> > > > >
> > > > Thank you for the suggestion! From my understanding, we have two ways
> > to
> > > > achieve
> > > > "zero_delay" functionality here.
> > > >
> > > > Option 1: as you mentioned, we can trim the initial samples of
> > lookahead
> > > > size.
> > > > The size of the lookahead buffer can be multiple frames. For example
> > when
> > > > the
> > > > attack is 0.08 second, sample rate is 44100 and frame size is 1024,
> the
> > > > lookahead
> > > > buffer size is about 3 frames so the filter needs to see at least 3
> > input
> > > > audio frames
> > > > before it can output one audio frame. We also need to make
> assumptions
> > > > about the
> > > > size of the audio frame (meaning the number of audio samples per
> frame)
> > > > when flushing.
> > > > The frame is probably 1024 conventionally but I think it's better to
> > make
> > > > less assumption
> > > > as possible to allow the filter to be used as flexible as possible.
> > > >
> > > > Option 2: this is what I proposed before. We basically map the same
> > > number
> > > > of input
> > > > frames to the output and we also make sure everything about the frame
> > the
> > > > same as
> > > > the input except for the audio signal data itself, which will be
> > changed
> > > by
> > > > whatever
> > > > processing alimiter has to do with that. I think it is safer to make
> > the
> > > > filter only work on
> > > > the signal itself. It can help other people who use this filter
> without
> > > > worrying about
> > > > any unexpected change on the frame. The downside is that the filter
> > > > internally needs to
> > > > store some empty frames, which will be written as the lookahead
> buffer
> > is
> > > > filled.
> > > >
> > > > I don't see any performance difference between these two options but
> > > option
> > > > 2 looks
> > > > better to me because it works solely on the signals without any
> changes
> > > on
> > > > the frame
> > > >
> > >
> > > option 1 does not add extra delay in processing chain at all, and
> option
> > 2
> > > adds extra delay.
> > >
> > > Just look at latest version of af_ladspa.c filter code.
> > >
> > Hi Paul, I have checked af_ladspa filter. Option 1 certainly doesn't
> > introduce any delay.
> > Option 2 will at most introduce one audio frame of delay in the
> processing
> > chain. Option
> > 2 needs to fill the lookahead buffer and some frames of samples to push
> the
> > data in the
> > lookahead buffer out for one output audio frame. The processing delay is
> > probably not a
> > big concern.
> >
> > The reason I proposed option 2 is about the sync of metadata for the
> output
> > frames:
> >
> > It appears to me that the only metadata we need to worry about is PTS and
> > the number of
> > samples in af_ladspa. However, AVFrame has other fields that also seem to
> > specify some metadata
> > for the frame:
> > 1. AVDictionary *metadata
> > 2. void *opaque (this seems to be client-specific)
> > 3. AVFrameSideData** side_data (this is not handled by
> > av_frame_copy_props).
> > If we go for option 1, it is likely these fields in the input frame will
> > not be mapped to the corresponding
> > output frames. I believe this is also an unexpected behavior for other
> > users who rely on these fields. I
> > understand other filters may not consider this as important but
> > conceptually I believe it is better to make
> > the filter focus on changing what it is supposed to, which is the audio
> > signal itself.
> >
> > Looking forward to hearing your opinion on this, thanks!
> >
>
>
> AVFrame Metadata is mostly useful for cases when input frame after
> filtering
> with filter that adds metadata no longer changes output frames later in
> filter graph.
> Thus using something like astats=metadata=1,alimiter is not very useful and
> logical.
>
> Why by, lightly insisting on no extra delay/latency introduction, and
> prefer 1, option for alimiter filter
> is mostly because in audio domain, latency is relevant and important
> subject for most audio processing specially for limiters.
> So it is very important to keep it at minimum if possible.
>
Hi Paul, I have sent out the patch according to your insights. Thank you!
diff mbox series

Patch

diff --git a/doc/filters.texi b/doc/filters.texi
index d70ac3e237..bb8f7c1a0b 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -1978,6 +1978,11 @@  in release time while 1 produces higher release times.
 @item level
 Auto level output signal. Default is enabled.
 This normalizes audio back to 0dB if enabled.
+
+@item flush_buffer
+Flushes the internal buffer so that all the input audio samples to the limiter 
+will appear to the output. Currently due to lookahead buffer, the total number 
+of output samples will be larger than the input.
 @end table
 
 Depending on picked setting it is recommended to upsample input 2x or 4x times
diff --git a/libavfilter/af_alimiter.c b/libavfilter/af_alimiter.c
index 133f98f165..ba0a1361ac 100644
--- a/libavfilter/af_alimiter.c
+++ b/libavfilter/af_alimiter.c
@@ -55,6 +55,9 @@  typedef struct AudioLimiterContext {
     int *nextpos;
     double *nextdelta;
 
+    int flush_buffer;
+    int total_samples_to_flush;
+
     double delta;
     int nextiter;
     int nextlen;
@@ -65,14 +68,15 @@  typedef struct AudioLimiterContext {
 #define AF AV_OPT_FLAG_AUDIO_PARAM | AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_RUNTIME_PARAM
 
 static const AVOption alimiter_options[] = {
-    { "level_in",  "set input level",  OFFSET(level_in),     AV_OPT_TYPE_DOUBLE, {.dbl=1},.015625,   64, AF },
-    { "level_out", "set output level", OFFSET(level_out),    AV_OPT_TYPE_DOUBLE, {.dbl=1},.015625,   64, AF },
-    { "limit",     "set limit",        OFFSET(limit),        AV_OPT_TYPE_DOUBLE, {.dbl=1}, 0.0625,    1, AF },
-    { "attack",    "set attack",       OFFSET(attack),       AV_OPT_TYPE_DOUBLE, {.dbl=5},    0.1,   80, AF },
-    { "release",   "set release",      OFFSET(release),      AV_OPT_TYPE_DOUBLE, {.dbl=50},     1, 8000, AF },
-    { "asc",       "enable asc",       OFFSET(auto_release), AV_OPT_TYPE_BOOL,   {.i64=0},      0,    1, AF },
-    { "asc_level", "set asc level",    OFFSET(asc_coeff),    AV_OPT_TYPE_DOUBLE, {.dbl=0.5},    0,    1, AF },
-    { "level",     "auto level",       OFFSET(auto_level),   AV_OPT_TYPE_BOOL,   {.i64=1},      0,    1, AF },
+    { "level_in",  "set input level",                              OFFSET(level_in),     AV_OPT_TYPE_DOUBLE, {.dbl=1},.015625,   64, AF },
+    { "level_out", "set output level",                             OFFSET(level_out),    AV_OPT_TYPE_DOUBLE, {.dbl=1},.015625,   64, AF },
+    { "limit",     "set limit",                                    OFFSET(limit),        AV_OPT_TYPE_DOUBLE, {.dbl=1}, 0.0625,    1, AF },
+    { "attack",    "set attack",                                   OFFSET(attack),       AV_OPT_TYPE_DOUBLE, {.dbl=5},    0.1,   80, AF },
+    { "release",   "set release",                                  OFFSET(release),      AV_OPT_TYPE_DOUBLE, {.dbl=50},     1, 8000, AF },
+    { "asc",       "enable asc",                                   OFFSET(auto_release), AV_OPT_TYPE_BOOL,   {.i64=0},      0,    1, AF },
+    { "asc_level", "set asc level",                                OFFSET(asc_coeff),    AV_OPT_TYPE_DOUBLE, {.dbl=0.5},    0,    1, AF },
+    { "level",     "auto level",                                   OFFSET(auto_level),   AV_OPT_TYPE_BOOL,   {.i64=1},      0,    1, AF },
+    { "flush_buffer","flush the samples in the lookahead buffer",  OFFSET(flush_buffer),   AV_OPT_TYPE_BOOL, {.i64=0},      0,    1, AF },
     { NULL }
 };
 
@@ -275,6 +279,39 @@  static int filter_frame(AVFilterLink *inlink, AVFrame *in)
     return ff_filter_frame(outlink, out);
 }
 
+
+static int request_frame(AVFilterLink* outlink) {
+  AVFilterContext* ctx = outlink->src;
+  AudioLimiterContext* s = (AudioLimiterContext*)ctx->priv;
+  int ret;
+  AVFilterLink *inlink;
+  AVFrame *silence_frame;
+
+  ret = ff_request_frame(ctx->inputs[0]);
+
+  if (!s->flush_buffer) {
+      return 0;
+  }
+
+  if (ret != AVERROR_EOF || s->total_samples_to_flush) {
+    // Not necessarily an error, just not EOF.s
+    return ret;
+  }
+
+  // We reach here when input filters have finished producing data (i.e. EOF),
+  // but because of the attac param, s->buffer still has meaningful
+  // audio content that needs flushing. The amount of remaining audio to flush
+  // is the same as the amount of lookahead that was trimmed from the beginning.
+  inlink = ctx->inputs[0];
+  // Pushes silence frame to flush valid audio in the s->buffer
+  silence_frame = ff_get_audio_buffer(inlink, s->total_samples_to_flush);
+  ret = filter_frame(inlink, silence_frame);
+  if (ret < 0) {
+    return ret;
+  }
+  return AVERROR_EOF;
+}
+
 static int config_input(AVFilterLink *inlink)
 {
     AVFilterContext *ctx = inlink->dst;
@@ -292,7 +329,8 @@  static int config_input(AVFilterLink *inlink)
         return AVERROR(ENOMEM);
 
     memset(s->nextpos, -1, obuffer_size * sizeof(*s->nextpos));
-    s->buffer_size = inlink->sample_rate * s->attack * inlink->ch_layout.nb_channels;
+    s->total_samples_to_flush = inlink->sample_rate * s->attack;
+    s->buffer_size = s->total_samples_to_flush * inlink->ch_layout.nb_channels;
     s->buffer_size -= s->buffer_size % inlink->ch_layout.nb_channels;
 
     if (s->buffer_size <= 0) {
@@ -325,6 +363,7 @@  static const AVFilterPad alimiter_outputs[] = {
     {
         .name = "default",
         .type = AVMEDIA_TYPE_AUDIO,
+        .request_frame = request_frame,
     },
 };