diff mbox series

[FFmpeg-devel] avcodec: Vorbis decode: don't use a flag to determine if frames have been output

Message ID 20220908082505.953-1-jyrkive@nekonyansoft.com
State New
Headers show
Series [FFmpeg-devel] avcodec: Vorbis decode: don't use a flag to determine if frames have been output | expand

Checks

Context Check Description
andriy/commit_msg_x86 warning Please wrap lines in the body of the commit message between 60 and 72 characters.
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Jyrki Vesterinen Sept. 8, 2022, 8:25 a.m. UTC
From: Jyrki Vesterinen <jyrkive@nekonyansoft.com>

If a developer using FFmpeg libraries seeks into an earlier position and calls
avcodec_flush_buffers() afterwards as recommended, the Vorbis decoder will drop
the next frame, since buffer flushing clears the first_frame flag. As a result,
the audio samples the calling code receives may be ahead of the requested seek
position, which is unacceptable in some use cases such as playing a looping
sound effect.

This commit removes the first_frame flag entirely and instead uses the
presentation timestamp to determine if it's the first frame.
---
 libavcodec/vorbisdec.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

Comments

Paul B Mahol Sept. 8, 2022, 8:40 a.m. UTC | #1
On Thu, Sep 8, 2022 at 10:26 AM <jyrkive@nekonyansoft.com> wrote:

> From: Jyrki Vesterinen <jyrkive@nekonyansoft.com>
>
> If a developer using FFmpeg libraries seeks into an earlier position and
> calls
> avcodec_flush_buffers() afterwards as recommended, the Vorbis decoder will
> drop
> the next frame, since buffer flushing clears the first_frame flag. As a
> result,
> the audio samples the calling code receives may be ahead of the requested
> seek
> position, which is unacceptable in some use cases such as playing a looping
> sound effect.
>
> This commit removes the first_frame flag entirely and instead uses the
> presentation timestamp to determine if it's the first frame.
>

Proper solution is to fetch initial/first pts and use that one instead
using of using
fragile pts < 0.


> ---
>  libavcodec/vorbisdec.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/libavcodec/vorbisdec.c b/libavcodec/vorbisdec.c
> index 4d03947c49..d4b030d7b9 100644
> --- a/libavcodec/vorbisdec.c
> +++ b/libavcodec/vorbisdec.c
> @@ -130,7 +130,6 @@ typedef struct vorbis_context_s {
>      AVFloatDSPContext *fdsp;
>
>      FFTContext mdct[2];
> -    uint8_t       first_frame;
>      uint32_t      version;
>      uint8_t       audio_channels;
>      uint32_t      audio_samplerate;
> @@ -1845,8 +1844,7 @@ static int vorbis_decode_frame(AVCodecContext
> *avctx, AVFrame *frame,
>      if ((len = vorbis_parse_audio_packet(vc, channel_ptrs)) <= 0)
>          return len;
>
> -    if (!vc->first_frame) {
> -        vc->first_frame = 1;
> +    if (frame->pts < 0) {
>          *got_frame_ptr = 0;
>          av_frame_unref(frame);
>          return buf_size;
> @@ -1881,7 +1879,6 @@ static av_cold void
> vorbis_decode_flush(AVCodecContext *avctx)
>                               sizeof(*vc->saved));
>      }
>      vc->previous_window = -1;
> -    vc->first_frame = 0;
>  }
>
>  const FFCodec ff_vorbis_decoder = {
> --
> 2.37.2.windows.2
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Jyrki Vesterinen Sept. 8, 2022, 11:36 a.m. UTC | #2
Thanks, Paul. I'm not very familiar with the FFmpeg codebase. This new patch attempts to implement your suggestion. Works fine in my tests, at least.
Hendrik Leppkes Oct. 17, 2022, 8:15 a.m. UTC | #3
On Thu, Sep 8, 2022 at 10:26 AM <jyrkive@nekonyansoft.com> wrote:
>
> From: Jyrki Vesterinen <jyrkive@nekonyansoft.com>
>
> If a developer using FFmpeg libraries seeks into an earlier position and calls
> avcodec_flush_buffers() afterwards as recommended, the Vorbis decoder will drop
> the next frame, since buffer flushing clears the first_frame flag. As a result,
> the audio samples the calling code receives may be ahead of the requested seek
> position, which is unacceptable in some use cases such as playing a looping
> sound effect.
>
> This commit removes the first_frame flag entirely and instead uses the
> presentation timestamp to determine if it's the first frame.
> ---
>  libavcodec/vorbisdec.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/libavcodec/vorbisdec.c b/libavcodec/vorbisdec.c
> index 4d03947c49..d4b030d7b9 100644
> --- a/libavcodec/vorbisdec.c
> +++ b/libavcodec/vorbisdec.c
> @@ -130,7 +130,6 @@ typedef struct vorbis_context_s {
>      AVFloatDSPContext *fdsp;
>
>      FFTContext mdct[2];
> -    uint8_t       first_frame;
>      uint32_t      version;
>      uint8_t       audio_channels;
>      uint32_t      audio_samplerate;
> @@ -1845,8 +1844,7 @@ static int vorbis_decode_frame(AVCodecContext *avctx, AVFrame *frame,
>      if ((len = vorbis_parse_audio_packet(vc, channel_ptrs)) <= 0)
>          return len;
>
> -    if (!vc->first_frame) {
> -        vc->first_frame = 1;
> +    if (frame->pts < 0) {
>          *got_frame_ptr = 0;
>          av_frame_unref(frame);
>          return buf_size;
> @@ -1881,7 +1879,6 @@ static av_cold void vorbis_decode_flush(AVCodecContext *avctx)
>                               sizeof(*vc->saved));
>      }
>      vc->previous_window = -1;
> -    vc->first_frame = 0;
>  }
>
>  const FFCodec ff_vorbis_decoder = {
> --
> 2.37.2.windows.2
>

This change seems to be rather fragile and faulty, causing vorbis
decoding to fail in various scenarios for a bunch of downstream
projects.

- A user may not set pts at all, resulting in all frames being dropped
(pure audio files don't necessarily need timestamps)
- A seek could happen before any frame is ever decoded, resulting in
wrong drops, potentially in the middle of playback if the user seeks
backwards after opening in the middle.

In general, using timestamps to control decoder behavior is often just
wrong, as timestamps are not reliable, and most importantly, not tied
to the bitstream at all.

Can we revert this and re-think the approach?

- Hendrik
Paul B Mahol Oct. 17, 2022, 8:18 a.m. UTC | #4
On 10/17/22, Hendrik Leppkes <h.leppkes@gmail.com> wrote:
> On Thu, Sep 8, 2022 at 10:26 AM <jyrkive@nekonyansoft.com> wrote:
>>
>> From: Jyrki Vesterinen <jyrkive@nekonyansoft.com>
>>
>> If a developer using FFmpeg libraries seeks into an earlier position and
>> calls
>> avcodec_flush_buffers() afterwards as recommended, the Vorbis decoder will
>> drop
>> the next frame, since buffer flushing clears the first_frame flag. As a
>> result,
>> the audio samples the calling code receives may be ahead of the requested
>> seek
>> position, which is unacceptable in some use cases such as playing a
>> looping
>> sound effect.
>>
>> This commit removes the first_frame flag entirely and instead uses the
>> presentation timestamp to determine if it's the first frame.
>> ---
>>  libavcodec/vorbisdec.c | 5 +----
>>  1 file changed, 1 insertion(+), 4 deletions(-)
>>
>> diff --git a/libavcodec/vorbisdec.c b/libavcodec/vorbisdec.c
>> index 4d03947c49..d4b030d7b9 100644
>> --- a/libavcodec/vorbisdec.c
>> +++ b/libavcodec/vorbisdec.c
>> @@ -130,7 +130,6 @@ typedef struct vorbis_context_s {
>>      AVFloatDSPContext *fdsp;
>>
>>      FFTContext mdct[2];
>> -    uint8_t       first_frame;
>>      uint32_t      version;
>>      uint8_t       audio_channels;
>>      uint32_t      audio_samplerate;
>> @@ -1845,8 +1844,7 @@ static int vorbis_decode_frame(AVCodecContext
>> *avctx, AVFrame *frame,
>>      if ((len = vorbis_parse_audio_packet(vc, channel_ptrs)) <= 0)
>>          return len;
>>
>> -    if (!vc->first_frame) {
>> -        vc->first_frame = 1;
>> +    if (frame->pts < 0) {
>>          *got_frame_ptr = 0;
>>          av_frame_unref(frame);
>>          return buf_size;
>> @@ -1881,7 +1879,6 @@ static av_cold void
>> vorbis_decode_flush(AVCodecContext *avctx)
>>                               sizeof(*vc->saved));
>>      }
>>      vc->previous_window = -1;
>> -    vc->first_frame = 0;
>>  }
>>
>>  const FFCodec ff_vorbis_decoder = {
>> --
>> 2.37.2.windows.2
>>
>
> This change seems to be rather fragile and faulty, causing vorbis
> decoding to fail in various scenarios for a bunch of downstream
> projects.
>
> - A user may not set pts at all, resulting in all frames being dropped
> (pure audio files don't necessarily need timestamps)
> - A seek could happen before any frame is ever decoded, resulting in
> wrong drops, potentially in the middle of playback if the user seeks
> backwards after opening in the middle.
>
> In general, using timestamps to control decoder behavior is often just
> wrong, as timestamps are not reliable, and most importantly, not tied
> to the bitstream at all.
>
> Can we revert this and re-think the approach?

Are you saying that previous solution was better than current one?

By your own words its ever worse that current state.

>
> - Hendrik
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Hendrik Leppkes Oct. 17, 2022, 8:23 a.m. UTC | #5
On Mon, Oct 17, 2022 at 10:18 AM Paul B Mahol <onemda@gmail.com> wrote:
>
> On 10/17/22, Hendrik Leppkes <h.leppkes@gmail.com> wrote:
> > On Thu, Sep 8, 2022 at 10:26 AM <jyrkive@nekonyansoft.com> wrote:
> >>
> >> From: Jyrki Vesterinen <jyrkive@nekonyansoft.com>
> >>
> >> If a developer using FFmpeg libraries seeks into an earlier position and
> >> calls
> >> avcodec_flush_buffers() afterwards as recommended, the Vorbis decoder will
> >> drop
> >> the next frame, since buffer flushing clears the first_frame flag. As a
> >> result,
> >> the audio samples the calling code receives may be ahead of the requested
> >> seek
> >> position, which is unacceptable in some use cases such as playing a
> >> looping
> >> sound effect.
> >>
> >> This commit removes the first_frame flag entirely and instead uses the
> >> presentation timestamp to determine if it's the first frame.
> >> ---
> >>  libavcodec/vorbisdec.c | 5 +----
> >>  1 file changed, 1 insertion(+), 4 deletions(-)
> >>
> >> diff --git a/libavcodec/vorbisdec.c b/libavcodec/vorbisdec.c
> >> index 4d03947c49..d4b030d7b9 100644
> >> --- a/libavcodec/vorbisdec.c
> >> +++ b/libavcodec/vorbisdec.c
> >> @@ -130,7 +130,6 @@ typedef struct vorbis_context_s {
> >>      AVFloatDSPContext *fdsp;
> >>
> >>      FFTContext mdct[2];
> >> -    uint8_t       first_frame;
> >>      uint32_t      version;
> >>      uint8_t       audio_channels;
> >>      uint32_t      audio_samplerate;
> >> @@ -1845,8 +1844,7 @@ static int vorbis_decode_frame(AVCodecContext
> >> *avctx, AVFrame *frame,
> >>      if ((len = vorbis_parse_audio_packet(vc, channel_ptrs)) <= 0)
> >>          return len;
> >>
> >> -    if (!vc->first_frame) {
> >> -        vc->first_frame = 1;
> >> +    if (frame->pts < 0) {
> >>          *got_frame_ptr = 0;
> >>          av_frame_unref(frame);
> >>          return buf_size;
> >> @@ -1881,7 +1879,6 @@ static av_cold void
> >> vorbis_decode_flush(AVCodecContext *avctx)
> >>                               sizeof(*vc->saved));
> >>      }
> >>      vc->previous_window = -1;
> >> -    vc->first_frame = 0;
> >>  }
> >>
> >>  const FFCodec ff_vorbis_decoder = {
> >> --
> >> 2.37.2.windows.2
> >>
> >
> > This change seems to be rather fragile and faulty, causing vorbis
> > decoding to fail in various scenarios for a bunch of downstream
> > projects.
> >
> > - A user may not set pts at all, resulting in all frames being dropped
> > (pure audio files don't necessarily need timestamps)
> > - A seek could happen before any frame is ever decoded, resulting in
> > wrong drops, potentially in the middle of playback if the user seeks
> > backwards after opening in the middle.
> >
> > In general, using timestamps to control decoder behavior is often just
> > wrong, as timestamps are not reliable, and most importantly, not tied
> > to the bitstream at all.
> >
> > Can we revert this and re-think the approach?
>
> Are you saying that previous solution was better than current one?
>
> By your own words its ever worse that current state.
>

At least the old solution consistently just dropped one frame after a
flush, not in the middle of playback, or dropping every single frame
because the user did not specify timestamps, breaking playback
entirely.

We already have mechanisms to properly drop padding data from the
front of a stream in generic code, that should ideally be used, and
not a decoder-specific hack.

- Hendrik
diff mbox series

Patch

diff --git a/libavcodec/vorbisdec.c b/libavcodec/vorbisdec.c
index 4d03947c49..d4b030d7b9 100644
--- a/libavcodec/vorbisdec.c
+++ b/libavcodec/vorbisdec.c
@@ -130,7 +130,6 @@  typedef struct vorbis_context_s {
     AVFloatDSPContext *fdsp;
 
     FFTContext mdct[2];
-    uint8_t       first_frame;
     uint32_t      version;
     uint8_t       audio_channels;
     uint32_t      audio_samplerate;
@@ -1845,8 +1844,7 @@  static int vorbis_decode_frame(AVCodecContext *avctx, AVFrame *frame,
     if ((len = vorbis_parse_audio_packet(vc, channel_ptrs)) <= 0)
         return len;
 
-    if (!vc->first_frame) {
-        vc->first_frame = 1;
+    if (frame->pts < 0) {
         *got_frame_ptr = 0;
         av_frame_unref(frame);
         return buf_size;
@@ -1881,7 +1879,6 @@  static av_cold void vorbis_decode_flush(AVCodecContext *avctx)
                              sizeof(*vc->saved));
     }
     vc->previous_window = -1;
-    vc->first_frame = 0;
 }
 
 const FFCodec ff_vorbis_decoder = {