Message ID | 20220908082505.953-1-jyrkive@nekonyansoft.com |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel] avcodec: Vorbis decode: don't use a flag to determine if frames have been output | expand |
Context | Check | Description |
---|---|---|
andriy/commit_msg_x86 | warning | Please wrap lines in the body of the commit message between 60 and 72 characters. |
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
On Thu, Sep 8, 2022 at 10:26 AM <jyrkive@nekonyansoft.com> wrote: > From: Jyrki Vesterinen <jyrkive@nekonyansoft.com> > > If a developer using FFmpeg libraries seeks into an earlier position and > calls > avcodec_flush_buffers() afterwards as recommended, the Vorbis decoder will > drop > the next frame, since buffer flushing clears the first_frame flag. As a > result, > the audio samples the calling code receives may be ahead of the requested > seek > position, which is unacceptable in some use cases such as playing a looping > sound effect. > > This commit removes the first_frame flag entirely and instead uses the > presentation timestamp to determine if it's the first frame. > Proper solution is to fetch initial/first pts and use that one instead using of using fragile pts < 0. > --- > libavcodec/vorbisdec.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/libavcodec/vorbisdec.c b/libavcodec/vorbisdec.c > index 4d03947c49..d4b030d7b9 100644 > --- a/libavcodec/vorbisdec.c > +++ b/libavcodec/vorbisdec.c > @@ -130,7 +130,6 @@ typedef struct vorbis_context_s { > AVFloatDSPContext *fdsp; > > FFTContext mdct[2]; > - uint8_t first_frame; > uint32_t version; > uint8_t audio_channels; > uint32_t audio_samplerate; > @@ -1845,8 +1844,7 @@ static int vorbis_decode_frame(AVCodecContext > *avctx, AVFrame *frame, > if ((len = vorbis_parse_audio_packet(vc, channel_ptrs)) <= 0) > return len; > > - if (!vc->first_frame) { > - vc->first_frame = 1; > + if (frame->pts < 0) { > *got_frame_ptr = 0; > av_frame_unref(frame); > return buf_size; > @@ -1881,7 +1879,6 @@ static av_cold void > vorbis_decode_flush(AVCodecContext *avctx) > sizeof(*vc->saved)); > } > vc->previous_window = -1; > - vc->first_frame = 0; > } > > const FFCodec ff_vorbis_decoder = { > -- > 2.37.2.windows.2 > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >
Thanks, Paul. I'm not very familiar with the FFmpeg codebase. This new patch attempts to implement your suggestion. Works fine in my tests, at least.
On Thu, Sep 8, 2022 at 10:26 AM <jyrkive@nekonyansoft.com> wrote: > > From: Jyrki Vesterinen <jyrkive@nekonyansoft.com> > > If a developer using FFmpeg libraries seeks into an earlier position and calls > avcodec_flush_buffers() afterwards as recommended, the Vorbis decoder will drop > the next frame, since buffer flushing clears the first_frame flag. As a result, > the audio samples the calling code receives may be ahead of the requested seek > position, which is unacceptable in some use cases such as playing a looping > sound effect. > > This commit removes the first_frame flag entirely and instead uses the > presentation timestamp to determine if it's the first frame. > --- > libavcodec/vorbisdec.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/libavcodec/vorbisdec.c b/libavcodec/vorbisdec.c > index 4d03947c49..d4b030d7b9 100644 > --- a/libavcodec/vorbisdec.c > +++ b/libavcodec/vorbisdec.c > @@ -130,7 +130,6 @@ typedef struct vorbis_context_s { > AVFloatDSPContext *fdsp; > > FFTContext mdct[2]; > - uint8_t first_frame; > uint32_t version; > uint8_t audio_channels; > uint32_t audio_samplerate; > @@ -1845,8 +1844,7 @@ static int vorbis_decode_frame(AVCodecContext *avctx, AVFrame *frame, > if ((len = vorbis_parse_audio_packet(vc, channel_ptrs)) <= 0) > return len; > > - if (!vc->first_frame) { > - vc->first_frame = 1; > + if (frame->pts < 0) { > *got_frame_ptr = 0; > av_frame_unref(frame); > return buf_size; > @@ -1881,7 +1879,6 @@ static av_cold void vorbis_decode_flush(AVCodecContext *avctx) > sizeof(*vc->saved)); > } > vc->previous_window = -1; > - vc->first_frame = 0; > } > > const FFCodec ff_vorbis_decoder = { > -- > 2.37.2.windows.2 > This change seems to be rather fragile and faulty, causing vorbis decoding to fail in various scenarios for a bunch of downstream projects. - A user may not set pts at all, resulting in all frames being dropped (pure audio files don't necessarily need timestamps) - A seek could happen before any frame is ever decoded, resulting in wrong drops, potentially in the middle of playback if the user seeks backwards after opening in the middle. In general, using timestamps to control decoder behavior is often just wrong, as timestamps are not reliable, and most importantly, not tied to the bitstream at all. Can we revert this and re-think the approach? - Hendrik
On 10/17/22, Hendrik Leppkes <h.leppkes@gmail.com> wrote: > On Thu, Sep 8, 2022 at 10:26 AM <jyrkive@nekonyansoft.com> wrote: >> >> From: Jyrki Vesterinen <jyrkive@nekonyansoft.com> >> >> If a developer using FFmpeg libraries seeks into an earlier position and >> calls >> avcodec_flush_buffers() afterwards as recommended, the Vorbis decoder will >> drop >> the next frame, since buffer flushing clears the first_frame flag. As a >> result, >> the audio samples the calling code receives may be ahead of the requested >> seek >> position, which is unacceptable in some use cases such as playing a >> looping >> sound effect. >> >> This commit removes the first_frame flag entirely and instead uses the >> presentation timestamp to determine if it's the first frame. >> --- >> libavcodec/vorbisdec.c | 5 +---- >> 1 file changed, 1 insertion(+), 4 deletions(-) >> >> diff --git a/libavcodec/vorbisdec.c b/libavcodec/vorbisdec.c >> index 4d03947c49..d4b030d7b9 100644 >> --- a/libavcodec/vorbisdec.c >> +++ b/libavcodec/vorbisdec.c >> @@ -130,7 +130,6 @@ typedef struct vorbis_context_s { >> AVFloatDSPContext *fdsp; >> >> FFTContext mdct[2]; >> - uint8_t first_frame; >> uint32_t version; >> uint8_t audio_channels; >> uint32_t audio_samplerate; >> @@ -1845,8 +1844,7 @@ static int vorbis_decode_frame(AVCodecContext >> *avctx, AVFrame *frame, >> if ((len = vorbis_parse_audio_packet(vc, channel_ptrs)) <= 0) >> return len; >> >> - if (!vc->first_frame) { >> - vc->first_frame = 1; >> + if (frame->pts < 0) { >> *got_frame_ptr = 0; >> av_frame_unref(frame); >> return buf_size; >> @@ -1881,7 +1879,6 @@ static av_cold void >> vorbis_decode_flush(AVCodecContext *avctx) >> sizeof(*vc->saved)); >> } >> vc->previous_window = -1; >> - vc->first_frame = 0; >> } >> >> const FFCodec ff_vorbis_decoder = { >> -- >> 2.37.2.windows.2 >> > > This change seems to be rather fragile and faulty, causing vorbis > decoding to fail in various scenarios for a bunch of downstream > projects. > > - A user may not set pts at all, resulting in all frames being dropped > (pure audio files don't necessarily need timestamps) > - A seek could happen before any frame is ever decoded, resulting in > wrong drops, potentially in the middle of playback if the user seeks > backwards after opening in the middle. > > In general, using timestamps to control decoder behavior is often just > wrong, as timestamps are not reliable, and most importantly, not tied > to the bitstream at all. > > Can we revert this and re-think the approach? Are you saying that previous solution was better than current one? By your own words its ever worse that current state. > > - Hendrik > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >
On Mon, Oct 17, 2022 at 10:18 AM Paul B Mahol <onemda@gmail.com> wrote: > > On 10/17/22, Hendrik Leppkes <h.leppkes@gmail.com> wrote: > > On Thu, Sep 8, 2022 at 10:26 AM <jyrkive@nekonyansoft.com> wrote: > >> > >> From: Jyrki Vesterinen <jyrkive@nekonyansoft.com> > >> > >> If a developer using FFmpeg libraries seeks into an earlier position and > >> calls > >> avcodec_flush_buffers() afterwards as recommended, the Vorbis decoder will > >> drop > >> the next frame, since buffer flushing clears the first_frame flag. As a > >> result, > >> the audio samples the calling code receives may be ahead of the requested > >> seek > >> position, which is unacceptable in some use cases such as playing a > >> looping > >> sound effect. > >> > >> This commit removes the first_frame flag entirely and instead uses the > >> presentation timestamp to determine if it's the first frame. > >> --- > >> libavcodec/vorbisdec.c | 5 +---- > >> 1 file changed, 1 insertion(+), 4 deletions(-) > >> > >> diff --git a/libavcodec/vorbisdec.c b/libavcodec/vorbisdec.c > >> index 4d03947c49..d4b030d7b9 100644 > >> --- a/libavcodec/vorbisdec.c > >> +++ b/libavcodec/vorbisdec.c > >> @@ -130,7 +130,6 @@ typedef struct vorbis_context_s { > >> AVFloatDSPContext *fdsp; > >> > >> FFTContext mdct[2]; > >> - uint8_t first_frame; > >> uint32_t version; > >> uint8_t audio_channels; > >> uint32_t audio_samplerate; > >> @@ -1845,8 +1844,7 @@ static int vorbis_decode_frame(AVCodecContext > >> *avctx, AVFrame *frame, > >> if ((len = vorbis_parse_audio_packet(vc, channel_ptrs)) <= 0) > >> return len; > >> > >> - if (!vc->first_frame) { > >> - vc->first_frame = 1; > >> + if (frame->pts < 0) { > >> *got_frame_ptr = 0; > >> av_frame_unref(frame); > >> return buf_size; > >> @@ -1881,7 +1879,6 @@ static av_cold void > >> vorbis_decode_flush(AVCodecContext *avctx) > >> sizeof(*vc->saved)); > >> } > >> vc->previous_window = -1; > >> - vc->first_frame = 0; > >> } > >> > >> const FFCodec ff_vorbis_decoder = { > >> -- > >> 2.37.2.windows.2 > >> > > > > This change seems to be rather fragile and faulty, causing vorbis > > decoding to fail in various scenarios for a bunch of downstream > > projects. > > > > - A user may not set pts at all, resulting in all frames being dropped > > (pure audio files don't necessarily need timestamps) > > - A seek could happen before any frame is ever decoded, resulting in > > wrong drops, potentially in the middle of playback if the user seeks > > backwards after opening in the middle. > > > > In general, using timestamps to control decoder behavior is often just > > wrong, as timestamps are not reliable, and most importantly, not tied > > to the bitstream at all. > > > > Can we revert this and re-think the approach? > > Are you saying that previous solution was better than current one? > > By your own words its ever worse that current state. > At least the old solution consistently just dropped one frame after a flush, not in the middle of playback, or dropping every single frame because the user did not specify timestamps, breaking playback entirely. We already have mechanisms to properly drop padding data from the front of a stream in generic code, that should ideally be used, and not a decoder-specific hack. - Hendrik
diff --git a/libavcodec/vorbisdec.c b/libavcodec/vorbisdec.c index 4d03947c49..d4b030d7b9 100644 --- a/libavcodec/vorbisdec.c +++ b/libavcodec/vorbisdec.c @@ -130,7 +130,6 @@ typedef struct vorbis_context_s { AVFloatDSPContext *fdsp; FFTContext mdct[2]; - uint8_t first_frame; uint32_t version; uint8_t audio_channels; uint32_t audio_samplerate; @@ -1845,8 +1844,7 @@ static int vorbis_decode_frame(AVCodecContext *avctx, AVFrame *frame, if ((len = vorbis_parse_audio_packet(vc, channel_ptrs)) <= 0) return len; - if (!vc->first_frame) { - vc->first_frame = 1; + if (frame->pts < 0) { *got_frame_ptr = 0; av_frame_unref(frame); return buf_size; @@ -1881,7 +1879,6 @@ static av_cold void vorbis_decode_flush(AVCodecContext *avctx) sizeof(*vc->saved)); } vc->previous_window = -1; - vc->first_frame = 0; } const FFCodec ff_vorbis_decoder = {
From: Jyrki Vesterinen <jyrkive@nekonyansoft.com> If a developer using FFmpeg libraries seeks into an earlier position and calls avcodec_flush_buffers() afterwards as recommended, the Vorbis decoder will drop the next frame, since buffer flushing clears the first_frame flag. As a result, the audio samples the calling code receives may be ahead of the requested seek position, which is unacceptable in some use cases such as playing a looping sound effect. This commit removes the first_frame flag entirely and instead uses the presentation timestamp to determine if it's the first frame. --- libavcodec/vorbisdec.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)