diff mbox

[FFmpeg-devel] mov: fix decode of fragments that overlap in time

Message ID 20170925171058.1148-1-jstebbins@jetheaddev.com
State Superseded
Headers show

Commit Message

John Stebbins Sept. 25, 2017, 5:10 p.m. UTC
When keyframe intervals of dash segments are not perfectly aligned,
fragments in the stream can overlap in time. Append new "trun" index
entries to the end of the index instead of sorting by timestamp.
Sorting by timestamp causes packets to be read out of decode order and
results in decode errors.
---
 libavformat/mov.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Carl Eugen Hoyos Sept. 25, 2017, 5:12 p.m. UTC | #1
2017-09-25 19:10 GMT+02:00 John Stebbins <jstebbins@jetheaddev.com>:
> When keyframe intervals of dash segments are not perfectly aligned,
> fragments in the stream can overlap in time. Append new "trun" index
> entries to the end of the index instead of sorting by timestamp.
> Sorting by timestamp causes packets to be read out of decode order and
> results in decode errors.

Could you provide sample(s) that need this?

Thank you, Carl Eugen
wm4 Sept. 25, 2017, 5:31 p.m. UTC | #2
On Mon, 25 Sep 2017 10:10:58 -0700
John Stebbins <jstebbins@jetheaddev.com> wrote:

> When keyframe intervals of dash segments are not perfectly aligned,
> fragments in the stream can overlap in time. Append new "trun" index
> entries to the end of the index instead of sorting by timestamp.
> Sorting by timestamp causes packets to be read out of decode order and
> results in decode errors.
> ---

Does that mean the demuxed packets will have timestamps going
backwards or funny issues like this?
John Stebbins Sept. 25, 2017, 5:48 p.m. UTC | #3
On 09/25/2017 10:12 AM, Carl Eugen Hoyos wrote:
> 2017-09-25 19:10 GMT+02:00 John Stebbins <jstebbins@jetheaddev.com>:
>> When keyframe intervals of dash segments are not perfectly aligned,
>> fragments in the stream can overlap in time. Append new "trun" index
>> entries to the end of the index instead of sorting by timestamp.
>> Sorting by timestamp causes packets to be read out of decode order and
>> results in decode errors.
> Could you provide sample(s) that need this?
>

ftp://uploads.ffmpeg.org seems to be non-functional.  But you can get a small sample here:

https://john.stebbins.name/owncloud/index.php/s/MzTU7ARSwKnlX96

Also, it's relatively easy to create samples with x264 and MP4Box.
John Stebbins Sept. 25, 2017, 5:52 p.m. UTC | #4
On 09/25/2017 10:31 AM, wm4 wrote:
> On Mon, 25 Sep 2017 10:10:58 -0700
> John Stebbins <jstebbins@jetheaddev.com> wrote:
>
>> When keyframe intervals of dash segments are not perfectly aligned,
>> fragments in the stream can overlap in time. Append new "trun" index
>> entries to the end of the index instead of sorting by timestamp.
>> Sorting by timestamp causes packets to be read out of decode order and
>> results in decode errors.
>> ---
> Does that mean the demuxed packets will have timestamps going
> backwards or funny issues like this?
>

Yes, the timestamps do go backwards in this scenario.  It's up to the player to either drop frames at the end of the
previous fragment, or decode and then not display the overlapping frames at the start of the fragment.  In a streaming
scenario, you likely have already displayed the end of the previous fragment, so would have to do the latter.
Carl Eugen Hoyos Sept. 25, 2017, 8:12 p.m. UTC | #5
2017-09-25 19:10 GMT+02:00 John Stebbins <jstebbins@jetheaddev.com>:
> When keyframe intervals of dash segments are not perfectly aligned,
> fragments in the stream can overlap in time. Append new "trun" index
> entries to the end of the index instead of sorting by timestamp.
> Sorting by timestamp causes packets to be read out of decode order and
> results in decode errors.
> ---
>  libavformat/mov.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libavformat/mov.c b/libavformat/mov.c
> index 2519707345..b2bc7c2c3d 100644
> --- a/libavformat/mov.c
> +++ b/libavformat/mov.c
> @@ -4339,8 +4339,8 @@ static int mov_read_trun(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>                                    MOV_FRAG_SAMPLE_FLAG_DEPENDS_YES));
>          if (keyframe)
>              distance = 0;
> -        ctts_index = av_add_index_entry(st, offset, dts, sample_size, distance,
> -                                        keyframe ? AVINDEX_KEYFRAME : 0);
> +        ctts_index = add_index_entry(st, offset, dts, sample_size, distance,
> +                                     keyframe ? AVINDEX_KEYFRAME : 0);

I can confirm that this fixes playback with FFplay but it shows
many warnings (Non-monotonous DTS) with ffmpeg: Is this
unavoidable?

Thank you, Carl Eugen
John Stebbins Sept. 25, 2017, 9:12 p.m. UTC | #6
On 09/25/2017 01:12 PM, Carl Eugen Hoyos wrote:
> 2017-09-25 19:10 GMT+02:00 John Stebbins <jstebbins@jetheaddev.com>:
>> When keyframe intervals of dash segments are not perfectly aligned,
>> fragments in the stream can overlap in time. Append new "trun" index
>> entries to the end of the index instead of sorting by timestamp.
>> Sorting by timestamp causes packets to be read out of decode order and
>> results in decode errors.
>> ---
>>  libavformat/mov.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/libavformat/mov.c b/libavformat/mov.c
>> index 2519707345..b2bc7c2c3d 100644
>> --- a/libavformat/mov.c
>> +++ b/libavformat/mov.c
>> @@ -4339,8 +4339,8 @@ static int mov_read_trun(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>>                                    MOV_FRAG_SAMPLE_FLAG_DEPENDS_YES));
>>          if (keyframe)
>>              distance = 0;
>> -        ctts_index = av_add_index_entry(st, offset, dts, sample_size, distance,
>> -                                        keyframe ? AVINDEX_KEYFRAME : 0);
>> +        ctts_index = add_index_entry(st, offset, dts, sample_size, distance,
>> +                                     keyframe ? AVINDEX_KEYFRAME : 0);
> I can confirm that this fixes playback with FFplay but it shows
> many warnings (Non-monotonous DTS) with ffmpeg: Is this
> unavoidable?
>
>

Fixing the non-monotonous DTS in ffmpeg would require more consideration. The overlapping frames need to be dropped
somewhere after they are decoded and before they are presented. 

I've given this some thought, but other ideas are certainly welcome. The demuxer is probably the most appropriate place
for marking the frames as needing to be dropped since it has the "knowledge" about why there are overlapping
timestamps.  I.e. you only want to drop frames in this particular scenario and not generally when timestamps go
backwards.  I don't think there is currently any facility for marking frames to be dropped after decoding, but it seems
like AVPacketSideData would be the appropriate mechanism for such marking.  FYI, such a facility for marking frames to
drop would also be useful for frame accurate playback of MP4 edit lists.
Carl Eugen Hoyos Sept. 25, 2017, 9:27 p.m. UTC | #7
2017-09-25 23:12 GMT+02:00 John Stebbins <stebbins@jetheaddev.com>:
> FYI, such a facility for marking frames to drop would also
> be useful for frame accurate playback of MP4 edit lists.

Do you have samples with edit lists that do not play
correctly with current FFmpeg?

(Others will hopefully comment on this issue, I certainly don't
disagree with your analysis.)

Thank you, Carl Eugen
John Stebbins Sept. 25, 2017, 10:09 p.m. UTC | #8
On 09/25/2017 02:27 PM, Carl Eugen Hoyos wrote:
> 2017-09-25 23:12 GMT+02:00 John Stebbins <stebbins@jetheaddev.com>:
>> FYI, such a facility for marking frames to drop would also
>> be useful for frame accurate playback of MP4 edit lists.
> Do you have samples with edit lists that do not play
> correctly with current FFmpeg?
>
> (Others will hopefully comment on this issue, I certainly don't
> disagree with your analysis.)
>
>

I do not. Although I think I could get someone to construct one for me if needed. I don't use Apple products, but my
understanding is that Apple video editing tools can be used to generate such files.

I've only given the improved edit list handling in FFmpeg a brief look.  But edit lists suffer from a very similar
problem in that an edit can start on a non-IDR frame and thus you have to decode frames prior to the start of the edit
in order to get frame accurate playback.  I don't recall seeing anything that can handle this scenario in FFmpeg.
Michael Niedermayer Sept. 27, 2017, 12:12 a.m. UTC | #9
Hi

On Mon, Sep 25, 2017 at 02:12:26PM -0700, John Stebbins wrote:
> 
> 
> On 09/25/2017 01:12 PM, Carl Eugen Hoyos wrote:
> > 2017-09-25 19:10 GMT+02:00 John Stebbins <jstebbins@jetheaddev.com>:
> >> When keyframe intervals of dash segments are not perfectly aligned,
> >> fragments in the stream can overlap in time. Append new "trun" index
> >> entries to the end of the index instead of sorting by timestamp.
> >> Sorting by timestamp causes packets to be read out of decode order and
> >> results in decode errors.
> >> ---
> >>  libavformat/mov.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/libavformat/mov.c b/libavformat/mov.c
> >> index 2519707345..b2bc7c2c3d 100644
> >> --- a/libavformat/mov.c
> >> +++ b/libavformat/mov.c
> >> @@ -4339,8 +4339,8 @@ static int mov_read_trun(MOVContext *c, AVIOContext *pb, MOVAtom atom)
> >>                                    MOV_FRAG_SAMPLE_FLAG_DEPENDS_YES));
> >>          if (keyframe)
> >>              distance = 0;
> >> -        ctts_index = av_add_index_entry(st, offset, dts, sample_size, distance,
> >> -                                        keyframe ? AVINDEX_KEYFRAME : 0);
> >> +        ctts_index = add_index_entry(st, offset, dts, sample_size, distance,
> >> +                                     keyframe ? AVINDEX_KEYFRAME : 0);
> > I can confirm that this fixes playback with FFplay but it shows
> > many warnings (Non-monotonous DTS) with ffmpeg: Is this
> > unavoidable?
> >
> >
> 
> Fixing the non-monotonous DTS in ffmpeg would require more consideration. The overlapping frames need to be dropped
> somewhere after they are decoded and before they are presented. 
> 

> I've given this some thought, but other ideas are certainly welcome. The demuxer is probably the most appropriate place
> for marking the frames as needing to be dropped since it has the "knowledge" about why there are overlapping
> timestamps.  I.e. you only want to drop frames in this particular scenario and not generally when timestamps go
> backwards.  I don't think there is currently any facility for marking frames to be dropped after decoding, but it seems
> like AVPacketSideData would be the appropriate mechanism for such marking.  FYI, such a facility for marking frames to
> drop would also be useful for frame accurate playback of MP4 edit lists.

This sounds like:

/**
 * Flag is used to discard packets which are required to maintain valid
 * decoder state but are not required for output and should be dropped
 * after decoding.
 **/
#define AV_PKT_FLAG_DISCARD   0x0004


[...]
John Stebbins Sept. 29, 2017, 3:56 a.m. UTC | #10
On 09/26/2017 05:12 PM, Michael Niedermayer wrote:
> Hi
>
> On Mon, Sep 25, 2017 at 02:12:26PM -0700, John Stebbins wrote:
>>
>> On 09/25/2017 01:12 PM, Carl Eugen Hoyos wrote:
>>> 2017-09-25 19:10 GMT+02:00 John Stebbins <jstebbins@jetheaddev.com>:
>>>> When keyframe intervals of dash segments are not perfectly aligned,
>>>> fragments in the stream can overlap in time. Append new "trun" index
>>>> entries to the end of the index instead of sorting by timestamp.
>>>> Sorting by timestamp causes packets to be read out of decode order and
>>>> results in decode errors.
>>>> ---
>>>>   libavformat/mov.c | 4 ++--
>>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/libavformat/mov.c b/libavformat/mov.c
>>>> index 2519707345..b2bc7c2c3d 100644
>>>> --- a/libavformat/mov.c
>>>> +++ b/libavformat/mov.c
>>>> @@ -4339,8 +4339,8 @@ static int mov_read_trun(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>>>>                                     MOV_FRAG_SAMPLE_FLAG_DEPENDS_YES));
>>>>           if (keyframe)
>>>>               distance = 0;
>>>> -        ctts_index = av_add_index_entry(st, offset, dts, sample_size, distance,
>>>> -                                        keyframe ? AVINDEX_KEYFRAME : 0);
>>>> +        ctts_index = add_index_entry(st, offset, dts, sample_size, distance,
>>>> +                                     keyframe ? AVINDEX_KEYFRAME : 0);
>>> I can confirm that this fixes playback with FFplay but it shows
>>> many warnings (Non-monotonous DTS) with ffmpeg: Is this
>>> unavoidable?
>>>
>>>
>> Fixing the non-monotonous DTS in ffmpeg would require more consideration. The overlapping frames need to be dropped
>> somewhere after they are decoded and before they are presented.
>>
>> I've given this some thought, but other ideas are certainly welcome. The demuxer is probably the most appropriate place
>> for marking the frames as needing to be dropped since it has the "knowledge" about why there are overlapping
>> timestamps.  I.e. you only want to drop frames in this particular scenario and not generally when timestamps go
>> backwards.  I don't think there is currently any facility for marking frames to be dropped after decoding, but it seems
>> like AVPacketSideData would be the appropriate mechanism for such marking.  FYI, such a facility for marking frames to
>> drop would also be useful for frame accurate playback of MP4 edit lists.
> This sounds like:
>
> /**
>   * Flag is used to discard packets which are required to maintain valid
>   * decoder state but are not required for output and should be dropped
>   * after decoding.
>   **/
> #define AV_PKT_FLAG_DISCARD   0x0004
>
>
>

Oh, that's awesome. Thanks for pointing that out. There's even a sample 
flag that can be set in the index which is the next thing I was 
"worried" about.  I'll follow up with a patch to fix the non-monotonous 
DTS.
diff mbox

Patch

diff --git a/libavformat/mov.c b/libavformat/mov.c
index 2519707345..b2bc7c2c3d 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -4339,8 +4339,8 @@  static int mov_read_trun(MOVContext *c, AVIOContext *pb, MOVAtom atom)
                                   MOV_FRAG_SAMPLE_FLAG_DEPENDS_YES));
         if (keyframe)
             distance = 0;
-        ctts_index = av_add_index_entry(st, offset, dts, sample_size, distance,
-                                        keyframe ? AVINDEX_KEYFRAME : 0);
+        ctts_index = add_index_entry(st, offset, dts, sample_size, distance,
+                                     keyframe ? AVINDEX_KEYFRAME : 0);
         if (ctts_index >= 0 && old_nb_index_entries < st->nb_index_entries) {
             unsigned int size_needed = st->nb_index_entries * sizeof(*sc->ctts_data);
             unsigned int request_size = size_needed > sc->ctts_allocated_size ?