diff mbox series

[FFmpeg-devel] avformat/movenc: Fix tfdt out of sync

Message ID 20210718102232.1382376-1-sehuww@mail.scut.edu.cn
State New
Headers show
Series [FFmpeg-devel] avformat/movenc: Fix tfdt out of sync | expand

Checks

Context Check Description
andriy/x86_make success Make finished
andriy/x86_make_fate success Make fate finished
andriy/PPC64_make success Make finished
andriy/PPC64_make_fate success Make fate finished

Commit Message

Hu Weiwen July 18, 2021, 10:22 a.m. UTC
Fix an edge case when auto flushing an empty fragment, and the previous fragment
has inaccurate duration, the track->frag_start would be out of sync from input
dts, and all subsequent tfdt box will have out of sync base_media_decode_time.
This error can accumulate to quite large over long-running streaming.

This can be easily reproduced by remux a variable frame rate source with dash
muxer and set streaming to 1.

When flushing a fragment, we may guess the duration if we have not got the next
package. Then when we get the first packet of the next fragment, we intend to
set the trk->cluster[0].dts to match the previously guessed duration.  However,
at this time trk->track_duration may already be updated from
mov_write_single_packet() when auto flushing the empty fragment (e.g. if
FF_MOV_FLAG_FRAG_EVERY_FRAME is set), but no moof was written to reflect this
change. So the trk->cluster[0].dts do NOT actually match the previously written
duration. Then, when flushing the next non-empty fragment, we will get a wrong
"duration", and track->frag_start will be out of sync. And it cannot get back on
track again.

This error is expected to accumulate over time. trk->cluster[0].dts will only
get too large. it will not get too small because we have check_pkt(), so
track->frag_start will fall more and more behind from input dts.

To fix this, we can directly remove the track->frag_start. It can always be
replaced by "track->cluster[0].dts - track->start_dts".

Signed-off-by: Hu Weiwen <sehuww@mail.scut.edu.cn>
---
 libavformat/movenc.c | 24 ++++--------------------
 libavformat/movenc.h |  1 -
 2 files changed, 4 insertions(+), 21 deletions(-)

Comments

Martin Storsjö July 19, 2021, 8:33 p.m. UTC | #1
Hi,

Thanks for the patch! I'll try to look into it in a while (I'm a bit 
swamped and short on time at the moment), hopefully within a couple days 
or so.

On Sun, 18 Jul 2021, Hu Weiwen wrote:

> Fix an edge case when auto flushing an empty fragment, and the previous fragment
> has inaccurate duration, the track->frag_start would be out of sync from input
> dts, and all subsequent tfdt box will have out of sync base_media_decode_time.
> This error can accumulate to quite large over long-running streaming.
>
> This can be easily reproduced by remux a variable frame rate source with dash
> muxer and set streaming to 1.

Can you provide a specific sample and a command line, and point out what 
values in the output file that are mismatched - so I can zoom in on the 
issue quicker once I have time to sit down and look at it?

// Martin
Hu Weiwen July 20, 2021, 5:18 a.m. UTC | #2
Hi Martin,

OK, here I provide some codes to demo this bug. Many of these are copied from libavformat/tests/movenc.c

#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavutil/opt.h>
#include <libavutil/intreadwrite.h>

static const uint8_t h264_extradata[] = {
    0x01, 0x4d, 0x40, 0x1e, 0xff, 0xe1, 0x00, 0x02, 0x67, 0x4d, 0x01, 0x00, 0x02, 0x68, 0xef
};

AVPacket *pkt;
AVFormatContext *ctx;

void mux_packet(int64_t ts) {
    uint8_t pktdata[8] = { 0 };

    av_packet_unref(pkt);
    pkt->stream_index = 0;
    pkt->pts = pkt->dts = ts;
    AV_WB32(pktdata + 4, pkt->pts);
    pkt->data = pktdata;
    pkt->size = 8;

    av_write_frame(ctx, pkt);
}

int main() {
    avformat_alloc_output_context2(&ctx, NULL, NULL, "bug.mp4");
    // Setting this flag is important
    av_opt_set(ctx, "movflags", "+frag_every_frame", AV_OPT_SEARCH_CHILDREN);
    AVStream *st = avformat_new_stream(ctx, NULL);
    st->codecpar->codec_type = AVMEDIA_TYPE_VIDEO;
    st->codecpar->codec_id = AV_CODEC_ID_H264;
    st->codecpar->width = 640;
    st->codecpar->height = 480;
    st->time_base.num = 1;
    st->time_base.den = 30;
    st->codecpar->extradata_size = sizeof(h264_extradata);
    st->codecpar->extradata = av_mallocz(st->codecpar->extradata_size + AV_INPUT_BUFFER_PADDING_SIZE);
    memcpy(st->codecpar->extradata, h264_extradata, sizeof(h264_extradata));
    avio_open(&ctx->pb, "bug.mp4", AVIO_FLAG_WRITE);
    avformat_write_header(ctx, NULL);

    pkt = av_packet_alloc();
    mux_packet(0);
    mux_packet(1);
    mux_packet(2);
    av_write_frame(ctx, NULL); // Manually flush a frag
    mux_packet(10);            // Automatically flush an empty frag, because frag_every_frame is set
    mux_packet(11);
    mux_packet(12);

    av_write_trailer(ctx);
    avformat_free_context(ctx);
    av_packet_free(&pkt);
}

After compile and run this code, run this command to inspect the dts (which comes from the out of sync tfdt)

ffprobe -show_packets bug.mp4 | grep dts=

The output is:

dts=0
dts=1
dts=2
dts=2
dts=3
dts=4

With this patch applied, the output is:

dts=0
dts=1
dts=2
dts=10
dts=11
dts=12

Regards,
Hu Weiwen

On Mon, Jul 19, 2021 at 11:33:59PM +0300, Martin Storsjö wrote:
> Hi,
> 
> Thanks for the patch! I'll try to look into it in a while (I'm a bit swamped
> and short on time at the moment), hopefully within a couple days or so.
> 
> On Sun, 18 Jul 2021, Hu Weiwen wrote:
> 
> > Fix an edge case when auto flushing an empty fragment, and the previous fragment
> > has inaccurate duration, the track->frag_start would be out of sync from input
> > dts, and all subsequent tfdt box will have out of sync base_media_decode_time.
> > This error can accumulate to quite large over long-running streaming.
> > 
> > This can be easily reproduced by remux a variable frame rate source with dash
> > muxer and set streaming to 1.
> 
> Can you provide a specific sample and a command line, and point out what
> values in the output file that are mismatched - so I can zoom in on the
> issue quicker once I have time to sit down and look at it?
> 
> // Martin
Martin Storsjö Aug. 5, 2021, 12:30 p.m. UTC | #3
Hi,

On Tue, 20 Jul 2021, 胡玮文 wrote:

> After compile and run this code, run this command to inspect the dts (which comes from the out of sync tfdt)
>
> ffprobe -show_packets bug.mp4 | grep dts=
>
> The output is:
>
> dts=0
> dts=1
> dts=2
> dts=2
> dts=3
> dts=4
>
> With this patch applied, the output is:
>
> dts=0
> dts=1
> dts=2
> dts=10
> dts=11
> dts=12

Thanks for the repro case, and sorry for the delay in looking at it.

I do see the issue, but I disagree with your suggested solution. While 
your patch does create the correct, intended value in tfdt, you will 
instead create a file where the dts calculated from adding up previous 
sample durations differ from what's written in tfdt. So depending on 
whether the demuxer just accumulates durations or reads tfdt, it will 
produce a different result.

I guess it can be argued that if a demuxer reads a fragmented file, then 
tfdt should be more authoritative than duration and any other reader is 
buggy, but nevertheless, the code as is is designed to make sure that tfdt 
is consistent with the sum of durations.

It seems it's possible to fix the same issue differently though, by not 
adjusting track_duration and end_pts when autoflushing, if there's no 
samples in the track that are going to be flushed. That way, we retain the 
existing intended logic of the muxer, while avoiding diverging.

The result of your repro example, with my movenc modification, produces 
this dts sequence:

dts=0
dts=1
dts=2
dts=2
dts=11
dts=12

This is, of course less nice than what we had before, but after flushing 
the fragment containing tfdt=2, duration=0, the only consistent choice we 
have is to start the next fragment at tfdt/dts=2.


However, I'm open to add an option to ignore the end of the previous 
fragment and make the new fragment start at the exact desired timestamp.

// Martin
diff mbox series

Patch

diff --git a/libavformat/movenc.c b/libavformat/movenc.c
index c85efe87489..48736acab65 100644
--- a/libavformat/movenc.c
+++ b/libavformat/movenc.c
@@ -4482,8 +4482,7 @@  static int mov_write_tfxd_tag(AVIOContext *pb, MOVTrack *track)
     avio_write(pb, uuid, sizeof(uuid));
     avio_w8(pb, 1);
     avio_wb24(pb, 0);
-    avio_wb64(pb, track->start_dts + track->frag_start +
-                  track->cluster[0].cts);
+    avio_wb64(pb, track->cluster[0].dts + track->cluster[0].cts);
     avio_wb64(pb, track->end_pts -
                   (track->cluster[0].dts + track->cluster[0].cts));
 
@@ -4562,8 +4561,7 @@  static int mov_add_tfra_entries(AVIOContext *pb, MOVMuxContext *mov, int tracks,
         info->size     = size;
         // Try to recreate the original pts for the first packet
         // from the fields we have stored
-        info->time     = track->start_dts + track->frag_start +
-                         track->cluster[0].cts;
+        info->time     = track->cluster[0].dts + track->cluster[0].cts;
         info->duration = track->end_pts -
                          (track->cluster[0].dts + track->cluster[0].cts);
         // If the pts is less than zero, we will have trimmed
@@ -4601,7 +4599,7 @@  static int mov_write_tfdt_tag(AVIOContext *pb, MOVTrack *track)
     ffio_wfourcc(pb, "tfdt");
     avio_w8(pb, 1); /* version */
     avio_wb24(pb, 0);
-    avio_wb64(pb, track->frag_start);
+    avio_wb64(pb, track->cluster[0].dts - track->start_dts);
     return update_size(pb, pos);
 }
 
@@ -4678,8 +4676,7 @@  static int mov_write_sidx_tag(AVIOContext *pb,
 
     if (track->entry) {
         entries = 1;
-        presentation_time = track->start_dts + track->frag_start +
-                            track->cluster[0].cts;
+        presentation_time = track->cluster[0].dts + track->cluster[0].cts;
         duration = track->end_pts -
                    (track->cluster[0].dts + track->cluster[0].cts);
         starts_with_SAP = track->cluster[0].flags & MOV_SYNC_SAMPLE;
@@ -5358,10 +5355,6 @@  static int mov_flush_fragment(AVFormatContext *s, int force)
         mov->moov_written = 1;
         mov->mdat_size = 0;
         for (i = 0; i < mov->nb_streams; i++) {
-            if (mov->tracks[i].entry)
-                mov->tracks[i].frag_start += mov->tracks[i].start_dts +
-                                             mov->tracks[i].track_duration -
-                                             mov->tracks[i].cluster[0].dts;
             mov->tracks[i].entry = 0;
             mov->tracks[i].end_reliable = 0;
         }
@@ -5415,11 +5408,7 @@  static int mov_flush_fragment(AVFormatContext *s, int force)
         MOVTrack *track = &mov->tracks[i];
         int buf_size, write_moof = 1, moof_tracks = -1;
         uint8_t *buf;
-        int64_t duration = 0;
 
-        if (track->entry)
-            duration = track->start_dts + track->track_duration -
-                       track->cluster[0].dts;
         if (mov->flags & FF_MOV_FLAG_SEPARATE_MOOF) {
             if (!track->mdat_buf)
                 continue;
@@ -5439,8 +5428,6 @@  static int mov_flush_fragment(AVFormatContext *s, int force)
             ffio_wfourcc(s->pb, "mdat");
         }
 
-        if (track->entry)
-            track->frag_start += duration;
         track->entry = 0;
         track->entries_flushed = 0;
         track->end_reliable = 0;
@@ -5759,7 +5746,6 @@  int ff_mov_write_packet(AVFormatContext *s, AVPacket *pkt)
             /* New fragment, but discontinuous from previous fragments.
              * Pretend the duration sum of the earlier fragments is
              * pkt->dts - trk->start_dts. */
-            trk->frag_start = pkt->dts - trk->start_dts;
             trk->end_pts = AV_NOPTS_VALUE;
             trk->frag_discont = 0;
         }
@@ -5781,12 +5767,10 @@  int ff_mov_write_packet(AVFormatContext *s, AVPacket *pkt)
                 /* Pretend the whole stream started at pts=0, with earlier fragments
                  * already written. If the stream started at pts=0, the duration sum
                  * of earlier fragments would have been pkt->pts. */
-                trk->frag_start = pkt->pts;
                 trk->start_dts  = pkt->dts - pkt->pts;
             } else {
                 /* Pretend the whole stream started at dts=0, with earlier fragments
                  * already written, with a duration summing up to pkt->dts. */
-                trk->frag_start = pkt->dts;
                 trk->start_dts  = 0;
             }
             trk->frag_discont = 0;
diff --git a/libavformat/movenc.h b/libavformat/movenc.h
index af1ea0bce64..daeaad1cc68 100644
--- a/libavformat/movenc.h
+++ b/libavformat/movenc.h
@@ -138,7 +138,6 @@  typedef struct MOVTrack {
 
     AVIOContext *mdat_buf;
     int64_t     data_offset;
-    int64_t     frag_start;
     int         frag_discont;
     int         entries_flushed;