[FFmpeg-devel] lavf/movenc: Replace dts by pts to calculate duration

Submitted by manuelyuan on Nov. 6, 2019, 2:36 a.m.

Details

Message ID 6543c657.3c3b.16e3e91c7b6.Coremail.manuelyuan@163.com
State New
Headers show

Commit Message

manuelyuan Nov. 6, 2019, 2:36 a.m.
From: Mengyang Yuan <manuelyuan@163.com>

In this case, the input video is of dynamic frame rate and we don't want to
duplicate or drop frames, but the output video duration calculated by DTS is
incorrect, I solved it by using PTS.
There are many UGC videos with dynamic frame rates, which are represented by
PTS jumps. After transcoding with ffmpeg -vsync 0 or -vsync 2, the output
video duration becomes longer.By reading the code of x264/encoder/encoder.c,
I found that in order to predict the B frame, x264 needs to ensure that there
are enough reference frames when DTS = 0, so the DTS of these reference frames
will subtract the cache time. However, the cache time includes the part of PTS
jumps, which results in the abnormal small DTS.

Signed-off-by: Mengyang Yuan <manuelyuan@163.com>
---
 libavformat/movenc.c | 23 ++++++++++++++---------
 libavformat/movenc.h |  2 ++
 2 files changed, 16 insertions(+), 9 deletions(-)

Comments

Michael Niedermayer Nov. 6, 2019, 4:47 p.m.
On Wed, Nov 06, 2019 at 10:36:11AM +0800, manuelyuan wrote:
> From: Mengyang Yuan <manuelyuan@163.com>
> 
> In this case, the input video is of dynamic frame rate and we don't want to
> duplicate or drop frames, but the output video duration calculated by DTS is
> incorrect, I solved it by using PTS.
> There are many UGC videos with dynamic frame rates, which are represented by
> PTS jumps. After transcoding with ffmpeg -vsync 0 or -vsync 2, the output
> video duration becomes longer.By reading the code of x264/encoder/encoder.c,
> I found that in order to predict the B frame, x264 needs to ensure that there
> are enough reference frames when DTS = 0, so the DTS of these reference frames
> will subtract the cache time. However, the cache time includes the part of PTS
> jumps, which results in the abnormal small DTS.
> 
> Signed-off-by: Mengyang Yuan <manuelyuan@163.com>
> ---
>  libavformat/movenc.c | 23 ++++++++++++++---------
>  libavformat/movenc.h |  2 ++
>  2 files changed, 16 insertions(+), 9 deletions(-)

this breaks make fate / changes checksums
if the changes are intended, the references would need to be updated
with the patch doing the change

make: *** [fate-lavf-mp4] Error 1
make: *** [fate-lavf-mov] Error 1
make: *** [fate-binsub-movtextenc] Error 1
make: *** [fate-movenc] Error 1

thanks

[...]
manuelyuan Nov. 7, 2019, 9:21 a.m.
Thanks for your reply! 
My changes break make fate but this is inevitable. I will update the corresponding references to make sure make fate success



At 2019-11-07 00:47:42, "Michael Niedermayer" <michael@niedermayer.cc> wrote:
>On Wed, Nov 06, 2019 at 10:36:11AM +0800, manuelyuan wrote:
>> From: Mengyang Yuan <manuelyuan@163.com>
>> 
>> In this case, the input video is of dynamic frame rate and we don't want to
>> duplicate or drop frames, but the output video duration calculated by DTS is
>> incorrect, I solved it by using PTS.
>> There are many UGC videos with dynamic frame rates, which are represented by
>> PTS jumps. After transcoding with ffmpeg -vsync 0 or -vsync 2, the output
>> video duration becomes longer.By reading the code of x264/encoder/encoder.c,
>> I found that in order to predict the B frame, x264 needs to ensure that there
>> are enough reference frames when DTS = 0, so the DTS of these reference frames
>> will subtract the cache time. However, the cache time includes the part of PTS
>> jumps, which results in the abnormal small DTS.
>> 
>> Signed-off-by: Mengyang Yuan <manuelyuan@163.com>
>> ---
>>  libavformat/movenc.c | 23 ++++++++++++++---------
>>  libavformat/movenc.h |  2 ++
>>  2 files changed, 16 insertions(+), 9 deletions(-)
>
>this breaks make fate / changes checksums
>if the changes are intended, the references would need to be updated
>with the patch doing the change
>
>make: *** [fate-lavf-mp4] Error 1
>make: *** [fate-lavf-mov] Error 1
>make: *** [fate-binsub-movtextenc] Error 1
>make: *** [fate-movenc] Error 1
>
>thanks
>
>[...]
>-- 
>Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
>Asymptotically faster algorithms should always be preferred if you have
>asymptotical amounts of data

Patch hide | download patch | download mbox

diff --git a/libavformat/movenc.c b/libavformat/movenc.c
index 715bec1c2f..206aa48d8c 100644
--- a/libavformat/movenc.c
+++ b/libavformat/movenc.c
@@ -1015,7 +1015,7 @@  static int get_cluster_duration(MOVTrack *track, int cluster_idx)
         return 0;
 
     if (cluster_idx + 1 == track->entry)
-        next_dts = track->track_duration + track->start_dts;
+        next_dts = track->end_dts;
     else
         next_dts = track->cluster[cluster_idx + 1].dts;
 
@@ -5149,8 +5149,7 @@  static int mov_flush_fragment(AVFormatContext *s, int force)
         mov->mdat_size = 0;
         for (i = 0; i < mov->nb_streams; i++) {
             if (mov->tracks[i].entry)
-                mov->tracks[i].frag_start += mov->tracks[i].start_dts +
-                                             mov->tracks[i].track_duration -
+                mov->tracks[i].frag_start += mov->tracks[i].end_dts -
                                              mov->tracks[i].cluster[0].dts;
             mov->tracks[i].entry = 0;
             mov->tracks[i].end_reliable = 0;
@@ -5208,7 +5207,7 @@  static int mov_flush_fragment(AVFormatContext *s, int force)
         int64_t duration = 0;
 
         if (track->entry)
-            duration = track->start_dts + track->track_duration -
+            duration = track->end_dts -
                        track->cluster[0].dts;
         if (mov->flags & FF_MOV_FLAG_SEPARATE_MOOF) {
             if (!track->mdat_buf)
@@ -5281,7 +5280,7 @@  static int check_pkt(AVFormatContext *s, AVPacket *pkt)
         ref = trk->cluster[trk->entry - 1].dts;
     } else if (   trk->start_dts != AV_NOPTS_VALUE
                && !trk->frag_discont) {
-        ref = trk->start_dts + trk->track_duration;
+        ref = trk->end_dts;
     } else
         ref = pkt->dts; // Skip tests for the first packet
 
@@ -5494,7 +5493,7 @@  int ff_mov_write_packet(AVFormatContext *s, AVPacket *pkt)
              * of the last packet of the previous fragment based on track_duration,
              * which might not exactly match our dts. Therefore adjust the dts
              * of this packet to be what the previous packets duration implies. */
-            trk->cluster[trk->entry].dts = trk->start_dts + trk->track_duration;
+            trk->cluster[trk->entry].dts = trk->end_dts;
             /* We also may have written the pts and the corresponding duration
              * in sidx/tfrf/tfxd tags; make sure the sidx pts and duration match up with
              * the next fragment. This means the cts of the first sample must
@@ -5546,13 +5545,17 @@  int ff_mov_write_packet(AVFormatContext *s, AVPacket *pkt)
                    "this case.\n",
                    pkt->stream_index, pkt->dts);
     }
-    trk->track_duration = pkt->dts - trk->start_dts + pkt->duration;
-    trk->last_sample_is_subtitle_end = 0;
-
     if (pkt->pts == AV_NOPTS_VALUE) {
         av_log(s, AV_LOG_WARNING, "pts has no value\n");
         pkt->pts = pkt->dts;
     }
+    if (trk->start_pts == AV_NOPTS_VALUE) {
+        trk->start_pts = pkt->pts;
+    }
+    trk->track_duration = FFMAX(pkt->pts - trk->start_pts + pkt->duration, trk->track_duration);
+    trk->end_dts = pkt->dts + pkt->duration;
+    trk->last_sample_is_subtitle_end = 0;
+
     if (pkt->dts != pkt->pts)
         trk->flags |= MOV_TRACK_CTTS;
     trk->cluster[trk->entry].cts   = pkt->pts - pkt->dts;
@@ -6295,7 +6298,9 @@  static int mov_init(AVFormatContext *s)
          * this is updated. */
         track->hint_track = -1;
         track->start_dts  = AV_NOPTS_VALUE;
+        track->start_pts  = AV_NOPTS_VALUE;
         track->start_cts  = AV_NOPTS_VALUE;
+        track->end_dts    = AV_NOPTS_VALUE;
         track->end_pts    = AV_NOPTS_VALUE;
         track->dts_shift  = AV_NOPTS_VALUE;
         if (st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
diff --git a/libavformat/movenc.h b/libavformat/movenc.h
index 68d6f23a5a..ddad2631d7 100644
--- a/libavformat/movenc.h
+++ b/libavformat/movenc.h
@@ -116,7 +116,9 @@  typedef struct MOVTrack {
     uint32_t    tref_tag;
     int         tref_id; ///< trackID of the referenced track
     int64_t     start_dts;
+    int64_t     start_pts;
     int64_t     start_cts;
+    int64_t     end_dts;
     int64_t     end_pts;
     int         end_reliable;
     int64_t     dts_shift;