diff mbox series

[FFmpeg-devel,v2,5/7] avcodec/mediacodecenc: remove the strategy to create DTS

Message ID tencent_F315AB514E5570E32D6E9AC0407A9ADF9F08@qq.com
State New
Headers show
Series [FFmpeg-devel,v2,1/7] avcodec/mediacodecenc: make each encoder has its own option | expand

Checks

Context Check Description
andriy/configure_x86 warning Failed to apply patch
yinshiyou/configure_loongarch64 warning Failed to apply patch

Commit Message

Zhao Zhili Dec. 7, 2022, 9:31 a.m. UTC
From: Zhao Zhili <zhilizhao@tencent.com>

Use input PTS as DTS has multiple problems:
1. If there is no reordering, it's better to just use the output
PTS as DTS, since encoder may change the timestamp value (do it
on purpose or rounding error).

2. If there is reordering, input PTS should be shift a few frames
as DTS to satisfy the requirement of PTS >= DTS. I can't find a
reliable way to determine how many frames to be shift. For example,
we don't known if the encoder use hierarchical B frames. The
max_num_reorder_frames can be get from VUI, but VUI is optional.

3. Encoder dropping frames makes the case worse. Android has an
BITRATE_MODE_CBR_FD option to allow it explicitly.
---
 libavcodec/mediacodecenc.c | 28 +---------------------------
 1 file changed, 1 insertion(+), 27 deletions(-)

Comments

Tomas Härdin Dec. 12, 2022, 3:28 p.m. UTC | #1
ons 2022-12-07 klockan 17:31 +0800 skrev Zhao Zhili:
> From: Zhao Zhili <zhilizhao@tencent.com>
> 
> Use input PTS as DTS has multiple problems:
> 1. If there is no reordering, it's better to just use the output
> PTS as DTS, since encoder may change the timestamp value (do it
> on purpose or rounding error).
> 
> 2. If there is reordering, input PTS should be shift a few frames
> as DTS to satisfy the requirement of PTS >= DTS. I can't find a
> reliable way to determine how many frames to be shift. For example,
> we don't known if the encoder use hierarchical B frames. The
> max_num_reorder_frames can be get from VUI, but VUI is optional.
> 
> 3. Encoder dropping frames makes the case worse. Android has an
> BITRATE_MODE_CBR_FD option to allow it explicitly.

Don't we already have code to parse this stuff from h.264 streams?

/Tomas
Zhao Zhili Dec. 13, 2022, 2:55 a.m. UTC | #2
> On Dec 12, 2022, at 23:28, Tomas Härdin <git@haerdin.se> wrote:
> 
> ons 2022-12-07 klockan 17:31 +0800 skrev Zhao Zhili:
>> From: Zhao Zhili <zhilizhao@tencent.com>
>> 
>> Use input PTS as DTS has multiple problems:
>> 1. If there is no reordering, it's better to just use the output
>> PTS as DTS, since encoder may change the timestamp value (do it
>> on purpose or rounding error).
>> 
>> 2. If there is reordering, input PTS should be shift a few frames
>> as DTS to satisfy the requirement of PTS >= DTS. I can't find a
>> reliable way to determine how many frames to be shift. For example,
>> we don't known if the encoder use hierarchical B frames. The
>> max_num_reorder_frames can be get from VUI, but VUI is optional.
>> 
>> 3. Encoder dropping frames makes the case worse. Android has an
>> BITRATE_MODE_CBR_FD option to allow it explicitly.
> 
> Don't we already have code to parse this stuff from h.264 streams?

We have code to parse max_num_reorder_frames, if the stream contains
the field.

We have some code to generate/guess DTS but doesn’t work reliably for
H.264/H.265. I guess it’s hard, if not impossible. We don’t know the
prediction structures. Use encoder’s input PTS as DTS works for some
transport protocols like RTP, and MP4 with negative_cts_offsets,
otherwise it’s broken due to PTS >= DTS.

If we can and decided to implement such algorithm, it should go in
a higher level, so remux raw stream works better.

> 
> /Tomas
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Tomas Härdin Dec. 14, 2022, 5:31 p.m. UTC | #3
tis 2022-12-13 klockan 10:55 +0800 skrev zhilizhao(赵志立):
> 
> 
> > On Dec 12, 2022, at 23:28, Tomas Härdin <git@haerdin.se> wrote:
> > 
> > ons 2022-12-07 klockan 17:31 +0800 skrev Zhao Zhili:
> > > From: Zhao Zhili <zhilizhao@tencent.com>
> > > 
> > > Use input PTS as DTS has multiple problems:
> > > 1. If there is no reordering, it's better to just use the output
> > > PTS as DTS, since encoder may change the timestamp value (do it
> > > on purpose or rounding error).
> > > 
> > > 2. If there is reordering, input PTS should be shift a few frames
> > > as DTS to satisfy the requirement of PTS >= DTS. I can't find a
> > > reliable way to determine how many frames to be shift. For
> > > example,
> > > we don't known if the encoder use hierarchical B frames. The
> > > max_num_reorder_frames can be get from VUI, but VUI is optional.
> > > 
> > > 3. Encoder dropping frames makes the case worse. Android has an
> > > BITRATE_MODE_CBR_FD option to allow it explicitly.
> > 
> > Don't we already have code to parse this stuff from h.264 streams?
> 
> We have code to parse max_num_reorder_frames, if the stream contains
> the field.
> 
> We have some code to generate/guess DTS but doesn’t work reliably for
> H.264/H.265. I guess it’s hard, if not impossible. We don’t know the
> prediction structures. Use encoder’s input PTS as DTS works for some
> transport protocols like RTP, and MP4 with negative_cts_offsets,
> otherwise it’s broken due to PTS >= DTS.

Right, H.264 and H.265 support rather arbitrary reference structures. I
guess a pessimistic estimate is good enough

/Tomas
Anton Khirnov Jan. 4, 2023, 10:16 a.m. UTC | #4
Does this mean the encoder will produce packets with dts=AV_NOPTS_VALUE?
Zhao Zhili Jan. 4, 2023, 11:31 a.m. UTC | #5
> On Jan 4, 2023, at 18:16, Anton Khirnov <anton@khirnov.net> wrote:
> 
> Does this mean the encoder will produce packets with dts=AV_NOPTS_VALUE?

MediaCodec should not encode B frames by default, so dts = pts by default.
B frames can be enabled explicitly, in that case dts is AV_NOPTS_VALUE.

Android system’s MP4 muxer works very hard to recreate dts to workaround
the limitation of MediaCodec API. It create ‘valid’ but almost useless
files with a lot of jitters.

> 
> -- 
> Anton Khirnov
>
Tomas Härdin Jan. 4, 2023, 1:59 p.m. UTC | #6
ons 2023-01-04 klockan 19:31 +0800 skrev zhilizhao(赵志立):
> 
> > On Jan 4, 2023, at 18:16, Anton Khirnov <anton@khirnov.net> wrote:
> > 
> > Does this mean the encoder will produce packets with
> > dts=AV_NOPTS_VALUE?
> 
> MediaCodec should not encode B frames by default, so dts = pts by
> default.
> B frames can be enabled explicitly, in that case dts is
> AV_NOPTS_VALUE.
> 
> Android system’s MP4 muxer works very hard to recreate dts to
> workaround
> the limitation of MediaCodec API. It create ‘valid’ but almost
> useless
> files with a lot of jitters.

Maybe we should disallow B-frames entirely until the MediaCodec API is
fixed so as to not add to this mess?

/Tomas
Zhao Zhili Jan. 4, 2023, 2:46 p.m. UTC | #7
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Tomas Härdin
> Sent: 2023年1月4日 22:00
> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [Internet]Re: [PATCH v2 5/7] avcodec/mediacodecenc: remove the strategy to create DTS
> 
> ons 2023-01-04 klockan 19:31 +0800 skrev zhilizhao(赵志立):
> >
> > > On Jan 4, 2023, at 18:16, Anton Khirnov <anton@khirnov.net> wrote:
> > >
> > > Does this mean the encoder will produce packets with
> > > dts=AV_NOPTS_VALUE?
> >
> > MediaCodec should not encode B frames by default, so dts = pts by
> > default.
> > B frames can be enabled explicitly, in that case dts is
> > AV_NOPTS_VALUE.
> >
> > Android system’s MP4 muxer works very hard to recreate dts to
> > workaround
> > the limitation of MediaCodec API. It create ‘valid’ but almost
> > useless
> > files with a lot of jitters.
> 
> Maybe we should disallow B-frames entirely until the MediaCodec API is
> fixed so as to not add to this mess?

There are some valid usecases. For example, RTP works fine without DTS.
With some restriction or precondition, use can create DTS too. So I'd like
to keep the feature.

> 
> /Tomas
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Anton Khirnov Jan. 4, 2023, 3:15 p.m. UTC | #8
Quoting Zhao Zhili (2023-01-04 15:46:52)
> 
> > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Tomas Härdin
> > Sent: 2023年1月4日 22:00
> > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> > Subject: Re: [FFmpeg-devel] [Internet]Re: [PATCH v2 5/7] avcodec/mediacodecenc: remove the strategy to create DTS
> > 
> > ons 2023-01-04 klockan 19:31 +0800 skrev zhilizhao(赵志立):
> > >
> > > > On Jan 4, 2023, at 18:16, Anton Khirnov <anton@khirnov.net> wrote:
> > > >
> > > > Does this mean the encoder will produce packets with
> > > > dts=AV_NOPTS_VALUE?
> > >
> > > MediaCodec should not encode B frames by default, so dts = pts by
> > > default.
> > > B frames can be enabled explicitly, in that case dts is
> > > AV_NOPTS_VALUE.
> > >
> > > Android system’s MP4 muxer works very hard to recreate dts to
> > > workaround
> > > the limitation of MediaCodec API. It create ‘valid’ but almost
> > > useless
> > > files with a lot of jitters.
> > 
> > Maybe we should disallow B-frames entirely until the MediaCodec API is
> > fixed so as to not add to this mess?
> 
> There are some valid usecases. For example, RTP works fine without DTS.
> With some restriction or precondition, use can create DTS too. So I'd like
> to keep the feature.

It is currently an API guarantee that all encoders return valid DTS
values, so this encoder is behaving in an invalid way.

If you want to keep B-frame functionality in this half-working state,
I'd say it should be hidden behind something like -strict experimental.
Zhao Zhili Jan. 4, 2023, 4:12 p.m. UTC | #9
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Anton Khirnov
> Sent: 2023年1月4日 23:16
> To: 'FFmpeg development discussions and patches' <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [Internet]Re: [PATCH v2 5/7] avcodec/mediacodecenc: remove the strategy to create DTS
> 
> Quoting Zhao Zhili (2023-01-04 15:46:52)
> >
> > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Tomas Härdin
> > > Sent: 2023年1月4日 22:00
> > > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> > > Subject: Re: [FFmpeg-devel] [Internet]Re: [PATCH v2 5/7] avcodec/mediacodecenc: remove the strategy to create DTS
> > >
> > > ons 2023-01-04 klockan 19:31 +0800 skrev zhilizhao(赵志立):
> > > >
> > > > > On Jan 4, 2023, at 18:16, Anton Khirnov <anton@khirnov.net> wrote:
> > > > >
> > > > > Does this mean the encoder will produce packets with
> > > > > dts=AV_NOPTS_VALUE?
> > > >
> > > > MediaCodec should not encode B frames by default, so dts = pts by
> > > > default.
> > > > B frames can be enabled explicitly, in that case dts is
> > > > AV_NOPTS_VALUE.
> > > >
> > > > Android system’s MP4 muxer works very hard to recreate dts to
> > > > workaround
> > > > the limitation of MediaCodec API. It create ‘valid’ but almost
> > > > useless
> > > > files with a lot of jitters.
> > >
> > > Maybe we should disallow B-frames entirely until the MediaCodec API is
> > > fixed so as to not add to this mess?
> >
> > There are some valid usecases. For example, RTP works fine without DTS.
> > With some restriction or precondition, use can create DTS too. So I'd like
> > to keep the feature.
> 
> It is currently an API guarantee that all encoders return valid DTS
> values, so this encoder is behaving in an invalid way.

It's an reasonable requirement, but could you elaborate on where is the doc
explicitly says that?

> 
> If you want to keep B-frame functionality in this half-working state,
> I'd say it should be hidden behind something like -strict experimental.

Will do.

> 
> --
> Anton Khirnov
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Anton Khirnov Jan. 5, 2023, 8:07 a.m. UTC | #10
Quoting Zhao Zhili (2023-01-04 17:12:56)
> > It is currently an API guarantee that all encoders return valid DTS
> > values, so this encoder is behaving in an invalid way.
> 
> It's an reasonable requirement, but could you elaborate on where is the doc
> explicitly says that?

I am not aware of this being explicitly documented, but then our
documentation is sadly far from complete.

But this was one of the major reasons for adding avcodec_encode_video2()
back in 2012 (later replaced by avcodec_send_frame()+receive_packet()).
Also see 8de1ee9f725aa3c550f425bd3120bcd95d5b2ea8.
diff mbox series

Patch

diff --git a/libavcodec/mediacodecenc.c b/libavcodec/mediacodecenc.c
index 7f2ae88285..8e28a50e0d 100644
--- a/libavcodec/mediacodecenc.c
+++ b/libavcodec/mediacodecenc.c
@@ -64,18 +64,6 @@  typedef struct MediaCodecEncContext {
 
     uint8_t *extradata;
     int extradata_size;
-
-    // Since MediaCodec doesn't output DTS, use a timestamp queue to save pts
-    // of AVFrame and generate DTS for AVPacket.
-    //
-    // This doesn't work when use Surface as input, in that case frames can be
-    // sent to encoder without our notice. One exception is frames come from
-    // our MediaCodec decoder wrapper, since we can control it's render by
-    // av_mediacodec_release_buffer.
-    int64_t timestamps[32];
-    int ts_head;
-    int ts_tail;
-
     int eof_sent;
 
     AVFrame *frame;
@@ -368,11 +356,6 @@  static int mediacodec_receive(AVCodecContext *avctx,
     }
     memcpy(pkt->data + extradata_size, out_buf + out_info.offset, out_info.size);
     pkt->pts = av_rescale_q(out_info.presentationTimeUs, AV_TIME_BASE_Q, avctx->time_base);
-    if (s->ts_tail != s->ts_head) {
-        pkt->dts = s->timestamps[s->ts_tail];
-        s->ts_tail = (s->ts_tail + 1) % FF_ARRAY_ELEMS(s->timestamps);
-    }
-
     if (out_info.flags & ff_AMediaCodec_getBufferFlagKeyFrame(codec))
         pkt->flags |= AV_PKT_FLAG_KEY;
     ret = 0;
@@ -437,14 +420,8 @@  static int mediacodec_send(AVCodecContext *avctx,
             return ff_AMediaCodec_signalEndOfInputStream(codec);
         }
 
-
-        if (frame->data[3]) {
-            pts = av_rescale_q(frame->pts, avctx->time_base, AV_TIME_BASE_Q);
-            s->timestamps[s->ts_head] = frame->pts;
-            s->ts_head = (s->ts_head + 1) % FF_ARRAY_ELEMS(s->timestamps);
-
+        if (frame->data[3])
             av_mediacodec_release_buffer((AVMediaCodecBuffer *)frame->data[3], 1);
-        }
         return 0;
     }
 
@@ -463,9 +440,6 @@  static int mediacodec_send(AVCodecContext *avctx,
         copy_frame_to_buffer(avctx, frame, input_buf, input_size);
 
         pts = av_rescale_q(frame->pts, avctx->time_base, AV_TIME_BASE_Q);
-
-        s->timestamps[s->ts_head] = frame->pts;
-        s->ts_head = (s->ts_head + 1) % FF_ARRAY_ELEMS(s->timestamps);
     } else {
         flags |= ff_AMediaCodec_getBufferFlagEndOfStream(codec);
         s->eof_sent = 1;