diff mbox series

[FFmpeg-devel,3/4,v2] ffmpeg: move A/V non-streamcopy initialization to a later point

Message ID 20200913213314.247143-1-jeebjp@gmail.com
State Superseded
Headers show
Series None | expand

Commit Message

Jan Ekström Sept. 13, 2020, 9:33 p.m. UTC
- For video, this means a single initialization point in do_video_out.
- For audio we unfortunately need to do it in two places just
  before the buffer sink is utilized (if av_buffersink_get_samples
  would still work according to its specification after a call to
  avfilter_graph_request_oldest was made, we could at least remove
  the one in transcode_step).

Other adjustments to make things work:
- As the AVFrame PTS adjustment to encoder time base needs the encoder
  to be initialized, so it is now moved to do_{video,audio}_out,
  right after the encoder has been initialized. Due to this,
  the additional parameter in do_video_out is removed as it is no
  longer necessary.
---
 fftools/ffmpeg.c | 112 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 77 insertions(+), 35 deletions(-)

Comments

Jan Ekström Sept. 14, 2020, 8:10 p.m. UTC | #1
On Mon, Sep 14, 2020 at 12:33 AM Jan Ekström <jeebjp@gmail.com> wrote:
>
> - For video, this means a single initialization point in do_video_out.
> - For audio we unfortunately need to do it in two places just
>   before the buffer sink is utilized (if av_buffersink_get_samples
>   would still work according to its specification after a call to
>   avfilter_graph_request_oldest was made, we could at least remove
>   the one in transcode_step).
>
> Other adjustments to make things work:
> - As the AVFrame PTS adjustment to encoder time base needs the encoder
>   to be initialized, so it is now moved to do_{video,audio}_out,
>   right after the encoder has been initialized. Due to this,
>   the additional parameter in do_video_out is removed as it is no
>   longer necessary.
> ---

Diff between v1 and v2 follows:

- Basically fixed issues in the comments I had written down.

Jan

diff --git a/fftools/ffmpeg.c b/fftools/ffmpeg.c
index 5425ba245d..8874da9268 100644
--- a/fftools/ffmpeg.c
+++ b/fftools/ffmpeg.c
@@ -4664,14 +4656,15 @@ static int transcode_step(void)
          * transcode_from_filter just down the line) peeks. Peeking already
          * puts one frame "ready to be given out", which means that any
          * update in filter buffer sink configuration afterwards will not
-         * help us.
+         * help us. And yes, even if it would be utilized,
+         * av_buffersink_get_samples is affected, as it internally utilizes
+         * the same early exit for peeked frames.
          *
-         * And yes, even av_buffersink_get_samples is affected,
-         * As it internally utilizes the same early exit for peeked frames.
-         * In other words, if either av_buffersink_get_samples with
-         * avfilter_graph_request_oldest will start playing ball, or we add
-         * our own audio buffering to handle frame size mismatches, both of
-         * these early exits can be gotten rid of.
+         * In other words, if avfilter_graph_request_oldest would not make
+         * further filter chain configuration or usage of
+         * av_buffersink_get_samples useless (by just causing the return
+         * of the peeked AVFrame as-is), we could get rid of this additional
+         * early encoder initialization.
          */
         if (av_buffersink_get_type(ost->filter->filter) == AVMEDIA_TYPE_AUDIO)
             init_output_stream_wrapper(ost, NULL, 1);
Michael Niedermayer Sept. 15, 2020, 9:24 p.m. UTC | #2
On Mon, Sep 14, 2020 at 12:33:14AM +0300, Jan Ekström wrote:
> - For video, this means a single initialization point in do_video_out.
> - For audio we unfortunately need to do it in two places just
>   before the buffer sink is utilized (if av_buffersink_get_samples
>   would still work according to its specification after a call to
>   avfilter_graph_request_oldest was made, we could at least remove
>   the one in transcode_step).
> 
> Other adjustments to make things work:
> - As the AVFrame PTS adjustment to encoder time base needs the encoder
>   to be initialized, so it is now moved to do_{video,audio}_out,
>   right after the encoder has been initialized. Due to this,
>   the additional parameter in do_video_out is removed as it is no
>   longer necessary.
> ---
>  fftools/ffmpeg.c | 112 ++++++++++++++++++++++++++++++++---------------
>  1 file changed, 77 insertions(+), 35 deletions(-)

breaks this:
./ffmpeg -ss 30.0 -i ~/tickets/1745/1745-Sample.mkv -f vob -c:a copy  -bitexact -t 1 -f framecrc -  
(sample file is linked in the ticket https://trac.ffmpeg.org/ticket/1745)

(Too many packets buffered for output stream 0:1. Conversion failed!)

thx

[...]
Jan Ekström Sept. 16, 2020, 10:20 a.m. UTC | #3
On Wed, Sep 16, 2020 at 12:24 AM Michael Niedermayer
<michael@niedermayer.cc> wrote:
>
> On Mon, Sep 14, 2020 at 12:33:14AM +0300, Jan Ekström wrote:
> > - For video, this means a single initialization point in do_video_out.
> > - For audio we unfortunately need to do it in two places just
> >   before the buffer sink is utilized (if av_buffersink_get_samples
> >   would still work according to its specification after a call to
> >   avfilter_graph_request_oldest was made, we could at least remove
> >   the one in transcode_step).
> >
> > Other adjustments to make things work:
> > - As the AVFrame PTS adjustment to encoder time base needs the encoder
> >   to be initialized, so it is now moved to do_{video,audio}_out,
> >   right after the encoder has been initialized. Due to this,
> >   the additional parameter in do_video_out is removed as it is no
> >   longer necessary.
> > ---
> >  fftools/ffmpeg.c | 112 ++++++++++++++++++++++++++++++++---------------
> >  1 file changed, 77 insertions(+), 35 deletions(-)
>
> breaks this:
> ./ffmpeg -ss 30.0 -i ~/tickets/1745/1745-Sample.mkv -f vob -c:a copy  -bitexact -t 1 -f framecrc -
> (sample file is linked in the ticket https://trac.ffmpeg.org/ticket/1745)
>
> (Too many packets buffered for output stream 0:1. Conversion failed!)
>
> thx

With an initial look with -debug_ts -v verbose -max_muxing_queue_size
10000 , it appears that audio packets start at about -5.5 seconds, and
video is getting skipped until an exact zero point is hit.

So either the offset is incorrect, or we should also be dropping the
audio packets as well until zero point is found.

Jan
Jan Ekström Sept. 20, 2020, 9:59 p.m. UTC | #4
On Wed, Sep 16, 2020 at 1:20 PM Jan Ekström <jeebjp@gmail.com> wrote:
>
> On Wed, Sep 16, 2020 at 12:24 AM Michael Niedermayer
> <michael@niedermayer.cc> wrote:
> >
> > On Mon, Sep 14, 2020 at 12:33:14AM +0300, Jan Ekström wrote:
> > > - For video, this means a single initialization point in do_video_out.
> > > - For audio we unfortunately need to do it in two places just
> > >   before the buffer sink is utilized (if av_buffersink_get_samples
> > >   would still work according to its specification after a call to
> > >   avfilter_graph_request_oldest was made, we could at least remove
> > >   the one in transcode_step).
> > >
> > > Other adjustments to make things work:
> > > - As the AVFrame PTS adjustment to encoder time base needs the encoder
> > >   to be initialized, so it is now moved to do_{video,audio}_out,
> > >   right after the encoder has been initialized. Due to this,
> > >   the additional parameter in do_video_out is removed as it is no
> > >   longer necessary.
> > > ---
> > >  fftools/ffmpeg.c | 112 ++++++++++++++++++++++++++++++++---------------
> > >  1 file changed, 77 insertions(+), 35 deletions(-)
> >
> > breaks this:
> > ./ffmpeg -ss 30.0 -i ~/tickets/1745/1745-Sample.mkv -f vob -c:a copy  -bitexact -t 1 -f framecrc -
> > (sample file is linked in the ticket https://trac.ffmpeg.org/ticket/1745)
> >
> > (Too many packets buffered for output stream 0:1. Conversion failed!)
> >
> > thx
>
> With an initial look with -debug_ts -v verbose -max_muxing_queue_size
> 10000 , it appears that audio packets start at about -5.5 seconds, and
> video is getting skipped until an exact zero point is hit.
>
> So either the offset is incorrect, or we should also be dropping the
> audio packets as well until zero point is found.
>
> Jan

So, with a further look this stems from a difference in how stream
copy and non-stream copy cases are handled:
- For stream copy by default any packets received after the seek point
are thrown into the muxer, even if we end up way before the actual
seek point. There is `-copypriorss 0` to control that behavior.
- For re-encoding use cases we wait until we hit the wanted seek point.

In this specific example, seeking gets us to 5 or so seconds before
the wanted point.

Before:
- Audio stream packets would get thrown to muxer, as both streams get
initialized at a similar point. Any buffering/sync issues are left to
lavf/the muxer.
- Video stream packets would get thrown out until the seek point was hit.
- So you have ~5.5 seconds of audio only, and then video.

Now:
- Audio stream packets are passed into ffmpeg.c's muxing buffer, as
the video stream has not yet been initialized as the first video frame
has not been decoded.
- Video stream packets still get thrown out until the seek point is hit.
- Since the audio packets contain (1536/48000) seconds of audio, quite
a few have to be buffered (187 according to my testing) for the mux to
succeed. This way larger than the default muxing queue.
- Thus, the remux part fails.

Alternatives:
- Alternative A: To keep the current behavior, we would have to start
decoding those frames, but dropping them out later in the logic. That
way the output stream may be initialized, but no video output would
happen.
- Alternative B: We normalize the behavior according to how stream
copy works by default. Video also gets output if the packets are
copied.
- Alternative C: We normalize the behavior according to how the
re-encoding logic works by default. `-copypriorss 0` would effectively
become the default.

I think in a way I would prefer Alternative C, as I'm not sure how
many people expect to get packets from way before their requested seek
point. Of course, in a perfect world we would do such things with
indexing a la ffms2, where we know exactly how long and how structured
an input is - and to which packet to seek and how much to decode or
drop.

Jan
diff mbox series

Patch

diff --git a/fftools/ffmpeg.c b/fftools/ffmpeg.c
index 54802a8ec3..8658dc0ca4 100644
--- a/fftools/ffmpeg.c
+++ b/fftools/ffmpeg.c
@@ -938,6 +938,28 @@  early_exit:
     return float_pts;
 }
 
+static int init_output_stream(OutputStream *ost, char *error, int error_len);
+
+static int init_output_stream_wrapper(OutputStream *ost, unsigned int fatal)
+{
+    int ret = AVERROR_BUG;
+    char error[1024] = {0};
+
+    if (ost->initialized)
+        return 0;
+
+    ret = init_output_stream(ost, error, sizeof(error));
+    if (ret < 0) {
+        av_log(NULL, AV_LOG_ERROR, "Error initializing output stream %d:%d -- %s\n",
+               ost->file_index, ost->index, error);
+
+        if (fatal)
+            exit_program(1);
+    }
+
+    return ret;
+}
+
 static void do_audio_out(OutputFile *of, OutputStream *ost,
                          AVFrame *frame)
 {
@@ -949,6 +971,8 @@  static void do_audio_out(OutputFile *of, OutputStream *ost,
     pkt.data = NULL;
     pkt.size = 0;
 
+    adjust_frame_pts_to_encoder_tb(of, ost, frame);
+
     if (!check_recording_time(ost))
         return;
 
@@ -1083,8 +1107,7 @@  static void do_subtitle_out(OutputFile *of,
 
 static void do_video_out(OutputFile *of,
                          OutputStream *ost,
-                         AVFrame *next_picture,
-                         double sync_ipts)
+                         AVFrame *next_picture)
 {
     int ret, format_video_sync;
     AVPacket pkt;
@@ -1094,10 +1117,14 @@  static void do_video_out(OutputFile *of,
     int nb_frames, nb0_frames, i;
     double delta, delta0;
     double duration = 0;
+    double sync_ipts = AV_NOPTS_VALUE;
     int frame_size = 0;
     InputStream *ist = NULL;
     AVFilterContext *filter = ost->filter->filter;
 
+    init_output_stream_wrapper(ost, 1);
+    sync_ipts = adjust_frame_pts_to_encoder_tb(of, ost, next_picture);
+
     if (ost->source_index >= 0)
         ist = input_streams[ost->source_index];
 
@@ -1431,28 +1458,6 @@  static void do_video_stats(OutputStream *ost, int frame_size)
     }
 }
 
-static int init_output_stream(OutputStream *ost, char *error, int error_len);
-
-static int init_output_stream_wrapper(OutputStream *ost, unsigned int fatal)
-{
-    int ret = AVERROR_BUG;
-    char error[1024] = {0};
-
-    if (ost->initialized)
-        return 0;
-
-    ret = init_output_stream(ost, error, sizeof(error));
-    if (ret < 0) {
-        av_log(NULL, AV_LOG_ERROR, "Error initializing output stream %d:%d -- %s\n",
-               ost->file_index, ost->index, error);
-
-        if (fatal)
-            exit_program(1);
-    }
-
-    return ret;
-}
-
 static void finish_output_stream(OutputStream *ost)
 {
     OutputFile *of = output_files[ost->file_index];
@@ -1489,7 +1494,17 @@  static int reap_filters(int flush)
             continue;
         filter = ost->filter->filter;
 
-        init_output_stream_wrapper(ost, 1);
+        /*
+         * Unlike video, with audio the audio frame size matters.
+         * Currently we are fully reliant on the lavfi filter chain to
+         * do the buffering deed for us, and thus the frame size parameter
+         * needs to be set accordingly. Where does one get the required
+         * frame size? From the initialized AVCodecContext of an audio
+         * encoder. Thus, if we have gotten to an audio stream, initialize
+         * the encoder earlier than receiving the first AVFrame.
+         */
+        if (av_buffersink_get_type(filter) == AVMEDIA_TYPE_AUDIO)
+            init_output_stream_wrapper(ost, 1);
 
         if (!ost->filtered_frame && !(ost->filtered_frame = av_frame_alloc())) {
             return AVERROR(ENOMEM);
@@ -1497,7 +1512,6 @@  static int reap_filters(int flush)
         filtered_frame = ost->filtered_frame;
 
         while (1) {
-            double float_pts = AV_NOPTS_VALUE; // this is identical to filtered_frame.pts but with higher precision
             ret = av_buffersink_get_frame_flags(filter, filtered_frame,
                                                AV_BUFFERSINK_FLAG_NO_REQUEST);
             if (ret < 0) {
@@ -1506,7 +1520,7 @@  static int reap_filters(int flush)
                            "Error in av_buffersink_get_frame_flags(): %s\n", av_err2str(ret));
                 } else if (flush && ret == AVERROR_EOF) {
                     if (av_buffersink_get_type(filter) == AVMEDIA_TYPE_VIDEO)
-                        do_video_out(of, ost, NULL, AV_NOPTS_VALUE);
+                        do_video_out(of, ost, NULL);
                 }
                 break;
             }
@@ -1515,15 +1529,12 @@  static int reap_filters(int flush)
                 continue;
             }
 
-            float_pts = adjust_frame_pts_to_encoder_tb(of, ost,
-                                                       filtered_frame);
-
             switch (av_buffersink_get_type(filter)) {
             case AVMEDIA_TYPE_VIDEO:
                 if (!ost->frame_aspect_ratio.num)
                     enc->sample_aspect_ratio = filtered_frame->sample_aspect_ratio;
 
-                do_video_out(of, ost, filtered_frame, float_pts);
+                do_video_out(of, ost, filtered_frame);
                 break;
             case AVMEDIA_TYPE_AUDIO:
                 if (!(enc->codec->capabilities & AV_CODEC_CAP_PARAM_CHANGE) &&
@@ -3688,10 +3699,19 @@  static int transcode_init(void)
             goto dump_format;
         }
 
-    /* open each encoder */
+    /*
+     * initialize stream copy and subtitle/data streams.
+     * Encoded AVFrame based streams will get initialized as follows:
+     * - when the first AVFrame is received in do_video_out
+     * - just before the first AVFrame is received in either transcode_step
+     *   or reap_filters due to us requiring the filter chain buffer sink
+     *   to be configured with the correct audio frame size, which is only
+     *   known after the encoder is initialized.
+     */
     for (i = 0; i < nb_output_streams; i++) {
-        // skip streams fed from filtergraphs until we have a frame for them
-        if (output_streams[i]->filter)
+        if (!output_streams[i]->stream_copy &&
+            (output_streams[i]->enc_ctx->codec_type == AVMEDIA_TYPE_VIDEO ||
+             output_streams[i]->enc_ctx->codec_type == AVMEDIA_TYPE_AUDIO))
             continue;
 
         ret = init_output_stream_wrapper(output_streams[i], 0);
@@ -4605,7 +4625,29 @@  static int transcode_step(void)
     }
 
     if (ost->filter && ost->filter->graph->graph) {
-        init_output_stream_wrapper(ost, 1);
+        /*
+         * Similar case to the early audio initialization in reap_filters.
+         * Audio is special in ffmpeg.c currently as we depend on lavfi's
+         * audio frame buffering/creation to get the output audio frame size
+         * in samples correct. The audio frame size for the filter chain is
+         * configured during the output stream initialization.
+         *
+         * Apparently avfilter_graph_request_oldest (called in
+         * transcode_from_filter just down the line) peeks. Peeking already
+         * puts one frame "ready to be given out", which means that any
+         * update in filter buffer sink configuration afterwards will not
+         * help us. And yes, even if it would be utilized,
+         * av_buffersink_get_samples is affected, as it internally utilizes
+         * the same early exit for peeked frames.
+         *
+         * In other words, if avfilter_graph_request_oldest would not make
+         * further filter chain configuration or usage of
+         * av_buffersink_get_samples useless (by just causing the return
+         * of the peeked AVFrame as-is), we could get rid of this additional
+         * early encoder initialization.
+         */
+        if (av_buffersink_get_type(ost->filter->filter) == AVMEDIA_TYPE_AUDIO)
+            init_output_stream_wrapper(ost, 1);
 
         if ((ret = transcode_from_filter(ost->filter->graph, &ist)) < 0)
             return ret;