diff mbox series

[FFmpeg-devel,3/5] lavf: replace FFERROR_REDO with AVERROR(EAGAIN)

Message ID 20221108112550.8375-3-anton@khirnov.net
State New
Headers show
Series [FFmpeg-devel,1/5] lavf/rtpdec_asf: set AVFMT_FLAG_NONBLOCK | expand

Checks

Context Check Description
yinshiyou/configure_loongarch64 warning Failed to run configure
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Anton Khirnov Nov. 8, 2022, 11:25 a.m. UTC
There is no meaningful distinction between them, both mean "the demuxer
did some work, but did not produce a packet - should be called again".
---
 libavformat/demux.c   | 12 ++++--------
 libavformat/demux.h   |  6 ------
 libavformat/flvdec.c  | 14 +++++++-------
 libavformat/lxfdec.c  |  2 +-
 libavformat/mlvdec.c  |  2 +-
 libavformat/mpeg.c    |  2 +-
 libavformat/smacker.c |  2 +-
 7 files changed, 15 insertions(+), 25 deletions(-)

Comments

Nicolas George Nov. 8, 2022, 12:54 p.m. UTC | #1
Anton Khirnov (12022-11-08):
> There is no meaningful distinction between them, both mean "the demuxer
> did some work, but did not produce a packet - should be called again".

NAK, there a difference in semantics: AVEROR(EAGAIN) is for when data is
not available for external reasons, typically network blocking,
AVERROR_REDO is for when data is available and the demuxer will read it
as soon as it is restarted, just to avoid having a loop in the demuxer.
Anton Khirnov Nov. 8, 2022, 2:07 p.m. UTC | #2
Quoting Nicolas George (2022-11-08 13:54:53)
> Anton Khirnov (12022-11-08):
> > There is no meaningful distinction between them, both mean "the demuxer
> > did some work, but did not produce a packet - should be called again".
> 
> NAK, there a difference in semantics: AVEROR(EAGAIN) is for when data is
> not available for external reasons, typically network blocking,

This is false - there are zero demuxers returning EAGAIN due to network
blocking. Furthermore, lavf architecture fundamentally does not allow
generic non-blocking operations, so the most any demuxer can do is
return at a few specific points. And since demuxers are supposed to be
independent of the underlying IO protocol, they fundamentally cannot
know anything about network state.

So there really is no meaningful difference between REDO and EAGAIN.
Both are just an opportunity for the caller to do something else before
trying again. If we believe giving the caller this option is valuable,
then it's valuable in both these cases and it makes no sense to
distinguish between them.
Nicolas George Nov. 8, 2022, 2:15 p.m. UTC | #3
Anton Khirnov (12022-11-08):
> This is false - there are zero demuxers returning EAGAIN due to network
> blocking.

There are devices who return EAGAIN due to device blocking.

> So there really is no meaningful difference between REDO and EAGAIN.

There is a difference: REDO is internal, EAGAIN is public.
Anton Khirnov Nov. 8, 2022, 4:39 p.m. UTC | #4
Quoting Nicolas George (2022-11-08 15:15:39)
> Anton Khirnov (12022-11-08):
> > This is false - there are zero demuxers returning EAGAIN due to network
> > blocking.
> 
> There are devices who return EAGAIN due to device blocking.

Sure, and that's about it. And as I said before - there is no way to
tell when a device will be ready, so this is of very limited usefulness.

> > So there really is no meaningful difference between REDO and EAGAIN.
> 
> There is a difference: REDO is internal, EAGAIN is public.

That is not a meaningful difference. A meaningful difference would be
one that has actionable consequences.
Nicolas George Nov. 9, 2022, 8:21 a.m. UTC | #5
Anton Khirnov (12022-11-08):
> Sure, and that's about it. And as I said before - there is no way to
> tell when a device will be ready, so this is of very limited usefulness.

This is not true for many devices. Sorry to contradict you once again,
but you would know it if you had any experience working with devices.

> That is not a meaningful difference. A meaningful difference would be
> one that has actionable consequences.

Well, it has actionable consequences: if you treat EAGAIN like REDO you
introduce a busy wait, and if you treat REDO like EAGAIN you introduce a
significant slowness. Now that you make me mention it, I remember it was
precisely the reason I introduced REDO: to fix a slowness in resyncs.
Paul B Mahol Nov. 9, 2022, 8:44 a.m. UTC | #6
On 11/9/22, Nicolas George <george@nsup.org> wrote:
> Anton Khirnov (12022-11-08):
>> Sure, and that's about it. And as I said before - there is no way to
>> tell when a device will be ready, so this is of very limited usefulness.
>
> This is not true for many devices. Sorry to contradict you once again,
> but you would know it if you had any experience working with devices.
>
>> That is not a meaningful difference. A meaningful difference would be
>> one that has actionable consequences.
>
> Well, it has actionable consequences: if you treat EAGAIN like REDO you
> introduce a busy wait, and if you treat REDO like EAGAIN you introduce a
> significant slowness. Now that you make me mention it, I remember it was
> precisely the reason I introduced REDO: to fix a slowness in resyncs.
>


I'm not interested in REDO / EAGAIN differences.
Can this bug be fixed at all?

> --
>   Nicolas George
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Nicolas George Nov. 9, 2022, 9:21 a.m. UTC | #7
Paul B Mahol (12022-11-09):
> I'm not interested in REDO / EAGAIN differences.

They delete this thread.

> Can this bug be fixed at all?

This bug has been fixed for seven years. I would rather we do not unfix
it.
Paul B Mahol Nov. 9, 2022, 9:23 a.m. UTC | #8
On 11/9/22, Nicolas George <george@nsup.org> wrote:
> Paul B Mahol (12022-11-09):
>> I'm not interested in REDO / EAGAIN differences.
>
> They delete this thread.
>
>> Can this bug be fixed at all?
>
> This bug has been fixed for seven years. I would rather we do not unfix
> it.

The bug is found by Jan IIRC, and fixed by not using EAGAIN return
values in demuxer IIRC.

It would be better that bug is explained in commit log too.

>
> --
>   Nicolas George
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Nicolas George Nov. 9, 2022, 9:29 a.m. UTC | #9
Paul B Mahol (12022-11-09):
> The bug is found by Jan IIRC, and fixed by not using EAGAIN return
> values in demuxer IIRC.

Then I do not know what bug you are referring to. Maybe try honing your
communication skills.
Anton Khirnov Nov. 9, 2022, 9:31 a.m. UTC | #10
Quoting Nicolas George (2022-11-09 09:21:34)
> Anton Khirnov (12022-11-08):
> > Sure, and that's about it. And as I said before - there is no way to
> > tell when a device will be ready, so this is of very limited usefulness.
> 
> This is not true for many devices. Sorry to contradict you once again,
> but you would know it if you had any experience working with devices.

We could really do with fewer of these personal attacks.

> > That is not a meaningful difference. A meaningful difference would be
> > one that has actionable consequences.
> 
> Well, it has actionable consequences: if you treat EAGAIN like REDO you
> introduce a busy wait,

In most devices it's a sleep rather than a busy wait. And in those where
it isn't, it should be.

Furthermore, since the caller has no way of knowing how long to wait,
there is little they can do other than sleeping for a random period and
hoping for the best.

> and if you treat REDO like EAGAIN you introduce a
> significant slowness. Now that you make me mention it, I remember it was
> precisely the reason I introduced REDO: to fix a slowness in resyncs.

I highly doubt that returning control back to the caller will cause any
slowdown in and of itself, it's more about what the caller will do in
response. If they choose to sleep for a random amount of time, then
maybe they should stop doing that (which is exactly what this patchset
does).
Nicolas George Nov. 9, 2022, 9:38 a.m. UTC | #11
Anton Khirnov (12022-11-09):
> > Well, it has actionable consequences: if you treat EAGAIN like REDO you
> > introduce a busy wait,
> In most devices it's a sleep rather than a busy wait. And in those where
> it isn't, it should be.

I do not know how your sentence connects to mine.

> Furthermore, since the caller has no way of knowing how long to wait,
> there is little they can do other than sleeping for a random period and
> hoping for the best.

This is why I have wanted to fix the design of demuxers and demuxer-like
components for years, but it is a tremendous work. In the meantime, we
just do with "av_usleep(1000)" and it is terrible.

> I highly doubt that returning control back to the caller will cause any
> slowdown in and of itself, it's more about what the caller will do in
> response. If they choose to sleep for a random amount of time, then
> maybe they should stop doing that (which is exactly what this patchset
> does).

Sleeping is the only correct reaction to EAGAIN.
Anton Khirnov Nov. 9, 2022, 10:17 a.m. UTC | #12
Quoting Nicolas George (2022-11-09 10:38:06)
> Anton Khirnov (12022-11-09):
> > > Well, it has actionable consequences: if you treat EAGAIN like REDO you
> > > introduce a busy wait,
> > In most devices it's a sleep rather than a busy wait. And in those where
> > it isn't, it should be.
> 
> I do not know how your sentence connects to mine.

If your concern with busy-waiting is pointless energy consumption, then
the correct thing to do is change all busy-waiting devices to sleep
internally if AVFMT_FLAG_NONBLOCK is not specified. I just checked, and
almost all of them actually do exactly this, the only exception is
avfoundation.

> 
> > Furthermore, since the caller has no way of knowing how long to wait,
> > there is little they can do other than sleeping for a random period and
> > hoping for the best.
> 
> This is why I have wanted to fix the design of demuxers and demuxer-like
> components for years, but it is a tremendous work. In the meantime, we
> just do with "av_usleep(1000)" and it is terrible.
> 
> > I highly doubt that returning control back to the caller will cause any
> > slowdown in and of itself, it's more about what the caller will do in
> > response. If they choose to sleep for a random amount of time, then
> > maybe they should stop doing that (which is exactly what this patchset
> > does).
> 
> Sleeping is the only correct reaction to EAGAIN.

First, this would be an incredibly strong claim - given how many
possible usage patterns there are - even if we did have support for
user-side polling. But since we do not, then it's even more dubious,
because the user does not know how long to sleep for. The only truly
correct way to work with arbitrary demuxers currently is run it in
blocking mode in a separate thread (which is exactly what ffmpeg.c does
now).

Furthermore this claim is not supported by development history. mpegts
will currently return EAGAIN on failed resyncs, specifically to give the
caller the opportunity to decide what to do next. RTSP does something
similar. This has nothing to do with waiting for the network, because
neither of these two demuxers know about the state of the network. And
there is no essential difference between what mpegts does with EAGAIN
and e.g. flvdec with REDO.

The point of this set is giving control to the caller. If the caller
wants blocking operations, they unset AVFMT_FLAG_NONBLOCK and don't have
to deal with EAGAINs. If they want the limited support for context
switching that lavf can provide, they set AVFMT_FLAG_NONBLOCK and decide
what to do.
Nicolas George Nov. 9, 2022, 10:41 a.m. UTC | #13
Anton Khirnov (12022-11-09):
> If your concern with busy-waiting is pointless energy consumption, then
> the correct thing to do is change all busy-waiting devices to sleep
> internally if AVFMT_FLAG_NONBLOCK is not specified. I just checked, and
> almost all of them actually do exactly this, the only exception is
> avfoundation.

Yes, devices currently do the right thing as much as our framework
permits.

Breaking that is one of the two ways merging EAGAIN and REDO can break
our code.

> First, this would be an incredibly strong claim - given how many
> possible usage patterns there are

Mostly true nonetheless.

>							The only truly
> correct way to work with arbitrary demuxers currently is run it in
> blocking mode in a separate thread (which is exactly what ffmpeg.c does
> now).

It is not a correct way, only slightly less broken.

> Furthermore this claim is not supported by development history. mpegts
> will currently return EAGAIN on failed resyncs, specifically to give the
> caller the opportunity to decide what to do next.

IIRC, this is precisely what REDO was introduced to fix. If the bug was
introduced again or incompletely fixed, this it is an issue. Indeed, the
two uses of EAGAIN in libavformat/mpegts.c seems highly dubious.

But as I explained, REDO and EAGAIN have a very different semantic with
regard to when the caller needs to retry and cannot be merged.

Now, if you want me to take from my time to explain it again with more
details and more clarity for your benefit… I suggest you tone done the
condescension towards me.
Anton Khirnov Nov. 9, 2022, 11:53 a.m. UTC | #14
Quoting Nicolas George (2022-11-09 11:41:32)
> Anton Khirnov (12022-11-09):
> > If your concern with busy-waiting is pointless energy consumption, then
> > the correct thing to do is change all busy-waiting devices to sleep
> > internally if AVFMT_FLAG_NONBLOCK is not specified. I just checked, and
> > almost all of them actually do exactly this, the only exception is
> > avfoundation.
> 
> Yes, devices currently do the right thing as much as our framework
> permits.
> 
> Breaking that is one of the two ways merging EAGAIN and REDO can break
> our code.

I have no idea what you mean by this. What is being broken and how?

> > Furthermore this claim is not supported by development history. mpegts
> > will currently return EAGAIN on failed resyncs, specifically to give the
> > caller the opportunity to decide what to do next.
> 
> IIRC, this is precisely what REDO was introduced to fix. If the bug was
> introduced again or incompletely fixed, this it is an issue. Indeed, the
> two uses of EAGAIN in libavformat/mpegts.c seems highly dubious.
> 
> But as I explained, REDO and EAGAIN have a very different semantic with
> regard to when the caller needs to retry and cannot be merged.

Maybe more context will clarify the motivation here. What became this
patchset started as an attempt to make all uses of EAGAIN/REDO in
lavf/lavd consistent. While doing that I gradually came to the
conclusion that the distinction is arbitrary - e.g. multiple demuxers
deliberately return EAGAIN as context switch method. Now we could
declare this practice invalid, but as this has been done in multiple
demuxers independently, and has been in the tree for over 10 years, so
this should not be done lightly.

The question then is - what should be the effective rule for deciding
whether a demuxer returns EAGAIN or REDO in any given case?
Opinions on this are very much welcome.
Nicolas George Nov. 9, 2022, 12:42 p.m. UTC | #15
Anton Khirnov (12022-11-09):
> I have no idea what you mean by this. What is being broken and how?

EAGAIN and REDO are different. If you make them the same, you break one
of them. Or possibly both if you are being really stupid, but at least
one.

> Maybe more context will clarify the motivation here. What became this
> patchset started as an attempt to make all uses of EAGAIN/REDO in
> lavf/lavd consistent.

And as I explained, it is not possible.

>			While doing that I gradually came to the
> conclusion that the distinction is arbitrary - e.g. multiple demuxers
> deliberately return EAGAIN as context switch method. Now we could
> declare this practice invalid, but as this has been done in multiple
> demuxers independently, and has been in the tree for over 10 years, so
> this should not be done lightly.

I introduced REDO precisely to fix demuxers who abused EAGAIN like that.

> The question then is - what should be the effective rule for deciding
> whether a demuxer returns EAGAIN or REDO in any given case?
> Opinions on this are very much welcome.

I do not have an opinion, I have knowledge.

The semantic of EAGAIN has always been, since it stabilized, that the
operation cannot be completed immediately, it requires an external
event, like a network packet or an IRQ on a device. Therefore, the only
correct reaction to EAGAIN is to delay retrying. Ideally delay until an
event is received; if that cannot be achieved a small arbitrary delay is
still infinitely better than busy wait.

This is standard Unix system and network programming, knowing this in
and out is IMO an absolute prerequisite for tinkering with FFmpeg's
network code.

The semantic of AVERROR(EAGAIN) is ours to decide, but it would be
really really stupid from us to decide for something that is not aligned
with the wider semantic of EAGAIN. And aligning with EAGAIN is what we
did, with the caveat that most of it only works with devices and
elementary protocols. Except for the bugs at hand.

Now, as you observed, some demuxers have abused AVERROR(EAGAIN) to let
the framework re-call them and continue their work, for example looking
for a resync marker.

This is an abuse, this is invalid. If AVERROR(EAGAIN) is handled
correctly, using it to look for a resync marker causes a delay to be
inserted. The delay is very small, negligible in most cases, but it
should not be there. And if the resync marker is hard to find, the delay
will add up.

If I remember correctly, this is precisely the issue AVERROR_REDO fixes:
a damaged sample that would be very slow to demux without consuming CPU
because of the delays introduced by the correct handling of
AVERROR(EAGAIN).

There was another way to fix the issue: have the demuxers loop
themselves looking for a resync. But it would have required more
reworking of the code.

Last detail: we do not have the API to return to the user in the middle
of the search for a resync marker. If this is something you want for
some reason, you will need to introduce something new. But if so, please
first explain your use case.
diff mbox series

Patch

diff --git a/libavformat/demux.c b/libavformat/demux.c
index 5cd9522367..8c81a58274 100644
--- a/libavformat/demux.c
+++ b/libavformat/demux.c
@@ -572,14 +572,10 @@  FF_ENABLE_DEPRECATION_WARNINGS
         if (err < 0) {
             av_packet_unref(pkt);
 
-            /* Some demuxers return FFERROR_REDO when they consume
-               data and discard it (ignored streams, junk, extradata).
-               We must re-call the demuxer to get the real packet.
-
-               Treat EAGAIN the same as FFERROR_REDO, unless the user
-               requested non-blocking behavior. */
-            if (err == FFERROR_REDO ||
-                (err == AVERROR(EAGAIN) && !(s->flags & AVFMT_FLAG_NONBLOCK)))
+            /* Some demuxers return AVERROR(EAGAIN) when they consume
+               data and discard it (ignored streams, junk, extradata). Call the
+               demuxer again unless the user requested non-blocking behavior. */
+            if (err == AVERROR(EAGAIN) && !(s->flags & AVFMT_FLAG_NONBLOCK))
                 continue;
             if (!pktl || err == AVERROR(EAGAIN))
                 return err;
diff --git a/libavformat/demux.h b/libavformat/demux.h
index 1f57e062f6..a008c3dba1 100644
--- a/libavformat/demux.h
+++ b/libavformat/demux.h
@@ -55,12 +55,6 @@  typedef struct FFStreamInfo {
     int     fps_last_dts_idx;
 } FFStreamInfo;
 
-/**
- * Returned by demuxers to indicate that data was consumed but discarded
- * (ignored streams or junk data). The framework will re-call the demuxer.
- */
-#define FFERROR_REDO FFERRTAG('R','E','D','O')
-
 #define RELATIVE_TS_BASE (INT64_MAX - (1LL << 48))
 
 static av_always_inline int is_relative(int64_t ts)
diff --git a/libavformat/flvdec.c b/libavformat/flvdec.c
index d83edff727..d320a431db 100644
--- a/libavformat/flvdec.c
+++ b/libavformat/flvdec.c
@@ -1058,7 +1058,7 @@  retry:
     }
 
     if (size == 0) {
-        ret = FFERROR_REDO;
+        ret = AVERROR(EAGAIN);
         goto leave;
     }
 
@@ -1110,13 +1110,13 @@  skip:
             av_log(s, AV_LOG_ERROR, "Unable to seek to the next packet\n");
             return AVERROR_INVALIDDATA;
         }
-        ret = FFERROR_REDO;
+        ret = AVERROR(EAGAIN);
         goto leave;
     }
 
     /* skip empty data packets */
     if (!size) {
-        ret = FFERROR_REDO;
+        ret = AVERROR(EAGAIN);
         goto leave;
     }
 
@@ -1160,7 +1160,7 @@  skip:
         (st->discard >= AVDISCARD_BIDIR && ((flags & FLV_VIDEO_FRAMETYPE_MASK) == FLV_FRAME_DISP_INTER && stream_type == FLV_STREAM_TYPE_VIDEO)) ||
          st->discard >= AVDISCARD_ALL) {
         avio_seek(s->pb, next, SEEK_SET);
-        ret = FFERROR_REDO;
+        ret = AVERROR(EAGAIN);
         goto leave;
     }
 
@@ -1273,7 +1273,7 @@  retry_duration:
             if (st->codecpar->extradata) {
                 if ((ret = flv_queue_extradata(flv, s->pb, stream_type, size)) < 0)
                     return ret;
-                ret = FFERROR_REDO;
+                ret = AVERROR(EAGAIN);
                 goto leave;
             }
             if ((ret = flv_get_extradata(s, st, size)) < 0)
@@ -1284,14 +1284,14 @@  retry_duration:
             if (st->codecpar->codec_id == AV_CODEC_ID_AAC && t && !strcmp(t->value, "Omnia A/XE"))
                 st->codecpar->extradata_size = 2;
 
-            ret = FFERROR_REDO;
+            ret = AVERROR(EAGAIN);
             goto leave;
         }
     }
 
     /* skip empty data packets */
     if (!size) {
-        ret = FFERROR_REDO;
+        ret = AVERROR(EAGAIN);
         goto leave;
     }
 
diff --git a/libavformat/lxfdec.c b/libavformat/lxfdec.c
index 8003ae98b7..bd28dfe270 100644
--- a/libavformat/lxfdec.c
+++ b/libavformat/lxfdec.c
@@ -307,7 +307,7 @@  static int lxf_read_packet(AVFormatContext *s, AVPacket *pkt)
     if (stream > 1) {
         av_log(s, AV_LOG_WARNING,
                "got packet with illegal stream index %"PRIu32"\n", stream);
-        return FFERROR_REDO;
+        return AVERROR(EAGAIN);
     }
 
     if (stream == 1 && s->nb_streams < 2) {
diff --git a/libavformat/mlvdec.c b/libavformat/mlvdec.c
index db3b77bb9b..d510787fcd 100644
--- a/libavformat/mlvdec.c
+++ b/libavformat/mlvdec.c
@@ -422,7 +422,7 @@  static int read_packet(AVFormatContext *avctx, AVPacket *pkt)
 
     pb = mlv->pb[sti->index_entries[index].size];
     if (!pb) {
-        ret = FFERROR_REDO;
+        ret = AVERROR(EAGAIN);
         goto next_packet;
     }
     avio_seek(pb, sti->index_entries[index].pos, SEEK_SET);
diff --git a/libavformat/mpeg.c b/libavformat/mpeg.c
index 864b08d8f8..f566ff74bd 100644
--- a/libavformat/mpeg.c
+++ b/libavformat/mpeg.c
@@ -251,7 +251,7 @@  redo:
         if (avio_feof(s->pb))
             return AVERROR_EOF;
         // FIXME we should remember header_state
-        return FFERROR_REDO;
+        return AVERROR(EAGAIN);
     }
 
     if (startcode == PACK_START_CODE)
diff --git a/libavformat/smacker.c b/libavformat/smacker.c
index 1d54e8e917..052b6d28d4 100644
--- a/libavformat/smacker.c
+++ b/libavformat/smacker.c
@@ -332,7 +332,7 @@  static int smacker_read_packet(AVFormatContext *s, AVPacket *pkt)
     }
 
     if (s->streams[smk->videoindex]->discard >= AVDISCARD_ALL) {
-        ret = FFERROR_REDO;
+        ret = AVERROR(EAGAIN);
         goto next_frame;
     }
     if (smk->frame_size >= INT_MAX/2) {