Message ID | 20221108112550.8375-3-anton@khirnov.net |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel,1/5] lavf/rtpdec_asf: set AVFMT_FLAG_NONBLOCK | expand |
Context | Check | Description |
---|---|---|
yinshiyou/configure_loongarch64 | warning | Failed to run configure |
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
Anton Khirnov (12022-11-08): > There is no meaningful distinction between them, both mean "the demuxer > did some work, but did not produce a packet - should be called again". NAK, there a difference in semantics: AVEROR(EAGAIN) is for when data is not available for external reasons, typically network blocking, AVERROR_REDO is for when data is available and the demuxer will read it as soon as it is restarted, just to avoid having a loop in the demuxer.
Quoting Nicolas George (2022-11-08 13:54:53) > Anton Khirnov (12022-11-08): > > There is no meaningful distinction between them, both mean "the demuxer > > did some work, but did not produce a packet - should be called again". > > NAK, there a difference in semantics: AVEROR(EAGAIN) is for when data is > not available for external reasons, typically network blocking, This is false - there are zero demuxers returning EAGAIN due to network blocking. Furthermore, lavf architecture fundamentally does not allow generic non-blocking operations, so the most any demuxer can do is return at a few specific points. And since demuxers are supposed to be independent of the underlying IO protocol, they fundamentally cannot know anything about network state. So there really is no meaningful difference between REDO and EAGAIN. Both are just an opportunity for the caller to do something else before trying again. If we believe giving the caller this option is valuable, then it's valuable in both these cases and it makes no sense to distinguish between them.
Anton Khirnov (12022-11-08): > This is false - there are zero demuxers returning EAGAIN due to network > blocking. There are devices who return EAGAIN due to device blocking. > So there really is no meaningful difference between REDO and EAGAIN. There is a difference: REDO is internal, EAGAIN is public.
Quoting Nicolas George (2022-11-08 15:15:39) > Anton Khirnov (12022-11-08): > > This is false - there are zero demuxers returning EAGAIN due to network > > blocking. > > There are devices who return EAGAIN due to device blocking. Sure, and that's about it. And as I said before - there is no way to tell when a device will be ready, so this is of very limited usefulness. > > So there really is no meaningful difference between REDO and EAGAIN. > > There is a difference: REDO is internal, EAGAIN is public. That is not a meaningful difference. A meaningful difference would be one that has actionable consequences.
Anton Khirnov (12022-11-08): > Sure, and that's about it. And as I said before - there is no way to > tell when a device will be ready, so this is of very limited usefulness. This is not true for many devices. Sorry to contradict you once again, but you would know it if you had any experience working with devices. > That is not a meaningful difference. A meaningful difference would be > one that has actionable consequences. Well, it has actionable consequences: if you treat EAGAIN like REDO you introduce a busy wait, and if you treat REDO like EAGAIN you introduce a significant slowness. Now that you make me mention it, I remember it was precisely the reason I introduced REDO: to fix a slowness in resyncs.
On 11/9/22, Nicolas George <george@nsup.org> wrote: > Anton Khirnov (12022-11-08): >> Sure, and that's about it. And as I said before - there is no way to >> tell when a device will be ready, so this is of very limited usefulness. > > This is not true for many devices. Sorry to contradict you once again, > but you would know it if you had any experience working with devices. > >> That is not a meaningful difference. A meaningful difference would be >> one that has actionable consequences. > > Well, it has actionable consequences: if you treat EAGAIN like REDO you > introduce a busy wait, and if you treat REDO like EAGAIN you introduce a > significant slowness. Now that you make me mention it, I remember it was > precisely the reason I introduced REDO: to fix a slowness in resyncs. > I'm not interested in REDO / EAGAIN differences. Can this bug be fixed at all? > -- > Nicolas George > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >
Paul B Mahol (12022-11-09): > I'm not interested in REDO / EAGAIN differences. They delete this thread. > Can this bug be fixed at all? This bug has been fixed for seven years. I would rather we do not unfix it.
On 11/9/22, Nicolas George <george@nsup.org> wrote: > Paul B Mahol (12022-11-09): >> I'm not interested in REDO / EAGAIN differences. > > They delete this thread. > >> Can this bug be fixed at all? > > This bug has been fixed for seven years. I would rather we do not unfix > it. The bug is found by Jan IIRC, and fixed by not using EAGAIN return values in demuxer IIRC. It would be better that bug is explained in commit log too. > > -- > Nicolas George > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >
Paul B Mahol (12022-11-09): > The bug is found by Jan IIRC, and fixed by not using EAGAIN return > values in demuxer IIRC. Then I do not know what bug you are referring to. Maybe try honing your communication skills.
Quoting Nicolas George (2022-11-09 09:21:34) > Anton Khirnov (12022-11-08): > > Sure, and that's about it. And as I said before - there is no way to > > tell when a device will be ready, so this is of very limited usefulness. > > This is not true for many devices. Sorry to contradict you once again, > but you would know it if you had any experience working with devices. We could really do with fewer of these personal attacks. > > That is not a meaningful difference. A meaningful difference would be > > one that has actionable consequences. > > Well, it has actionable consequences: if you treat EAGAIN like REDO you > introduce a busy wait, In most devices it's a sleep rather than a busy wait. And in those where it isn't, it should be. Furthermore, since the caller has no way of knowing how long to wait, there is little they can do other than sleeping for a random period and hoping for the best. > and if you treat REDO like EAGAIN you introduce a > significant slowness. Now that you make me mention it, I remember it was > precisely the reason I introduced REDO: to fix a slowness in resyncs. I highly doubt that returning control back to the caller will cause any slowdown in and of itself, it's more about what the caller will do in response. If they choose to sleep for a random amount of time, then maybe they should stop doing that (which is exactly what this patchset does).
Anton Khirnov (12022-11-09): > > Well, it has actionable consequences: if you treat EAGAIN like REDO you > > introduce a busy wait, > In most devices it's a sleep rather than a busy wait. And in those where > it isn't, it should be. I do not know how your sentence connects to mine. > Furthermore, since the caller has no way of knowing how long to wait, > there is little they can do other than sleeping for a random period and > hoping for the best. This is why I have wanted to fix the design of demuxers and demuxer-like components for years, but it is a tremendous work. In the meantime, we just do with "av_usleep(1000)" and it is terrible. > I highly doubt that returning control back to the caller will cause any > slowdown in and of itself, it's more about what the caller will do in > response. If they choose to sleep for a random amount of time, then > maybe they should stop doing that (which is exactly what this patchset > does). Sleeping is the only correct reaction to EAGAIN.
Quoting Nicolas George (2022-11-09 10:38:06) > Anton Khirnov (12022-11-09): > > > Well, it has actionable consequences: if you treat EAGAIN like REDO you > > > introduce a busy wait, > > In most devices it's a sleep rather than a busy wait. And in those where > > it isn't, it should be. > > I do not know how your sentence connects to mine. If your concern with busy-waiting is pointless energy consumption, then the correct thing to do is change all busy-waiting devices to sleep internally if AVFMT_FLAG_NONBLOCK is not specified. I just checked, and almost all of them actually do exactly this, the only exception is avfoundation. > > > Furthermore, since the caller has no way of knowing how long to wait, > > there is little they can do other than sleeping for a random period and > > hoping for the best. > > This is why I have wanted to fix the design of demuxers and demuxer-like > components for years, but it is a tremendous work. In the meantime, we > just do with "av_usleep(1000)" and it is terrible. > > > I highly doubt that returning control back to the caller will cause any > > slowdown in and of itself, it's more about what the caller will do in > > response. If they choose to sleep for a random amount of time, then > > maybe they should stop doing that (which is exactly what this patchset > > does). > > Sleeping is the only correct reaction to EAGAIN. First, this would be an incredibly strong claim - given how many possible usage patterns there are - even if we did have support for user-side polling. But since we do not, then it's even more dubious, because the user does not know how long to sleep for. The only truly correct way to work with arbitrary demuxers currently is run it in blocking mode in a separate thread (which is exactly what ffmpeg.c does now). Furthermore this claim is not supported by development history. mpegts will currently return EAGAIN on failed resyncs, specifically to give the caller the opportunity to decide what to do next. RTSP does something similar. This has nothing to do with waiting for the network, because neither of these two demuxers know about the state of the network. And there is no essential difference between what mpegts does with EAGAIN and e.g. flvdec with REDO. The point of this set is giving control to the caller. If the caller wants blocking operations, they unset AVFMT_FLAG_NONBLOCK and don't have to deal with EAGAINs. If they want the limited support for context switching that lavf can provide, they set AVFMT_FLAG_NONBLOCK and decide what to do.
Anton Khirnov (12022-11-09): > If your concern with busy-waiting is pointless energy consumption, then > the correct thing to do is change all busy-waiting devices to sleep > internally if AVFMT_FLAG_NONBLOCK is not specified. I just checked, and > almost all of them actually do exactly this, the only exception is > avfoundation. Yes, devices currently do the right thing as much as our framework permits. Breaking that is one of the two ways merging EAGAIN and REDO can break our code. > First, this would be an incredibly strong claim - given how many > possible usage patterns there are Mostly true nonetheless. > The only truly > correct way to work with arbitrary demuxers currently is run it in > blocking mode in a separate thread (which is exactly what ffmpeg.c does > now). It is not a correct way, only slightly less broken. > Furthermore this claim is not supported by development history. mpegts > will currently return EAGAIN on failed resyncs, specifically to give the > caller the opportunity to decide what to do next. IIRC, this is precisely what REDO was introduced to fix. If the bug was introduced again or incompletely fixed, this it is an issue. Indeed, the two uses of EAGAIN in libavformat/mpegts.c seems highly dubious. But as I explained, REDO and EAGAIN have a very different semantic with regard to when the caller needs to retry and cannot be merged. Now, if you want me to take from my time to explain it again with more details and more clarity for your benefit… I suggest you tone done the condescension towards me.
Quoting Nicolas George (2022-11-09 11:41:32) > Anton Khirnov (12022-11-09): > > If your concern with busy-waiting is pointless energy consumption, then > > the correct thing to do is change all busy-waiting devices to sleep > > internally if AVFMT_FLAG_NONBLOCK is not specified. I just checked, and > > almost all of them actually do exactly this, the only exception is > > avfoundation. > > Yes, devices currently do the right thing as much as our framework > permits. > > Breaking that is one of the two ways merging EAGAIN and REDO can break > our code. I have no idea what you mean by this. What is being broken and how? > > Furthermore this claim is not supported by development history. mpegts > > will currently return EAGAIN on failed resyncs, specifically to give the > > caller the opportunity to decide what to do next. > > IIRC, this is precisely what REDO was introduced to fix. If the bug was > introduced again or incompletely fixed, this it is an issue. Indeed, the > two uses of EAGAIN in libavformat/mpegts.c seems highly dubious. > > But as I explained, REDO and EAGAIN have a very different semantic with > regard to when the caller needs to retry and cannot be merged. Maybe more context will clarify the motivation here. What became this patchset started as an attempt to make all uses of EAGAIN/REDO in lavf/lavd consistent. While doing that I gradually came to the conclusion that the distinction is arbitrary - e.g. multiple demuxers deliberately return EAGAIN as context switch method. Now we could declare this practice invalid, but as this has been done in multiple demuxers independently, and has been in the tree for over 10 years, so this should not be done lightly. The question then is - what should be the effective rule for deciding whether a demuxer returns EAGAIN or REDO in any given case? Opinions on this are very much welcome.
Anton Khirnov (12022-11-09): > I have no idea what you mean by this. What is being broken and how? EAGAIN and REDO are different. If you make them the same, you break one of them. Or possibly both if you are being really stupid, but at least one. > Maybe more context will clarify the motivation here. What became this > patchset started as an attempt to make all uses of EAGAIN/REDO in > lavf/lavd consistent. And as I explained, it is not possible. > While doing that I gradually came to the > conclusion that the distinction is arbitrary - e.g. multiple demuxers > deliberately return EAGAIN as context switch method. Now we could > declare this practice invalid, but as this has been done in multiple > demuxers independently, and has been in the tree for over 10 years, so > this should not be done lightly. I introduced REDO precisely to fix demuxers who abused EAGAIN like that. > The question then is - what should be the effective rule for deciding > whether a demuxer returns EAGAIN or REDO in any given case? > Opinions on this are very much welcome. I do not have an opinion, I have knowledge. The semantic of EAGAIN has always been, since it stabilized, that the operation cannot be completed immediately, it requires an external event, like a network packet or an IRQ on a device. Therefore, the only correct reaction to EAGAIN is to delay retrying. Ideally delay until an event is received; if that cannot be achieved a small arbitrary delay is still infinitely better than busy wait. This is standard Unix system and network programming, knowing this in and out is IMO an absolute prerequisite for tinkering with FFmpeg's network code. The semantic of AVERROR(EAGAIN) is ours to decide, but it would be really really stupid from us to decide for something that is not aligned with the wider semantic of EAGAIN. And aligning with EAGAIN is what we did, with the caveat that most of it only works with devices and elementary protocols. Except for the bugs at hand. Now, as you observed, some demuxers have abused AVERROR(EAGAIN) to let the framework re-call them and continue their work, for example looking for a resync marker. This is an abuse, this is invalid. If AVERROR(EAGAIN) is handled correctly, using it to look for a resync marker causes a delay to be inserted. The delay is very small, negligible in most cases, but it should not be there. And if the resync marker is hard to find, the delay will add up. If I remember correctly, this is precisely the issue AVERROR_REDO fixes: a damaged sample that would be very slow to demux without consuming CPU because of the delays introduced by the correct handling of AVERROR(EAGAIN). There was another way to fix the issue: have the demuxers loop themselves looking for a resync. But it would have required more reworking of the code. Last detail: we do not have the API to return to the user in the middle of the search for a resync marker. If this is something you want for some reason, you will need to introduce something new. But if so, please first explain your use case.
diff --git a/libavformat/demux.c b/libavformat/demux.c index 5cd9522367..8c81a58274 100644 --- a/libavformat/demux.c +++ b/libavformat/demux.c @@ -572,14 +572,10 @@ FF_ENABLE_DEPRECATION_WARNINGS if (err < 0) { av_packet_unref(pkt); - /* Some demuxers return FFERROR_REDO when they consume - data and discard it (ignored streams, junk, extradata). - We must re-call the demuxer to get the real packet. - - Treat EAGAIN the same as FFERROR_REDO, unless the user - requested non-blocking behavior. */ - if (err == FFERROR_REDO || - (err == AVERROR(EAGAIN) && !(s->flags & AVFMT_FLAG_NONBLOCK))) + /* Some demuxers return AVERROR(EAGAIN) when they consume + data and discard it (ignored streams, junk, extradata). Call the + demuxer again unless the user requested non-blocking behavior. */ + if (err == AVERROR(EAGAIN) && !(s->flags & AVFMT_FLAG_NONBLOCK)) continue; if (!pktl || err == AVERROR(EAGAIN)) return err; diff --git a/libavformat/demux.h b/libavformat/demux.h index 1f57e062f6..a008c3dba1 100644 --- a/libavformat/demux.h +++ b/libavformat/demux.h @@ -55,12 +55,6 @@ typedef struct FFStreamInfo { int fps_last_dts_idx; } FFStreamInfo; -/** - * Returned by demuxers to indicate that data was consumed but discarded - * (ignored streams or junk data). The framework will re-call the demuxer. - */ -#define FFERROR_REDO FFERRTAG('R','E','D','O') - #define RELATIVE_TS_BASE (INT64_MAX - (1LL << 48)) static av_always_inline int is_relative(int64_t ts) diff --git a/libavformat/flvdec.c b/libavformat/flvdec.c index d83edff727..d320a431db 100644 --- a/libavformat/flvdec.c +++ b/libavformat/flvdec.c @@ -1058,7 +1058,7 @@ retry: } if (size == 0) { - ret = FFERROR_REDO; + ret = AVERROR(EAGAIN); goto leave; } @@ -1110,13 +1110,13 @@ skip: av_log(s, AV_LOG_ERROR, "Unable to seek to the next packet\n"); return AVERROR_INVALIDDATA; } - ret = FFERROR_REDO; + ret = AVERROR(EAGAIN); goto leave; } /* skip empty data packets */ if (!size) { - ret = FFERROR_REDO; + ret = AVERROR(EAGAIN); goto leave; } @@ -1160,7 +1160,7 @@ skip: (st->discard >= AVDISCARD_BIDIR && ((flags & FLV_VIDEO_FRAMETYPE_MASK) == FLV_FRAME_DISP_INTER && stream_type == FLV_STREAM_TYPE_VIDEO)) || st->discard >= AVDISCARD_ALL) { avio_seek(s->pb, next, SEEK_SET); - ret = FFERROR_REDO; + ret = AVERROR(EAGAIN); goto leave; } @@ -1273,7 +1273,7 @@ retry_duration: if (st->codecpar->extradata) { if ((ret = flv_queue_extradata(flv, s->pb, stream_type, size)) < 0) return ret; - ret = FFERROR_REDO; + ret = AVERROR(EAGAIN); goto leave; } if ((ret = flv_get_extradata(s, st, size)) < 0) @@ -1284,14 +1284,14 @@ retry_duration: if (st->codecpar->codec_id == AV_CODEC_ID_AAC && t && !strcmp(t->value, "Omnia A/XE")) st->codecpar->extradata_size = 2; - ret = FFERROR_REDO; + ret = AVERROR(EAGAIN); goto leave; } } /* skip empty data packets */ if (!size) { - ret = FFERROR_REDO; + ret = AVERROR(EAGAIN); goto leave; } diff --git a/libavformat/lxfdec.c b/libavformat/lxfdec.c index 8003ae98b7..bd28dfe270 100644 --- a/libavformat/lxfdec.c +++ b/libavformat/lxfdec.c @@ -307,7 +307,7 @@ static int lxf_read_packet(AVFormatContext *s, AVPacket *pkt) if (stream > 1) { av_log(s, AV_LOG_WARNING, "got packet with illegal stream index %"PRIu32"\n", stream); - return FFERROR_REDO; + return AVERROR(EAGAIN); } if (stream == 1 && s->nb_streams < 2) { diff --git a/libavformat/mlvdec.c b/libavformat/mlvdec.c index db3b77bb9b..d510787fcd 100644 --- a/libavformat/mlvdec.c +++ b/libavformat/mlvdec.c @@ -422,7 +422,7 @@ static int read_packet(AVFormatContext *avctx, AVPacket *pkt) pb = mlv->pb[sti->index_entries[index].size]; if (!pb) { - ret = FFERROR_REDO; + ret = AVERROR(EAGAIN); goto next_packet; } avio_seek(pb, sti->index_entries[index].pos, SEEK_SET); diff --git a/libavformat/mpeg.c b/libavformat/mpeg.c index 864b08d8f8..f566ff74bd 100644 --- a/libavformat/mpeg.c +++ b/libavformat/mpeg.c @@ -251,7 +251,7 @@ redo: if (avio_feof(s->pb)) return AVERROR_EOF; // FIXME we should remember header_state - return FFERROR_REDO; + return AVERROR(EAGAIN); } if (startcode == PACK_START_CODE) diff --git a/libavformat/smacker.c b/libavformat/smacker.c index 1d54e8e917..052b6d28d4 100644 --- a/libavformat/smacker.c +++ b/libavformat/smacker.c @@ -332,7 +332,7 @@ static int smacker_read_packet(AVFormatContext *s, AVPacket *pkt) } if (s->streams[smk->videoindex]->discard >= AVDISCARD_ALL) { - ret = FFERROR_REDO; + ret = AVERROR(EAGAIN); goto next_frame; } if (smk->frame_size >= INT_MAX/2) {