diff mbox series

[FFmpeg-devel,v2] avformat/mpegts: set data broadcast streams as such

Message ID 20220411105031.7924-1-jeebjp@gmail.com
State New
Headers show
Series [FFmpeg-devel,v2] avformat/mpegts: set data broadcast streams as such | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 fail Make fate failed
andriy/make_x86 success Make finished
andriy/make_fate_x86 fail Make fate failed
andriy/make_armv7_RPi4 success Make finished
andriy/make_fate_armv7_RPi4 fail Make fate failed

Commit Message

Jan Ekström April 11, 2022, 10:50 a.m. UTC
From: Jan Ekström <jan.ekstrom@24i.com>

Additionally, they should not be probed, as this is essentially
various types of binary data.

Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
---
 libavformat/mpegts.c                        | 53 +++++++++++++++++++++
 tests/fate/mpegts.mak                       |  3 ++
 tests/ref/fate/mpegts-probe-sdt-data-stream | 14 ++++++
 3 files changed, 70 insertions(+)
 create mode 100644 tests/ref/fate/mpegts-probe-sdt-data-stream

Comments

Jan Ekström April 11, 2022, 1:31 p.m. UTC | #1
On Mon, Apr 11, 2022 at 1:50 PM Jan Ekström <jeebjp@gmail.com> wrote:
>
> From: Jan Ekström <jan.ekstrom@24i.com>
>
> Additionally, they should not be probed, as this is essentially
> various types of binary data.
>
> Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>

Changes from v1:
- checking the length of descriptor to at least be 3 bytes before
reading them with getX(). The only error these functions can throw is
INVALIDDATA in case of the buffer not being large enough so this
should make sure that these reads cannot fail (VS checking each
separately for < 0).
- switching data type to the type returned by getX (int).

Jan
Jan Ekström April 14, 2022, 6:59 a.m. UTC | #2
On Mon, Apr 11, 2022 at 1:50 PM Jan Ekström <jeebjp@gmail.com> wrote:
>
> From: Jan Ekström <jan.ekstrom@24i.com>
>
> Additionally, they should not be probed, as this is essentially
> various types of binary data.
>
> Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
> ---

Ping.

Basically this checks if we have an unknown stream with a private
stream type still at the end of the per-stream loop in PMT parsing,
and then cancels the stop of parsing that usually occurs as a PMT is
hit. Instead the logic will continue parsing further. When an SDT is
then found and a PMT for that program has already been received, it
will then stop header reading at that point.

I do agree that DVB/ETSI deciding to put this descriptor in the SDT is
unfortunate, but actually setting the stream to a non-unknown codec
actually leads to reduced probing time, as unknown streams get probed
within the stream, and such data streams are often of very low data
rate (and generally contain no A/V/S data which could be probed).

Jan
Marton Balint April 19, 2022, midnight UTC | #3
On Thu, 14 Apr 2022, Jan Ekström wrote:

> On Mon, Apr 11, 2022 at 1:50 PM Jan Ekström <jeebjp@gmail.com> wrote:
>>
>> From: Jan Ekström <jan.ekstrom@24i.com>
>>
>> Additionally, they should not be probed, as this is essentially
>> various types of binary data.
>>
>> Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
>> ---
>
> Ping.
>
> Basically this checks if we have an unknown stream with a private
> stream type still at the end of the per-stream loop in PMT parsing,
> and then cancels the stop of parsing that usually occurs as a PMT is
> hit. Instead the logic will continue parsing further. When an SDT is
> then found and a PMT for that program has already been received, it
> will then stop header reading at that point.

But why does it matter when the initial parsing is stopped? I mean it 
stops at the first PMT right now, nobody expects it to find all the 
programs and all the streams or all the stream codecs/parameters.

I am saying, that ideally, the ts->stop_parse magic should not be needed 
to be changed to fix your issue and when an SDT is detected with the 
broadcast descriptor that should stop any existing data stream parsing. Do 
you know why it does not work like that?

Thanks,
Marton
Jan Ekström April 19, 2022, 10:13 a.m. UTC | #4
On Tue, Apr 19, 2022 at 3:00 AM Marton Balint <cus@passwd.hu> wrote:
>
>
>
> On Thu, 14 Apr 2022, Jan Ekström wrote:
>
> > On Mon, Apr 11, 2022 at 1:50 PM Jan Ekström <jeebjp@gmail.com> wrote:
> >>
> >> From: Jan Ekström <jan.ekstrom@24i.com>
> >>
> >> Additionally, they should not be probed, as this is essentially
> >> various types of binary data.
> >>
> >> Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
> >> ---
> >
> > Ping.
> >
> > Basically this checks if we have an unknown stream with a private
> > stream type still at the end of the per-stream loop in PMT parsing,
> > and then cancels the stop of parsing that usually occurs as a PMT is
> > hit. Instead the logic will continue parsing further. When an SDT is
> > then found and a PMT for that program has already been received, it
> > will then stop header reading at that point.
>
> But why does it matter when the initial parsing is stopped? I mean it
> stops at the first PMT right now, nobody expects it to find all the
> programs and all the streams or all the stream codecs/parameters.
>
> I am saying, that ideally, the ts->stop_parse magic should not be needed
> to be changed to fix your issue and when an SDT is detected with the
> broadcast descriptor that should stop any existing data stream parsing. Do
> you know why it does not work like that?
>

If the codec is unknown after header parsing, the general parsing
logic is utilized to probe which codec is possibly in that unknown
stream, and thus more data is read from that stream, which can cause
further delays.

If the codec is known as data, it leads to no such result.

Basically, the idea is to figure out whether a stream is a data stream
or not during header parsing, if possible.

Jan
Jan Ekström April 19, 2022, 2:08 p.m. UTC | #5
On Tue, Apr 19, 2022 at 1:13 PM Jan Ekström <jeebjp@gmail.com> wrote:
>
> On Tue, Apr 19, 2022 at 3:00 AM Marton Balint <cus@passwd.hu> wrote:
> >
> >
> >
> > On Thu, 14 Apr 2022, Jan Ekström wrote:
> >
> > > On Mon, Apr 11, 2022 at 1:50 PM Jan Ekström <jeebjp@gmail.com> wrote:
> > >>
> > >> From: Jan Ekström <jan.ekstrom@24i.com>
> > >>
> > >> Additionally, they should not be probed, as this is essentially
> > >> various types of binary data.
> > >>
> > >> Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
> > >> ---
> > >
> > > Ping.
> > >
> > > Basically this checks if we have an unknown stream with a private
> > > stream type still at the end of the per-stream loop in PMT parsing,
> > > and then cancels the stop of parsing that usually occurs as a PMT is
> > > hit. Instead the logic will continue parsing further. When an SDT is
> > > then found and a PMT for that program has already been received, it
> > > will then stop header reading at that point.
> >
> > But why does it matter when the initial parsing is stopped? I mean it
> > stops at the first PMT right now, nobody expects it to find all the
> > programs and all the streams or all the stream codecs/parameters.
> >
> > I am saying, that ideally, the ts->stop_parse magic should not be needed
> > to be changed to fix your issue and when an SDT is detected with the
> > broadcast descriptor that should stop any existing data stream parsing. Do
> > you know why it does not work like that?
> >
>
> If the codec is unknown after header parsing, the general parsing
> logic is utilized to probe which codec is possibly in that unknown
> stream, and thus more data is read from that stream, which can cause
> further delays.
>
> If the codec is known as data, it leads to no such result.
>
> Basically, the idea is to figure out whether a stream is a data stream
> or not during header parsing, if possible.
>

Just double-checked and the difference is whether
max_stream_analyze_duration gets utilized in
libavformat/demux.c::avformat_find_stream_info .

If a stream is marked as unknown, it will get checked for longer. If
it is marked as a known "codec" it gets through quicker.

You can check an example locally with multicat:

There is a longer sample from an example stream under:
https://megumin.fushizen.eu/samples/2022-02-04-radio_with_data_stream/
with the files
13699753.{ts,aux}

You can then
`multicat -U ./13699753.ts "239.255.1.1:6001@127.0.0.1/ifaddr=127.0.0.1"`
and that should bind & connect and start pushing out UDP multicast via
localhost, using the aux file as timing.

Then you can with f.ex. ffprobe do
`/usr/bin/time -v ffprobe -v verbose -localaddr 127.0.0.1
udp://239.255.1.1:6001`
and compare the probe times with and without SDT being parsed during
the header read :) .

Jan
Marton Balint April 19, 2022, 8:06 p.m. UTC | #6
On Tue, 19 Apr 2022, Jan Ekström wrote:

> On Tue, Apr 19, 2022 at 1:13 PM Jan Ekström <jeebjp@gmail.com> wrote:
>>
>> On Tue, Apr 19, 2022 at 3:00 AM Marton Balint <cus@passwd.hu> wrote:
>> >
>> >
>> >
>> > On Thu, 14 Apr 2022, Jan Ekström wrote:
>> >
>> > > On Mon, Apr 11, 2022 at 1:50 PM Jan Ekström <jeebjp@gmail.com> wrote:
>> > >>
>> > >> From: Jan Ekström <jan.ekstrom@24i.com>
>> > >>
>> > >> Additionally, they should not be probed, as this is essentially
>> > >> various types of binary data.
>> > >>
>> > >> Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
>> > >> ---
>> > >
>> > > Ping.
>> > >
>> > > Basically this checks if we have an unknown stream with a private
>> > > stream type still at the end of the per-stream loop in PMT parsing,
>> > > and then cancels the stop of parsing that usually occurs as a PMT is
>> > > hit. Instead the logic will continue parsing further. When an SDT is
>> > > then found and a PMT for that program has already been received, it
>> > > will then stop header reading at that point.
>> >
>> > But why does it matter when the initial parsing is stopped? I mean it
>> > stops at the first PMT right now, nobody expects it to find all the
>> > programs and all the streams or all the stream codecs/parameters.
>> >
>> > I am saying, that ideally, the ts->stop_parse magic should not be needed
>> > to be changed to fix your issue and when an SDT is detected with the
>> > broadcast descriptor that should stop any existing data stream parsing. Do
>> > you know why it does not work like that?
>> >
>>
>> If the codec is unknown after header parsing, the general parsing
>> logic is utilized to probe which codec is possibly in that unknown
>> stream, and thus more data is read from that stream, which can cause
>> further delays.
>>
>> If the codec is known as data, it leads to no such result.
>>
>> Basically, the idea is to figure out whether a stream is a data stream
>> or not during header parsing, if possible.
>>
>
> Just double-checked and the difference is whether
> max_stream_analyze_duration gets utilized in
> libavformat/demux.c::avformat_find_stream_info .
>
> If a stream is marked as unknown, it will get checked for longer. If
> it is marked as a known "codec" it gets through quicker.

The part of the patch which parses the SDT and sets the codec is fine. 
But the fact that you have to set the codec during mpegts_read_header 
to make it stop parsing definitely looks like some bug in some code 
somewhere. It should be enough to set the codec sometime later in 
mpegts_read_packet() (which is called during avformat_find_stream_info())

Or to make it even more strange: comment out handle_packets() in 
mpegts_read_header. And it will work just fine. So if nothing is parsed 
in mpegts_read_header then it works. If something is parsed, then it does 
not. Kind of unexpected...

Regards,
Marton

>
> There is a longer sample from an example stream under:
> https://megumin.fushizen.eu/samples/2022-02-04-radio_with_data_stream/
> with the files
> 13699753.{ts,aux}
>
> You can then
> `multicat -U ./13699753.ts "239.255.1.1:6001@127.0.0.1/ifaddr=127.0.0.1"`
> and that should bind & connect and start pushing out UDP multicast via
> localhost, using the aux file as timing.
>
> Then you can with f.ex. ffprobe do
> `/usr/bin/time -v ffprobe -v verbose -localaddr 127.0.0.1
> udp://239.255.1.1:6001`
> and compare the probe times with and without SDT being parsed during
> the header read :) .
>
> Jan
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Jan Ekström April 25, 2022, 10:36 a.m. UTC | #7
On Tue, Apr 19, 2022 at 11:06 PM Marton Balint <cus@passwd.hu> wrote:
>
>
>
> On Tue, 19 Apr 2022, Jan Ekström wrote:
>
> > On Tue, Apr 19, 2022 at 1:13 PM Jan Ekström <jeebjp@gmail.com> wrote:
> >>
> >> On Tue, Apr 19, 2022 at 3:00 AM Marton Balint <cus@passwd.hu> wrote:
> >> >
> >> >
> >> >
> >> > On Thu, 14 Apr 2022, Jan Ekström wrote:
> >> >
> >> > > On Mon, Apr 11, 2022 at 1:50 PM Jan Ekström <jeebjp@gmail.com> wrote:
> >> > >>
> >> > >> From: Jan Ekström <jan.ekstrom@24i.com>
> >> > >>
> >> > >> Additionally, they should not be probed, as this is essentially
> >> > >> various types of binary data.
> >> > >>
> >> > >> Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
> >> > >> ---
> >> > >
> >> > > Ping.
> >> > >
> >> > > Basically this checks if we have an unknown stream with a private
> >> > > stream type still at the end of the per-stream loop in PMT parsing,
> >> > > and then cancels the stop of parsing that usually occurs as a PMT is
> >> > > hit. Instead the logic will continue parsing further. When an SDT is
> >> > > then found and a PMT for that program has already been received, it
> >> > > will then stop header reading at that point.
> >> >
> >> > But why does it matter when the initial parsing is stopped? I mean it
> >> > stops at the first PMT right now, nobody expects it to find all the
> >> > programs and all the streams or all the stream codecs/parameters.
> >> >
> >> > I am saying, that ideally, the ts->stop_parse magic should not be needed
> >> > to be changed to fix your issue and when an SDT is detected with the
> >> > broadcast descriptor that should stop any existing data stream parsing. Do
> >> > you know why it does not work like that?
> >> >
> >>
> >> If the codec is unknown after header parsing, the general parsing
> >> logic is utilized to probe which codec is possibly in that unknown
> >> stream, and thus more data is read from that stream, which can cause
> >> further delays.
> >>
> >> If the codec is known as data, it leads to no such result.
> >>
> >> Basically, the idea is to figure out whether a stream is a data stream
> >> or not during header parsing, if possible.
> >>
> >
> > Just double-checked and the difference is whether
> > max_stream_analyze_duration gets utilized in
> > libavformat/demux.c::avformat_find_stream_info .
> >
> > If a stream is marked as unknown, it will get checked for longer. If
> > it is marked as a known "codec" it gets through quicker.
>
> The part of the patch which parses the SDT and sets the codec is fine.
> But the fact that you have to set the codec during mpegts_read_header
> to make it stop parsing definitely looks like some bug in some code
> somewhere. It should be enough to set the codec sometime later in
> mpegts_read_packet() (which is called during avformat_find_stream_info())
>
> Or to make it even more strange: comment out handle_packets() in
> mpegts_read_header. And it will work just fine. So if nothing is parsed
> in mpegts_read_header then it works. If something is parsed, then it does
> not. Kind of unexpected...
>
> Regards,
> Marton

Hi,

So I poked at this again and did the changes you requested. The result
is that if the "continue parsing until SDT if you have these sorts of
streams in-band" logic is removed, it leads to the FATE test ending up
with the result of "codec_name=unknown". Which makes the test rather
useless as it doesn't actually show that the stream is noted as a data
stream. Additionally, I thought the logic made sense since as much as
I dislike having information outside of PMT being utilized for PMT
purposes, that is effectively what this SDT descriptor is utilized
for. Given I specifically limited this logic to unknown streams within
a specific stream identifier range, the effect of this change would
have been far-from-global as well.

I have a hunch that information is copied from the lavf-internal codec
context to codecpar at one point, and then no further synchronization
is attempted.

As for the results being different without any packets being actually
demuxed, codec_info_nb_frames is possibly a variable which has some
relevance to it, although most checks against it seem to be of the > 1
type, instead of > 0.

Jan
Jan Ekström April 25, 2022, 11:33 a.m. UTC | #8
On Mon, Apr 25, 2022 at 1:36 PM Jan Ekström <jeebjp@gmail.com> wrote:
>
> On Tue, Apr 19, 2022 at 11:06 PM Marton Balint <cus@passwd.hu> wrote:
> >
> >
> >
> > On Tue, 19 Apr 2022, Jan Ekström wrote:
> >
> > > On Tue, Apr 19, 2022 at 1:13 PM Jan Ekström <jeebjp@gmail.com> wrote:
> > >>
> > >> On Tue, Apr 19, 2022 at 3:00 AM Marton Balint <cus@passwd.hu> wrote:
> > >> >
> > >> >
> > >> >
> > >> > On Thu, 14 Apr 2022, Jan Ekström wrote:
> > >> >
> > >> > > On Mon, Apr 11, 2022 at 1:50 PM Jan Ekström <jeebjp@gmail.com> wrote:
> > >> > >>
> > >> > >> From: Jan Ekström <jan.ekstrom@24i.com>
> > >> > >>
> > >> > >> Additionally, they should not be probed, as this is essentially
> > >> > >> various types of binary data.
> > >> > >>
> > >> > >> Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
> > >> > >> ---
> > >> > >
> > >> > > Ping.
> > >> > >
> > >> > > Basically this checks if we have an unknown stream with a private
> > >> > > stream type still at the end of the per-stream loop in PMT parsing,
> > >> > > and then cancels the stop of parsing that usually occurs as a PMT is
> > >> > > hit. Instead the logic will continue parsing further. When an SDT is
> > >> > > then found and a PMT for that program has already been received, it
> > >> > > will then stop header reading at that point.
> > >> >
> > >> > But why does it matter when the initial parsing is stopped? I mean it
> > >> > stops at the first PMT right now, nobody expects it to find all the
> > >> > programs and all the streams or all the stream codecs/parameters.
> > >> >
> > >> > I am saying, that ideally, the ts->stop_parse magic should not be needed
> > >> > to be changed to fix your issue and when an SDT is detected with the
> > >> > broadcast descriptor that should stop any existing data stream parsing. Do
> > >> > you know why it does not work like that?
> > >> >
> > >>
> > >> If the codec is unknown after header parsing, the general parsing
> > >> logic is utilized to probe which codec is possibly in that unknown
> > >> stream, and thus more data is read from that stream, which can cause
> > >> further delays.
> > >>
> > >> If the codec is known as data, it leads to no such result.
> > >>
> > >> Basically, the idea is to figure out whether a stream is a data stream
> > >> or not during header parsing, if possible.
> > >>
> > >
> > > Just double-checked and the difference is whether
> > > max_stream_analyze_duration gets utilized in
> > > libavformat/demux.c::avformat_find_stream_info .
> > >
> > > If a stream is marked as unknown, it will get checked for longer. If
> > > it is marked as a known "codec" it gets through quicker.
> >
> > The part of the patch which parses the SDT and sets the codec is fine.
> > But the fact that you have to set the codec during mpegts_read_header
> > to make it stop parsing definitely looks like some bug in some code
> > somewhere. It should be enough to set the codec sometime later in
> > mpegts_read_packet() (which is called during avformat_find_stream_info())
> >
> > Or to make it even more strange: comment out handle_packets() in
> > mpegts_read_header. And it will work just fine. So if nothing is parsed
> > in mpegts_read_header then it works. If something is parsed, then it does
> > not. Kind of unexpected...
> >
> > Regards,
> > Marton
>
> Hi,
>
> So I poked at this again and did the changes you requested. The result
> is that if the "continue parsing until SDT if you have these sorts of
> streams in-band" logic is removed, it leads to the FATE test ending up
> with the result of "codec_name=unknown". Which makes the test rather
> useless as it doesn't actually show that the stream is noted as a data
> stream. Additionally, I thought the logic made sense since as much as
> I dislike having information outside of PMT being utilized for PMT
> purposes, that is effectively what this SDT descriptor is utilized
> for. Given I specifically limited this logic to unknown streams within
> a specific stream identifier range, the effect of this change would
> have been far-from-global as well.
>
> I have a hunch that information is copied from the lavf-internal codec
> context to codecpar at one point, and then no further synchronization
> is attempted.

So, when checking the following diff:

diff --git a/libavformat/demux.c b/libavformat/demux.c
index ad7b5dbf83..6e483acb10 100644
--- a/libavformat/demux.c
+++ b/libavformat/demux.c
@@ -2909,6 +2909,11 @@ int avformat_find_stream_info(AVFormatContext
*ic, AVDictionary **options)
         FFStream *const sti = ffstream(st);
         const char *errmsg;

+        av_log(ic, AV_LOG_VERBOSE,
+               "stream %u (%sinitialized), codec context codec: %s,
codecpar codec: %s\n",
+               i, sti->avctx_inited ? "" : "not ",
+               avcodec_get_name(sti->avctx->codec_id),
avcodec_get_name(st->codecpar->codec_id));
+
         /* if no packet was ever seen, update context now for
has_codec_parameters */
         if (!sti->avctx_inited) {
             if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO &&


..the result was that:
- codec context was already noted as initialized
- the codecpar codec was bin_data
- the codec context one was still none.

I hacked it to do the update in case of sti->need_context_update in
the if visible in the diff, and it led to the update of the end
result.

Thus it seems like read_frame_internal - or some other function should
probably check whether a context update was requested by the demuxer.
That way it might be possible to stop the processing earlier as well.

Jan
Jan Ekström April 25, 2022, 12:19 p.m. UTC | #9
On Mon, Apr 25, 2022 at 2:33 PM Jan Ekström <jeebjp@gmail.com> wrote:
>
> On Mon, Apr 25, 2022 at 1:36 PM Jan Ekström <jeebjp@gmail.com> wrote:
> >
> > On Tue, Apr 19, 2022 at 11:06 PM Marton Balint <cus@passwd.hu> wrote:
> > >
> > >
> > >
> > > On Tue, 19 Apr 2022, Jan Ekström wrote:
> > >
> > > > On Tue, Apr 19, 2022 at 1:13 PM Jan Ekström <jeebjp@gmail.com> wrote:
> > > >>
> > > >> On Tue, Apr 19, 2022 at 3:00 AM Marton Balint <cus@passwd.hu> wrote:
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Thu, 14 Apr 2022, Jan Ekström wrote:
> > > >> >
> > > >> > > On Mon, Apr 11, 2022 at 1:50 PM Jan Ekström <jeebjp@gmail.com> wrote:
> > > >> > >>
> > > >> > >> From: Jan Ekström <jan.ekstrom@24i.com>
> > > >> > >>
> > > >> > >> Additionally, they should not be probed, as this is essentially
> > > >> > >> various types of binary data.
> > > >> > >>
> > > >> > >> Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
> > > >> > >> ---
> > > >> > >
> > > >> > > Ping.
> > > >> > >
> > > >> > > Basically this checks if we have an unknown stream with a private
> > > >> > > stream type still at the end of the per-stream loop in PMT parsing,
> > > >> > > and then cancels the stop of parsing that usually occurs as a PMT is
> > > >> > > hit. Instead the logic will continue parsing further. When an SDT is
> > > >> > > then found and a PMT for that program has already been received, it
> > > >> > > will then stop header reading at that point.
> > > >> >
> > > >> > But why does it matter when the initial parsing is stopped? I mean it
> > > >> > stops at the first PMT right now, nobody expects it to find all the
> > > >> > programs and all the streams or all the stream codecs/parameters.
> > > >> >
> > > >> > I am saying, that ideally, the ts->stop_parse magic should not be needed
> > > >> > to be changed to fix your issue and when an SDT is detected with the
> > > >> > broadcast descriptor that should stop any existing data stream parsing. Do
> > > >> > you know why it does not work like that?
> > > >> >
> > > >>
> > > >> If the codec is unknown after header parsing, the general parsing
> > > >> logic is utilized to probe which codec is possibly in that unknown
> > > >> stream, and thus more data is read from that stream, which can cause
> > > >> further delays.
> > > >>
> > > >> If the codec is known as data, it leads to no such result.
> > > >>
> > > >> Basically, the idea is to figure out whether a stream is a data stream
> > > >> or not during header parsing, if possible.
> > > >>
> > > >
> > > > Just double-checked and the difference is whether
> > > > max_stream_analyze_duration gets utilized in
> > > > libavformat/demux.c::avformat_find_stream_info .
> > > >
> > > > If a stream is marked as unknown, it will get checked for longer. If
> > > > it is marked as a known "codec" it gets through quicker.
> > >
> > > The part of the patch which parses the SDT and sets the codec is fine.
> > > But the fact that you have to set the codec during mpegts_read_header
> > > to make it stop parsing definitely looks like some bug in some code
> > > somewhere. It should be enough to set the codec sometime later in
> > > mpegts_read_packet() (which is called during avformat_find_stream_info())
> > >
> > > Or to make it even more strange: comment out handle_packets() in
> > > mpegts_read_header. And it will work just fine. So if nothing is parsed
> > > in mpegts_read_header then it works. If something is parsed, then it does
> > > not. Kind of unexpected...
> > >
> > > Regards,
> > > Marton
> >
> > Hi,
> >
> > So I poked at this again and did the changes you requested. The result
> > is that if the "continue parsing until SDT if you have these sorts of
> > streams in-band" logic is removed, it leads to the FATE test ending up
> > with the result of "codec_name=unknown". Which makes the test rather
> > useless as it doesn't actually show that the stream is noted as a data
> > stream. Additionally, I thought the logic made sense since as much as
> > I dislike having information outside of PMT being utilized for PMT
> > purposes, that is effectively what this SDT descriptor is utilized
> > for. Given I specifically limited this logic to unknown streams within
> > a specific stream identifier range, the effect of this change would
> > have been far-from-global as well.
> >
> > I have a hunch that information is copied from the lavf-internal codec
> > context to codecpar at one point, and then no further synchronization
> > is attempted.
>
> So, when checking the following diff:
>
> diff --git a/libavformat/demux.c b/libavformat/demux.c
> index ad7b5dbf83..6e483acb10 100644
> --- a/libavformat/demux.c
> +++ b/libavformat/demux.c
> @@ -2909,6 +2909,11 @@ int avformat_find_stream_info(AVFormatContext
> *ic, AVDictionary **options)
>          FFStream *const sti = ffstream(st);
>          const char *errmsg;
>
> +        av_log(ic, AV_LOG_VERBOSE,
> +               "stream %u (%sinitialized), codec context codec: %s,
> codecpar codec: %s\n",
> +               i, sti->avctx_inited ? "" : "not ",
> +               avcodec_get_name(sti->avctx->codec_id),
> avcodec_get_name(st->codecpar->codec_id));
> +
>          /* if no packet was ever seen, update context now for
> has_codec_parameters */
>          if (!sti->avctx_inited) {
>              if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO &&
>
>
> ..the result was that:
> - codec context was already noted as initialized
> - the codecpar codec was bin_data
> - the codec context one was still none.
>
> I hacked it to do the update in case of sti->need_context_update in
> the if visible in the diff, and it led to the update of the end
> result.
>
> Thus it seems like read_frame_internal - or some other function should
> probably check whether a context update was requested by the demuxer.
> That way it might be possible to stop the processing earlier as well.
>

After further checking, it seems like the MPEG-TS demuxer never
returns a packet for the data stream looking at the results from
ff_read_packet/read_frame_internal, which explains why the already
existing sti->need_context_update logic within read_frame_internal
does not get hit.

Jan
Jan Ekström April 25, 2022, 2:45 p.m. UTC | #10
On Mon, Apr 25, 2022 at 3:19 PM Jan Ekström <jeebjp@gmail.com> wrote:
>
> On Mon, Apr 25, 2022 at 2:33 PM Jan Ekström <jeebjp@gmail.com> wrote:
> >
> > On Mon, Apr 25, 2022 at 1:36 PM Jan Ekström <jeebjp@gmail.com> wrote:
> > >
> > > On Tue, Apr 19, 2022 at 11:06 PM Marton Balint <cus@passwd.hu> wrote:
> > > >
> > > >
> > > >
> > > > On Tue, 19 Apr 2022, Jan Ekström wrote:
> > > >
> > > > > On Tue, Apr 19, 2022 at 1:13 PM Jan Ekström <jeebjp@gmail.com> wrote:
> > > > >>
> > > > >> On Tue, Apr 19, 2022 at 3:00 AM Marton Balint <cus@passwd.hu> wrote:
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Thu, 14 Apr 2022, Jan Ekström wrote:
> > > > >> >
> > > > >> > > On Mon, Apr 11, 2022 at 1:50 PM Jan Ekström <jeebjp@gmail.com> wrote:
> > > > >> > >>
> > > > >> > >> From: Jan Ekström <jan.ekstrom@24i.com>
> > > > >> > >>
> > > > >> > >> Additionally, they should not be probed, as this is essentially
> > > > >> > >> various types of binary data.
> > > > >> > >>
> > > > >> > >> Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
> > > > >> > >> ---
> > > > >> > >
> > > > >> > > Ping.
> > > > >> > >
> > > > >> > > Basically this checks if we have an unknown stream with a private
> > > > >> > > stream type still at the end of the per-stream loop in PMT parsing,
> > > > >> > > and then cancels the stop of parsing that usually occurs as a PMT is
> > > > >> > > hit. Instead the logic will continue parsing further. When an SDT is
> > > > >> > > then found and a PMT for that program has already been received, it
> > > > >> > > will then stop header reading at that point.
> > > > >> >
> > > > >> > But why does it matter when the initial parsing is stopped? I mean it
> > > > >> > stops at the first PMT right now, nobody expects it to find all the
> > > > >> > programs and all the streams or all the stream codecs/parameters.
> > > > >> >
> > > > >> > I am saying, that ideally, the ts->stop_parse magic should not be needed
> > > > >> > to be changed to fix your issue and when an SDT is detected with the
> > > > >> > broadcast descriptor that should stop any existing data stream parsing. Do
> > > > >> > you know why it does not work like that?
> > > > >> >
> > > > >>
> > > > >> If the codec is unknown after header parsing, the general parsing
> > > > >> logic is utilized to probe which codec is possibly in that unknown
> > > > >> stream, and thus more data is read from that stream, which can cause
> > > > >> further delays.
> > > > >>
> > > > >> If the codec is known as data, it leads to no such result.
> > > > >>
> > > > >> Basically, the idea is to figure out whether a stream is a data stream
> > > > >> or not during header parsing, if possible.
> > > > >>
> > > > >
> > > > > Just double-checked and the difference is whether
> > > > > max_stream_analyze_duration gets utilized in
> > > > > libavformat/demux.c::avformat_find_stream_info .
> > > > >
> > > > > If a stream is marked as unknown, it will get checked for longer. If
> > > > > it is marked as a known "codec" it gets through quicker.
> > > >
> > > > The part of the patch which parses the SDT and sets the codec is fine.
> > > > But the fact that you have to set the codec during mpegts_read_header
> > > > to make it stop parsing definitely looks like some bug in some code
> > > > somewhere. It should be enough to set the codec sometime later in
> > > > mpegts_read_packet() (which is called during avformat_find_stream_info())
> > > >
> > > > Or to make it even more strange: comment out handle_packets() in
> > > > mpegts_read_header. And it will work just fine. So if nothing is parsed
> > > > in mpegts_read_header then it works. If something is parsed, then it does
> > > > not. Kind of unexpected...
> > > >
> > > > Regards,
> > > > Marton
> > >
> > > Hi,
> > >
> > > So I poked at this again and did the changes you requested. The result
> > > is that if the "continue parsing until SDT if you have these sorts of
> > > streams in-band" logic is removed, it leads to the FATE test ending up
> > > with the result of "codec_name=unknown". Which makes the test rather
> > > useless as it doesn't actually show that the stream is noted as a data
> > > stream. Additionally, I thought the logic made sense since as much as
> > > I dislike having information outside of PMT being utilized for PMT
> > > purposes, that is effectively what this SDT descriptor is utilized
> > > for. Given I specifically limited this logic to unknown streams within
> > > a specific stream identifier range, the effect of this change would
> > > have been far-from-global as well.
> > >
> > > I have a hunch that information is copied from the lavf-internal codec
> > > context to codecpar at one point, and then no further synchronization
> > > is attempted.
> >
> > So, when checking the following diff:
> >
> > diff --git a/libavformat/demux.c b/libavformat/demux.c
> > index ad7b5dbf83..6e483acb10 100644
> > --- a/libavformat/demux.c
> > +++ b/libavformat/demux.c
> > @@ -2909,6 +2909,11 @@ int avformat_find_stream_info(AVFormatContext
> > *ic, AVDictionary **options)
> >          FFStream *const sti = ffstream(st);
> >          const char *errmsg;
> >
> > +        av_log(ic, AV_LOG_VERBOSE,
> > +               "stream %u (%sinitialized), codec context codec: %s,
> > codecpar codec: %s\n",
> > +               i, sti->avctx_inited ? "" : "not ",
> > +               avcodec_get_name(sti->avctx->codec_id),
> > avcodec_get_name(st->codecpar->codec_id));
> > +
> >          /* if no packet was ever seen, update context now for
> > has_codec_parameters */
> >          if (!sti->avctx_inited) {
> >              if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO &&
> >
> >
> > ..the result was that:
> > - codec context was already noted as initialized
> > - the codecpar codec was bin_data
> > - the codec context one was still none.
> >
> > I hacked it to do the update in case of sti->need_context_update in
> > the if visible in the diff, and it led to the update of the end
> > result.
> >
> > Thus it seems like read_frame_internal - or some other function should
> > probably check whether a context update was requested by the demuxer.
> > That way it might be possible to stop the processing earlier as well.
> >
>
> After further checking, it seems like the MPEG-TS demuxer never
> returns a packet for the data stream looking at the results from
> ff_read_packet/read_frame_internal, which explains why the already
> existing sti->need_context_update logic within read_frame_internal
> does not get hit.

Now that I looked at the MPEG-TS packets I see a pretty obvious reason
why packets probably aren't getting through even though a PID filter
is being added.

As far as I can tell, the payload_unit_start_indicator (aka the
is_start variable) is never set to 1, thus most likely not triggering
the logic meant for PES packets.

Jan
diff mbox series

Patch

diff --git a/libavformat/mpegts.c b/libavformat/mpegts.c
index 49f7735123..d41f7fc8bd 100644
--- a/libavformat/mpegts.c
+++ b/libavformat/mpegts.c
@@ -2516,6 +2516,13 @@  static void pmt_cb(MpegTSFilter *filter, const uint8_t *section, int section_len
             }
         }
         p = desc_list_end;
+
+        if (!ts->pkt && (stream_type >= 0x80 && stream_type <= 0xFF) &&
+            st->codecpar->codec_id == AV_CODEC_ID_NONE)
+            // if we are reading headers, and still have a user private stream
+            // with no proper codec set, do not stop reading at PMT. Data
+            // streams are marked within SDT.
+            ts->stop_parse = 0;
     }
 
     if (!ts->pids[pcr_pid])
@@ -2699,6 +2706,7 @@  static void sdt_cb(MpegTSFilter *filter, const uint8_t *section, int section_len
     if (val < 0)
         return;
     for (;;) {
+        struct Program *program = NULL;
         sid = get16(&p, p_end);
         if (sid < 0)
             break;
@@ -2712,6 +2720,15 @@  static void sdt_cb(MpegTSFilter *filter, const uint8_t *section, int section_len
         desc_list_end  = p + desc_list_len;
         if (desc_list_end > p_end)
             break;
+
+        program = get_program(ts, sid);
+
+        if (!ts->pkt && program && program->pmt_found)
+            // if during header reading we have already received a PMT for
+            // this program and now have received an SDT for it, stop further
+            // reading at this point.
+            ts->stop_parse = 2;
+
         for (;;) {
             desc_tag = get8(&p, desc_list_end);
             if (desc_tag < 0)
@@ -2744,6 +2761,42 @@  static void sdt_cb(MpegTSFilter *filter, const uint8_t *section, int section_len
                 av_free(name);
                 av_free(provider_name);
                 break;
+            case 0x64: /* ETSI data broadcast descriptor; EN 300 468 6.2.11 */
+                if (desc_len < 3)
+                    // length of the always available header part, up to and
+                    // including the component_tag field.
+                    break;
+
+                {
+                    AVStream *st  = NULL;
+                    FFStream *sti = NULL;
+
+                    int data_broadcast_id = get16(&p, desc_end); // TS 101 162
+                    int component_tag     = get8(&p, desc_end);
+                    if (!component_tag)
+                        // no stream mapping according to component_tag
+                        break;
+
+                    av_log(ts->stream, AV_LOG_TRACE,
+                           "data broadcast id: %d, component tag: %d\n",
+                           data_broadcast_id, component_tag);
+
+                    if (!program)
+                        break;
+
+                    st = find_matching_stream(ts, 0, sid, component_tag + 1, 0,
+                                              program);
+                    if (!st)
+                        break;
+
+                    sti = ffstream(st);
+
+                    st->codecpar->codec_type = AVMEDIA_TYPE_DATA;
+                    st->codecpar->codec_id   = AV_CODEC_ID_BIN_DATA;
+                    sti->request_probe = sti->need_parsing = 0;
+                    sti->need_context_update = 1;
+                    break;
+                }
             default:
                 break;
             }
diff --git a/tests/fate/mpegts.mak b/tests/fate/mpegts.mak
index bbcbfc47b2..ae21ee87d0 100644
--- a/tests/fate/mpegts.mak
+++ b/tests/fate/mpegts.mak
@@ -19,6 +19,9 @@  FATE_MPEGTS_PROBE-$(call DEMDEC, MPEGTS) += fate-mpegts-probe-pmt-merge
 fate-mpegts-probe-pmt-merge: SRC = $(TARGET_SAMPLES)/mpegts/pmtchange.ts
 fate-mpegts-probe-pmt-merge: CMD = run $(PROBE_CODEC_NAME_COMMAND) -merge_pmt_versions 1 -i "$(SRC)"
 
+FATE_MPEGTS_PROBE-$(call DEMDEC, MPEGTS, MP2) += fate-mpegts-probe-sdt-data-stream
+fate-mpegts-probe-sdt-data-stream: SRC = $(TARGET_SAMPLES)/mpegts/mpegts_sdt_data_stream.ts
+fate-mpegts-probe-sdt-data-stream: CMD = run $(PROBE_CODEC_NAME_COMMAND) -i "$(SRC)"
 
 FATE_SAMPLES_FFPROBE += $(FATE_MPEGTS_PROBE-yes)
 
diff --git a/tests/ref/fate/mpegts-probe-sdt-data-stream b/tests/ref/fate/mpegts-probe-sdt-data-stream
new file mode 100644
index 0000000000..0b8e90962f
--- /dev/null
+++ b/tests/ref/fate/mpegts-probe-sdt-data-stream
@@ -0,0 +1,14 @@ 
+[PROGRAM]
+[STREAM]
+codec_name=mp2
+[/STREAM]
+[STREAM]
+codec_name=bin_data
+[/STREAM]
+[/PROGRAM]
+[STREAM]
+codec_name=mp2
+[/STREAM]
+[STREAM]
+codec_name=bin_data
+[/STREAM]