diff mbox

[FFmpeg-devel] mov: do not mark timed thumbnail tracks as attached picture

Message ID 20180126230236.15481-1-nfxjfg@googlemail.com
State New
Headers show

Commit Message

wm4 Jan. 26, 2018, 11:02 p.m. UTC
The AV_DISPOSITION_ATTACHED_PIC flag is meant only for exporting a
static picture (such as embedded cover art) as pseudo video track. The
requirement is that there is exactly 1 packet, and that normal demuxing
does not return it (otherwise avformat_queue_attached_pictures() would
add it a second time). It should never used on actual video tracks that
return packets when demuxing.

I considered keeping the current behavior if there is exactly 1 frame
(according to nb_index_entries), but that would require additional work
to make sure avformat_queue_attached_pictures() does not add it a second
time, so I didn't bother.

Reverts part of 697400eac07c0614f6b9f2e7615563982dbcbe4a and fixes
regressions with certain API users with such mp4 files.
---
 libavformat/mov.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

Comments

Marton Balint Jan. 27, 2018, 12:44 a.m. UTC | #1
On Sat, 27 Jan 2018, wm4 wrote:

> The AV_DISPOSITION_ATTACHED_PIC flag is meant only for exporting a
> static picture (such as embedded cover art) as pseudo video track. The
> requirement is that there is exactly 1 packet, and that normal demuxing
> does not return it (otherwise avformat_queue_attached_pictures() would
> add it a second time). It should never used on actual video tracks that
> return packets when demuxing.

Docs in avformat.h only says that AV_DISPOSITION_ATTACHED_PIC "usually" 
contains one packet.

>
> I considered keeping the current behavior if there is exactly 1 frame
> (according to nb_index_entries), but that would require additional work
> to make sure avformat_queue_attached_pictures() does not add it a second
> time, so I didn't bother.
>
> Reverts part of 697400eac07c0614f6b9f2e7615563982dbcbe4a and fixes
> regressions with certain API users with such mp4 files.
> ---
> libavformat/mov.c | 18 +-----------------
> 1 file changed, 1 insertion(+), 17 deletions(-)
>
> diff --git a/libavformat/mov.c b/libavformat/mov.c
> index 22faecfc17..f74be03359 100644
> --- a/libavformat/mov.c
> +++ b/libavformat/mov.c
> @@ -6335,23 +6335,7 @@ static void mov_read_chapters(AVFormatContext *s)
>         cur_pos = avio_tell(sc->pb);
>
>         if (st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
> -            st->disposition |= AV_DISPOSITION_ATTACHED_PIC | AV_DISPOSITION_TIMED_THUMBNAILS;
> -            if (st->nb_index_entries) {
> -                // Retrieve the first frame, if possible
> -                AVPacket pkt;
> -                AVIndexEntry *sample = &st->index_entries[0];
> -                if (avio_seek(sc->pb, sample->pos, SEEK_SET) != sample->pos) {
> -                    av_log(s, AV_LOG_ERROR, "Failed to retrieve first frame\n");
> -                    goto finish;
> -                }
> -
> -                if (av_get_packet(sc->pb, &pkt, sample->size) < 0)
> -                    goto finish;
> -
> -                st->attached_pic              = pkt;
> -                st->attached_pic.stream_index = st->index;
> -                st->attached_pic.flags       |= AV_PKT_FLAG_KEY;
> -            }
> +            st->disposition |= AV_DISPOSITION_TIMED_THUMBNAILS;

AV_DISPOSITION_TIMED_THUMBNAILS is only ever used with 
AV_DISPOSITION_ATTACHED_PIC according to the docs in avformat.h.

Like it or not, that is how the flag was introduced, so I'd rather not 
change it now. It made sense to introduce it this way, because checking 
for the ATTACHED_PIC flag was used (for example in ffmpeg when using the 
capital V stream speicifier) to search for ordinary video streams. By 
using this flag for timed thumbnails as well, applications did not have to 
check for an additional flag when searching for ordinary video streams.

So I am against this patch, probably better to fix the regression in the 
API user side, because it seems to me this is one of those cases where 
something will regress no matter what we do, so it is better to not cause 
new regressions and advise the API user to work around existing ones 
according to the slightly changed semantics of the API.

Regards,
Marton
wm4 Jan. 27, 2018, 1:15 a.m. UTC | #2
On Sat, 27 Jan 2018 01:44:04 +0100 (CET)
Marton Balint <cus@passwd.hu> wrote:

> On Sat, 27 Jan 2018, wm4 wrote:
> 
> > The AV_DISPOSITION_ATTACHED_PIC flag is meant only for exporting a
> > static picture (such as embedded cover art) as pseudo video track. The
> > requirement is that there is exactly 1 packet, and that normal demuxing
> > does not return it (otherwise avformat_queue_attached_pictures() would
> > add it a second time). It should never used on actual video tracks that
> > return packets when demuxing.  
> 
> Docs in avformat.h only says that AV_DISPOSITION_ATTACHED_PIC "usually" 
> contains one packet.

It also implies AVStream.attached_pic is set, which can be only one
picture.

> >
> > I considered keeping the current behavior if there is exactly 1 frame
> > (according to nb_index_entries), but that would require additional work
> > to make sure avformat_queue_attached_pictures() does not add it a second
> > time, so I didn't bother.
> >
> > Reverts part of 697400eac07c0614f6b9f2e7615563982dbcbe4a and fixes
> > regressions with certain API users with such mp4 files.
> > ---
> > libavformat/mov.c | 18 +-----------------
> > 1 file changed, 1 insertion(+), 17 deletions(-)
> >
> > diff --git a/libavformat/mov.c b/libavformat/mov.c
> > index 22faecfc17..f74be03359 100644
> > --- a/libavformat/mov.c
> > +++ b/libavformat/mov.c
> > @@ -6335,23 +6335,7 @@ static void mov_read_chapters(AVFormatContext *s)
> >         cur_pos = avio_tell(sc->pb);
> >
> >         if (st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
> > -            st->disposition |= AV_DISPOSITION_ATTACHED_PIC | AV_DISPOSITION_TIMED_THUMBNAILS;
> > -            if (st->nb_index_entries) {
> > -                // Retrieve the first frame, if possible
> > -                AVPacket pkt;
> > -                AVIndexEntry *sample = &st->index_entries[0];
> > -                if (avio_seek(sc->pb, sample->pos, SEEK_SET) != sample->pos) {
> > -                    av_log(s, AV_LOG_ERROR, "Failed to retrieve first frame\n");
> > -                    goto finish;
> > -                }
> > -
> > -                if (av_get_packet(sc->pb, &pkt, sample->size) < 0)
> > -                    goto finish;
> > -
> > -                st->attached_pic              = pkt;
> > -                st->attached_pic.stream_index = st->index;
> > -                st->attached_pic.flags       |= AV_PKT_FLAG_KEY;
> > -            }
> > +            st->disposition |= AV_DISPOSITION_TIMED_THUMBNAILS;  
> 
> AV_DISPOSITION_TIMED_THUMBNAILS is only ever used with 
> AV_DISPOSITION_ATTACHED_PIC according to the docs in avformat.h.

Well then there can be only 1 timed thumbnail. As documented, this flag
has been nonsense, and not even libavformat itself respected it. As I
wrote in the commit message, the first packet is added twice, once
injected by utils.c, and then again returned by mov.c.

How does this make any sense?

> Like it or not, that is how the flag was introduced, so I'd rather not 
> change it now. It made sense to introduce it this way, because checking 
> for the ATTACHED_PIC flag was used (for example in ffmpeg when using the 
> capital V stream speicifier) to search for ordinary video streams. By 
> using this flag for timed thumbnails as well, applications did not have to 
> check for an additional flag when searching for ordinary video streams.

This is also nonsense. ATTACHED_PIC is not meant for any stream
selection stuff that ffmpeg.c might do, it means that it's a cover art
picture imported from metadata. (To be honest that doesn't make sense
either, it's just a dumb hack that should never have existed in the
first place and that broke a lot of things when it was introduced.)

If you want some flag that means "ordinary video stream", it should
probably be introduced separately, instead of abusing ATTACHED_PIC for
it.

> So I am against this patch, probably better to fix the regression in the 
> API user side, because it seems to me this is one of those cases where 
> something will regress no matter what we do, so it is better to not cause 
> new regressions and advise the API user to work around existing ones 
> according to the slightly changed semantics of the API.

It's already "regressed" because it's been in a permanent state of
being buggy and making no sense at all.

I can fix the avformat.h description of AV_DISPOSITION_TIMED_THUMBNAILS.
Marton Balint Jan. 27, 2018, 5:07 p.m. UTC | #3
On Sat, 27 Jan 2018, wm4 wrote:

>> Docs in avformat.h only says that AV_DISPOSITION_ATTACHED_PIC "usually" 
>> contains one packet.
>
> It also implies AVStream.attached_pic is set, which can be only one
> picture.

From the docs I'd assume that attached_pic contains 
the _first_ packet of the stream if multiple packets are there for a 
stream with ATTACHED_PIC flag set.

>> AV_DISPOSITION_TIMED_THUMBNAILS is only ever used with 
>> AV_DISPOSITION_ATTACHED_PIC according to the docs in avformat.h.
>
> Well then there can be only 1 timed thumbnail.
>
> As documented, this flag
> has been nonsense, and not even libavformat itself respected it. As I
> wrote in the commit message, the first packet is added twice, once
> injected by utils.c, and then again returned by mov.c.
>
> How does this make any sense?

Returning the packet twice is a bug then. We should fix that instead of 
changing the semantics of flags.

>
>> Like it or not, that is how the flag was introduced, so I'd rather not 
>> change it now. It made sense to introduce it this way, because checking 
>> for the ATTACHED_PIC flag was used (for example in ffmpeg when using the 
>> capital V stream speicifier) to search for ordinary video streams. By 
>> using this flag for timed thumbnails as well, applications did not have to 
>> check for an additional flag when searching for ordinary video streams.
>
> This is also nonsense. ATTACHED_PIC is not meant for any stream
> selection stuff that ffmpeg.c might do, it means that it's a cover art
> picture imported from metadata. (To be honest that doesn't make sense
> either, it's just a dumb hack that should never have existed in the
> first place and that broke a lot of things when it was introduced.)
>
> If you want some flag that means "ordinary video stream", it should
> probably be introduced separately, instead of abusing ATTACHED_PIC for
> it.

Well, my intent was to able to select ordinary video streams when I 
introduced the "V" stream specifier, and that is how I tried to document 
it. Any patch that breaks this behaviour is a regression from my point of 
view.

>
>> So I am against this patch, probably better to fix the regression in the 
>> API user side, because it seems to me this is one of those cases where 
>> something will regress no matter what we do, so it is better to not cause 
>> new regressions and advise the API user to work around existing ones 
>> according to the slightly changed semantics of the API.
>
> It's already "regressed" because it's been in a permanent state of
> being buggy and making no sense at all.

Returning a packet twice is obviously wrong, other than that it's just 
how it works now.

> I can fix the avformat.h description of AV_DISPOSITION_TIMED_THUMBNAILS.

If you want to pursue this then
- make the docs consistent with your changes
- document the API change in APIChanges
- make sure ffmpeg.c "V" stream specifier does not regress
- ask some insight from the orignal patch author who introduced the
   TIMED_THUMBNAILS flag, he might have his own reasons or comments worth
   considering

Thanks,
Marton
wm4 Jan. 27, 2018, 9:03 p.m. UTC | #4
On Sat, 27 Jan 2018 18:07:53 +0100 (CET)
Marton Balint <cus@passwd.hu> wrote:

> On Sat, 27 Jan 2018, wm4 wrote:
> 
> >> Docs in avformat.h only says that AV_DISPOSITION_ATTACHED_PIC "usually" 
> >> contains one packet.  
> >
> > It also implies AVStream.attached_pic is set, which can be only one
> > picture.  
> 
> From the docs I'd assume that attached_pic contains 
> the _first_ packet of the stream if multiple packets are there for a 
> stream with ATTACHED_PIC flag set.
> 
> >> AV_DISPOSITION_TIMED_THUMBNAILS is only ever used with 
> >> AV_DISPOSITION_ATTACHED_PIC according to the docs in avformat.h.  
> >
> > Well then there can be only 1 timed thumbnail.
> >
> > As documented, this flag
> > has been nonsense, and not even libavformat itself respected it. As I
> > wrote in the commit message, the first packet is added twice, once
> > injected by utils.c, and then again returned by mov.c.
> >
> > How does this make any sense?  
> 
> Returning the packet twice is a bug then. We should fix that instead of 
> changing the semantics of flags.

Sure, but it's an example how all assumptions about ATTACHED_PICs got
broken. ffplay has the same problem btw.: it makes ATTACHED_PIC tracks
entirely static. It uses that flag just like my own code to distinguish
single picture pseudo video tracks from tracks that actually have
timestamped frames.

> >  
> >> Like it or not, that is how the flag was introduced, so I'd rather not 
> >> change it now. It made sense to introduce it this way, because checking 
> >> for the ATTACHED_PIC flag was used (for example in ffmpeg when using the 
> >> capital V stream speicifier) to search for ordinary video streams. By 
> >> using this flag for timed thumbnails as well, applications did not have to 
> >> check for an additional flag when searching for ordinary video streams.  
> >
> > This is also nonsense. ATTACHED_PIC is not meant for any stream
> > selection stuff that ffmpeg.c might do, it means that it's a cover art
> > picture imported from metadata. (To be honest that doesn't make sense
> > either, it's just a dumb hack that should never have existed in the
> > first place and that broke a lot of things when it was introduced.)
> >
> > If you want some flag that means "ordinary video stream", it should
> > probably be introduced separately, instead of abusing ATTACHED_PIC for
> > it.  
> 
> Well, my intent was to able to select ordinary video streams when I 
> introduced the "V" stream specifier, and that is how I tried to document 
> it. Any patch that breaks this behaviour is a regression from my point of 
> view.

Well, it was also a regression from my point of view.

> >  
> >> So I am against this patch, probably better to fix the regression in the 
> >> API user side, because it seems to me this is one of those cases where 
> >> something will regress no matter what we do, so it is better to not cause 
> >> new regressions and advise the API user to work around existing ones 
> >> according to the slightly changed semantics of the API.  
> >
> > It's already "regressed" because it's been in a permanent state of
> > being buggy and making no sense at all.  
> 
> Returning a packet twice is obviously wrong, other than that it's just 
> how it works now.
> 
> > I can fix the avformat.h description of AV_DISPOSITION_TIMED_THUMBNAILS.  
> 
> If you want to pursue this then
> - make the docs consistent with your changes
> - document the API change in APIChanges
> - make sure ffmpeg.c "V" stream specifier does not regress
> - ask some insight from the orignal patch author who introduced the
>    TIMED_THUMBNAILS flag, he might have his own reasons or comments worth
>    considering

He agreed to my patch.

anyway, I already added a crappy workaround for this to my code. But
I'll block any other overloaded uses of ATTACHED_PIC that might be
added in the future, because that'd mean me (and possibly other API
users) would regress again and would have to check another flag as
exception.
diff mbox

Patch

diff --git a/libavformat/mov.c b/libavformat/mov.c
index 22faecfc17..f74be03359 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -6335,23 +6335,7 @@  static void mov_read_chapters(AVFormatContext *s)
         cur_pos = avio_tell(sc->pb);
 
         if (st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
-            st->disposition |= AV_DISPOSITION_ATTACHED_PIC | AV_DISPOSITION_TIMED_THUMBNAILS;
-            if (st->nb_index_entries) {
-                // Retrieve the first frame, if possible
-                AVPacket pkt;
-                AVIndexEntry *sample = &st->index_entries[0];
-                if (avio_seek(sc->pb, sample->pos, SEEK_SET) != sample->pos) {
-                    av_log(s, AV_LOG_ERROR, "Failed to retrieve first frame\n");
-                    goto finish;
-                }
-
-                if (av_get_packet(sc->pb, &pkt, sample->size) < 0)
-                    goto finish;
-
-                st->attached_pic              = pkt;
-                st->attached_pic.stream_index = st->index;
-                st->attached_pic.flags       |= AV_PKT_FLAG_KEY;
-            }
+            st->disposition |= AV_DISPOSITION_TIMED_THUMBNAILS;
         } else {
             st->codecpar->codec_type = AVMEDIA_TYPE_DATA;
             st->codecpar->codec_id = AV_CODEC_ID_BIN_DATA;