diff mbox series

[FFmpeg-devel] mov.c log qt ref extenal essence metadata

Message ID 20210306082941.974-1-emcodem@ffastrans.com
State New
Headers show
Series [FFmpeg-devel] mov.c log qt ref extenal essence metadata
Related show

Checks

Context Check Description
andriy/x86_make success Make finished
andriy/x86_make_fate success Make fate finished
andriy/PPC64_make success Make finished
andriy/PPC64_make_fate success Make fate finished

Commit Message

emcodem March 6, 2021, 8:29 a.m. UTC
---
 libavformat/mov.c | 8 ++++++++
 1 file changed, 8 insertions(+)

In quicktime reference files, exposing the parsed info for external essences location can be very handy for users

Comments

Jan Ekström March 6, 2021, 9:48 a.m. UTC | #1
On Sat, Mar 6, 2021 at 10:38 AM emcodem <emcodem@ffastrans.com> wrote:
>
> ---
>  libavformat/mov.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> In quicktime reference files, exposing the parsed info for external essences location can be very handy for users
>

Unfortunately, as per the discussion we had yesterday on #ffmpeg, Data
References are not as simple as mov.c might make it feel like.

So, if we think that a single MOV/MP4 Track is:
1. A set of decode'able packets of a certain media type (as far as I
can tell that has been the limitation, while other things can change
during a Track the media type is one that doesn't).
2. Which is then presented according to a virtual time line (edit
lists, which we will in this case ignore since they get applied on top
of the decoded result on the presentation layer, and data references
are on the packet set level).

Thus if we go through the layers:
1. We have samples (packets in FFmpeg parliance more or less, stsz
defines the sizes of them and so forth).
2. Samples get put into chunks, which are basically tuple of
(sample_description_index, data offset) - see stsc, stco, co64.
3. Sample Description can thought of as a tuple of (AVCodec, the
extradata (if any) required, data reference index), there is a list of
them in the Track's stsd box.
4. Then finally we get to the data reference list in the dref box of the track.

Currently as far as I can tell from reading mov_read_stsd /
ff_mov_read_stsd_entries, it does generate extradata buffer for each
sample description, but effectively only keeps a single data reference
around in the MOVStreamContext, skipping the whole chunk matching etc
part of things :) (if I am reading the code correctly, which I might
not be).

So yea, there's two questions:
1. Should this be exposed?
2. If it should be exposed, how? A set of metadata this should not be,
as this at the very least would end up being a weird set/list of byte
offsets/sizes and references :)

So yea, sorry for things not actually being as simple as they look by
the code in mov.c.

Jan
emcodem March 6, 2021, 12:11 p.m. UTC | #2
Am 2021-03-06 10:48, schrieb Jan Ekström:
> On Sat, Mar 6, 2021 at 10:38 AM emcodem <emcodem@ffastrans.com> wrote:
>> 
>> ---
>>  libavformat/mov.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>> 
>> In quicktime reference files, exposing the parsed info for external 
>> essences location can be very handy for users
>> 
> 
> Unfortunately, as per the discussion we had yesterday on #ffmpeg, Data
> References are not as simple as mov.c might make it feel like.
> 

Thanks again for the chat yesterday! I thought i better open the topic 
here so i can do the work async :D

> So, if we think that a single MOV/MP4 Track is:
> 1. A set of decode'able packets of a certain media type (as far as I
> can tell that has been the limitation, while other things can change
> during a Track the media type is one that doesn't).
> 2. Which is then presented according to a virtual time line (edit
> lists, which we will in this case ignore since they get applied on top
> of the decoded result on the presentation layer, and data references
> are on the packet set level).
> 
> Thus if we go through the layers:
> 1. We have samples (packets in FFmpeg parliance more or less, stsz
> defines the sizes of them and so forth).
> 2. Samples get put into chunks, which are basically tuple of
> (sample_description_index, data offset) - see stsc, stco, co64.
> 3. Sample Description can thought of as a tuple of (AVCodec, the
> extradata (if any) required, data reference index), there is a list of
> them in the Track's stsd box.
> 4. Then finally we get to the data reference list in the dref box of 
> the track.
> 
> Currently as far as I can tell from reading mov_read_stsd /
> ff_mov_read_stsd_entries, it does generate extradata buffer for each
> sample description, but effectively only keeps a single data reference
> around in the MOVStreamContext, skipping the whole chunk matching etc
> part of things :) (if I am reading the code correctly, which I might
> not be).
> 

Hmmm not sure why you refer to extradata, how is this connected to the 
dref besides both are stored on streamcontext level? (but yes, i also 
understand that it would be overwritten for the current pseudotrack in 
case it is called multiple times, not sure if that can happen tough)

> So yea, there's two questions:
> 1. Should this be exposed?

Well it is vital information. mov.c unfortunately misses the 
functionality that the original quicktime engine has: try to resolve the 
referenced path on multiple different locations (e.g. try every 
connected root device/driveletter), so it occcasionally fails to process 
qt ref files.
Now i am not that experienced that i can add this missing part cross-OS 
in C but exposing it is cheap and simple. When it's exposed, API users 
or scripters have a much easier locating the media and set cwd 
accordingly or even work with the referenced media directly.

> 2. If it should be exposed, how? A set of metadata this should not be,
> as this at the very least would end up being a weird set/list of byte
> offsets/sizes and references :)
> 
> So yea, sorry for things not actually being as simple as they look by
> the code in mov.c.

How could it end up as a weird set of byte offests/references? I mean i 
totally see your point that dref is more like on the same level as media 
type, so kind of top-level but i miss any example how to present an 
array of objects on that level.
What my code definitely misses is to add the dref_id, so i imagine the 
"key", e.g. dref_path should be better presented as 
sprintf("dref_path_%d",sc->dref_id).

What you think?
diff mbox series

Patch

diff --git a/libavformat/mov.c b/libavformat/mov.c
index 1c07cff6b5..e9625c0cf9 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -4275,6 +4275,14 @@  static int mov_read_trak(MOVContext *c, AVIOContext *pb, MOVAtom atom)
 
     if (sc->dref_id-1 < sc->drefs_count && sc->drefs[sc->dref_id-1].path) {
         MOVDref *dref = &sc->drefs[sc->dref_id - 1];
+        
+        av_dict_set(&st->metadata, "dref_path", dref->path, 0);
+        av_dict_set(&st->metadata, "dref_dir", dref->dir, 0);
+        av_dict_set(&st->metadata, "dref_filename", dref->filename, 0);
+        av_dict_set(&st->metadata, "dref_volume", dref->volume, 0);
+        av_dict_set_int(&st->metadata, "dref_nlvl_from", dref->nlvl_from, 0);
+        av_dict_set_int(&st->metadata, "nlvl_to", dref->nlvl_to, 0);
+        
         if (c->enable_drefs) {
             if (mov_open_dref(c, &sc->pb, c->fc->url, dref) < 0)
                 av_log(c->fc, AV_LOG_ERROR,