diff mbox

[FFmpeg-devel,4/4] lavf/mov: Add support for edit list parsing.

Message ID 20160914205729.GS4975@nb4
State Not Applicable
Headers show

Commit Message

Michael Niedermayer Sept. 14, 2016, 8:57 p.m. UTC
On Wed, Sep 14, 2016 at 12:20:52AM -0700, Sasi Inguva wrote:
> On Tue, Sep 13, 2016 at 4:39 PM, Sasi Inguva <isasi@google.com> wrote:
> 
> > Sorry for the very late reply. I was busy with other things.
> >
> > On Sat, Sep 3, 2016 at 4:48 PM, Michael Niedermayer <
> > michael@niedermayer.cc> wrote:
> >
> >> On Sat, Sep 03, 2016 at 12:06:39PM -0700, Sasi Inguva wrote:
> >> > Hi Michael,
> >> >
> >> > In fact from audacity I see that out-ingu.wav out-mp3.wav are almost
> >> > equivalent,
> >>
> >> They do not match. (and that is alot more vissible if you scale the
> >> time axis up a bit)
> >>
> >> You also dont use the existing API for handling initial padding/skip
> >> And you didnt reply to this concern
> >> its probably not that hard to fix that ...
> >> instead of droping just at packet granularity you would set the stuff
> >> for droping at sample granularity (too)
> >>
> >
> > Yes. Looking at it more closely now, they don't matrch exactly and this is
> > because as you said, number of samples to drop is not exactly multiple of
> > number of packets. In that case we need to partially discard some samples
> > of a packet. This can be done by using AV_PKT_SIDE_DATA_SKIP_SAMPLES.  I
> > can change the code to use this packet side data instead of discard packet
> > flag, if it is ok.

ok


> >
> >
> >>
> >> also maybe you missed my question about your oppinion on the correct
> >> timestamp output:
> >> "My first question is, entirely independant of the implementation from
> >>  the patches. What is the correct output ? (my primary focus is on
> >>  the timestamps)"
> >>
> >> Iam curious because to me the timestamps of the dropped packets look
> >> wrong and id like to know your oppinion about this. Is this a
> >> implementation issue (if so please explain) or is there some reason
> >> independant of the implementation  (if so please explain too)
> >>
> >
> > The correct output according to me should be - we should show exactly the
> > same presentation timestmap indicated by the MOV stts, ctts atoms, for the
> > samples that fall inside the edits. As long as the PTS is according to what
> > the edit and stts,ctts atoms say. I don't really have to worry about what
> > the decoding timestamp for those packets should be (DTS might as well be
> > AV_NOPTS_VALUE for all packets) .
> >
> > In the current implementation, we directly assign the timestamp in the
> > AVIndex to  pkt->dts . http://git.videolan.org/?p=ffmpeg.git;a=blob;f=
> > libavformat/mov.c;h=6e80b93271a4f998af6ba1af795d7d7c5d67f5a1;hb=HEAD#l5332
> > . The implementation of the edit list code is such that, it needs to be
> > filled with "linear monotonically increasing timestamps for the
> > non-discarded packets " to have the samples denoted by the editlists to
> > have linear monotonically increasing timestamps, in essence, to avoid a big
> > gap between the presentation of the end of first edit list and the start of
> > next edit list because of the discarded packets.
> >
> Hence while parsing a new edit list we bump down the index entry timestamp
> > to (frame_duration + last non-discarded packet from previous edit list ) .
> > And that's why we get non-monotonic timestamps as a whole in the
> > AVIndexEntry .
> >
> >  I think my wording is confusing here. Just clarifying . For one editlist/
> within one editlist, the AVIndexEntry samples pertaining to that edit, need
> to be filled with "monotonic timestamps which increase in steps of their
> corresponding 'stts' atom entries ", to achieve the correct presentation
> timestamp for those samples.
> 
>  When we start parsing the next edit list entry, and start adding index
> entries to AVIndexEntry we need to avoid a big gap between the presentation
> of the end of first edit and the start of next edit, that might happen
> because of the DISCARD packets added to the AVIndexEntry while parsing the
> first edit. Hence we start adding index entries from a  bumped down value
> corresponding to (frame_duration + last non-discarded packet from previous
> edit ) . And that's why we get non-monotonic timestamps as a whole in the
> AVIndexEntry .
> 
> 
> > Let us take an example of a file with two edit lists. First edit list
> > ending on a B frame. This is what ffprobe -show_packet -pf compact looks
> > like . Where 'D' stands for discarded frame. ( I've attached a 6th patch to
> > ffprobe to achieve this formatting ).
> > ./ffprobe -show_packets -print_format compact mov-2elist-bpyramid-1elist-
> > ends-on-bref.mov
> > .....
> > packet|codec_type=video|stream_index=0|pts=10752|pts_
> > time=0.875000|dts=8192|dts_time=0.666667|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=1281|pos=39546|flags=__ - Pframe
> > packet|codec_type=video|stream_index=0|pts=9728|pts_
> > time=0.791667|dts=8704|dts_time=0.708333|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=749|pos=40827|flags=__ - B pyramidal
> > ref
> > packet|codec_type=video|stream_index=0|pts=9216|pts_
> > time=0.750000|dts=9216|dts_time=0.750000|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=487|pos=41576|flags=__ - B
> > packet|codec_type=video|stream_index=0|pts=10240|pts_
> > time=0.833333|dts=9728|dts_time=0.791667|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=486|pos=42063|flags=__ - B
> > packet|codec_type=video|stream_index=0|pts=12800|pts_
> > time=1.041667|dts=10240|dts_time=0.833333|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=3455|pos=42549|flags=_D - P
> > packet|codec_type=video|stream_index=0|pts=11776|pts_
> > time=0.958333|dts=10752|dts_time=0.875000|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=848|pos=46004|flags=_D - B pyramidal
> > ref
> > packet|codec_type=video|stream_index=0|pts=11264|pts_
> > time=0.916667|dts=11264|dts_time=0.916667|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=655|pos=46852|flags=__ - B
> > packet|codec_type=video|stream_index=0|pts=12288|pts_
> > time=1.000000|dts=11776|dts_time=0.958333|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=901|pos=47507|flags=_D - B
> > packet|codec_type=video|stream_index=0|pts=13312|pts_
> > time=1.083333|dts=12288|dts_time=1.000000|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=5112|pos=48408|flags=KD  - end 1st edit
> > packet|codec_type=video|stream_index=0|pts=11772|pts_
> > time=0.958008|dts=10748|dts_time=0.874674|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=5112|pos=48408|flags=K_ - start 2nd
> > edit
> > packet|codec_type=video|stream_index=0|pts=13820|pts_
> > time=1.124674|dts=11260|dts_time=0.916341|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=1179|pos=53520|flags=__
> > .....
> >
> > The implementation basically increases dts by the amounts that stts atom
> > indicates, and as you can see after the end of the first edit list and
> > start of the next edit list, it bumps down the dts from 12288 to 10650 to
> > make the pts linearly increasing from the last non-discarded frame 11264 ->
> > 11772 .
> >
> > As I understand from DTS semantics in ffmpeg we have two constraints on
> > DTS.
> > i) DTS should be monotonically increasing  ii) For every packet PTS >= DTS
> > should be true.
> >
> >  Just reordering PTS to DTS doesn't work either. For example, here is the
> > DTS  I get from "ffprobe -show_packets " when I do "pkt->dts =
> > AV_NOPTS_VALUE" for all packets in mov_read_packet function
> > http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/mov.c;h=
> > 6e80b93271a4f998af6ba1af795d7d7c5d67f5a1;hb=HEAD#l5332  . Ffmpeg should
> > use update_dts_from_pts function to reoder pts to dts -
> > http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/utils.c;h=
> > 76cbff4ef67937499a7c113dcaacb88389e6015e;hb=HEAD#l1005 .
> >
> > ./ffprobe -show_packets -print_format compact mov-2elist-bpyramid-1elist-
> > ends-on-bref.mov
> > packet|codec_type=video|stream_index=0|pts=0|pts_time=
> > 0.000000|dts=N/A|dts_time=N/A|duration=512|duration_time=0.
> > 041667|convergence_duration=N/A|convergence_duration_time=N/
> > A|size=5683|pos=36|flags=K_
> > packet|codec_type=video|stream_index=0|pts=2048|pts_
> > time=0.166667|dts=N/A|dts_time=N/A|duration=512|duration_time=0.041667|
> > convergence_duration=N/A|convergence_duration_time=N/A|
> > size=4260|pos=5719|flags=__
> > packet|codec_type=video|stream_index=0|pts=1024|pts_
> > time=0.083333|dts=0|dts_time=0.000000|duration=512|duration_time=0.041667|
> > convergence_duration=N/A|convergence_duration_time=N/A|
> > size=3098|pos=9979|flags=__
> > packet|codec_type=video|stream_index=0|pts=512|pts_
> > time=0.041667|dts=512|dts_time=0.041667|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=786|pos=13077|flags=__
> > ....
> > packet|codec_type=video|stream_index=0|pts=10752|pts_
> > time=0.875000|dts=8192|dts_time=0.666667|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=1281|pos=39546|flags=__
> > packet|codec_type=video|stream_index=0|pts=9728|pts_
> > time=0.791667|dts=8704|dts_time=0.708333|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=749|pos=40827|flags=__
> > packet|codec_type=video|stream_index=0|pts=9216|pts_
> > time=0.750000|dts=9216|dts_time=0.750000|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=487|pos=41576|flags=__
> > packet|codec_type=video|stream_index=0|pts=10240|pts_
> > time=0.833333|dts=9728|dts_time=0.791667|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=486|pos=42063|flags=__
> > packet|codec_type=video|stream_index=0|pts=12800|pts_
> > time=1.041667|dts=10240|dts_time=0.833333|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=3455|pos=42549|flags=_D
> > packet|codec_type=video|stream_index=0|pts=11776|pts_
> > time=0.958333|dts=10752|dts_time=0.875000|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=848|pos=46004|flags=_D
> > packet|codec_type=video|stream_index=0|pts=11264|pts_
> > time=0.916667|dts=11264|dts_time=0.916667|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=655|pos=46852|flags=__
> > packet|codec_type=video|stream_index=0|pts=12288|pts_
> > time=1.000000|dts=11776|dts_time=0.958333|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=901|pos=47507|flags=_D
> > packet|codec_type=video|stream_index=0|pts=13312|pts_
> > time=1.083333|dts=12288|dts_time=1.000000|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=5112|pos=48408|flags=KD
> > packet|codec_type=video|stream_index=0|pts=11772|pts_
> > time=0.958008|dts=11772|dts_time=0.958008|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=5112|pos=48408|flags=K_
> > packet|codec_type=video|stream_index=0|pts=13820|pts_
> > time=1.124674|dts=12800|dts_time=1.041667|duration=512|
> > duration_time=0.041667|convergence_duration=N/A|
> > convergence_duration_time=N/A|size=1179|pos=53520|flags=__
> > ....
> > As we can see the dts are still non monotonic.
> >
> > The problem here is that in case of edit lists, we are trying to make the
> > PTS "not grow" when the packets are being discarded. If we look at the
> > non-discarded packets in presentation order, in the above example the PTS
> > is approx. linearly growing 10752, 11264, 11772, with a difference of
> > frame_duration=512 approx. However there are 3 discarded frames that we are
> > outputting in between. So, to grow DTS monotonically linearly we need to
> > grow DTS at a higher frame rate i.e. smaller frame duration.
> >

> > If we want to make DTS  satisfy the two constraints described above , I
> > can recompute the DTS based on an assumed constant frame rate, and shift
> > the DTS so that PTS > DTS always, but this solution wont work for videos
> > with variable frame rates.
> > So in short , these are the options available I see for the DTS
> > i) Keep the change as it is , with non monotonic DTS
> > ii) Mark DTS=AV_NOPTS_VALUE for discarded frames. Maybe it is right
> > semantic to have DTS=N/A for some packets. I don't know.
> > iii) Recompute the DTS as stated above.

I guess i or ii are the least evil
staying with i and changing if it causes issues is probably least work



> >
> > Attaching all the 6 patches and the test file.

the mov patch doesnt apply cleanly, 1 patch per mail is preferred as
well

the ffprobe patch breaks fate
 Strings Metadata|8
-video|0|37|1.480000|34|1.360000|1|0.040000|N/A|N/A|24786|211456|K|1
+video|0|37|1.480000|34|1.360000|1|0.040000|N/A|N/A|24786|211456|K_|1
 Strings Metadata|8


also the mov patch breaks fate:

Test copy-trac236 failed. Look at tests/data/fate/copy-trac236.err for details.
make: *** [fate-copy-trac236] Error 1

either way, new fate samples uploaded

Thanks

[...]

Comments

Michael Niedermayer Sept. 16, 2016, 2:46 a.m. UTC | #1
On Thu, Sep 15, 2016 at 01:08:40PM -0700, Sasi Inguva wrote:
> On Wed, Sep 14, 2016 at 1:57 PM, Michael Niedermayer <michael@niedermayer.cc
> > wrote:
> 
> > On Wed, Sep 14, 2016 at 12:20:52AM -0700, Sasi Inguva wrote:
> > > On Tue, Sep 13, 2016 at 4:39 PM, Sasi Inguva <isasi@google.com> wrote:
> > >
> > > > Sorry for the very late reply. I was busy with other things.
> > > >
> > > > On Sat, Sep 3, 2016 at 4:48 PM, Michael Niedermayer <
> > > > michael@niedermayer.cc> wrote:
> > > >
> > > >> On Sat, Sep 03, 2016 at 12:06:39PM -0700, Sasi Inguva wrote:
> > > >> > Hi Michael,
> > > >> >
> > > >> > In fact from audacity I see that out-ingu.wav out-mp3.wav are almost
> > > >> > equivalent,
> > > >>
> > > >> They do not match. (and that is alot more vissible if you scale the
> > > >> time axis up a bit)
> > > >>
> > > >> You also dont use the existing API for handling initial padding/skip
> > > >> And you didnt reply to this concern
> > > >> its probably not that hard to fix that ...
> > > >> instead of droping just at packet granularity you would set the stuff
> > > >> for droping at sample granularity (too)
> > > >>
> > > >
> > > > Yes. Looking at it more closely now, they don't matrch exactly and
> > this is
> > > > because as you said, number of samples to drop is not exactly multiple
> > of
> > > > number of packets. In that case we need to partially discard some
> > samples
> > > > of a packet. This can be done by using AV_PKT_SIDE_DATA_SKIP_SAMPLES.
> > I
> > > > can change the code to use this packet side data instead of discard
> > packet
> > > > flag, if it is ok.
> >
> > ok
> >
> > I have put my foot in my mouth here. There is no way I can add
> AV_PKT_SIDE_DATA_SKIP_SAMPLES to each packet, unless I add a new field
> "skip_samples" in AVIndexEntry and populate it in mov_fix_index function.
> Adding another entry to AVIndexEntry seems contentious because we want to
> keep the AVIndexEntry as compact as possible, I presume. So what I wil do
> for now is just set st->skip_samples field in mov_fix_index function for
> the first audio edit list. So at least single edit list audio cases ,will
> be correct. In multiple edit list case, the the first edit will be
> correctly output but the second audio edit might have some samples extra.
> Also note that, there is no mechanism for doing "discard_padding" too, in
> MOV demuxer. If we have skip_samples and discard_padding fields in
> AVIndexEntry, then we can implement both of them correctly.
> 
> With the attached patch the test that you have showed, comparing
> out-ingu.wav and out-mp3.wav - are  exactly equivalent.

yes, the patch also improves some audio sync cases that where bad
but it breaks audio sync with faac

./ffmpeg-ref -f u8 -ar 16000 -ac 1 -i sync_audio.raw -acodec libfaac test-ref.mov
./ffmpeg_g -f u8 -ar 16000 -ac 1 -i sync_audio.raw -acodec libfaac test.mov
./ffmpeg-ref -i test-ref.mov test-ref.mov.wav
./ffmpeg_g -i test.mov test.mov.wav


[...]
> @@ -2756,6 +2757,343 @@ static int mov_read_sbgp(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>      return pb->eof_reached ? AVERROR_EOF : 0;
>  }
>  
> +/**
> + * Get ith edit list entry (media time, duration).
> + */
> +static int get_edit_list_entry(const MOVStreamContext *msc,
> +                               unsigned int edit_list_index,
> +                               int64_t *edit_list_media_time,
> +                               int64_t *edit_list_duration,
> +                               int64_t global_timescale)
> +{
> +    if (edit_list_index == msc->elst_count) {
> +        return 0;
> +    }
> +    *edit_list_media_time = msc->elst_data[edit_list_index].time;
> +    *edit_list_duration = msc->elst_data[edit_list_index].duration;
> +    /* duration is in global timescale units;convert to msc timescale */
> +    *edit_list_duration = av_rescale(*edit_list_duration, msc->time_scale,
> +                                     global_timescale);

global_timescale can be 0 here leading to division by 0

[...]
diff mbox

Patch

--- ./tests/ref/fate/copy-trac236       2016-09-14 15:56:10.845886847 +0200
+++ tests/data/fate/copy-trac236        2016-09-14 22:44:44.494403284 +0200
@@ -1,5 +1,5 @@ 
-9b95afdb39b426a33bc962889f820aed *tests/data/fate/copy-trac236.mov
-630802 tests/data/fate/copy-trac236.mov
+d6e3d97b522ce881ed29c5da74cc7e63 *tests/data/fate/copy-trac236.mov
+630810 tests/data/fate/copy-trac236.mov
 #tb 0: 100/2997
 #media_type 0: video
 #codec_id 0: rawvideo