diff mbox

[FFmpeg-devel,4/4] lavf/mov: Add support for edit list parsing.

Message ID CAGD_KHfgs54s5ji3seSZoL0MXjUVq--wZbim2rt+-T7Trjtj9w@mail.gmail.com
State Superseded
Headers show

Commit Message

Sasi Inguva Aug. 31, 2016, 1:37 a.m. UTC
On Sun, Aug 28, 2016 at 3:10 AM, Michael Niedermayer <michael@niedermayer.cc
> wrote:

> On Sat, Aug 27, 2016 at 03:30:24PM -0700, Sasi Inguva wrote:
> > On Fri, Aug 26, 2016 at 5:55 PM, Michael Niedermayer
> <michael@niedermayer.cc
> > > wrote:
> >
> > > On Fri, Aug 26, 2016 at 12:49:19PM -0700, Sasi Inguva wrote:
> > > > I think there is some bug in mp3 decoder which is making skip
> > > > samples -1431655766 for ~/tickets/5528/fire.mp3 . For now I have
> removed
> > > > the assert from the 3rd commit.
> > > > For the file one.mov , I think the audio has edit list with start
> time
> > > > correspending to the second sample - (which should be media time
> 1024 if
> > > I
> > > > guess correctly). This indicates that the first sample be used for
> > > encoder
> > > > delay.
> > > >  Previously without edit  list parsing , we used to offset the
> start_dts
> > > by
> > > > -1024 to make the second sample timestamp start from zero.
> > > >              sc->time_offset = start_time - empty_duration;
> > > > -            current_dts = -sc->time_offset;
> > > >              if (sc->ctts_count>0 && sc->stts_count>0 &&
> > > >
> > > > But now edit list parsing handles the rebasing of timestamps to zero,
> > > > because it will  assign increasing timestamps  starting from zero, to
> > > > samples present in the edit list.
> > >
> > > > Because the first sample is not in the
> > > > edit list, we mark it as DISCARD, which flag av_decode_audio4 will
> look
> > > at
> > > > and decode-and-discard that frame. So it wouldn't matter what the
> first
> > > > sample timestamp should be assigned because it is anyway discarded
> after
> > > > decode.
> > >
> > > current applications using libavformat have no knowledge of the
> > > discard flag you can add the DISCARD flag, you can set it on packets
> but
> > > applications written or built for libavformat 57 dont have to know
> > > this flag and can treat the packets like any other packet.
> > >
> >
> > Yes. they can treat the packet like any other packet. They don't have to
> > know about the discard flag.
> >
> > Adding this feature without a major version bump requires things to
> > > still work reasonable with the old semantics that apps are build
> > > around. That should be possible unless iam missing something
> >
> >
> > As I have pointed out earlier this code will change the timestamps of the
> > packets. In the case of multiple edit lists, the code will also repeat
> some
> > packets, and might make the timestamps non-monotonous. I don't know if
> the
> > last behavior is not  an acceptable expectation from av_read_frame.
>
> if timestamps repeat then it will not be possible to seek in the file
> by timestamp (to all positions) and i suspect also not by byte position.
> How would one seek then ?
> or do i misunderstand ?
>
>
In case of MOV  container, the seek is done using av_index_search_timestamp
function
http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/mov.c;h=f4999068519f1f06f6b5d84ca007148e74e5a82e;hb=HEAD#l5419
.

i) In case of single edit list , the timestamps will only be repeated but
not non-monotonous. In that case av_index_search_timestamp will still work
correctly, only that it will seek to any one of the packets with the same
timestamp. However when we decode the file, then all of the discarded
packets with similar timestamps should be discarded and only the
non-discarded packet will bet output. So in short,
./ffprobe -read_intervals 0.0  -show_frames -print_format compact
one-editlist-audio.mov
<https://drive.google.com/file/d/0Bz6XfEJZ-9N3ejNkMW9yU0k4ZFE/view?usp=sharing>
should
give exactly the same behavior as before while -show_packets will show more
discarded packets at the start.  I had to change the patch (4) a bit to
make the audio-seek on MOV to work  correctly, so please reapply the
attached patch to test.

ii) In case of multiple edit lists , timestamps might be non monotonous  so
existing av_index_search_timestamp implementation won't be correct, since
it assumes sorted timestamps. However making it work for discarded packets
is not that hard. I have attached a 5th patch that changes av_index_search
function. This fixes the issue in (i) also


> > However as I've pointed out, we are already showing the wrong packets for
> > videos with multiple edit lists by not parsing the edit lists currently,
> > which will introduce AVSync issues. So this patch wont make things any
> > worse for those cases in my view.
>
> Is it difficult to adjust the timestamps of discarded packets to avoid
> timestamp discontinuities ? (for the cases where this is possible of
> course only)
> the timestamps for discarded packets i would assume are meaningless in
> the new semenatics but they matter for the old semantics
> again, please correct me if iam wrong
>
> The way fix_index is implemented it is difficult to change the timestamps
to avoid discotinuities and still have the timeline the same as MOV edit
lists would intend.
The timestamps for discarded packets are meaningless to av_decode_*
functions because they parse the DISCARD flag and ignore the packets. I am
not sure, what you mean by semantics though, because I don't think I am
changing any user expectations defined by the mov_read_frame mov_seek_frame
functions . If by semantics, you mean that user expects to see
monotonically increasing timestamps for the "demuxed " packets then yes
that expectation changes to " monotonically increasing timestamps for the
"demuxed and non-discarded" packets" and user has to parse the discard
flag. And to reiterate , this semantic is only affected in the case of
files with multiple edit lists which are very less in number , since in
case of single edit list videos timestamps will only be repeated and still
are monotonically increasing.

As of now with 5th patch the seek also works with edit lists.  I can hide
this functionality under a flag like "support_edit_liist" in MOV demuxer,
and turn it off by default, if u think that is better.



> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> If you drop bombs on a foreign country and kill a hundred thousand
> innocent people, expect your government to call the consequence
> "unprovoked inhuman terrorist attacks" and use it to justify dropping
> more bombs and killing more people. The technology changed, the idea is
> old.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>

Comments

Michael Niedermayer Aug. 31, 2016, 12:23 p.m. UTC | #1
On Tue, Aug 30, 2016 at 06:37:22PM -0700, Sasi Inguva wrote:
> On Sun, Aug 28, 2016 at 3:10 AM, Michael Niedermayer <michael@niedermayer.cc
> > wrote:
> 
> > On Sat, Aug 27, 2016 at 03:30:24PM -0700, Sasi Inguva wrote:
> > > On Fri, Aug 26, 2016 at 5:55 PM, Michael Niedermayer
> > <michael@niedermayer.cc
> > > > wrote:
> > >
> > > > On Fri, Aug 26, 2016 at 12:49:19PM -0700, Sasi Inguva wrote:
> > > > > I think there is some bug in mp3 decoder which is making skip
> > > > > samples -1431655766 for ~/tickets/5528/fire.mp3 . For now I have
> > removed
> > > > > the assert from the 3rd commit.
> > > > > For the file one.mov , I think the audio has edit list with start
> > time
> > > > > correspending to the second sample - (which should be media time
> > 1024 if
> > > > I
> > > > > guess correctly). This indicates that the first sample be used for
> > > > encoder
> > > > > delay.
> > > > >  Previously without edit  list parsing , we used to offset the
> > start_dts
> > > > by
> > > > > -1024 to make the second sample timestamp start from zero.
> > > > >              sc->time_offset = start_time - empty_duration;
> > > > > -            current_dts = -sc->time_offset;
> > > > >              if (sc->ctts_count>0 && sc->stts_count>0 &&
> > > > >
> > > > > But now edit list parsing handles the rebasing of timestamps to zero,
> > > > > because it will  assign increasing timestamps  starting from zero, to
> > > > > samples present in the edit list.
> > > >
> > > > > Because the first sample is not in the
> > > > > edit list, we mark it as DISCARD, which flag av_decode_audio4 will
> > look
> > > > at
> > > > > and decode-and-discard that frame. So it wouldn't matter what the
> > first
> > > > > sample timestamp should be assigned because it is anyway discarded
> > after
> > > > > decode.
> > > >
> > > > current applications using libavformat have no knowledge of the
> > > > discard flag you can add the DISCARD flag, you can set it on packets
> > but
> > > > applications written or built for libavformat 57 dont have to know
> > > > this flag and can treat the packets like any other packet.
> > > >
> > >
> > > Yes. they can treat the packet like any other packet. They don't have to
> > > know about the discard flag.
> > >
> > > Adding this feature without a major version bump requires things to
> > > > still work reasonable with the old semantics that apps are build
> > > > around. That should be possible unless iam missing something
> > >
> > >
> > > As I have pointed out earlier this code will change the timestamps of the
> > > packets. In the case of multiple edit lists, the code will also repeat
> > some
> > > packets, and might make the timestamps non-monotonous. I don't know if
> > the
> > > last behavior is not  an acceptable expectation from av_read_frame.
> >
> > if timestamps repeat then it will not be possible to seek in the file
> > by timestamp (to all positions) and i suspect also not by byte position.
> > How would one seek then ?
> > or do i misunderstand ?
> >
> >
> In case of MOV  container, the seek is done using av_index_search_timestamp
> function
> http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/mov.c;h=f4999068519f1f06f6b5d84ca007148e74e5a82e;hb=HEAD#l5419
> .
> 
> i) In case of single edit list , the timestamps will only be repeated but
> not non-monotonous. In that case av_index_search_timestamp will still work
> correctly, only that it will seek to any one of the packets with the same
> timestamp. However when we decode the file, then all of the discarded
> packets with similar timestamps should be discarded and only the
> non-discarded packet will bet output. So in short,
> ./ffprobe -read_intervals 0.0  -show_frames -print_format compact
> one-editlist-audio.mov
> <https://drive.google.com/file/d/0Bz6XfEJZ-9N3ejNkMW9yU0k4ZFE/view?usp=sharing>
> should
> give exactly the same behavior as before while -show_packets will show more
> discarded packets at the start.  I had to change the patch (4) a bit to
> make the audio-seek on MOV to work  correctly, so please reapply the
> attached patch to test.
> 
> ii) In case of multiple edit lists , timestamps might be non monotonous  so
> existing av_index_search_timestamp implementation won't be correct, since
> it assumes sorted timestamps. However making it work for discarded packets
> is not that hard. I have attached a 5th patch that changes av_index_search
> function. This fixes the issue in (i) also
> 
> 
> > > However as I've pointed out, we are already showing the wrong packets for
> > > videos with multiple edit lists by not parsing the edit lists currently,
> > > which will introduce AVSync issues. So this patch wont make things any
> > > worse for those cases in my view.
> >
> > Is it difficult to adjust the timestamps of discarded packets to avoid
> > timestamp discontinuities ? (for the cases where this is possible of
> > course only)
> > the timestamps for discarded packets i would assume are meaningless in
> > the new semenatics but they matter for the old semantics
> > again, please correct me if iam wrong
> >
> > The way fix_index is implemented it is difficult to change the timestamps
> to avoid discotinuities and still have the timeline the same as MOV edit
> lists would intend.

My first question is, entirely independant of the implementation from
the patches. What is the correct output ? (my primary focus is on
the timestamps)


Also if there are discontinuities, AVFMT_TS_DISCONT is normally set
(and this never happens for files with indexes but only for files
 that dont have indexes like mpeg*)
players will generally seek by file position if AVFMT_TS_DISCONT is
set (because timestamps are ambigous in that case)

iam not sure how to seek reliably by file position in a mov
with edit lists and stll have the file positions actually be file
positions of the frames, so this direction gets tricky too ...



> The timestamps for discarded packets are meaningless to av_decode_*
> functions because they parse the DISCARD flag and ignore the packets. I am
> not sure, what you mean by semantics though, because I don't think I am
> changing any user expectations defined by the mov_read_frame mov_seek_frame
> functions .

> If by semantics, you mean that user expects to see
> monotonically increasing timestamps for the "demuxed " packets then yes
> that expectation changes to " monotonically increasing timestamps for the
> "demuxed and non-discarded" packets" and user has to parse the discard
> flag.

Adding a flag that "must be parsed" would be a incompatible API change.
It would require a major version bump

also this patchset changes streamcopy
try:
./ffmpeg-ref -i matrixbench_mpeg2.mpg  -acodec mp3 -vn -t 1  test-ref.mov
./ffmpeg     -i matrixbench_mpeg2.mpg  -acodec mp3 -vn -t 1  test.mov
./ffmpeg-ref -i test-ref.mov -acodec copy test2-ref.mov
./ffmpeg     -i test.mov -acodec copy test2.mov

old code:
./ffmpeg-ref -i test-ref.mov -aframes 3 -f framecrc -
0,          0,          0,     1152,     4608, 0xa2a00df2
0,       1152,       1152,     1152,     4608, 0xa573dfd4
0,       2304,       2304,     1152,     4608, 0x8994a906
and
./ffmpeg-ref -i test2-ref.mov -aframes 3 -f framecrc -
0,          0,          0,     1152,     4608, 0xa2a00df2
0,       1152,       1152,     1152,     4608, 0xa573dfd4
0,       2304,       2304,     1152,     4608, 0x8994a906

new code:
./ffmpeg -i test.mov -aframes 3 -f framecrc
0,          0,          0,     1152,     4608, 0xa573dfd4
0,       1152,       1152,     1152,     4608, 0x8994a906
0,       2304,       2304,     1152,     4608, 0x824d1a30
and
./ffmpeg -i test2.mov -aframes 3 -f framecrc -
0,          0,          0,     1152,     4608, 0xa2a00df2
0,          1,          1,     1152,     4608, 0xa573dfd4
0,       1152,       1152,     1152,     4608, 0x8994a906


[...]
Sasi Inguva Sept. 2, 2016, 8 p.m. UTC | #2
On Aug 31, 2016 5:23 AM, "Michael Niedermayer" <michael@niedermayer.cc>
wrote:
>
> On Tue, Aug 30, 2016 at 06:37:22PM -0700, Sasi Inguva wrote:
> > On Sun, Aug 28, 2016 at 3:10 AM, Michael Niedermayer
<michael@niedermayer.cc
> > > wrote:
> >
> > > On Sat, Aug 27, 2016 at 03:30:24PM -0700, Sasi Inguva wrote:
> > > > On Fri, Aug 26, 2016 at 5:55 PM, Michael Niedermayer
> > > <michael@niedermayer.cc
> > > > > wrote:
> > > >
> > > > > On Fri, Aug 26, 2016 at 12:49:19PM -0700, Sasi Inguva wrote:
> > > > > > I think there is some bug in mp3 decoder which is making skip
> > > > > > samples -1431655766 for ~/tickets/5528/fire.mp3 . For now I have
> > > removed
> > > > > > the assert from the 3rd commit.
> > > > > > For the file one.mov , I think the audio has edit list with
start
> > > time
> > > > > > correspending to the second sample - (which should be media time
> > > 1024 if
> > > > > I
> > > > > > guess correctly). This indicates that the first sample be used
for
> > > > > encoder
> > > > > > delay.
> > > > > >  Previously without edit  list parsing , we used to offset the
> > > start_dts
> > > > > by
> > > > > > -1024 to make the second sample timestamp start from zero.
> > > > > >              sc->time_offset = start_time - empty_duration;
> > > > > > -            current_dts = -sc->time_offset;
> > > > > >              if (sc->ctts_count>0 && sc->stts_count>0 &&
> > > > > >
> > > > > > But now edit list parsing handles the rebasing of timestamps to
zero,
> > > > > > because it will  assign increasing timestamps  starting from
zero, to
> > > > > > samples present in the edit list.
> > > > >
> > > > > > Because the first sample is not in the
> > > > > > edit list, we mark it as DISCARD, which flag av_decode_audio4
will
> > > look
> > > > > at
> > > > > > and decode-and-discard that frame. So it wouldn't matter what
the
> > > first
> > > > > > sample timestamp should be assigned because it is anyway
discarded
> > > after
> > > > > > decode.
> > > > >
> > > > > current applications using libavformat have no knowledge of the
> > > > > discard flag you can add the DISCARD flag, you can set it on
packets
> > > but
> > > > > applications written or built for libavformat 57 dont have to know
> > > > > this flag and can treat the packets like any other packet.
> > > > >
> > > >
> > > > Yes. they can treat the packet like any other packet. They don't
have to
> > > > know about the discard flag.
> > > >
> > > > Adding this feature without a major version bump requires things to
> > > > > still work reasonable with the old semantics that apps are build
> > > > > around. That should be possible unless iam missing something
> > > >
> > > >
> > > > As I have pointed out earlier this code will change the timestamps
of the
> > > > packets. In the case of multiple edit lists, the code will also
repeat
> > > some
> > > > packets, and might make the timestamps non-monotonous. I don't know
if
> > > the
> > > > last behavior is not  an acceptable expectation from av_read_frame.
> > >
> > > if timestamps repeat then it will not be possible to seek in the file
> > > by timestamp (to all positions) and i suspect also not by byte
position.
> > > How would one seek then ?
> > > or do i misunderstand ?
> > >
> > >
> > In case of MOV  container, the seek is done using
av_index_search_timestamp
> > function
> > http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/
mov.c;h=f4999068519f1f06f6b5d84ca007148e74e5a82e;hb=HEAD#l5419
> > .
> >
> > i) In case of single edit list , the timestamps will only be repeated
but
> > not non-monotonous. In that case av_index_search_timestamp will still
work
> > correctly, only that it will seek to any one of the packets with the
same
> > timestamp. However when we decode the file, then all of the discarded
> > packets with similar timestamps should be discarded and only the
> > non-discarded packet will bet output. So in short,
> > ./ffprobe -read_intervals 0.0  -show_frames -print_format compact
> > one-editlist-audio.mov
> > <https://drive.google.com/file/d/0Bz6XfEJZ-9N3ejNkMW9yU0k4ZF
E/view?usp=sharing>
> > should
> > give exactly the same behavior as before while -show_packets will show
more
> > discarded packets at the start.  I had to change the patch (4) a bit to
> > make the audio-seek on MOV to work  correctly, so please reapply the
> > attached patch to test.
> >
> > ii) In case of multiple edit lists , timestamps might be non
monotonous  so
> > existing av_index_search_timestamp implementation won't be correct,
since
> > it assumes sorted timestamps. However making it work for discarded
packets
> > is not that hard. I have attached a 5th patch that changes
av_index_search
> > function. This fixes the issue in (i) also
> >
> >
> > > > However as I've pointed out, we are already showing the wrong
packets for
> > > > videos with multiple edit lists by not parsing the edit lists
currently,
> > > > which will introduce AVSync issues. So this patch wont make things
any
> > > > worse for those cases in my view.
> > >
> > > Is it difficult to adjust the timestamps of discarded packets to avoid
> > > timestamp discontinuities ? (for the cases where this is possible of
> > > course only)
> > > the timestamps for discarded packets i would assume are meaningless in
> > > the new semenatics but they matter for the old semantics
> > > again, please correct me if iam wrong
> > >
> > > The way fix_index is implemented it is difficult to change the
timestamps
> > to avoid discotinuities and still have the timeline the same as MOV edit
> > lists would intend.
>
> My first question is, entirely independant of the implementation from
> the patches. What is the correct output ? (my primary focus is on
> the timestamps)
>
>
> Also if there are discontinuities, AVFMT_TS_DISCONT is normally set
> (and this never happens for files with indexes but only for files
>  that dont have indexes like mpeg*)
> players will generally seek by file position if AVFMT_TS_DISCONT is
> set (because timestamps are ambigous in that case)
>
> iam not sure how to seek reliably by file position in a mov
> with edit lists and stll have the file positions actually be file
> positions of the frames, so this direction gets tricky too ...
>
I'll try do describe the behavior of the timestamps and seek after all
these patches are applied.
Reading:
i) Decoding (using av_codec_video* API) - Non-repeated monotonically
increasing timestamps.
ii) Just demuxing - Repeated timestamps in case of Single edit list .
Non-monotonic timestamps in case of multiple edit lists.

Seeking:
With the 5th patch seeking by "timestamp" using mov_read_seek function
works correctly. So if we are using ffmpeg API to seek it should work.
i) If AV_SEEK_FLAG_ANY  is set -
   before: seeks to incorrect packet because edit lists were not parsed
   now: seeks to the correct packet accounting for the edit lists.

ii) if AV_SEEK_FLAG_ANY is not set i.e. seek to KEYFRAME
    before:  finds the incorrect packet by timestamp , and then seeks to
the closest keyframe to that packet. If decoding/demuxing after seek,
decoder/demuxer outputs frames starting from that keyframe.
   now : finds the correct non-discarded packet by timestamp, and then
seeks to the closest keyframe to that packet, irrespective of, if the
closest keyframe is discarded or not-discarded .
             If decoding after seek and if that closest keyframe is marked
discard, then decoder will output frames starting from the next
non-discarded frame after the key frame.
             If demuxing after seek, we still output packets starting from
the discarded keyframe.

Now the only worry I think is, if some application is doing its own
analysis, based on the timestamps of the packets it "demuxes"  , without
accounting for the DISCARD flag. For example an application can compute
duration of the video as
max(timestamp) - min(timestamp) over all packets. If it doesn't account for
the DISCARD flag, it will be computing it wrong. However without edit list
parsing, such duration is already computed wrong because all the packets
are output, and not trimmed according to edit lists.

Another example can be an application implements its own transmuxing by
doing av_read_frame and passing the pts, dts without santizing, directly to
av_write_frame. In this case, if there are multiple edit lists in the input
MOV file, this application will fail because we have non-monotonic
timestamp being given to av_write_frame. (However previously it succeeded
but produced wrong video anyway) . I guess videos with multiple edit lists
are relatively few, and we already have other possible cases where ffmpeg
returns dts=N/A for some frames and the av_write_frame will fail on that
too. So this will add another one of those corner cases where this kind of
application will fail.


>
>
> > The timestamps for discarded packets are meaningless to av_decode_*
> > functions because they parse the DISCARD flag and ignore the packets. I
am
> > not sure, what you mean by semantics though, because I don't think I am
> > changing any user expectations defined by the mov_read_frame
mov_seek_frame
> > functions .
>
> > If by semantics, you mean that user expects to see
> > monotonically increasing timestamps for the "demuxed " packets then yes
> > that expectation changes to " monotonically increasing timestamps for
the
> > "demuxed and non-discarded" packets" and user has to parse the discard
> > flag.
>
> Adding a flag that "must be parsed" would be a incompatible API change.
> It would require a major version bump
>
> also this patchset changes streamcopy
> try:
> ./ffmpeg-ref -i matrixbench_mpeg2.mpg  -acodec mp3 -vn -t 1  test-ref.mov
> ./ffmpeg     -i matrixbench_mpeg2.mpg  -acodec mp3 -vn -t 1  test.mov
> ./ffmpeg-ref -i test-ref.mov -acodec copy test2-ref.mov
> ./ffmpeg     -i test.mov -acodec copy test2.mov
>
> old code:
> ./ffmpeg-ref -i test-ref.mov -aframes 3 -f framecrc -
> 0,          0,          0,     1152,     4608, 0xa2a00df2
> 0,       1152,       1152,     1152,     4608, 0xa573dfd4
> 0,       2304,       2304,     1152,     4608, 0x8994a906
> and
> ./ffmpeg-ref -i test2-ref.mov -aframes 3 -f framecrc -
> 0,          0,          0,     1152,     4608, 0xa2a00df2
> 0,       1152,       1152,     1152,     4608, 0xa573dfd4
> 0,       2304,       2304,     1152,     4608, 0x8994a906
>
> new code:
> ./ffmpeg -i test.mov -aframes 3 -f framecrc
> 0,          0,          0,     1152,     4608, 0xa573dfd4
> 0,       1152,       1152,     1152,     4608, 0x8994a906
> 0,       2304,       2304,     1152,     4608, 0x824d1a30
> and
> ./ffmpeg -i test2.mov -aframes 3 -f framecrc -
> 0,          0,          0,     1152,     4608, 0xa2a00df2
> 0,          1,          1,     1152,     4608, 0xa573dfd4
> 0,       1152,       1152,     1152,     4608, 0x8994a906
>

First phase of transmuxing :

test.mov and test-ref.mov both have an edit list with media time 1152 which
indicates that the first audio frame be not output (and perhaps it is a
silent packet used as encoder delay) . While ./ffmpeg correctly doesn't
output first frame ./ffmpeg-ref does.

Second phase of transmuxing:

test2.mov has an stts as such
EntryCount 4
Entry[0].SampleCount 1
Entry[0].SampleDelta 1
Entry[1].SampleCount 1
Entry[1].SampleDelta 1151
Entry[2].SampleCount 40
Entry[2].SampleDelta 1152
Entry[3].SampleCount 1
Entry[3].SampleDelta 768

 which makes the ffmpeg demuxer output the first two packet timestamps as 0
and 1 . Something must be going wrong while remuxing to MOV from test.mov
to test2.mov, but as long as demuxing and timestamps are concerned, they
seem correct to me.

>
> [...]
>
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Does the universe only have a finite lifespan? No, its going to go on
> forever, its just that you wont like living in it. -- Hiranya Peiri
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
Michael Niedermayer Sept. 2, 2016, 10:56 p.m. UTC | #3
On Fri, Sep 02, 2016 at 01:00:59PM -0700, Sasi Inguva wrote:
> On Aug 31, 2016 5:23 AM, "Michael Niedermayer" <michael@niedermayer.cc>
> wrote:
> >
> > On Tue, Aug 30, 2016 at 06:37:22PM -0700, Sasi Inguva wrote:
> > > On Sun, Aug 28, 2016 at 3:10 AM, Michael Niedermayer
> <michael@niedermayer.cc
> > > > wrote:
[...]
> >
> >
> > > The timestamps for discarded packets are meaningless to av_decode_*
> > > functions because they parse the DISCARD flag and ignore the packets. I
> am
> > > not sure, what you mean by semantics though, because I don't think I am
> > > changing any user expectations defined by the mov_read_frame
> mov_seek_frame
> > > functions .
> >
> > > If by semantics, you mean that user expects to see
> > > monotonically increasing timestamps for the "demuxed " packets then yes
> > > that expectation changes to " monotonically increasing timestamps for
> the
> > > "demuxed and non-discarded" packets" and user has to parse the discard
> > > flag.
> >
> > Adding a flag that "must be parsed" would be a incompatible API change.
> > It would require a major version bump
> >
> > also this patchset changes streamcopy
> > try:
> > ./ffmpeg-ref -i matrixbench_mpeg2.mpg  -acodec mp3 -vn -t 1  test-ref.mov
> > ./ffmpeg     -i matrixbench_mpeg2.mpg  -acodec mp3 -vn -t 1  test.mov
> > ./ffmpeg-ref -i test-ref.mov -acodec copy test2-ref.mov
> > ./ffmpeg     -i test.mov -acodec copy test2.mov
> >
> > old code:
> > ./ffmpeg-ref -i test-ref.mov -aframes 3 -f framecrc -
> > 0,          0,          0,     1152,     4608, 0xa2a00df2
> > 0,       1152,       1152,     1152,     4608, 0xa573dfd4
> > 0,       2304,       2304,     1152,     4608, 0x8994a906
> > and
> > ./ffmpeg-ref -i test2-ref.mov -aframes 3 -f framecrc -
> > 0,          0,          0,     1152,     4608, 0xa2a00df2
> > 0,       1152,       1152,     1152,     4608, 0xa573dfd4
> > 0,       2304,       2304,     1152,     4608, 0x8994a906
> >
> > new code:
> > ./ffmpeg -i test.mov -aframes 3 -f framecrc
> > 0,          0,          0,     1152,     4608, 0xa573dfd4
> > 0,       1152,       1152,     1152,     4608, 0x8994a906
> > 0,       2304,       2304,     1152,     4608, 0x824d1a30
> > and
> > ./ffmpeg -i test2.mov -aframes 3 -f framecrc -
> > 0,          0,          0,     1152,     4608, 0xa2a00df2
> > 0,          1,          1,     1152,     4608, 0xa573dfd4
> > 0,       1152,       1152,     1152,     4608, 0x8994a906
> >
> 
> First phase of transmuxing :
> 
> test.mov and test-ref.mov both have an edit list with media time 1152 which
> indicates that the first audio frame be not output (and perhaps it is a
> silent packet used as encoder delay) . While ./ffmpeg correctly doesn't
> output first frame ./ffmpeg-ref does.

Ill awnser the first part of your mail separatly, but first this

If you use audacity to compare the output, and it can probably be
most clearly seen with this:

./ffmpeg -f u8 -ar 16000 -ac 1 -i sync_audio.raw -acodec libmp3lame test-ingu.mov
./ffmpeg -f u8 -ar 16000 -ac 1 -i sync_audio.raw -acodec libmp3lame test.mp3
./ffmpeg -f u8 -ar 16000 -ac 1 -i sync_audio.raw ref.wav
./ffmpeg -i test-ingu.mov out-ingu.wav
./ffmpeg -i test.mp3 out-mp3.wav

you can clearly see that the mp3 encoding and decoding gives an exactly
matching output, there is no error in the initial padding /start skip

your patchset with mov does not achive this it also doesnt use the
existing framework for initial skip/padding

see libavformat/mp3dec.c for an example of the existing framework

[...]
Michael Niedermayer Sept. 2, 2016, 11:15 p.m. UTC | #4
On Fri, Sep 02, 2016 at 01:00:59PM -0700, Sasi Inguva wrote:
> On Aug 31, 2016 5:23 AM, "Michael Niedermayer" <michael@niedermayer.cc>
> wrote:
> >
> > On Tue, Aug 30, 2016 at 06:37:22PM -0700, Sasi Inguva wrote:
> > > On Sun, Aug 28, 2016 at 3:10 AM, Michael Niedermayer
> <michael@niedermayer.cc
> > > > wrote:
> > >
> > > > On Sat, Aug 27, 2016 at 03:30:24PM -0700, Sasi Inguva wrote:
> > > > > On Fri, Aug 26, 2016 at 5:55 PM, Michael Niedermayer
> > > > <michael@niedermayer.cc
> > > > > > wrote:
> > > > >
> > > > > > On Fri, Aug 26, 2016 at 12:49:19PM -0700, Sasi Inguva wrote:
> > > > > > > I think there is some bug in mp3 decoder which is making skip
> > > > > > > samples -1431655766 for ~/tickets/5528/fire.mp3 . For now I have
> > > > removed
> > > > > > > the assert from the 3rd commit.
> > > > > > > For the file one.mov , I think the audio has edit list with
> start
> > > > time
> > > > > > > correspending to the second sample - (which should be media time
> > > > 1024 if
> > > > > > I
> > > > > > > guess correctly). This indicates that the first sample be used
> for
> > > > > > encoder
> > > > > > > delay.
> > > > > > >  Previously without edit  list parsing , we used to offset the
> > > > start_dts
> > > > > > by
> > > > > > > -1024 to make the second sample timestamp start from zero.
> > > > > > >              sc->time_offset = start_time - empty_duration;
> > > > > > > -            current_dts = -sc->time_offset;
> > > > > > >              if (sc->ctts_count>0 && sc->stts_count>0 &&
> > > > > > >
> > > > > > > But now edit list parsing handles the rebasing of timestamps to
> zero,
> > > > > > > because it will  assign increasing timestamps  starting from
> zero, to
> > > > > > > samples present in the edit list.
> > > > > >
> > > > > > > Because the first sample is not in the
> > > > > > > edit list, we mark it as DISCARD, which flag av_decode_audio4
> will
> > > > look
> > > > > > at
> > > > > > > and decode-and-discard that frame. So it wouldn't matter what
> the
> > > > first
> > > > > > > sample timestamp should be assigned because it is anyway
> discarded
> > > > after
> > > > > > > decode.
> > > > > >
> > > > > > current applications using libavformat have no knowledge of the
> > > > > > discard flag you can add the DISCARD flag, you can set it on
> packets
> > > > but
> > > > > > applications written or built for libavformat 57 dont have to know
> > > > > > this flag and can treat the packets like any other packet.
> > > > > >
> > > > >
> > > > > Yes. they can treat the packet like any other packet. They don't
> have to
> > > > > know about the discard flag.
> > > > >
> > > > > Adding this feature without a major version bump requires things to
> > > > > > still work reasonable with the old semantics that apps are build
> > > > > > around. That should be possible unless iam missing something
> > > > >
> > > > >
> > > > > As I have pointed out earlier this code will change the timestamps
> of the
> > > > > packets. In the case of multiple edit lists, the code will also
> repeat
> > > > some
> > > > > packets, and might make the timestamps non-monotonous. I don't know
> if
> > > > the
> > > > > last behavior is not  an acceptable expectation from av_read_frame.
> > > >
> > > > if timestamps repeat then it will not be possible to seek in the file
> > > > by timestamp (to all positions) and i suspect also not by byte
> position.
> > > > How would one seek then ?
> > > > or do i misunderstand ?
> > > >
> > > >
> > > In case of MOV  container, the seek is done using
> av_index_search_timestamp
> > > function
> > > http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/
> mov.c;h=f4999068519f1f06f6b5d84ca007148e74e5a82e;hb=HEAD#l5419
> > > .
> > >
> > > i) In case of single edit list , the timestamps will only be repeated
> but
> > > not non-monotonous. In that case av_index_search_timestamp will still
> work
> > > correctly, only that it will seek to any one of the packets with the
> same
> > > timestamp. However when we decode the file, then all of the discarded
> > > packets with similar timestamps should be discarded and only the
> > > non-discarded packet will bet output. So in short,
> > > ./ffprobe -read_intervals 0.0  -show_frames -print_format compact
> > > one-editlist-audio.mov
> > > <https://drive.google.com/file/d/0Bz6XfEJZ-9N3ejNkMW9yU0k4ZF
> E/view?usp=sharing>
> > > should
> > > give exactly the same behavior as before while -show_packets will show
> more
> > > discarded packets at the start.  I had to change the patch (4) a bit to
> > > make the audio-seek on MOV to work  correctly, so please reapply the
> > > attached patch to test.
> > >
> > > ii) In case of multiple edit lists , timestamps might be non
> monotonous  so
> > > existing av_index_search_timestamp implementation won't be correct,
> since
> > > it assumes sorted timestamps. However making it work for discarded
> packets
> > > is not that hard. I have attached a 5th patch that changes
> av_index_search
> > > function. This fixes the issue in (i) also
> > >
> > >
> > > > > However as I've pointed out, we are already showing the wrong
> packets for
> > > > > videos with multiple edit lists by not parsing the edit lists
> currently,
> > > > > which will introduce AVSync issues. So this patch wont make things
> any
> > > > > worse for those cases in my view.
> > > >
> > > > Is it difficult to adjust the timestamps of discarded packets to avoid
> > > > timestamp discontinuities ? (for the cases where this is possible of
> > > > course only)
> > > > the timestamps for discarded packets i would assume are meaningless in
> > > > the new semenatics but they matter for the old semantics
> > > > again, please correct me if iam wrong
> > > >
> > > > The way fix_index is implemented it is difficult to change the
> timestamps
> > > to avoid discotinuities and still have the timeline the same as MOV edit
> > > lists would intend.
> >
> > My first question is, entirely independant of the implementation from
> > the patches. What is the correct output ? (my primary focus is on
> > the timestamps)
> >
> >
> > Also if there are discontinuities, AVFMT_TS_DISCONT is normally set
> > (and this never happens for files with indexes but only for files
> >  that dont have indexes like mpeg*)
> > players will generally seek by file position if AVFMT_TS_DISCONT is
> > set (because timestamps are ambigous in that case)
> >
> > iam not sure how to seek reliably by file position in a mov
> > with edit lists and stll have the file positions actually be file
> > positions of the frames, so this direction gets tricky too ...
> >
> I'll try do describe the behavior of the timestamps and seek after all
> these patches are applied.
> Reading:
> i) Decoding (using av_codec_video* API) - Non-repeated monotonically
> increasing timestamps.

> ii) Just demuxing - Repeated timestamps in case of Single edit list .

can you explain why you insist on this ?
why do you assign invalid timestamps to discarded packets ?
These packets are just normal packets in the current API, applications
will treat them like any other packet and the timestamps would likely
cause problems, confusion and bugreports


> Non-monotonic timestamps in case of multiple edit lists.

can you elaborate on how exactly they are non monotonic
you said above that they are monotonic after the decoder, i would
have expected that they would still be non monotonic after the
decoder for multiple edit lists
so maybe i misunderstood


[...]
Sasi Inguva Sept. 3, 2016, 7:06 p.m. UTC | #5
Hi Michael,

In fact from audacity I see that out-ingu.wav out-mp3.wav are almost
equivalent, though not exactly. However if I do the same RAW->MOV->WAV
conversion with head, I see that the out-head.wav is shifted to the right .
This is because the MOV file contains edit list
EntryCount 1
Entry[0].SegmentDuration 2000
Entry[0].MediaTime 1105which represents the encoder delay but without my
patch we are not parsing that entry as encoder delay and outputting the
first silent packet too. This existing line in HEAD does it for AAC  but
not mp3
http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/mov.c;h=f4999068519f1f06f6b5d84ca007148e74e5a82e;hb=HEAD#l2799
. With editlist patch we solve this problem for mp3 too. Please find
attached the audacity screenshot.


​

On Fri, Sep 2, 2016 at 3:56 PM, Michael Niedermayer <michael@niedermayer.cc>
wrote:

> On Fri, Sep 02, 2016 at 01:00:59PM -0700, Sasi Inguva wrote:
> > On Aug 31, 2016 5:23 AM, "Michael Niedermayer" <michael@niedermayer.cc>
> > wrote:
> > >
> > > On Tue, Aug 30, 2016 at 06:37:22PM -0700, Sasi Inguva wrote:
> > > > On Sun, Aug 28, 2016 at 3:10 AM, Michael Niedermayer
> > <michael@niedermayer.cc
> > > > > wrote:
> [...]
> > >
> > >
> > > > The timestamps for discarded packets are meaningless to av_decode_*
> > > > functions because they parse the DISCARD flag and ignore the
> packets. I
> > am
> > > > not sure, what you mean by semantics though, because I don't think I
> am
> > > > changing any user expectations defined by the mov_read_frame
> > mov_seek_frame
> > > > functions .
> > >
> > > > If by semantics, you mean that user expects to see
> > > > monotonically increasing timestamps for the "demuxed " packets then
> yes
> > > > that expectation changes to " monotonically increasing timestamps for
> > the
> > > > "demuxed and non-discarded" packets" and user has to parse the
> discard
> > > > flag.
> > >
> > > Adding a flag that "must be parsed" would be a incompatible API change.
> > > It would require a major version bump
> > >
> > > also this patchset changes streamcopy
> > > try:
> > > ./ffmpeg-ref -i matrixbench_mpeg2.mpg  -acodec mp3 -vn -t 1
> test-ref.mov
> > > ./ffmpeg     -i matrixbench_mpeg2.mpg  -acodec mp3 -vn -t 1  test.mov
> > > ./ffmpeg-ref -i test-ref.mov -acodec copy test2-ref.mov
> > > ./ffmpeg     -i test.mov -acodec copy test2.mov
> > >
> > > old code:
> > > ./ffmpeg-ref -i test-ref.mov -aframes 3 -f framecrc -
> > > 0,          0,          0,     1152,     4608, 0xa2a00df2
> > > 0,       1152,       1152,     1152,     4608, 0xa573dfd4
> > > 0,       2304,       2304,     1152,     4608, 0x8994a906
> > > and
> > > ./ffmpeg-ref -i test2-ref.mov -aframes 3 -f framecrc -
> > > 0,          0,          0,     1152,     4608, 0xa2a00df2
> > > 0,       1152,       1152,     1152,     4608, 0xa573dfd4
> > > 0,       2304,       2304,     1152,     4608, 0x8994a906
> > >
> > > new code:
> > > ./ffmpeg -i test.mov -aframes 3 -f framecrc
> > > 0,          0,          0,     1152,     4608, 0xa573dfd4
> > > 0,       1152,       1152,     1152,     4608, 0x8994a906
> > > 0,       2304,       2304,     1152,     4608, 0x824d1a30
> > > and
> > > ./ffmpeg -i test2.mov -aframes 3 -f framecrc -
> > > 0,          0,          0,     1152,     4608, 0xa2a00df2
> > > 0,          1,          1,     1152,     4608, 0xa573dfd4
> > > 0,       1152,       1152,     1152,     4608, 0x8994a906
> > >
> >
> > First phase of transmuxing :
> >
> > test.mov and test-ref.mov both have an edit list with media time 1152
> which
> > indicates that the first audio frame be not output (and perhaps it is a
> > silent packet used as encoder delay) . While ./ffmpeg correctly doesn't
> > output first frame ./ffmpeg-ref does.
>
> Ill awnser the first part of your mail separatly, but first this
>
> If you use audacity to compare the output, and it can probably be
> most clearly seen with this:
>
> ./ffmpeg -f u8 -ar 16000 -ac 1 -i sync_audio.raw -acodec libmp3lame
> test-ingu.mov
> ./ffmpeg -f u8 -ar 16000 -ac 1 -i sync_audio.raw -acodec libmp3lame
> test.mp3
> ./ffmpeg -f u8 -ar 16000 -ac 1 -i sync_audio.raw ref.wav
> ./ffmpeg -i test-ingu.mov out-ingu.wav
> ./ffmpeg -i test.mp3 out-mp3.wav
>
> you can clearly see that the mp3 encoding and decoding gives an exactly
> matching output, there is no error in the initial padding /start skip
>
> your patchset with mov does not achive this it also doesnt use the
> existing framework for initial skip/padding
>
> see libavformat/mp3dec.c for an example of the existing framework
>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> It is what and why we do it that matters, not just one of them.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
Michael Niedermayer Sept. 3, 2016, 11:48 p.m. UTC | #6
On Sat, Sep 03, 2016 at 12:06:39PM -0700, Sasi Inguva wrote:
> Hi Michael,
> 
> In fact from audacity I see that out-ingu.wav out-mp3.wav are almost
> equivalent,

They do not match. (and that is alot more vissible if you scale the
time axis up a bit)

You also dont use the existing API for handling initial padding/skip
And you didnt reply to this concern
its probably not that hard to fix that ...
instead of droping just at packet granularity you would set the stuff
for droping at sample granularity (too)


also maybe you missed my question about your oppinion on the correct
timestamp output:
"My first question is, entirely independant of the implementation from
 the patches. What is the correct output ? (my primary focus is on
 the timestamps)"

Iam curious because to me the timestamps of the dropped packets look
wrong and id like to know your oppinion about this. Is this a
implementation issue (if so please explain) or is there some reason
independant of the implementation  (if so please explain too)


[...]

> though not exactly. However if I do the same RAW->MOV->WAV
> conversion with head, I see that the out-head.wav is shifted to the right .

git master gets it totally wrong with mp3 in mov i agree here 100%

can you fix this exactly (not just almost)  ?

[...]
diff mbox

Patch

From f2a7a9d85728a1fa7d4afdc280a22aaaa4d91a8e Mon Sep 17 00:00:00 2001
From: Sasi Inguva <isasi@google.com>
Date: Tue, 30 Aug 2016 17:57:05 -0700
Subject: [PATCH 5/5] lavf/utils: Support av_index_search_timestamp in case of
 AVIndexEntry with discarded packets.

Signed-off-by: Sasi Inguva <isasi@google.com>
---
 libavformat/utils.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/libavformat/utils.c b/libavformat/utils.c
index d7f3c7a..f26ce58 100644
--- a/libavformat/utils.c
+++ b/libavformat/utils.c
@@ -1934,6 +1934,16 @@  int ff_index_search_timestamp(const AVIndexEntry *entries, int nb_entries,
 
     while (b - a > 1) {
         m         = (a + b) >> 1;
+
+        // Search for the next non-discarded packet.
+        while ((entries[m].flags & AVINDEX_DISCARD_FRAME) && m < b) {
+            m++;
+            if (m == b && entries[m].timestamp >= wanted_timestamp) {
+                m = b - 1;
+                break;
+            }
+        }
+
         timestamp = entries[m].timestamp;
         if (timestamp >= wanted_timestamp)
             b = m;
-- 
2.8.0.rc3.226.g39d4020