diff mbox

[FFmpeg-devel,4/4] lavf/mov: Add support for edit list parsing.

Message ID CAGD_KHdbkiFiYwawnkrzYMAy6wzkYO80Dk_A8mfdtUEJEQny9Q@mail.gmail.com
State Superseded
Headers show

Commit Message

Sasi Inguva Sept. 13, 2016, 11:39 p.m. UTC
Sorry for the very late reply. I was busy with other things.

On Sat, Sep 3, 2016 at 4:48 PM, Michael Niedermayer <michael@niedermayer.cc>
wrote:

> On Sat, Sep 03, 2016 at 12:06:39PM -0700, Sasi Inguva wrote:
> > Hi Michael,
> >
> > In fact from audacity I see that out-ingu.wav out-mp3.wav are almost
> > equivalent,
>
> They do not match. (and that is alot more vissible if you scale the
> time axis up a bit)
>
> You also dont use the existing API for handling initial padding/skip
> And you didnt reply to this concern
> its probably not that hard to fix that ...
> instead of droping just at packet granularity you would set the stuff
> for droping at sample granularity (too)
>

Yes. Looking at it more closely now, they don't matrch exactly and this is
because as you said, number of samples to drop is not exactly multiple of
number of packets. In that case we need to partially discard some samples
of a packet. This can be done by using AV_PKT_SIDE_DATA_SKIP_SAMPLES.  I
can change the code to use this packet side data instead of discard packet
flag, if it is ok.


>
> also maybe you missed my question about your oppinion on the correct
> timestamp output:
> "My first question is, entirely independant of the implementation from
>  the patches. What is the correct output ? (my primary focus is on
>  the timestamps)"
>
> Iam curious because to me the timestamps of the dropped packets look
> wrong and id like to know your oppinion about this. Is this a
> implementation issue (if so please explain) or is there some reason
> independant of the implementation  (if so please explain too)
>

The correct output according to me should be - we should show exactly the
same presentation timestmap indicated by the MOV stts, ctts atoms, for the
samples that fall inside the edits. As long as the PTS is according to what
the edit and stts,ctts atoms say. I don't really have to worry about what
the decoding timestamp for those packets should be (DTS might as well be
AV_NOPTS_VALUE for all packets) .

In the current implementation, we directly assign the timestamp in the
AVIndex to  pkt->dts .
http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/mov.c;h=6e80b93271a4f998af6ba1af795d7d7c5d67f5a1;hb=HEAD#l5332
. The implementation of the edit list code is such that, it needs to be
filled with "linear monotonically increasing timestamps for the
non-discarded packets " to have the samples denoted by the editlists to
have linear monotonically increasing timestamps, in essence, to avoid a big
gap between the presentation of the end of first edit list and the start of
next edit list because of the discarded packets. Hence while parsing a new
edit list we bump down the index entry timestamp to (frame_duration + last
non-discarded packet from previous edit list ) . And that's why we get
non-monotonic timestamps as a whole in the AVIndexEntry .

Let us take an example of a file with two edit lists. First edit list
ending on a B frame. This is what ffprobe -show_packet -pf compact looks
like . Where 'D' stands for discarded frame. ( I've attached a 6th patch to
ffprobe to achieve this formatting ).
./ffprobe -show_packets -print_format compact
mov-2elist-bpyramid-1elist-ends-on-bref.mov
.....
packet|codec_type=video|stream_index=0|pts=10752|pts_time=0.875000|dts=8192|dts_time=0.666667|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=1281|pos=39546|flags=__
- Pframe
packet|codec_type=video|stream_index=0|pts=9728|pts_time=0.791667|dts=8704|dts_time=0.708333|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=749|pos=40827|flags=__
- B pyramidal ref
packet|codec_type=video|stream_index=0|pts=9216|pts_time=0.750000|dts=9216|dts_time=0.750000|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=487|pos=41576|flags=__
- B
packet|codec_type=video|stream_index=0|pts=10240|pts_time=0.833333|dts=9728|dts_time=0.791667|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=486|pos=42063|flags=__
- B
packet|codec_type=video|stream_index=0|pts=12800|pts_time=1.041667|dts=10240|dts_time=0.833333|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=3455|pos=42549|flags=_D
- P
packet|codec_type=video|stream_index=0|pts=11776|pts_time=0.958333|dts=10752|dts_time=0.875000|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=848|pos=46004|flags=_D
- B pyramidal ref
packet|codec_type=video|stream_index=0|pts=11264|pts_time=0.916667|dts=11264|dts_time=0.916667|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=655|pos=46852|flags=__
- B
packet|codec_type=video|stream_index=0|pts=12288|pts_time=1.000000|dts=11776|dts_time=0.958333|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=901|pos=47507|flags=_D
- B
packet|codec_type=video|stream_index=0|pts=13312|pts_time=1.083333|dts=12288|dts_time=1.000000|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=5112|pos=48408|flags=KD
 - end 1st edit
packet|codec_type=video|stream_index=0|pts=11772|pts_time=0.958008|dts=10748|dts_time=0.874674|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=5112|pos=48408|flags=K_
- start 2nd edit
packet|codec_type=video|stream_index=0|pts=13820|pts_time=1.124674|dts=11260|dts_time=0.916341|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=1179|pos=53520|flags=__
.....

The implementation basically increases dts by the amounts that stts atom
indicates, and as you can see after the end of the first edit list and
start of the next edit list, it bumps down the dts from 12288 to 10650 to
make the pts linearly increasing from the last non-discarded frame 11264 ->
11772 .

As I understand from DTS semantics in ffmpeg we have two constraints on DTS.
i) DTS should be monotonically increasing  ii) For every packet PTS >= DTS
should be true.

 Just reordering PTS to DTS doesn't work either. For example, here is the
DTS  I get from "ffprobe -show_packets " when I do "pkt->dts =
AV_NOPTS_VALUE" for all packets in mov_read_packet function
http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/mov.c;h=6e80b93271a4f998af6ba1af795d7d7c5d67f5a1;hb=HEAD#l5332
 . Ffmpeg should use update_dts_from_pts function to reoder pts to dts -
http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/utils.c;h=76cbff4ef67937499a7c113dcaacb88389e6015e;hb=HEAD#l1005
.

./ffprobe -show_packets -print_format compact
mov-2elist-bpyramid-1elist-ends-on-bref.mov
packet|codec_type=video|stream_index=0|pts=0|pts_time=0.000000|dts=N/A|dts_time=N/A|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=5683|pos=36|flags=K_
packet|codec_type=video|stream_index=0|pts=2048|pts_time=0.166667|dts=N/A|dts_time=N/A|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=4260|pos=5719|flags=__
packet|codec_type=video|stream_index=0|pts=1024|pts_time=0.083333|dts=0|dts_time=0.000000|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=3098|pos=9979|flags=__
packet|codec_type=video|stream_index=0|pts=512|pts_time=0.041667|dts=512|dts_time=0.041667|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=786|pos=13077|flags=__
....
packet|codec_type=video|stream_index=0|pts=10752|pts_time=0.875000|dts=8192|dts_time=0.666667|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=1281|pos=39546|flags=__
packet|codec_type=video|stream_index=0|pts=9728|pts_time=0.791667|dts=8704|dts_time=0.708333|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=749|pos=40827|flags=__
packet|codec_type=video|stream_index=0|pts=9216|pts_time=0.750000|dts=9216|dts_time=0.750000|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=487|pos=41576|flags=__
packet|codec_type=video|stream_index=0|pts=10240|pts_time=0.833333|dts=9728|dts_time=0.791667|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=486|pos=42063|flags=__
packet|codec_type=video|stream_index=0|pts=12800|pts_time=1.041667|dts=10240|dts_time=0.833333|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=3455|pos=42549|flags=_D
packet|codec_type=video|stream_index=0|pts=11776|pts_time=0.958333|dts=10752|dts_time=0.875000|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=848|pos=46004|flags=_D
packet|codec_type=video|stream_index=0|pts=11264|pts_time=0.916667|dts=11264|dts_time=0.916667|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=655|pos=46852|flags=__
packet|codec_type=video|stream_index=0|pts=12288|pts_time=1.000000|dts=11776|dts_time=0.958333|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=901|pos=47507|flags=_D
packet|codec_type=video|stream_index=0|pts=13312|pts_time=1.083333|dts=12288|dts_time=1.000000|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=5112|pos=48408|flags=KD
packet|codec_type=video|stream_index=0|pts=11772|pts_time=0.958008|dts=11772|dts_time=0.958008|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=5112|pos=48408|flags=K_
packet|codec_type=video|stream_index=0|pts=13820|pts_time=1.124674|dts=12800|dts_time=1.041667|duration=512|duration_time=0.041667|convergence_duration=N/A|convergence_duration_time=N/A|size=1179|pos=53520|flags=__
....
As we can see the dts are still non monotonic.

The problem here is that in case of edit lists, we are trying to make the
PTS "not grow" when the packets are being discarded. If we look at the
non-discarded packets in presentation order, in the above example the PTS
is approx. linearly growing 10752, 11264, 11772, with a difference of
frame_duration=512 approx. However there are 3 discarded frames that we are
outputting in between. So, to grow DTS monotonically linearly we need to
grow DTS at a higher frame rate i.e. smaller frame duration.

If we want to make DTS  satisfy the two constraints described above , I can
recompute the DTS based on an assumed constant frame rate, and shift the
DTS so that PTS > DTS always, but this solution wont work for videos with
variable frame rates.
So in short , these are the options available I see for the DTS
i) Keep the change as it is , with non monotonic DTS
ii) Mark DTS=AV_NOPTS_VALUE for discarded frames. Maybe it is right
semantic to have DTS=N/A for some packets. I don't know.
iii) Recompute the DTS as stated above.

Attaching all the 6 patches and the test file.


> [...]
>
> > though not exactly. However if I do the same RAW->MOV->WAV
> > conversion with head, I see that the out-head.wav is shifted to the
> right .
>
> git master gets it totally wrong with mp3 in mov i agree here 100%
>
> can you fix this exactly (not just almost)  ?
>
>
Yes, I think I can fix this as described above by using SKIP_SAMPLES side
data.


> [...]
>
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> I do not agree with what you have to say, but I'll defend to the death your
> right to say it. -- Voltaire
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>

Comments

Sasi Inguva Sept. 14, 2016, 7:20 a.m. UTC | #1
On Tue, Sep 13, 2016 at 4:39 PM, Sasi Inguva <isasi@google.com> wrote:

> Sorry for the very late reply. I was busy with other things.
>
> On Sat, Sep 3, 2016 at 4:48 PM, Michael Niedermayer <
> michael@niedermayer.cc> wrote:
>
>> On Sat, Sep 03, 2016 at 12:06:39PM -0700, Sasi Inguva wrote:
>> > Hi Michael,
>> >
>> > In fact from audacity I see that out-ingu.wav out-mp3.wav are almost
>> > equivalent,
>>
>> They do not match. (and that is alot more vissible if you scale the
>> time axis up a bit)
>>
>> You also dont use the existing API for handling initial padding/skip
>> And you didnt reply to this concern
>> its probably not that hard to fix that ...
>> instead of droping just at packet granularity you would set the stuff
>> for droping at sample granularity (too)
>>
>
> Yes. Looking at it more closely now, they don't matrch exactly and this is
> because as you said, number of samples to drop is not exactly multiple of
> number of packets. In that case we need to partially discard some samples
> of a packet. This can be done by using AV_PKT_SIDE_DATA_SKIP_SAMPLES.  I
> can change the code to use this packet side data instead of discard packet
> flag, if it is ok.
>
>
>>
>> also maybe you missed my question about your oppinion on the correct
>> timestamp output:
>> "My first question is, entirely independant of the implementation from
>>  the patches. What is the correct output ? (my primary focus is on
>>  the timestamps)"
>>
>> Iam curious because to me the timestamps of the dropped packets look
>> wrong and id like to know your oppinion about this. Is this a
>> implementation issue (if so please explain) or is there some reason
>> independant of the implementation  (if so please explain too)
>>
>
> The correct output according to me should be - we should show exactly the
> same presentation timestmap indicated by the MOV stts, ctts atoms, for the
> samples that fall inside the edits. As long as the PTS is according to what
> the edit and stts,ctts atoms say. I don't really have to worry about what
> the decoding timestamp for those packets should be (DTS might as well be
> AV_NOPTS_VALUE for all packets) .
>
> In the current implementation, we directly assign the timestamp in the
> AVIndex to  pkt->dts . http://git.videolan.org/?p=ffmpeg.git;a=blob;f=
> libavformat/mov.c;h=6e80b93271a4f998af6ba1af795d7d7c5d67f5a1;hb=HEAD#l5332
> . The implementation of the edit list code is such that, it needs to be
> filled with "linear monotonically increasing timestamps for the
> non-discarded packets " to have the samples denoted by the editlists to
> have linear monotonically increasing timestamps, in essence, to avoid a big
> gap between the presentation of the end of first edit list and the start of
> next edit list because of the discarded packets.
>
Hence while parsing a new edit list we bump down the index entry timestamp
> to (frame_duration + last non-discarded packet from previous edit list ) .
> And that's why we get non-monotonic timestamps as a whole in the
> AVIndexEntry .
>
>  I think my wording is confusing here. Just clarifying . For one editlist/
within one editlist, the AVIndexEntry samples pertaining to that edit, need
to be filled with "monotonic timestamps which increase in steps of their
corresponding 'stts' atom entries ", to achieve the correct presentation
timestamp for those samples.

 When we start parsing the next edit list entry, and start adding index
entries to AVIndexEntry we need to avoid a big gap between the presentation
of the end of first edit and the start of next edit, that might happen
because of the DISCARD packets added to the AVIndexEntry while parsing the
first edit. Hence we start adding index entries from a  bumped down value
corresponding to (frame_duration + last non-discarded packet from previous
edit ) . And that's why we get non-monotonic timestamps as a whole in the
AVIndexEntry .


> Let us take an example of a file with two edit lists. First edit list
> ending on a B frame. This is what ffprobe -show_packet -pf compact looks
> like . Where 'D' stands for discarded frame. ( I've attached a 6th patch to
> ffprobe to achieve this formatting ).
> ./ffprobe -show_packets -print_format compact mov-2elist-bpyramid-1elist-
> ends-on-bref.mov
> .....
> packet|codec_type=video|stream_index=0|pts=10752|pts_
> time=0.875000|dts=8192|dts_time=0.666667|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=1281|pos=39546|flags=__ - Pframe
> packet|codec_type=video|stream_index=0|pts=9728|pts_
> time=0.791667|dts=8704|dts_time=0.708333|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=749|pos=40827|flags=__ - B pyramidal
> ref
> packet|codec_type=video|stream_index=0|pts=9216|pts_
> time=0.750000|dts=9216|dts_time=0.750000|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=487|pos=41576|flags=__ - B
> packet|codec_type=video|stream_index=0|pts=10240|pts_
> time=0.833333|dts=9728|dts_time=0.791667|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=486|pos=42063|flags=__ - B
> packet|codec_type=video|stream_index=0|pts=12800|pts_
> time=1.041667|dts=10240|dts_time=0.833333|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=3455|pos=42549|flags=_D - P
> packet|codec_type=video|stream_index=0|pts=11776|pts_
> time=0.958333|dts=10752|dts_time=0.875000|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=848|pos=46004|flags=_D - B pyramidal
> ref
> packet|codec_type=video|stream_index=0|pts=11264|pts_
> time=0.916667|dts=11264|dts_time=0.916667|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=655|pos=46852|flags=__ - B
> packet|codec_type=video|stream_index=0|pts=12288|pts_
> time=1.000000|dts=11776|dts_time=0.958333|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=901|pos=47507|flags=_D - B
> packet|codec_type=video|stream_index=0|pts=13312|pts_
> time=1.083333|dts=12288|dts_time=1.000000|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=5112|pos=48408|flags=KD  - end 1st edit
> packet|codec_type=video|stream_index=0|pts=11772|pts_
> time=0.958008|dts=10748|dts_time=0.874674|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=5112|pos=48408|flags=K_ - start 2nd
> edit
> packet|codec_type=video|stream_index=0|pts=13820|pts_
> time=1.124674|dts=11260|dts_time=0.916341|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=1179|pos=53520|flags=__
> .....
>
> The implementation basically increases dts by the amounts that stts atom
> indicates, and as you can see after the end of the first edit list and
> start of the next edit list, it bumps down the dts from 12288 to 10650 to
> make the pts linearly increasing from the last non-discarded frame 11264 ->
> 11772 .
>
> As I understand from DTS semantics in ffmpeg we have two constraints on
> DTS.
> i) DTS should be monotonically increasing  ii) For every packet PTS >= DTS
> should be true.
>
>  Just reordering PTS to DTS doesn't work either. For example, here is the
> DTS  I get from "ffprobe -show_packets " when I do "pkt->dts =
> AV_NOPTS_VALUE" for all packets in mov_read_packet function
> http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/mov.c;h=
> 6e80b93271a4f998af6ba1af795d7d7c5d67f5a1;hb=HEAD#l5332  . Ffmpeg should
> use update_dts_from_pts function to reoder pts to dts -
> http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/utils.c;h=
> 76cbff4ef67937499a7c113dcaacb88389e6015e;hb=HEAD#l1005 .
>
> ./ffprobe -show_packets -print_format compact mov-2elist-bpyramid-1elist-
> ends-on-bref.mov
> packet|codec_type=video|stream_index=0|pts=0|pts_time=
> 0.000000|dts=N/A|dts_time=N/A|duration=512|duration_time=0.
> 041667|convergence_duration=N/A|convergence_duration_time=N/
> A|size=5683|pos=36|flags=K_
> packet|codec_type=video|stream_index=0|pts=2048|pts_
> time=0.166667|dts=N/A|dts_time=N/A|duration=512|duration_time=0.041667|
> convergence_duration=N/A|convergence_duration_time=N/A|
> size=4260|pos=5719|flags=__
> packet|codec_type=video|stream_index=0|pts=1024|pts_
> time=0.083333|dts=0|dts_time=0.000000|duration=512|duration_time=0.041667|
> convergence_duration=N/A|convergence_duration_time=N/A|
> size=3098|pos=9979|flags=__
> packet|codec_type=video|stream_index=0|pts=512|pts_
> time=0.041667|dts=512|dts_time=0.041667|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=786|pos=13077|flags=__
> ....
> packet|codec_type=video|stream_index=0|pts=10752|pts_
> time=0.875000|dts=8192|dts_time=0.666667|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=1281|pos=39546|flags=__
> packet|codec_type=video|stream_index=0|pts=9728|pts_
> time=0.791667|dts=8704|dts_time=0.708333|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=749|pos=40827|flags=__
> packet|codec_type=video|stream_index=0|pts=9216|pts_
> time=0.750000|dts=9216|dts_time=0.750000|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=487|pos=41576|flags=__
> packet|codec_type=video|stream_index=0|pts=10240|pts_
> time=0.833333|dts=9728|dts_time=0.791667|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=486|pos=42063|flags=__
> packet|codec_type=video|stream_index=0|pts=12800|pts_
> time=1.041667|dts=10240|dts_time=0.833333|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=3455|pos=42549|flags=_D
> packet|codec_type=video|stream_index=0|pts=11776|pts_
> time=0.958333|dts=10752|dts_time=0.875000|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=848|pos=46004|flags=_D
> packet|codec_type=video|stream_index=0|pts=11264|pts_
> time=0.916667|dts=11264|dts_time=0.916667|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=655|pos=46852|flags=__
> packet|codec_type=video|stream_index=0|pts=12288|pts_
> time=1.000000|dts=11776|dts_time=0.958333|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=901|pos=47507|flags=_D
> packet|codec_type=video|stream_index=0|pts=13312|pts_
> time=1.083333|dts=12288|dts_time=1.000000|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=5112|pos=48408|flags=KD
> packet|codec_type=video|stream_index=0|pts=11772|pts_
> time=0.958008|dts=11772|dts_time=0.958008|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=5112|pos=48408|flags=K_
> packet|codec_type=video|stream_index=0|pts=13820|pts_
> time=1.124674|dts=12800|dts_time=1.041667|duration=512|
> duration_time=0.041667|convergence_duration=N/A|
> convergence_duration_time=N/A|size=1179|pos=53520|flags=__
> ....
> As we can see the dts are still non monotonic.
>
> The problem here is that in case of edit lists, we are trying to make the
> PTS "not grow" when the packets are being discarded. If we look at the
> non-discarded packets in presentation order, in the above example the PTS
> is approx. linearly growing 10752, 11264, 11772, with a difference of
> frame_duration=512 approx. However there are 3 discarded frames that we are
> outputting in between. So, to grow DTS monotonically linearly we need to
> grow DTS at a higher frame rate i.e. smaller frame duration.
>
> If we want to make DTS  satisfy the two constraints described above , I
> can recompute the DTS based on an assumed constant frame rate, and shift
> the DTS so that PTS > DTS always, but this solution wont work for videos
> with variable frame rates.
> So in short , these are the options available I see for the DTS
> i) Keep the change as it is , with non monotonic DTS
> ii) Mark DTS=AV_NOPTS_VALUE for discarded frames. Maybe it is right
> semantic to have DTS=N/A for some packets. I don't know.
> iii) Recompute the DTS as stated above.
>
> Attaching all the 6 patches and the test file.
>
>
>> [...]
>>
>> > though not exactly. However if I do the same RAW->MOV->WAV
>> > conversion with head, I see that the out-head.wav is shifted to the
>> right .
>>
>> git master gets it totally wrong with mp3 in mov i agree here 100%
>>
>> can you fix this exactly (not just almost)  ?
>>
>>
> Yes, I think I can fix this as described above by using SKIP_SAMPLES side
> data.
>
>
>> [...]
>>
>> --
>> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>>
>> I do not agree with what you have to say, but I'll defend to the death
>> your
>> right to say it. -- Voltaire
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>>
>
diff mbox

Patch

From f2a7a9d85728a1fa7d4afdc280a22aaaa4d91a8e Mon Sep 17 00:00:00 2001
From: Sasi Inguva <isasi@google.com>
Date: Tue, 30 Aug 2016 17:57:05 -0700
Subject: [PATCH 5/6] lavf/utils: Support av_index_search_timestamp in case of
 AVIndexEntry with discarded packets.

Signed-off-by: Sasi Inguva <isasi@google.com>
---
 libavformat/utils.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/libavformat/utils.c b/libavformat/utils.c
index d7f3c7a..f26ce58 100644
--- a/libavformat/utils.c
+++ b/libavformat/utils.c
@@ -1934,6 +1934,16 @@  int ff_index_search_timestamp(const AVIndexEntry *entries, int nb_entries,
 
     while (b - a > 1) {
         m         = (a + b) >> 1;
+
+        // Search for the next non-discarded packet.
+        while ((entries[m].flags & AVINDEX_DISCARD_FRAME) && m < b) {
+            m++;
+            if (m == b && entries[m].timestamp >= wanted_timestamp) {
+                m = b - 1;
+                break;
+            }
+        }
+
         timestamp = entries[m].timestamp;
         if (timestamp >= wanted_timestamp)
             b = m;
-- 
2.8.0.rc3.226.g39d4020