diff mbox

[FFmpeg-devel] hevc_mp4toannexb: Do not duplicate parameter sets

Message ID 20190721045236.30572-1-andriy.gelman@gmail.com
State New
Headers show

Commit Message

Andriy Gelman July 21, 2019, 4:52 a.m. UTC
From: Andriy Gelman <andriy.gelman@gmail.com>

Fixes #7799

Currently, the mp4toannexb filter always inserts extradata at the start
of each IRAP unit. This can lead to duplication of parameter sets if the
demuxed packet from mdat atom already contains a version of the
parameters.

As in ticket #7799 this can also lead to decoding errors when the
parameter sets of the IRAP frames are different from the ones stored in
extradata.

This commit avoids duplicating the parameter sets if they are already
present in the demuxed packet.

This commit also makes an update to the hevc-bsf-mp4toannexb fate
test since the result before this patch contained duplicate vps/sps/pps
nal units.
---
 libavcodec/hevc_mp4toannexb_bsf.c | 9 ++++++++-
 tests/fate/hevc.mak               | 2 +-
 2 files changed, 9 insertions(+), 2 deletions(-)

Comments

Andreas Rheinhardt July 21, 2019, 10:47 a.m. UTC | #1
Andriy Gelman:
> From: Andriy Gelman <andriy.gelman@gmail.com>
> 
> Fixes #7799
> 
> Currently, the mp4toannexb filter always inserts extradata at the start
> of each IRAP unit. This can lead to duplication of parameter sets if the
> demuxed packet from mdat atom already contains a version of the
> parameters.
> 
> As in ticket #7799 this can also lead to decoding errors when the
> parameter sets of the IRAP frames are different from the ones stored in
> extradata.
> 
> This commit avoids duplicating the parameter sets if they are already
> present in the demuxed packet.
> 
> This commit also makes an update to the hevc-bsf-mp4toannexb fate
> test since the result before this patch contained duplicate vps/sps/pps
> nal units.
> ---
>  libavcodec/hevc_mp4toannexb_bsf.c | 9 ++++++++-
>  tests/fate/hevc.mak               | 2 +-
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/libavcodec/hevc_mp4toannexb_bsf.c b/libavcodec/hevc_mp4toannexb_bsf.c
> index 09bce5b34c..5c27306b09 100644
> --- a/libavcodec/hevc_mp4toannexb_bsf.c
> +++ b/libavcodec/hevc_mp4toannexb_bsf.c
> @@ -123,6 +123,7 @@ static int hevc_mp4toannexb_filter(AVBSFContext *ctx, AVPacket *out)
>  
>      int got_irap = 0;
>      int i, ret = 0;
> +    int vps_detected, sps_detected, pps_detected = 0;
>  
You are only initiallizing pps_detected.

>      ret = ff_bsf_get_packet(ctx, &in);
>      if (ret < 0)
> @@ -146,9 +147,15 @@ static int hevc_mp4toannexb_filter(AVBSFContext *ctx, AVPacket *out)
>  
>          nalu_type = (bytestream2_peek_byte(&gb) >> 1) & 0x3f;
>  
> +        switch (nalu_type) {
> +          case HEVC_NAL_VPS: vps_detected = 1; break;
> +          case HEVC_NAL_SPS: sps_detected = 1; break;
> +          case HEVC_NAL_PPS: pps_detected = 1; break;
> +        }
> +
>          /* prepend extradata to IRAP frames */
>          is_irap       = nalu_type >= 16 && nalu_type <= 23;
> -        add_extradata = is_irap && !got_irap;
> +        add_extradata = is_irap && !got_irap && !(vps_detected && sps_detected && pps_detected);
>          extra_size    = add_extradata * ctx->par_out->extradata_size;
>          got_irap     |= is_irap;
>  
There are two things that I don't like about this approach:
1. Image an input file like this: VPS, SPS and PPS in extradata and
then after a few GOPs there is an inband PPS as part of a VPS, SPS and
PPS combination where the PPS differs from the PPS in the extradata.
After this change in parameter sets there are no further inband
parameter sets. With your proposal, the extradata would again be
inserted into access units with IRAP frames after the change in
extradata and it would be the old, outdated extradata, thereby
breaking decoding (the original mp4 file would probably not be able
fully seekable in this case, but that would be another story).
2. If the sample in #7799 contained only the parameter set that
actually changed inband, your proposal would still add a whole
VPS+SPS+PPS combination and therefore still make the sample unplayable.

- Andreas
Andriy Gelman Aug. 13, 2019, 4:49 a.m. UTC | #2
Andreas,

On Sun, 21. Jul 10:47, Andreas Rheinhardt wrote:
> Andriy Gelman:
> > From: Andriy Gelman <andriy.gelman@gmail.com>
> > 
> > Fixes #7799
> > 
> > Currently, the mp4toannexb filter always inserts extradata at the start
> > of each IRAP unit. This can lead to duplication of parameter sets if the
> > demuxed packet from mdat atom already contains a version of the
> > parameters.
> > 
> > As in ticket #7799 this can also lead to decoding errors when the
> > parameter sets of the IRAP frames are different from the ones stored in
> > extradata.
> > 
> > This commit avoids duplicating the parameter sets if they are already
> > present in the demuxed packet.
> > 
> > This commit also makes an update to the hevc-bsf-mp4toannexb fate
> > test since the result before this patch contained duplicate vps/sps/pps
> > nal units.
> > ---
> >  libavcodec/hevc_mp4toannexb_bsf.c | 9 ++++++++-
> >  tests/fate/hevc.mak               | 2 +-
> >  2 files changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/libavcodec/hevc_mp4toannexb_bsf.c b/libavcodec/hevc_mp4toannexb_bsf.c
> > index 09bce5b34c..5c27306b09 100644
> > --- a/libavcodec/hevc_mp4toannexb_bsf.c
> > +++ b/libavcodec/hevc_mp4toannexb_bsf.c
> > @@ -123,6 +123,7 @@ static int hevc_mp4toannexb_filter(AVBSFContext *ctx, AVPacket *out)
> >  
> >      int got_irap = 0;
> >      int i, ret = 0;
> > +    int vps_detected, sps_detected, pps_detected = 0;
> >  
> You are only initiallizing pps_detected.
> 
> >      ret = ff_bsf_get_packet(ctx, &in);
> >      if (ret < 0)
> > @@ -146,9 +147,15 @@ static int hevc_mp4toannexb_filter(AVBSFContext *ctx, AVPacket *out)
> >  
> >          nalu_type = (bytestream2_peek_byte(&gb) >> 1) & 0x3f;
> >  
> > +        switch (nalu_type) {
> > +          case HEVC_NAL_VPS: vps_detected = 1; break;
> > +          case HEVC_NAL_SPS: sps_detected = 1; break;
> > +          case HEVC_NAL_PPS: pps_detected = 1; break;
> > +        }
> > +
> >          /* prepend extradata to IRAP frames */
> >          is_irap       = nalu_type >= 16 && nalu_type <= 23;
> > -        add_extradata = is_irap && !got_irap;
> > +        add_extradata = is_irap && !got_irap && !(vps_detected && sps_detected && pps_detected);
> >          extra_size    = add_extradata * ctx->par_out->extradata_size;
> >          got_irap     |= is_irap;
> >  
> There are two things that I don't like about this approach:
> 1. Image an input file like this: VPS, SPS and PPS in extradata and
> then after a few GOPs there is an inband PPS as part of a VPS, SPS and
> PPS combination where the PPS differs from the PPS in the extradata.
> After this change in parameter sets there are no further inband
> parameter sets. With your proposal, the extradata would again be
> inserted into access units with IRAP frames after the change in
> extradata and it would be the old, outdated extradata, thereby
> breaking decoding (the original mp4 file would probably not be able
> fully seekable in this case, but that would be another story).
> 2. If the sample in #7799 contained only the parameter set that
> actually changed inband, your proposal would still add a whole
> VPS+SPS+PPS combination and therefore still make the sample unplayable.
> 

Thanks for your feedback. 
I believe I'm quite close to finishing the new patch. 

The general workflow in the updated mp4toannexb_filter function is: 
  1. Convert input avcc packet to annexb.
  2. Split the annexb bytestream into NAL units using ff_h2645_packet_splits function
  3. If VPS/SPS/PPS nal units are present, parse them to get their id and reference to higher level parameter set.
  4. New VPS/SPS/PPS parameter sets are compared their 'cached version'. The cached versions are stored in a struct similar to HEVCParamSets. If the new parameter set is
  different, then update the cached version.
  5. Update extradata if any of the cached parameter sets have changed.  
  6. Insert new extradata at the first IRAP nal unit (i.e.
  same as original code).

The only part that still remains to be done is dealing with SEI nals. What I'm unsure about is whether SEIs from past extradata should be inserted at the start of future IRAP frames. The current code seems to do this. 
I think the consistent approach is to add SEI prefix/suffix into extradata. But, also update SEI if new version are encountered in the bytestream. 

Andriy
Andreas Rheinhardt Aug. 13, 2019, 6:24 a.m. UTC | #3
Andriy Gelman:
> Andreas,
> 
> On Sun, 21. Jul 10:47, Andreas Rheinhardt wrote:
>> Andriy Gelman:
>>> From: Andriy Gelman <andriy.gelman@gmail.com>
>>>
>>> Fixes #7799
>>>
>>> Currently, the mp4toannexb filter always inserts extradata at the start
>>> of each IRAP unit. This can lead to duplication of parameter sets if the
>>> demuxed packet from mdat atom already contains a version of the
>>> parameters.
>>>
>>> As in ticket #7799 this can also lead to decoding errors when the
>>> parameter sets of the IRAP frames are different from the ones stored in
>>> extradata.
>>>
>>> This commit avoids duplicating the parameter sets if they are already
>>> present in the demuxed packet.
>>>
>>> This commit also makes an update to the hevc-bsf-mp4toannexb fate
>>> test since the result before this patch contained duplicate vps/sps/pps
>>> nal units.
>>> ---
>>>  libavcodec/hevc_mp4toannexb_bsf.c | 9 ++++++++-
>>>  tests/fate/hevc.mak               | 2 +-
>>>  2 files changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/libavcodec/hevc_mp4toannexb_bsf.c b/libavcodec/hevc_mp4toannexb_bsf.c
>>> index 09bce5b34c..5c27306b09 100644
>>> --- a/libavcodec/hevc_mp4toannexb_bsf.c
>>> +++ b/libavcodec/hevc_mp4toannexb_bsf.c
>>> @@ -123,6 +123,7 @@ static int hevc_mp4toannexb_filter(AVBSFContext *ctx, AVPacket *out)
>>>  
>>>      int got_irap = 0;
>>>      int i, ret = 0;
>>> +    int vps_detected, sps_detected, pps_detected = 0;
>>>  
>> You are only initiallizing pps_detected.
>>
>>>      ret = ff_bsf_get_packet(ctx, &in);
>>>      if (ret < 0)
>>> @@ -146,9 +147,15 @@ static int hevc_mp4toannexb_filter(AVBSFContext *ctx, AVPacket *out)
>>>  
>>>          nalu_type = (bytestream2_peek_byte(&gb) >> 1) & 0x3f;
>>>  
>>> +        switch (nalu_type) {
>>> +          case HEVC_NAL_VPS: vps_detected = 1; break;
>>> +          case HEVC_NAL_SPS: sps_detected = 1; break;
>>> +          case HEVC_NAL_PPS: pps_detected = 1; break;
>>> +        }
>>> +
>>>          /* prepend extradata to IRAP frames */
>>>          is_irap       = nalu_type >= 16 && nalu_type <= 23;
>>> -        add_extradata = is_irap && !got_irap;
>>> +        add_extradata = is_irap && !got_irap && !(vps_detected && sps_detected && pps_detected);
>>>          extra_size    = add_extradata * ctx->par_out->extradata_size;
>>>          got_irap     |= is_irap;
>>>  
>> There are two things that I don't like about this approach:
>> 1. Image an input file like this: VPS, SPS and PPS in extradata and
>> then after a few GOPs there is an inband PPS as part of a VPS, SPS and
>> PPS combination where the PPS differs from the PPS in the extradata.
>> After this change in parameter sets there are no further inband
>> parameter sets. With your proposal, the extradata would again be
>> inserted into access units with IRAP frames after the change in
>> extradata and it would be the old, outdated extradata, thereby
>> breaking decoding (the original mp4 file would probably not be able
>> fully seekable in this case, but that would be another story).
>> 2. If the sample in #7799 contained only the parameter set that
>> actually changed inband, your proposal would still add a whole
>> VPS+SPS+PPS combination and therefore still make the sample unplayable.
>>
> 
> Thanks for your feedback. 
> I believe I'm quite close to finishing the new patch. 
> 
> The general workflow in the updated mp4toannexb_filter function is: 
>   1. Convert input avcc packet to annexb.
>   2. Split the annexb bytestream into NAL units using ff_h2645_packet_splits function
This function is quite slow (it checks the whole NAL unit for 0x03
escapes and if it finds any, it unescapes the whole NAL unit, i.e. it
copies the NAL unit with the 0x03 escapes removed). Do you limit
unescaping to the relevant NAL units only?
>   3. If VPS/SPS/PPS nal units are present, parse them to get their id and reference to higher level parameter set.
>   4. New VPS/SPS/PPS parameter sets are compared their 'cached version'. The cached versions are stored in a struct similar to HEVCParamSets. If the new parameter set is
>   different, then update the cached version.
>   5. Update extradata if any of the cached parameter sets have changed.
If I am not mistaken, then you are not allowed to modify the extradata
after init. All you could do is send side data of
AV_PKT_DATA_NEW_EXTRADATA type. But it is probably enough to simply
add the new extradata in-band.

>   6. Insert new extradata at the first IRAP nal unit (i.e.
>   same as original code).
> 
> The only part that still remains to be done is dealing with SEI nals. What I'm unsure about is whether SEIs from past extradata should be inserted at the start of future IRAP frames. The current code seems to do this. 
> I think the consistent approach is to add SEI prefix/suffix into extradata. But, also update SEI if new version are encountered in the bytestream.
New version means same payload type, but different payload? That's at
least the only generic way of doing this I can think. But the problem
is that the persistence of the SEIs depends on many factors (involving
ids), so that I don't think that 100% correct generic way of handling
SEIs is possible. If I were you, I'd go the path of least resistance
and add the SEIs in the extradata to the output extradata and dump
these SEIs into the first access unit.

- Andreas
Andriy Gelman Aug. 19, 2019, 3:19 a.m. UTC | #4
Andreas, 

On Tue, 13. Aug 06:24, Andreas Rheinhardt wrote:
> Andriy Gelman:
> > Andreas,
> > 
> > On Sun, 21. Jul 10:47, Andreas Rheinhardt wrote:
> >> Andriy Gelman:
> >>> From: Andriy Gelman <andriy.gelman@gmail.com>
> >>>
> >>> Fixes #7799
> >>>
> >>> Currently, the mp4toannexb filter always inserts extradata at the start
> >>> of each IRAP unit. This can lead to duplication of parameter sets if the
> >>> demuxed packet from mdat atom already contains a version of the
> >>> parameters.
> >>>
> >>> As in ticket #7799 this can also lead to decoding errors when the
> >>> parameter sets of the IRAP frames are different from the ones stored in
> >>> extradata.
> >>>
> >>> This commit avoids duplicating the parameter sets if they are already
> >>> present in the demuxed packet.
> >>>
> >>> This commit also makes an update to the hevc-bsf-mp4toannexb fate
> >>> test since the result before this patch contained duplicate vps/sps/pps
> >>> nal units.
> >>> ---
> >>>  libavcodec/hevc_mp4toannexb_bsf.c | 9 ++++++++-
> >>>  tests/fate/hevc.mak               | 2 +-
> >>>  2 files changed, 9 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/libavcodec/hevc_mp4toannexb_bsf.c b/libavcodec/hevc_mp4toannexb_bsf.c
> >>> index 09bce5b34c..5c27306b09 100644
> >>> --- a/libavcodec/hevc_mp4toannexb_bsf.c
> >>> +++ b/libavcodec/hevc_mp4toannexb_bsf.c
> >>> @@ -123,6 +123,7 @@ static int hevc_mp4toannexb_filter(AVBSFContext *ctx, AVPacket *out)
> >>>  
> >>>      int got_irap = 0;
> >>>      int i, ret = 0;
> >>> +    int vps_detected, sps_detected, pps_detected = 0;
> >>>  
> >> You are only initiallizing pps_detected.
> >>
> >>>      ret = ff_bsf_get_packet(ctx, &in);
> >>>      if (ret < 0)
> >>> @@ -146,9 +147,15 @@ static int hevc_mp4toannexb_filter(AVBSFContext *ctx, AVPacket *out)
> >>>  
> >>>          nalu_type = (bytestream2_peek_byte(&gb) >> 1) & 0x3f;
> >>>  
> >>> +        switch (nalu_type) {
> >>> +          case HEVC_NAL_VPS: vps_detected = 1; break;
> >>> +          case HEVC_NAL_SPS: sps_detected = 1; break;
> >>> +          case HEVC_NAL_PPS: pps_detected = 1; break;
> >>> +        }
> >>> +
> >>>          /* prepend extradata to IRAP frames */
> >>>          is_irap       = nalu_type >= 16 && nalu_type <= 23;
> >>> -        add_extradata = is_irap && !got_irap;
> >>> +        add_extradata = is_irap && !got_irap && !(vps_detected && sps_detected && pps_detected);
> >>>          extra_size    = add_extradata * ctx->par_out->extradata_size;
> >>>          got_irap     |= is_irap;
> >>>  
> >> There are two things that I don't like about this approach:
> >> 1. Image an input file like this: VPS, SPS and PPS in extradata and
> >> then after a few GOPs there is an inband PPS as part of a VPS, SPS and
> >> PPS combination where the PPS differs from the PPS in the extradata.
> >> After this change in parameter sets there are no further inband
> >> parameter sets. With your proposal, the extradata would again be
> >> inserted into access units with IRAP frames after the change in
> >> extradata and it would be the old, outdated extradata, thereby
> >> breaking decoding (the original mp4 file would probably not be able
> >> fully seekable in this case, but that would be another story).
> >> 2. If the sample in #7799 contained only the parameter set that
> >> actually changed inband, your proposal would still add a whole
> >> VPS+SPS+PPS combination and therefore still make the sample unplayable.
> >>
> > 
> > Thanks for your feedback. 
> > I believe I'm quite close to finishing the new patch. 
> > 
> > The general workflow in the updated mp4toannexb_filter function is: 
> >   1. Convert input avcc packet to annexb.
> >   2. Split the annexb bytestream into NAL units using ff_h2645_packet_splits function
> This function is quite slow (it checks the whole NAL unit for 0x03
> escapes and if it finds any, it unescapes the whole NAL unit, i.e. it
> copies the NAL unit with the 0x03 escapes removed). Do you limit
> unescaping to the relevant NAL units only?

Yes I agree, ff_h2645_packet_splits is not the best option.
We only need the escaped version for the SPS parameter set.

I hope you can review the current version. I'll work on
writing a custom ff_h2645_packet_splits after your feedback.

> >   3. If VPS/SPS/PPS nal units are present, parse them to get their id and reference to higher level parameter set.
> >   4. New VPS/SPS/PPS parameter sets are compared their 'cached version'. The cached versions are stored in a struct similar to HEVCParamSets. If the new parameter set is
> >   different, then update the cached version.
> >   5. Update extradata if any of the cached parameter sets have changed.
> If I am not mistaken, then you are not allowed to modify the extradata
> after init. All you could do is send side data of
> AV_PKT_DATA_NEW_EXTRADATA type. But it is probably enough to simply
> add the new extradata in-band.
> 
I have changed this. The original extradata is not modified. 

> >   6. Insert new extradata at the first IRAP nal unit (i.e.
> >   same as original code).
> > 
> > The only part that still remains to be done is dealing with SEI nals. What I'm unsure about is whether SEIs from past extradata should be inserted at the start of future IRAP frames. The current code seems to do this. 
> > I think the consistent approach is to add SEI prefix/suffix into extradata. But, also update SEI if new version are encountered in the bytestream.
> New version means same payload type, but different payload? That's at
> least the only generic way of doing this I can think. But the problem
> is that the persistence of the SEIs depends on many factors (involving
> ids), so that I don't think that 100% correct generic way of handling
> SEIs is possible. If I were you, I'd go the path of least resistance
> and add the SEIs in the extradata to the output extradata and dump
> these SEIs into the first access unit.

If I encounter a new sei prefix, I overwrite the version we
have cached. I hope it's clearer in the code. 

For sei suffix, it didn't make sense to keep track, as these
are inserted at the end of the nal units as in the example 
from tests/data/hevc-mp4.mov

> 
> - Andreas

Thanks, 
Andriy
diff mbox

Patch

diff --git a/libavcodec/hevc_mp4toannexb_bsf.c b/libavcodec/hevc_mp4toannexb_bsf.c
index 09bce5b34c..5c27306b09 100644
--- a/libavcodec/hevc_mp4toannexb_bsf.c
+++ b/libavcodec/hevc_mp4toannexb_bsf.c
@@ -123,6 +123,7 @@  static int hevc_mp4toannexb_filter(AVBSFContext *ctx, AVPacket *out)
 
     int got_irap = 0;
     int i, ret = 0;
+    int vps_detected, sps_detected, pps_detected = 0;
 
     ret = ff_bsf_get_packet(ctx, &in);
     if (ret < 0)
@@ -146,9 +147,15 @@  static int hevc_mp4toannexb_filter(AVBSFContext *ctx, AVPacket *out)
 
         nalu_type = (bytestream2_peek_byte(&gb) >> 1) & 0x3f;
 
+        switch (nalu_type) {
+          case HEVC_NAL_VPS: vps_detected = 1; break;
+          case HEVC_NAL_SPS: sps_detected = 1; break;
+          case HEVC_NAL_PPS: pps_detected = 1; break;
+        }
+
         /* prepend extradata to IRAP frames */
         is_irap       = nalu_type >= 16 && nalu_type <= 23;
-        add_extradata = is_irap && !got_irap;
+        add_extradata = is_irap && !got_irap && !(vps_detected && sps_detected && pps_detected);
         extra_size    = add_extradata * ctx->par_out->extradata_size;
         got_irap     |= is_irap;
 
diff --git a/tests/fate/hevc.mak b/tests/fate/hevc.mak
index 559c3898bc..4f812b0834 100644
--- a/tests/fate/hevc.mak
+++ b/tests/fate/hevc.mak
@@ -238,7 +238,7 @@  FATE_HEVC-$(call ALLYES, HEVC_DEMUXER MOV_DEMUXER HEVC_MP4TOANNEXB_BSF MOV_MUXER
 fate-hevc-bsf-mp4toannexb: tests/data/hevc-mp4.mov
 fate-hevc-bsf-mp4toannexb: CMD = md5 -i $(TARGET_PATH)/tests/data/hevc-mp4.mov -c:v copy -fflags +bitexact -f hevc
 fate-hevc-bsf-mp4toannexb: CMP = oneline
-fate-hevc-bsf-mp4toannexb: REF = 1873662a3af1848c37e4eb25722c8df9
+fate-hevc-bsf-mp4toannexb: REF = 3c9d998a3aa2b9e0fb1c1f434952bf8b
 
 fate-hevc-skiploopfilter: CMD = framemd5 -skip_loop_filter nokey -i $(TARGET_SAMPLES)/hevc-conformance/SAO_D_Samsung_5.bit -sws_flags bitexact
 FATE_HEVC += fate-hevc-skiploopfilter