diff mbox series

[FFmpeg-devel,v2] avformat/mov: add option max_stts_delta

Message ID 20211123131106.8652-1-ffmpeg@gyani.pro
State New
Headers show
Series [FFmpeg-devel,v2] avformat/mov: add option max_stts_delta
Related show

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished
andriy/make_ppc success Make finished
andriy/make_fate_ppc success Make fate finished

Commit Message

Gyan Doshi Nov. 23, 2021, 1:11 p.m. UTC
Very high stts sample deltas may occasionally be intended but usually
they are written in error or used to store a negative value for dts correction
when treated as signed 32-bit integers.

This option lets the user set an upper limit, beyond which the delta
is clamped to 1. Negative values of under 1 second are used to adjust
dts.

Unit is the track time scale. Default is INT_MAX which maintains current handling.
---
 doc/demuxers.texi  |  6 ++++++
 libavformat/isom.h |  1 +
 libavformat/mov.c  | 14 +++++++++-----
 3 files changed, 16 insertions(+), 5 deletions(-)

Comments

Michael Niedermayer Nov. 23, 2021, 7:46 p.m. UTC | #1
On Tue, Nov 23, 2021 at 06:41:06PM +0530, Gyan Doshi wrote:
> Very high stts sample deltas may occasionally be intended but usually
> they are written in error or used to store a negative value for dts correction
> when treated as signed 32-bit integers.
> 
> This option lets the user set an upper limit, beyond which the delta
> is clamped to 1. Negative values of under 1 second are used to adjust
> dts.
> 
> Unit is the track time scale. Default is INT_MAX which maintains current handling.
> ---
>  doc/demuxers.texi  |  6 ++++++
>  libavformat/isom.h |  1 +
>  libavformat/mov.c  | 14 +++++++++-----
>  3 files changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/doc/demuxers.texi b/doc/demuxers.texi
> index cab8a7072c..15078b9b1b 100644
> --- a/doc/demuxers.texi
> +++ b/doc/demuxers.texi
> @@ -713,6 +713,12 @@ specify.
>  
>  @item decryption_key
>  16-byte key, in hex, to decrypt files encrypted using ISO Common Encryption (CENC/AES-128 CTR; ISO/IEC 23001-7).
> +
> +@item max_stts_delta
> +The sample offsets stored in a track's stts box are 32-bit unsigned integers. However, very large values usually indicate
> +a value written by error or a storage of a small negative value as a way to correct accumulated DTS delay.
> +Range is 0 to UINT_MAX. Default is INT_MAX.
> +
>  @end table
>  
>  @subsection Audible AAX
> diff --git a/libavformat/isom.h b/libavformat/isom.h
> index ef8f19b18c..625dea8421 100644
> --- a/libavformat/isom.h
> +++ b/libavformat/isom.h
> @@ -305,6 +305,7 @@ typedef struct MOVContext {
>      int32_t movie_display_matrix[3][3]; ///< display matrix from mvhd
>      int have_read_mfra_size;
>      uint32_t mfra_size;
> +    uint32_t max_stts_delta;
>  } MOVContext;
>  
>  int ff_mp4_read_descr_len(AVIOContext *pb);
> diff --git a/libavformat/mov.c b/libavformat/mov.c
> index 451cb78bbf..bbda07ac42 100644
> --- a/libavformat/mov.c
> +++ b/libavformat/mov.c
> @@ -3965,14 +3965,17 @@ static void mov_build_index(MOVContext *mov, AVStream *st)
>                  current_offset += sample_size;
>                  stream_size += sample_size;
>  
> -                /* A negative sample duration is invalid based on the spec,
> -                 * but some samples need it to correct the DTS. */
> -                if (sc->stts_data[stts_index].duration < 0) {
> +                /* STTS sample offsets are uint32 but some files store it as int32
> +                 * with negative values used to correct DTS delays.
> +                   There may be abnormally large values as well. */
> +                if (sc->stts_data[stts_index].duration > mov->max_stts_delta) {
> +                    // assume high delta is a negative correction if less than 1 second
> +                    int32_t delta_magnitude = *((int32_t *)&sc->stts_data[stts_index].duration);
>                      av_log(mov->fc, AV_LOG_WARNING,
> -                           "Invalid SampleDelta %d in STTS, at %d st:%d\n",
> +                           "Correcting too large SampleDelta %u in STTS, at %d st:%d.\n",
>                             sc->stts_data[stts_index].duration, stts_index,
>                             st->index);
> -                    dts_correction += sc->stts_data[stts_index].duration - 1;
> +                    dts_correction += (delta_magnitude < 0 && FFABS(delta_magnitude) < sc->time_scale ? delta_magnitude - 1 : 0);
>                      sc->stts_data[stts_index].duration = 1;
>                  }
>                  current_dts += sc->stts_data[stts_index].duration;
> @@ -8566,6 +8569,7 @@ static const AVOption mov_options[] = {
>      { "decryption_key", "The media decryption key (hex)", OFFSET(decryption_key), AV_OPT_TYPE_BINARY, .flags = AV_OPT_FLAG_DECODING_PARAM },
>      { "enable_drefs", "Enable external track support.", OFFSET(enable_drefs), AV_OPT_TYPE_BOOL,
>          {.i64 = 0}, 0, 1, FLAGS },
> +    { "max_stts_delta", "treat offsets above this value as invalid", OFFSET(max_stts_delta), AV_OPT_TYPE_INT, {.i64 = INT_MAX}, 0, UINT_MAX, .flags = AV_OPT_FLAG_DECODING_PARAM },
>  
>      { NULL },
>  };

The stts is used in other places, the value parsed as unsigned will cause
problems there too

the cast is also fragile as it will break when someone tries to change it
to int64

FFABS(delta_magnitude) is also undefined for INT_MIN

also please select the default so that it works with all real world files

I have not yet seen an ambigous file, +6 month vs -10ms is quite clear
what where the largest values your files used ?

thx


[...]
Gyan Doshi Nov. 24, 2021, 5:28 a.m. UTC | #2
On 2021-11-24 01:16 am, Michael Niedermayer wrote:
> On Tue, Nov 23, 2021 at 06:41:06PM +0530, Gyan Doshi wrote:
>> Very high stts sample deltas may occasionally be intended but usually
>> they are written in error or used to store a negative value for dts correction
>> when treated as signed 32-bit integers.
>>
>> This option lets the user set an upper limit, beyond which the delta
>> is clamped to 1. Negative values of under 1 second are used to adjust
>> dts.
>>
>> Unit is the track time scale. Default is INT_MAX which maintains current handling.
>> ---
>>   doc/demuxers.texi  |  6 ++++++
>>   libavformat/isom.h |  1 +
>>   libavformat/mov.c  | 14 +++++++++-----
>>   3 files changed, 16 insertions(+), 5 deletions(-)
>>
>> diff --git a/doc/demuxers.texi b/doc/demuxers.texi
>> index cab8a7072c..15078b9b1b 100644
>> --- a/doc/demuxers.texi
>> +++ b/doc/demuxers.texi
>> @@ -713,6 +713,12 @@ specify.
>>   
>>   @item decryption_key
>>   16-byte key, in hex, to decrypt files encrypted using ISO Common Encryption (CENC/AES-128 CTR; ISO/IEC 23001-7).
>> +
>> +@item max_stts_delta
>> +The sample offsets stored in a track's stts box are 32-bit unsigned integers. However, very large values usually indicate
>> +a value written by error or a storage of a small negative value as a way to correct accumulated DTS delay.
>> +Range is 0 to UINT_MAX. Default is INT_MAX.
>> +
>>   @end table
>>   
>>   @subsection Audible AAX
>> diff --git a/libavformat/isom.h b/libavformat/isom.h
>> index ef8f19b18c..625dea8421 100644
>> --- a/libavformat/isom.h
>> +++ b/libavformat/isom.h
>> @@ -305,6 +305,7 @@ typedef struct MOVContext {
>>       int32_t movie_display_matrix[3][3]; ///< display matrix from mvhd
>>       int have_read_mfra_size;
>>       uint32_t mfra_size;
>> +    uint32_t max_stts_delta;
>>   } MOVContext;
>>   
>>   int ff_mp4_read_descr_len(AVIOContext *pb);
>> diff --git a/libavformat/mov.c b/libavformat/mov.c
>> index 451cb78bbf..bbda07ac42 100644
>> --- a/libavformat/mov.c
>> +++ b/libavformat/mov.c
>> @@ -3965,14 +3965,17 @@ static void mov_build_index(MOVContext *mov, AVStream *st)
>>                   current_offset += sample_size;
>>                   stream_size += sample_size;
>>   
>> -                /* A negative sample duration is invalid based on the spec,
>> -                 * but some samples need it to correct the DTS. */
>> -                if (sc->stts_data[stts_index].duration < 0) {
>> +                /* STTS sample offsets are uint32 but some files store it as int32
>> +                 * with negative values used to correct DTS delays.
>> +                   There may be abnormally large values as well. */
>> +                if (sc->stts_data[stts_index].duration > mov->max_stts_delta) {
>> +                    // assume high delta is a negative correction if less than 1 second
>> +                    int32_t delta_magnitude = *((int32_t *)&sc->stts_data[stts_index].duration);
>>                       av_log(mov->fc, AV_LOG_WARNING,
>> -                           "Invalid SampleDelta %d in STTS, at %d st:%d\n",
>> +                           "Correcting too large SampleDelta %u in STTS, at %d st:%d.\n",
>>                              sc->stts_data[stts_index].duration, stts_index,
>>                              st->index);
>> -                    dts_correction += sc->stts_data[stts_index].duration - 1;
>> +                    dts_correction += (delta_magnitude < 0 && FFABS(delta_magnitude) < sc->time_scale ? delta_magnitude - 1 : 0);
>>                       sc->stts_data[stts_index].duration = 1;
>>                   }
>>                   current_dts += sc->stts_data[stts_index].duration;
>> @@ -8566,6 +8569,7 @@ static const AVOption mov_options[] = {
>>       { "decryption_key", "The media decryption key (hex)", OFFSET(decryption_key), AV_OPT_TYPE_BINARY, .flags = AV_OPT_FLAG_DECODING_PARAM },
>>       { "enable_drefs", "Enable external track support.", OFFSET(enable_drefs), AV_OPT_TYPE_BOOL,
>>           {.i64 = 0}, 0, 1, FLAGS },
>> +    { "max_stts_delta", "treat offsets above this value as invalid", OFFSET(max_stts_delta), AV_OPT_TYPE_INT, {.i64 = INT_MAX}, 0, UINT_MAX, .flags = AV_OPT_FLAG_DECODING_PARAM },
>>   
>>       { NULL },
>>   };
> The stts is used in other places, the value parsed as unsigned will cause
> problems there too
I see stts duration used in 3 places in mov.c

mov_read_stts(), which my earlier patch changed.

mov_build_index(), which this patch changes,

mov_read_trak() where frame rate is populated for CFR streams and is 
called very shortly after mov_build_index().
I don't think that can break for non-malformed files as the only 
duration entry in the stream is not likely to be > INT_MAX
In any case, after this patch, with default option value, it will see 
the same values as earlier.

> the cast is also fragile as it will break when someone tries to change it
> to int64

In what circumstances would that happen?

The syntax of stts sample_delta has been unchanged since the start.
I see  'unsigned int(32) sample_delta;' in the 2005 2nd ed.
Same in the 2015 5th ed.

> FFABS(delta_magnitude) is also undefined for INT_MIN

Will add a check for that.

> also please select the default so that it works with all real world files

No single value can all work for all files. Hence the option.

The standard says this,

"The time to sample boxes must give non-zero durations for all samples 
with the possible exception of the last
one. Durations in the ‘stts’ box are strictly positive (non-zero), 
except for the very last entry, which may be zero"

This part is normative. mp4-negative-stts-problem is out-of-spec by 
using signed stts deltas. It uses 'mp42' for ftyp and 'isom' as a 
compatible brand.

The default value emulates pre-patch handling. So my in-spec file 
wouldn't parse well, but users would have the ability to allow it.
FATE passes.

My file had slightly different offsets of 7 hrs in both audio and video 
tracks with a time scale of 90000 in both tracks, recorded using Wowza.

(The stts delta is not a media packet duration. It is a decoding interval.
"The decoding time is defined in the Decoding Time to Sample Box, giving 
time deltas between successive decoding times.")

This is not an academic exercise.

Regards,
Gyan
Michael Niedermayer Nov. 24, 2021, 7:26 p.m. UTC | #3
On Wed, Nov 24, 2021 at 10:58:00AM +0530, Gyan Doshi wrote:
> 
> 
> On 2021-11-24 01:16 am, Michael Niedermayer wrote:
> > On Tue, Nov 23, 2021 at 06:41:06PM +0530, Gyan Doshi wrote:
> > > Very high stts sample deltas may occasionally be intended but usually
> > > they are written in error or used to store a negative value for dts correction
> > > when treated as signed 32-bit integers.
> > > 
> > > This option lets the user set an upper limit, beyond which the delta
> > > is clamped to 1. Negative values of under 1 second are used to adjust
> > > dts.
> > > 
> > > Unit is the track time scale. Default is INT_MAX which maintains current handling.
> > > ---
> > >   doc/demuxers.texi  |  6 ++++++
> > >   libavformat/isom.h |  1 +
> > >   libavformat/mov.c  | 14 +++++++++-----
> > >   3 files changed, 16 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/doc/demuxers.texi b/doc/demuxers.texi
> > > index cab8a7072c..15078b9b1b 100644
> > > --- a/doc/demuxers.texi
> > > +++ b/doc/demuxers.texi
> > > @@ -713,6 +713,12 @@ specify.
> > >   @item decryption_key
> > >   16-byte key, in hex, to decrypt files encrypted using ISO Common Encryption (CENC/AES-128 CTR; ISO/IEC 23001-7).
> > > +
> > > +@item max_stts_delta
> > > +The sample offsets stored in a track's stts box are 32-bit unsigned integers. However, very large values usually indicate
> > > +a value written by error or a storage of a small negative value as a way to correct accumulated DTS delay.
> > > +Range is 0 to UINT_MAX. Default is INT_MAX.
> > > +
> > >   @end table
> > >   @subsection Audible AAX
> > > diff --git a/libavformat/isom.h b/libavformat/isom.h
> > > index ef8f19b18c..625dea8421 100644
> > > --- a/libavformat/isom.h
> > > +++ b/libavformat/isom.h
> > > @@ -305,6 +305,7 @@ typedef struct MOVContext {
> > >       int32_t movie_display_matrix[3][3]; ///< display matrix from mvhd
> > >       int have_read_mfra_size;
> > >       uint32_t mfra_size;
> > > +    uint32_t max_stts_delta;
> > >   } MOVContext;
> > >   int ff_mp4_read_descr_len(AVIOContext *pb);
> > > diff --git a/libavformat/mov.c b/libavformat/mov.c
> > > index 451cb78bbf..bbda07ac42 100644
> > > --- a/libavformat/mov.c
> > > +++ b/libavformat/mov.c
> > > @@ -3965,14 +3965,17 @@ static void mov_build_index(MOVContext *mov, AVStream *st)
> > >                   current_offset += sample_size;
> > >                   stream_size += sample_size;
> > > -                /* A negative sample duration is invalid based on the spec,
> > > -                 * but some samples need it to correct the DTS. */
> > > -                if (sc->stts_data[stts_index].duration < 0) {
> > > +                /* STTS sample offsets are uint32 but some files store it as int32
> > > +                 * with negative values used to correct DTS delays.
> > > +                   There may be abnormally large values as well. */
> > > +                if (sc->stts_data[stts_index].duration > mov->max_stts_delta) {
> > > +                    // assume high delta is a negative correction if less than 1 second
> > > +                    int32_t delta_magnitude = *((int32_t *)&sc->stts_data[stts_index].duration);
> > >                       av_log(mov->fc, AV_LOG_WARNING,
> > > -                           "Invalid SampleDelta %d in STTS, at %d st:%d\n",
> > > +                           "Correcting too large SampleDelta %u in STTS, at %d st:%d.\n",
> > >                              sc->stts_data[stts_index].duration, stts_index,
> > >                              st->index);
> > > -                    dts_correction += sc->stts_data[stts_index].duration - 1;
> > > +                    dts_correction += (delta_magnitude < 0 && FFABS(delta_magnitude) < sc->time_scale ? delta_magnitude - 1 : 0);
> > >                       sc->stts_data[stts_index].duration = 1;
> > >                   }
> > >                   current_dts += sc->stts_data[stts_index].duration;
> > > @@ -8566,6 +8569,7 @@ static const AVOption mov_options[] = {
> > >       { "decryption_key", "The media decryption key (hex)", OFFSET(decryption_key), AV_OPT_TYPE_BINARY, .flags = AV_OPT_FLAG_DECODING_PARAM },
> > >       { "enable_drefs", "Enable external track support.", OFFSET(enable_drefs), AV_OPT_TYPE_BOOL,
> > >           {.i64 = 0}, 0, 1, FLAGS },
> > > +    { "max_stts_delta", "treat offsets above this value as invalid", OFFSET(max_stts_delta), AV_OPT_TYPE_INT, {.i64 = INT_MAX}, 0, UINT_MAX, .flags = AV_OPT_FLAG_DECODING_PARAM },
> > >       { NULL },
> > >   };
> > The stts is used in other places, the value parsed as unsigned will cause
> > problems there too
> I see stts duration used in 3 places in mov.c
> 
> mov_read_stts(), which my earlier patch changed.
> 
> mov_build_index(), which this patch changes,
> 
> mov_read_trak() where frame rate is populated for CFR streams and is called
> very shortly after mov_build_index().
> I don't think that can break for non-malformed files as the only duration
> entry in the stream is not likely to be > INT_MAX
> In any case, after this patch, with default option value, it will see the
> same values as earlier.
> 

If iam not mistaken there is code that adds the values up to initilize some
duration. this works with 64bit so the small negative values interpreted as
unsigned will result in a wrong result i would expect
that may very well not be the only issue


> > the cast is also fragile as it will break when someone tries to change it
> > to int64
> 
> In what circumstances would that happen?

when someone tries to change the code.
having both signed and unsigned 32bit in a 32bit element screams
almost to have someone change this and simplify the code or fix some
bugs related to it


[...]
> > also please select the default so that it works with all real world files
> 
> No single value can all work for all files. Hence the option.

but what i wrote was about all real world files.
Sure you can make a file that fails but so far the files i have seen and
the ones you told me about all would work with 49% of all possibly defaults
why do you pick a default that doesnt work with them and then argue about
that ?
Lets try to fix the code not win the argument please

[...]

> My file had slightly different offsets of 7 hrs in both audio and video
> tracks with a time scale of 90000 in both tracks, recorded using Wowza.

[...]
Gyan Doshi Nov. 25, 2021, 5:21 a.m. UTC | #4
On 2021-11-25 12:56 am, Michael Niedermayer wrote:
> On Wed, Nov 24, 2021 at 10:58:00AM +0530, Gyan Doshi wrote:
>>
>> On 2021-11-24 01:16 am, Michael Niedermayer wrote:
>>> On Tue, Nov 23, 2021 at 06:41:06PM +0530, Gyan Doshi wrote:
>>>> Very high stts sample deltas may occasionally be intended but usually
>>>> they are written in error or used to store a negative value for dts correction
>>>> when treated as signed 32-bit integers.
>>>>
>>>> This option lets the user set an upper limit, beyond which the delta
>>>> is clamped to 1. Negative values of under 1 second are used to adjust
>>>> dts.
>>>>
>>>> Unit is the track time scale. Default is INT_MAX which maintains current handling.
>>>> ---
>>>>    doc/demuxers.texi  |  6 ++++++
>>>>    libavformat/isom.h |  1 +
>>>>    libavformat/mov.c  | 14 +++++++++-----
>>>>    3 files changed, 16 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/doc/demuxers.texi b/doc/demuxers.texi
>>>> index cab8a7072c..15078b9b1b 100644
>>>> --- a/doc/demuxers.texi
>>>> +++ b/doc/demuxers.texi
>>>> @@ -713,6 +713,12 @@ specify.
>>>>    @item decryption_key
>>>>    16-byte key, in hex, to decrypt files encrypted using ISO Common Encryption (CENC/AES-128 CTR; ISO/IEC 23001-7).
>>>> +
>>>> +@item max_stts_delta
>>>> +The sample offsets stored in a track's stts box are 32-bit unsigned integers. However, very large values usually indicate
>>>> +a value written by error or a storage of a small negative value as a way to correct accumulated DTS delay.
>>>> +Range is 0 to UINT_MAX. Default is INT_MAX.
>>>> +
>>>>    @end table
>>>>    @subsection Audible AAX
>>>> diff --git a/libavformat/isom.h b/libavformat/isom.h
>>>> index ef8f19b18c..625dea8421 100644
>>>> --- a/libavformat/isom.h
>>>> +++ b/libavformat/isom.h
>>>> @@ -305,6 +305,7 @@ typedef struct MOVContext {
>>>>        int32_t movie_display_matrix[3][3]; ///< display matrix from mvhd
>>>>        int have_read_mfra_size;
>>>>        uint32_t mfra_size;
>>>> +    uint32_t max_stts_delta;
>>>>    } MOVContext;
>>>>    int ff_mp4_read_descr_len(AVIOContext *pb);
>>>> diff --git a/libavformat/mov.c b/libavformat/mov.c
>>>> index 451cb78bbf..bbda07ac42 100644
>>>> --- a/libavformat/mov.c
>>>> +++ b/libavformat/mov.c
>>>> @@ -3965,14 +3965,17 @@ static void mov_build_index(MOVContext *mov, AVStream *st)
>>>>                    current_offset += sample_size;
>>>>                    stream_size += sample_size;
>>>> -                /* A negative sample duration is invalid based on the spec,
>>>> -                 * but some samples need it to correct the DTS. */
>>>> -                if (sc->stts_data[stts_index].duration < 0) {
>>>> +                /* STTS sample offsets are uint32 but some files store it as int32
>>>> +                 * with negative values used to correct DTS delays.
>>>> +                   There may be abnormally large values as well. */
>>>> +                if (sc->stts_data[stts_index].duration > mov->max_stts_delta) {
>>>> +                    // assume high delta is a negative correction if less than 1 second
>>>> +                    int32_t delta_magnitude = *((int32_t *)&sc->stts_data[stts_index].duration);
>>>>                        av_log(mov->fc, AV_LOG_WARNING,
>>>> -                           "Invalid SampleDelta %d in STTS, at %d st:%d\n",
>>>> +                           "Correcting too large SampleDelta %u in STTS, at %d st:%d.\n",
>>>>                               sc->stts_data[stts_index].duration, stts_index,
>>>>                               st->index);
>>>> -                    dts_correction += sc->stts_data[stts_index].duration - 1;
>>>> +                    dts_correction += (delta_magnitude < 0 && FFABS(delta_magnitude) < sc->time_scale ? delta_magnitude - 1 : 0);
>>>>                        sc->stts_data[stts_index].duration = 1;
>>>>                    }
>>>>                    current_dts += sc->stts_data[stts_index].duration;
>>>> @@ -8566,6 +8569,7 @@ static const AVOption mov_options[] = {
>>>>        { "decryption_key", "The media decryption key (hex)", OFFSET(decryption_key), AV_OPT_TYPE_BINARY, .flags = AV_OPT_FLAG_DECODING_PARAM },
>>>>        { "enable_drefs", "Enable external track support.", OFFSET(enable_drefs), AV_OPT_TYPE_BOOL,
>>>>            {.i64 = 0}, 0, 1, FLAGS },
>>>> +    { "max_stts_delta", "treat offsets above this value as invalid", OFFSET(max_stts_delta), AV_OPT_TYPE_INT, {.i64 = INT_MAX}, 0, UINT_MAX, .flags = AV_OPT_FLAG_DECODING_PARAM },
>>>>        { NULL },
>>>>    };
>>> The stts is used in other places, the value parsed as unsigned will cause
>>> problems there too
>> I see stts duration used in 3 places in mov.c
>>
>> mov_read_stts(), which my earlier patch changed.
>>
>> mov_build_index(), which this patch changes,
>>
>> mov_read_trak() where frame rate is populated for CFR streams and is called
>> very shortly after mov_build_index().
>> I don't think that can break for non-malformed files as the only duration
>> entry in the stream is not likely to be > INT_MAX
>> In any case, after this patch, with default option value, it will see the
>> same values as earlier.
>>
> If iam not mistaken there is code that adds the values up to initilize some
> duration. this works with 64bit so the small negative values interpreted as
> unsigned will result in a wrong result i would expect
> that may very well not be the only issue

Yeah, I see the stream duration being set in mov_read_stts().
Interestingly,  after 153639cb9cf, the unadjusted deltas were being 
added, which would have included all negative deltas, small or large.

153639cb9cf  shifted the original negative delta adjustment code from 
mov_read_stts to mov_build_index. Was that necessary?


>>> the cast is also fragile as it will break when someone tries to change it
>>> to int64
>> In what circumstances would that happen?
> when someone tries to change the code.
> having both signed and unsigned 32bit in a 32bit element screams

Originally, negative values never left mov_read_stts. Maybe that's best.

Anyway, FATE passes, so it may only affect edge cases if it does. But I 
will walk through the code to check.


> [...]
>>> also please select the default so that it works with all real world files
>> No single value can all work for all files. Hence the option.
> but what i wrote was about all real world files.
> Sure you can make a file that fails but so far the files i have seen and
> the ones you told me about all would work with 49% of all possibly defaults
> why do you pick a default that doesnt work with them and then argue about
> that ?
> Lets try to fix the code not win the argument please

I picked the default to keep the old behaviour. 99% of files I see have 
all valid offsets.
Among the rest, large values are mostly errors. Very few are intended.
What default do you suggest?

Regards,
Gyan
Michael Niedermayer Nov. 25, 2021, 5:03 p.m. UTC | #5
On Thu, Nov 25, 2021 at 10:51:15AM +0530, Gyan Doshi wrote:
> 
> 
> On 2021-11-25 12:56 am, Michael Niedermayer wrote:
> > On Wed, Nov 24, 2021 at 10:58:00AM +0530, Gyan Doshi wrote:
> > > 
> > > On 2021-11-24 01:16 am, Michael Niedermayer wrote:
> > > > On Tue, Nov 23, 2021 at 06:41:06PM +0530, Gyan Doshi wrote:
> > > > > Very high stts sample deltas may occasionally be intended but usually
> > > > > they are written in error or used to store a negative value for dts correction
> > > > > when treated as signed 32-bit integers.
> > > > > 
> > > > > This option lets the user set an upper limit, beyond which the delta
> > > > > is clamped to 1. Negative values of under 1 second are used to adjust
> > > > > dts.
> > > > > 
> > > > > Unit is the track time scale. Default is INT_MAX which maintains current handling.
> > > > > ---
> > > > >    doc/demuxers.texi  |  6 ++++++
> > > > >    libavformat/isom.h |  1 +
> > > > >    libavformat/mov.c  | 14 +++++++++-----
> > > > >    3 files changed, 16 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/doc/demuxers.texi b/doc/demuxers.texi
> > > > > index cab8a7072c..15078b9b1b 100644
> > > > > --- a/doc/demuxers.texi
> > > > > +++ b/doc/demuxers.texi
> > > > > @@ -713,6 +713,12 @@ specify.
> > > > >    @item decryption_key
> > > > >    16-byte key, in hex, to decrypt files encrypted using ISO Common Encryption (CENC/AES-128 CTR; ISO/IEC 23001-7).
> > > > > +
> > > > > +@item max_stts_delta
> > > > > +The sample offsets stored in a track's stts box are 32-bit unsigned integers. However, very large values usually indicate
> > > > > +a value written by error or a storage of a small negative value as a way to correct accumulated DTS delay.
> > > > > +Range is 0 to UINT_MAX. Default is INT_MAX.
> > > > > +
> > > > >    @end table
> > > > >    @subsection Audible AAX
> > > > > diff --git a/libavformat/isom.h b/libavformat/isom.h
> > > > > index ef8f19b18c..625dea8421 100644
> > > > > --- a/libavformat/isom.h
> > > > > +++ b/libavformat/isom.h
> > > > > @@ -305,6 +305,7 @@ typedef struct MOVContext {
> > > > >        int32_t movie_display_matrix[3][3]; ///< display matrix from mvhd
> > > > >        int have_read_mfra_size;
> > > > >        uint32_t mfra_size;
> > > > > +    uint32_t max_stts_delta;
> > > > >    } MOVContext;
> > > > >    int ff_mp4_read_descr_len(AVIOContext *pb);
> > > > > diff --git a/libavformat/mov.c b/libavformat/mov.c
> > > > > index 451cb78bbf..bbda07ac42 100644
> > > > > --- a/libavformat/mov.c
> > > > > +++ b/libavformat/mov.c
> > > > > @@ -3965,14 +3965,17 @@ static void mov_build_index(MOVContext *mov, AVStream *st)
> > > > >                    current_offset += sample_size;
> > > > >                    stream_size += sample_size;
> > > > > -                /* A negative sample duration is invalid based on the spec,
> > > > > -                 * but some samples need it to correct the DTS. */
> > > > > -                if (sc->stts_data[stts_index].duration < 0) {
> > > > > +                /* STTS sample offsets are uint32 but some files store it as int32
> > > > > +                 * with negative values used to correct DTS delays.
> > > > > +                   There may be abnormally large values as well. */
> > > > > +                if (sc->stts_data[stts_index].duration > mov->max_stts_delta) {
> > > > > +                    // assume high delta is a negative correction if less than 1 second
> > > > > +                    int32_t delta_magnitude = *((int32_t *)&sc->stts_data[stts_index].duration);
> > > > >                        av_log(mov->fc, AV_LOG_WARNING,
> > > > > -                           "Invalid SampleDelta %d in STTS, at %d st:%d\n",
> > > > > +                           "Correcting too large SampleDelta %u in STTS, at %d st:%d.\n",
> > > > >                               sc->stts_data[stts_index].duration, stts_index,
> > > > >                               st->index);
> > > > > -                    dts_correction += sc->stts_data[stts_index].duration - 1;
> > > > > +                    dts_correction += (delta_magnitude < 0 && FFABS(delta_magnitude) < sc->time_scale ? delta_magnitude - 1 : 0);
> > > > >                        sc->stts_data[stts_index].duration = 1;
> > > > >                    }
> > > > >                    current_dts += sc->stts_data[stts_index].duration;
> > > > > @@ -8566,6 +8569,7 @@ static const AVOption mov_options[] = {
> > > > >        { "decryption_key", "The media decryption key (hex)", OFFSET(decryption_key), AV_OPT_TYPE_BINARY, .flags = AV_OPT_FLAG_DECODING_PARAM },
> > > > >        { "enable_drefs", "Enable external track support.", OFFSET(enable_drefs), AV_OPT_TYPE_BOOL,
> > > > >            {.i64 = 0}, 0, 1, FLAGS },
> > > > > +    { "max_stts_delta", "treat offsets above this value as invalid", OFFSET(max_stts_delta), AV_OPT_TYPE_INT, {.i64 = INT_MAX}, 0, UINT_MAX, .flags = AV_OPT_FLAG_DECODING_PARAM },
> > > > >        { NULL },
> > > > >    };
> > > > The stts is used in other places, the value parsed as unsigned will cause
> > > > problems there too
> > > I see stts duration used in 3 places in mov.c
> > > 
> > > mov_read_stts(), which my earlier patch changed.
> > > 
> > > mov_build_index(), which this patch changes,
> > > 
> > > mov_read_trak() where frame rate is populated for CFR streams and is called
> > > very shortly after mov_build_index().
> > > I don't think that can break for non-malformed files as the only duration
> > > entry in the stream is not likely to be > INT_MAX
> > > In any case, after this patch, with default option value, it will see the
> > > same values as earlier.
> > > 
> > If iam not mistaken there is code that adds the values up to initilize some
> > duration. this works with 64bit so the small negative values interpreted as
> > unsigned will result in a wrong result i would expect
> > that may very well not be the only issue
> 
> Yeah, I see the stream duration being set in mov_read_stts().
> Interestingly,  after 153639cb9cf, the unadjusted deltas were being added,
> which would have included all negative deltas, small or large.

did you check which variant leads to the correct duration ?
i naively thought that as it was before was correct but i didnt check
that so i can very well be wrong


> 
> 153639cb9cf  shifted the original negative delta adjustment code from
> mov_read_stts to mov_build_index. Was that necessary?

thats andreas commit, 5 years ago, i cannot awnser this 


> 
> 
> > > > the cast is also fragile as it will break when someone tries to change it
> > > > to int64
> > > In what circumstances would that happen?
> > when someone tries to change the code.
> > having both signed and unsigned 32bit in a 32bit element screams
> 
> Originally, negative values never left mov_read_stts. Maybe that's best.
> 
> Anyway, FATE passes, so it may only affect edge cases if it does. But I will
> walk through the code to check.
> 
> 
> > [...]
> > > > also please select the default so that it works with all real world files
> > > No single value can all work for all files. Hence the option.
> > but what i wrote was about all real world files.
> > Sure you can make a file that fails but so far the files i have seen and
> > the ones you told me about all would work with 49% of all possibly defaults
> > why do you pick a default that doesnt work with them and then argue about
> > that ?
> > Lets try to fix the code not win the argument please
> 
> I picked the default to keep the old behaviour. 99% of files I see have all
> valid offsets.
> Among the rest, large values are mostly errors. Very few are intended.
> What default do you suggest?

I would suggest something that treats small negative ones as negative
signed but leaves all the rest as unsigned. I dont really know what
"small" should be here but what the file here used was around -10ms
so if we take a common timebase and 10seconds that should hopefully
catch all these negative cases and leave >99% of the space for unsigned
 
thx

[...]
diff mbox series

Patch

diff --git a/doc/demuxers.texi b/doc/demuxers.texi
index cab8a7072c..15078b9b1b 100644
--- a/doc/demuxers.texi
+++ b/doc/demuxers.texi
@@ -713,6 +713,12 @@  specify.
 
 @item decryption_key
 16-byte key, in hex, to decrypt files encrypted using ISO Common Encryption (CENC/AES-128 CTR; ISO/IEC 23001-7).
+
+@item max_stts_delta
+The sample offsets stored in a track's stts box are 32-bit unsigned integers. However, very large values usually indicate
+a value written by error or a storage of a small negative value as a way to correct accumulated DTS delay.
+Range is 0 to UINT_MAX. Default is INT_MAX.
+
 @end table
 
 @subsection Audible AAX
diff --git a/libavformat/isom.h b/libavformat/isom.h
index ef8f19b18c..625dea8421 100644
--- a/libavformat/isom.h
+++ b/libavformat/isom.h
@@ -305,6 +305,7 @@  typedef struct MOVContext {
     int32_t movie_display_matrix[3][3]; ///< display matrix from mvhd
     int have_read_mfra_size;
     uint32_t mfra_size;
+    uint32_t max_stts_delta;
 } MOVContext;
 
 int ff_mp4_read_descr_len(AVIOContext *pb);
diff --git a/libavformat/mov.c b/libavformat/mov.c
index 451cb78bbf..bbda07ac42 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -3965,14 +3965,17 @@  static void mov_build_index(MOVContext *mov, AVStream *st)
                 current_offset += sample_size;
                 stream_size += sample_size;
 
-                /* A negative sample duration is invalid based on the spec,
-                 * but some samples need it to correct the DTS. */
-                if (sc->stts_data[stts_index].duration < 0) {
+                /* STTS sample offsets are uint32 but some files store it as int32
+                 * with negative values used to correct DTS delays.
+                   There may be abnormally large values as well. */
+                if (sc->stts_data[stts_index].duration > mov->max_stts_delta) {
+                    // assume high delta is a negative correction if less than 1 second
+                    int32_t delta_magnitude = *((int32_t *)&sc->stts_data[stts_index].duration);
                     av_log(mov->fc, AV_LOG_WARNING,
-                           "Invalid SampleDelta %d in STTS, at %d st:%d\n",
+                           "Correcting too large SampleDelta %u in STTS, at %d st:%d.\n",
                            sc->stts_data[stts_index].duration, stts_index,
                            st->index);
-                    dts_correction += sc->stts_data[stts_index].duration - 1;
+                    dts_correction += (delta_magnitude < 0 && FFABS(delta_magnitude) < sc->time_scale ? delta_magnitude - 1 : 0);
                     sc->stts_data[stts_index].duration = 1;
                 }
                 current_dts += sc->stts_data[stts_index].duration;
@@ -8566,6 +8569,7 @@  static const AVOption mov_options[] = {
     { "decryption_key", "The media decryption key (hex)", OFFSET(decryption_key), AV_OPT_TYPE_BINARY, .flags = AV_OPT_FLAG_DECODING_PARAM },
     { "enable_drefs", "Enable external track support.", OFFSET(enable_drefs), AV_OPT_TYPE_BOOL,
         {.i64 = 0}, 0, 1, FLAGS },
+    { "max_stts_delta", "treat offsets above this value as invalid", OFFSET(max_stts_delta), AV_OPT_TYPE_INT, {.i64 = INT_MAX}, 0, UINT_MAX, .flags = AV_OPT_FLAG_DECODING_PARAM },
 
     { NULL },
 };