[FFmpeg-devel,13/21] avformat/matroskadec: Improve length check

Submitted by Oliver Collyer via ffmpeg-devel on March 27, 2019, 11:18 a.m.

Details

Message ID 20190327111852.3784-14-andreas.rheinhardt@googlemail.com
State New
Headers show

Commit Message

Oliver Collyer via ffmpeg-devel March 27, 2019, 11:18 a.m.
The earlier code had three flaws:

1. The case of an unknown-sized element inside a finite-sized element
(which is against the specifications) was not caught.

2. The error message wasn't helpful: It compared the length of the child
with the offset of the end of the parent and claimed that the first
exceeds the latter, although that is not necessarily true.

3. Unknown-sized elements that are not parsed can't be skipped. Given
that according to the Matroska specifications only the segment and the
clusters can be of unknown-size, this is handled by not allowing any
other units to have infinite size whereas the earlier code would seek
back by 1 byte upon encountering an infinite-size element that ought
to be skipped.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com>
---
 libavformat/matroskadec.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

Comments

Michael Niedermayer April 5, 2019, 10:08 a.m.
On Wed, Mar 27, 2019 at 12:18:44PM +0100, Andreas Rheinhardt via ffmpeg-devel wrote:
> The earlier code had three flaws:
> 
> 1. The case of an unknown-sized element inside a finite-sized element
> (which is against the specifications) was not caught.
> 
> 2. The error message wasn't helpful: It compared the length of the child
> with the offset of the end of the parent and claimed that the first
> exceeds the latter, although that is not necessarily true.
> 
> 3. Unknown-sized elements that are not parsed can't be skipped. Given
> that according to the Matroska specifications only the segment and the
> clusters can be of unknown-size, this is handled by not allowing any
> other units to have infinite size whereas the earlier code would seek
> back by 1 byte upon encountering an infinite-size element that ought
> to be skipped.
> 
> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com>
> ---
>  libavformat/matroskadec.c | 26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)

will apply

thanks


[...]
Steve Lhomme April 7, 2019, 7:17 a.m.
On 3/27/2019 12:18 PM, Andreas Rheinhardt via ffmpeg-devel wrote:
> The earlier code had three flaws:
>
> 1. The case of an unknown-sized element inside a finite-sized element
> (which is against the specifications) was not caught.
>
> 2. The error message wasn't helpful: It compared the length of the child
> with the offset of the end of the parent and claimed that the first
> exceeds the latter, although that is not necessarily true.
>
> 3. Unknown-sized elements that are not parsed can't be skipped. Given
> that according to the Matroska specifications only the segment and the
> clusters can be of unknown-size, this is handled by not allowing any

This restriction is rather new. There might be old files that use 
infinite sizes in other places.

> other units to have infinite size whereas the earlier code would seek
> back by 1 byte upon encountering an infinite-size element that ought
> to be skipped.
>
> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com>
> ---
>   libavformat/matroskadec.c | 26 ++++++++++++++++++++++----
>   1 file changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
> index 6fa324c0cc..0179e5426e 100644
> --- a/libavformat/matroskadec.c
> +++ b/libavformat/matroskadec.c
> @@ -1180,11 +1180,29 @@ static int ebml_parse_elem(MatroskaDemuxContext *matroska,
>           if (matroska->num_levels > 0) {
>               MatroskaLevel *level = &matroska->levels[matroska->num_levels - 1];
>               int64_t pos = avio_tell(pb);
> -            if (level->length != EBML_UNKNOWN_LENGTH &&
> -                (pos + length) > (level->start + level->length)) {
> +
> +            if (length != EBML_UNKNOWN_LENGTH &&
> +                level->length != EBML_UNKNOWN_LENGTH) {
> +                uint64_t elem_end = pos + length,
> +                        level_end = level->start + level->length;
> +
> +                if (level_end < elem_end) {
> +                    av_log(matroska->ctx, AV_LOG_ERROR,
> +                           "Element at 0x%"PRIx64" ending at 0x%"PRIx64" exceeds "
> +                           "containing master element ending at 0x%"PRIx64"\n",
> +                           pos, elem_end, level_end);
> +                    return AVERROR_INVALIDDATA;
> +                }
> +            } else if (level->length != EBML_UNKNOWN_LENGTH) {
> +                av_log(matroska->ctx, AV_LOG_ERROR, "Unknown-sized element "
> +                       "at 0x%"PRIx64" inside parent with finite size\n", pos);
> +                return AVERROR_INVALIDDATA;

IMO there's no reason to return an error here. You can log the error and 
still parse the data, it should end up fine (if the code is done as such 
that you can never read past your parents boundaries). At least libebml 
can deal with that.

> +            } else if (length == EBML_UNKNOWN_LENGTH && id != MATROSKA_ID_CLUSTER) {
> +                // According to the specifications only clusters and segments
> +                // are allowed to be unknown-sized.
>                   av_log(matroska->ctx, AV_LOG_ERROR,
> -                       "Invalid length 0x%"PRIx64" > 0x%"PRIx64" in parent\n",
> -                       length, level->start + level->length);
> +                       "Found unknown-sized element other than a cluster at "
> +                       "0x%"PRIx64". Dropping the invalid element.\n", pos);
>                   return AVERROR_INVALIDDATA;
>               }
>           }
> -- 
> 2.19.2
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Oliver Collyer via ffmpeg-devel April 7, 2019, 10:54 a.m.
Steve Lhomme:
> On 3/27/2019 12:18 PM, Andreas Rheinhardt via ffmpeg-devel wrote:
>> The earlier code had three flaws:
>>
>> 1. The case of an unknown-sized element inside a finite-sized element
>> (which is against the specifications) was not caught.
>>
>> 2. The error message wasn't helpful: It compared the length of the
>> child
>> with the offset of the end of the parent and claimed that the first
>> exceeds the latter, although that is not necessarily true.
>>
>> 3. Unknown-sized elements that are not parsed can't be skipped. Given
>> that according to the Matroska specifications only the segment and the
>> clusters can be of unknown-size, this is handled by not allowing any
> 
> This restriction is rather new. There might be old files that use
> infinite sizes in other places.
> 
Indeed there are. The file mentioned by Dale
(https://ffmpeg.org/pipermail/ffmpeg-devel/2019-February/240376.html;
https://cs.chromium.org/chromium/src/media/test/data/bear-320x240-live.webm)
has cues where every master element (even CueTrackPositions) are
written as unknown-sized element (with length fields coded on eight
bytes). The Cues aren't referenced in a Seekhead at the beginning
though, so that FFmpeg doesn't try to read them. If they were
referenced, neither current nor patched FFmpeg would be able to
properly parse them and patched FFmpeg would furthermore give an error
message.

When exactly has this restriction been enacted?

I'll see if I find a way to support such elements (although I am not
sure whether it is worthwhile).

>> other units to have infinite size whereas the earlier code would seek
>> back by 1 byte upon encountering an infinite-size element that ought
>> to be skipped.
>>
>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com>
>> ---
>>   libavformat/matroskadec.c | 26 ++++++++++++++++++++++----
>>   1 file changed, 22 insertions(+), 4 deletions(-)
>>
>> diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
>> index 6fa324c0cc..0179e5426e 100644
>> --- a/libavformat/matroskadec.c
>> +++ b/libavformat/matroskadec.c
>> @@ -1180,11 +1180,29 @@ static int
>> ebml_parse_elem(MatroskaDemuxContext *matroska,
>>           if (matroska->num_levels > 0) {
>>               MatroskaLevel *level =
>> &matroska->levels[matroska->num_levels - 1];
>>               int64_t pos = avio_tell(pb);
>> -            if (level->length != EBML_UNKNOWN_LENGTH &&
>> -                (pos + length) > (level->start + level->length)) {
>> +
>> +            if (length != EBML_UNKNOWN_LENGTH &&
>> +                level->length != EBML_UNKNOWN_LENGTH) {
>> +                uint64_t elem_end = pos + length,
>> +                        level_end = level->start + level->length;
>> +
>> +                if (level_end < elem_end) {
>> +                    av_log(matroska->ctx, AV_LOG_ERROR,
>> +                           "Element at 0x%"PRIx64" ending at
>> 0x%"PRIx64" exceeds "
>> +                           "containing master element ending at
>> 0x%"PRIx64"\n",
>> +                           pos, elem_end, level_end);
>> +                    return AVERROR_INVALIDDATA;
>> +                }
>> +            } else if (level->length != EBML_UNKNOWN_LENGTH) {
>> +                av_log(matroska->ctx, AV_LOG_ERROR, "Unknown-sized
>> element "
>> +                       "at 0x%"PRIx64" inside parent with finite
>> size\n", pos);
>> +                return AVERROR_INVALIDDATA;
> 
> IMO there's no reason to return an error here. You can log the error
> and still parse the data, it should end up fine (if the code is done
> as such that you can never read past your parents boundaries). At
> least libebml can deal with that.
> 
Ok, treating such elements as if they extended to the end of their
master element is easily doable.

>> +            } else if (length == EBML_UNKNOWN_LENGTH && id !=
>> MATROSKA_ID_CLUSTER) {
>> +                // According to the specifications only clusters
>> and segments
>> +                // are allowed to be unknown-sized.
>>                   av_log(matroska->ctx, AV_LOG_ERROR,
>> -                       "Invalid length 0x%"PRIx64" > 0x%"PRIx64" in
>> parent\n",
>> -                       length, level->start + level->length);
>> +                       "Found unknown-sized element other than a
>> cluster at "
>> +                       "0x%"PRIx64". Dropping the invalid
>> element.\n", pos);
>>                   return AVERROR_INVALIDDATA;
>>               }
>>           }
>> -- 
>> 2.19.2
>>
Steve Lhomme April 8, 2019, 6:55 a.m.
> On April 7, 2019 at 12:54 PM Andreas Rheinhardt via ffmpeg-devel <ffmpeg-devel@ffmpeg.org> wrote:
> 
> 
> Steve Lhomme:
> > On 3/27/2019 12:18 PM, Andreas Rheinhardt via ffmpeg-devel wrote:
> >> The earlier code had three flaws:
> >>
> >> 1. The case of an unknown-sized element inside a finite-sized element
> >> (which is against the specifications) was not caught.
> >>
> >> 2. The error message wasn't helpful: It compared the length of the
> >> child
> >> with the offset of the end of the parent and claimed that the first
> >> exceeds the latter, although that is not necessarily true.
> >>
> >> 3. Unknown-sized elements that are not parsed can't be skipped. Given
> >> that according to the Matroska specifications only the segment and the
> >> clusters can be of unknown-size, this is handled by not allowing any
> > 
> > This restriction is rather new. There might be old files that use
> > infinite sizes in other places.
> > 
> Indeed there are. The file mentioned by Dale
> (https://ffmpeg.org/pipermail/ffmpeg-devel/2019-February/240376.html;
> https://cs.chromium.org/chromium/src/media/test/data/bear-320x240-live.webm)
> has cues where every master element (even CueTrackPositions) are
> written as unknown-sized element (with length fields coded on eight
> bytes). The Cues aren't referenced in a Seekhead at the beginning
> though, so that FFmpeg doesn't try to read them. If they were
> referenced, neither current nor patched FFmpeg would be able to
> properly parse them and patched FFmpeg would furthermore give an error
> message.
> 
> When exactly has this restriction been enacted?

That would be 35e0909ef4fc34ca7cb15adba5e7c7ebf5daeacb in the Matroska specifications (2016/10/31). Before that any master element was allowed to use the unknown length feature. Now only Segment and Cluster are. Given WebM is based on the Matroska specifications on matroska.org I would assume they don't have this restriction yet.

> I'll see if I find a way to support such elements (although I am not
> sure whether it is worthwhile).
> 
> >> other units to have infinite size whereas the earlier code would seek
> >> back by 1 byte upon encountering an infinite-size element that ought
> >> to be skipped.
> >>
> >> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com>
> >> ---
> >>   libavformat/matroskadec.c | 26 ++++++++++++++++++++++----
> >>   1 file changed, 22 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
> >> index 6fa324c0cc..0179e5426e 100644
> >> --- a/libavformat/matroskadec.c
> >> +++ b/libavformat/matroskadec.c
> >> @@ -1180,11 +1180,29 @@ static int
> >> ebml_parse_elem(MatroskaDemuxContext *matroska,
> >>           if (matroska->num_levels > 0) {
> >>               MatroskaLevel *level =
> >> &matroska->levels[matroska->num_levels - 1];
> >>               int64_t pos = avio_tell(pb);
> >> -            if (level->length != EBML_UNKNOWN_LENGTH &&
> >> -                (pos + length) > (level->start + level->length)) {
> >> +
> >> +            if (length != EBML_UNKNOWN_LENGTH &&
> >> +                level->length != EBML_UNKNOWN_LENGTH) {
> >> +                uint64_t elem_end = pos + length,
> >> +                        level_end = level->start + level->length;
> >> +
> >> +                if (level_end < elem_end) {
> >> +                    av_log(matroska->ctx, AV_LOG_ERROR,
> >> +                           "Element at 0x%"PRIx64" ending at
> >> 0x%"PRIx64" exceeds "
> >> +                           "containing master element ending at
> >> 0x%"PRIx64"\n",
> >> +                           pos, elem_end, level_end);
> >> +                    return AVERROR_INVALIDDATA;
> >> +                }
> >> +            } else if (level->length != EBML_UNKNOWN_LENGTH) {
> >> +                av_log(matroska->ctx, AV_LOG_ERROR, "Unknown-sized
> >> element "
> >> +                       "at 0x%"PRIx64" inside parent with finite
> >> size\n", pos);
> >> +                return AVERROR_INVALIDDATA;
> > 
> > IMO there's no reason to return an error here. You can log the error
> > and still parse the data, it should end up fine (if the code is done
> > as such that you can never read past your parents boundaries). At
> > least libebml can deal with that.
> > 
> Ok, treating such elements as if they extended to the end of their
> master element is easily doable.
> 
> >> +            } else if (length == EBML_UNKNOWN_LENGTH && id !=
> >> MATROSKA_ID_CLUSTER) {
> >> +                // According to the specifications only clusters
> >> and segments
> >> +                // are allowed to be unknown-sized.
> >>                   av_log(matroska->ctx, AV_LOG_ERROR,
> >> -                       "Invalid length 0x%"PRIx64" > 0x%"PRIx64" in
> >> parent\n",
> >> -                       length, level->start + level->length);
> >> +                       "Found unknown-sized element other than a
> >> cluster at "
> >> +                       "0x%"PRIx64". Dropping the invalid
> >> element.\n", pos);
> >>                   return AVERROR_INVALIDDATA;
> >>               }
> >>           }
> >> -- 
> >> 2.19.2
> >>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

Patch hide | download patch | download mbox

diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
index 6fa324c0cc..0179e5426e 100644
--- a/libavformat/matroskadec.c
+++ b/libavformat/matroskadec.c
@@ -1180,11 +1180,29 @@  static int ebml_parse_elem(MatroskaDemuxContext *matroska,
         if (matroska->num_levels > 0) {
             MatroskaLevel *level = &matroska->levels[matroska->num_levels - 1];
             int64_t pos = avio_tell(pb);
-            if (level->length != EBML_UNKNOWN_LENGTH &&
-                (pos + length) > (level->start + level->length)) {
+
+            if (length != EBML_UNKNOWN_LENGTH &&
+                level->length != EBML_UNKNOWN_LENGTH) {
+                uint64_t elem_end = pos + length,
+                        level_end = level->start + level->length;
+
+                if (level_end < elem_end) {
+                    av_log(matroska->ctx, AV_LOG_ERROR,
+                           "Element at 0x%"PRIx64" ending at 0x%"PRIx64" exceeds "
+                           "containing master element ending at 0x%"PRIx64"\n",
+                           pos, elem_end, level_end);
+                    return AVERROR_INVALIDDATA;
+                }
+            } else if (level->length != EBML_UNKNOWN_LENGTH) {
+                av_log(matroska->ctx, AV_LOG_ERROR, "Unknown-sized element "
+                       "at 0x%"PRIx64" inside parent with finite size\n", pos);
+                return AVERROR_INVALIDDATA;
+            } else if (length == EBML_UNKNOWN_LENGTH && id != MATROSKA_ID_CLUSTER) {
+                // According to the specifications only clusters and segments
+                // are allowed to be unknown-sized.
                 av_log(matroska->ctx, AV_LOG_ERROR,
-                       "Invalid length 0x%"PRIx64" > 0x%"PRIx64" in parent\n",
-                       length, level->start + level->length);
+                       "Found unknown-sized element other than a cluster at "
+                       "0x%"PRIx64". Dropping the invalid element.\n", pos);
                 return AVERROR_INVALIDDATA;
             }
         }