Message ID | 20190327111852.3784-6-andreas.rheinhardt@googlemail.com |
---|---|
State | New |
Headers | show |
On 3/27/2019 12:18 PM, Andreas Rheinhardt via ffmpeg-devel wrote: > By default, the data_offset member of the AVFormatInternal of the > AVFormatContext associated with the MatroskaDemuxContext has not been > initialized explicitly by any Matroska-specific function, so that it was > initialized by default to the offset at the end of matroska_read_header, > i.e. usually to the offset of the length field of the first encountered > cluster. This meant that in case that the Matroska-specific seek-code > fails because there are no index entries for the target track a seek to > data_offset would be performed and ordinary parsing would start from > there which is nonsense: The length field would be treated as EBML ID and > (if the length field is not longer than four bytes (EBML numbers that > long are rejected as invalid EBML IDs)) and whatever comes next would be > treated as its EBML size although it simply isn't. > > Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com> > --- > libavformat/matroskadec.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c > index 49f8ff4082..f9811b54a1 100644 > --- a/libavformat/matroskadec.c > +++ b/libavformat/matroskadec.c > @@ -2651,6 +2651,9 @@ static int matroska_read_header(AVFormatContext *s) > pos = avio_tell(matroska->ctx->pb); > res = ebml_parse(matroska, matroska_segment, matroska); > } > + /* Set data_offset as it might be needed later by seek_frame_generic. */ > + if (matroska->current_id) I'm surprised this doesn't error out if a (level 1) ID is not found here. > + s->internal->data_offset = avio_tell(matroska->ctx->pb) - 4; The "- 4" is OK as long as level 1 elements are always 4 bytes (which is the case). But if matroska_resync() ever exits if it finds an EBML Void or CRC-32 then this will break. The code is safe for now but may not be future proof. > matroska_execute_seekhead(matroska); > > if (!matroska->time_scale) > -- > 2.19.2 > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Steve Lhomme: > On 3/27/2019 12:18 PM, Andreas Rheinhardt via ffmpeg-devel wrote: >> By default, the data_offset member of the AVFormatInternal of the >> AVFormatContext associated with the MatroskaDemuxContext has not been >> initialized explicitly by any Matroska-specific function, so that it >> was >> initialized by default to the offset at the end of >> matroska_read_header, >> i.e. usually to the offset of the length field of the first encountered >> cluster. This meant that in case that the Matroska-specific seek-code >> fails because there are no index entries for the target track a seek to >> data_offset would be performed and ordinary parsing would start from >> there which is nonsense: The length field would be treated as EBML >> ID and >> (if the length field is not longer than four bytes (EBML numbers that >> long are rejected as invalid EBML IDs)) and whatever comes next >> would be >> treated as its EBML size although it simply isn't. >> >> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com> >> --- >> libavformat/matroskadec.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c >> index 49f8ff4082..f9811b54a1 100644 >> --- a/libavformat/matroskadec.c >> +++ b/libavformat/matroskadec.c >> @@ -2651,6 +2651,9 @@ static int >> matroska_read_header(AVFormatContext *s) >> pos = avio_tell(matroska->ctx->pb); >> res = ebml_parse(matroska, matroska_segment, matroska); >> } >> + /* Set data_offset as it might be needed later by >> seek_frame_generic. */ >> + if (matroska->current_id) > > I'm surprised this doesn't error out if a (level 1) ID is not found here. >There are two ways how ebml_parse can return 1 in the earlier call: It can find (and consume) a cluster ID; or it can hit EOF if it is in the mode for parsing live header files (these don't contain clusters at all). In the latter case, matroska->current_id is unset. And if we are in the normal (non-live-header) mode and no cluster has been found, then we will end up at fail anyway (when matroska_resync eventually hits the end of input). Thinking about this, there is something wrong with this approach: A Matroska file needn't contain a cluster at all (e.g. a chapter- or tags- or attachment-only file is not against the spec if I am not mistaken). But this is orthogonal to the patch at hand. >> + s->internal->data_offset = avio_tell(matroska->ctx->pb) - 4; > > The "- 4" is OK as long as level 1 elements are always 4 bytes (which > is the case). But if matroska_resync() ever exits if it finds an EBML > Void or CRC-32 then this will break. > > The code is safe for now but may not be future proof. > As has already been said: If matroska->current_id is set at this point, then it contains a cluster ID, not an arbitrary ID, because the IDs that matroska_resync finds are fed to ebml_parse before this check and a level 1 ID different than a cluster is parsed. Nevertheless, the check should probably be changed to explicitly check for a cluster ID, just to make this more readable. >> matroska_execute_seekhead(matroska); >> if (!matroska->time_scale) >> -- >> 2.19.2
diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c index 49f8ff4082..f9811b54a1 100644 --- a/libavformat/matroskadec.c +++ b/libavformat/matroskadec.c @@ -2651,6 +2651,9 @@ static int matroska_read_header(AVFormatContext *s) pos = avio_tell(matroska->ctx->pb); res = ebml_parse(matroska, matroska_segment, matroska); } + /* Set data_offset as it might be needed later by seek_frame_generic. */ + if (matroska->current_id) + s->internal->data_offset = avio_tell(matroska->ctx->pb) - 4; matroska_execute_seekhead(matroska); if (!matroska->time_scale)
By default, the data_offset member of the AVFormatInternal of the AVFormatContext associated with the MatroskaDemuxContext has not been initialized explicitly by any Matroska-specific function, so that it was initialized by default to the offset at the end of matroska_read_header, i.e. usually to the offset of the length field of the first encountered cluster. This meant that in case that the Matroska-specific seek-code fails because there are no index entries for the target track a seek to data_offset would be performed and ordinary parsing would start from there which is nonsense: The length field would be treated as EBML ID and (if the length field is not longer than four bytes (EBML numbers that long are rejected as invalid EBML IDs)) and whatever comes next would be treated as its EBML size although it simply isn't. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com> --- libavformat/matroskadec.c | 3 +++ 1 file changed, 3 insertions(+)