diff mbox

[FFmpeg-devel,2/3] avformat: reject FFmpeg-style merged side data in raw packets

Message ID 20170308124012.27793-2-nfxjfg@googlemail.com
State New
Headers show

Commit Message

wm4 March 8, 2017, 12:40 p.m. UTC
It looks like this could lead to security issues, as side data readers
will for example rely on side data allocation sizes to be as large as
needed for correct operation. If such files exist at all, they also
should be brought out of circulation, so fully reject them. Under normal
circumstances, nothing creates such files.

To avoid problems for now, we let the concat and hls demuxers do this
(they merely return previously-demuxed packets, whose side data might
have been merged by libavformat itself after demuxing). The
special-cases will be removed in the next commit.
---
 libavformat/utils.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Michael Niedermayer March 8, 2017, 2:36 p.m. UTC | #1
On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:
> It looks like this could lead to security issues, as side data readers

anything could, but side data wasnt excluded from fuzzing, in fact
ive seen fuzzers find and trigger the split code
also a demuxer and user app could always set side data to something
bad.
All uses of side data must assume the data to be untrusted, because
it is even without all the split code.


> will for example rely on side data allocation sizes to be as large as
> needed for correct operation.

Is this really so ?
we allocate at least AV_INPUT_BUFFER_PADDING_SIZE, accesses beyond
that require a check on the size.

also size checks are needed for the case where a lib gets replaced by
one with newer version that has a bigger struct.


> If such files exist at all, they also
> should be brought out of circulation, so fully reject them. Under normal
> circumstances, nothing creates such files.
> 
> To avoid problems for now, we let the concat and hls demuxers do this
> (they merely return previously-demuxed packets, whose side data might
> have been merged by libavformat itself after demuxing). The
> special-cases will be removed in the next commit.
> ---
>  libavformat/utils.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/libavformat/utils.c b/libavformat/utils.c
> index 37d7024465..68f3c977d6 100644
> --- a/libavformat/utils.c
> +++ b/libavformat/utils.c
> @@ -840,6 +840,15 @@ int ff_read_packet(AVFormatContext *s, AVPacket *pkt)
>              *pkt = tmp;
>          }
>  
> +        if (strcmp(s->iformat->name, "concat") && strcmp(s->iformat->name, "hls,applehttp")

can strcmp() be avoided here ?
it would be run per packet which is rather ugly


> +            && av_packet_split_side_data(pkt) == 1) {

this could fail with ENOMEM which complicates things

if we do block such packets, its prbobably better to add a new
static inline function to a header that checks if a packet has
splitable side data and use that in av_packet_split_side_data() and
here

Iam suggesting "static inline" here to make backporting easier so no
new symbol is added to libavcodec that is needed by libavformat

also it may be interresting to disable this check for fuzzing so
side data can be fuzzed in a wider range of cases and any past
testcases that happen to use this can still be used for regression
testing

[...]
wm4 March 8, 2017, 3:06 p.m. UTC | #2
On Wed, 8 Mar 2017 15:36:25 +0100
Michael Niedermayer <michael@niedermayer.cc> wrote:

> On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:
> > It looks like this could lead to security issues, as side data readers  
> 
> anything could, but side data wasnt excluded from fuzzing, in fact
> ive seen fuzzers find and trigger the split code

The split code runs on _any_ input data, so this is not surprising. My
intention is to remove merging/splitting completely (the patches don't
go that far yet).

> also a demuxer and user app could always set side data to something
> bad.
> All uses of side data must assume the data to be untrusted, because
> it is even without all the split code.

This depends. For example, AV_PKT_DATA_NEW_EXTRADATA is of course 100%
defined by untrusted data (except that the size fits into memory). On
the other hand, I expect something like AV_PKT_DATA_DISPLAYMATRIX to
have at least the correct size. FFmpeg code also does this sometimes,
for example dump_sidedata() doesn't check the size of this particular
side data type.

In my own FFmpeg API using code I assume that side data has the correct
minimum size for a given type, and I will blame any potentially
security-relevant deviations on FFmpeg. I'm sure most other API users
also will, and they're probably not even aware that this us supposedly
possible.

Besides, fewer ways for untrusted input data to leak into
complex/obscure data structures is an improvement for security.

> 
> 
> > will for example rely on side data allocation sizes to be as large as
> > needed for correct operation.  
> 
> Is this really so ?
> we allocate at least AV_INPUT_BUFFER_PADDING_SIZE, accesses beyond
> that require a check on the size.
> 

Except av_frame_new_side_data() doesn't do that, and often side data is
copied from packets to frames without additional checks (see
ff_init_buffer_info() for e.g. the DISPLAYMATRIX case).

> also size checks are needed for the case where a lib gets replaced by
> one with newer version that has a bigger struct.

Oh really? We never do this. Normal API structs are also considered
appendable, so compiling against a newer API and then linking against
an older version doesn't work. This is exactly the same case.

> 
> > If such files exist at all, they also
> > should be brought out of circulation, so fully reject them. Under normal
> > circumstances, nothing creates such files.
> > 
> > To avoid problems for now, we let the concat and hls demuxers do this
> > (they merely return previously-demuxed packets, whose side data might
> > have been merged by libavformat itself after demuxing). The
> > special-cases will be removed in the next commit.
> > ---
> >  libavformat/utils.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/libavformat/utils.c b/libavformat/utils.c
> > index 37d7024465..68f3c977d6 100644
> > --- a/libavformat/utils.c
> > +++ b/libavformat/utils.c
> > @@ -840,6 +840,15 @@ int ff_read_packet(AVFormatContext *s, AVPacket *pkt)
> >              *pkt = tmp;
> >          }
> >  
> > +        if (strcmp(s->iformat->name, "concat") && strcmp(s->iformat->name, "hls,applehttp")  
> 
> can strcmp() be avoided here ?
> it would be run per packet which is rather ugly

We could add a flag, but considering the next commit removes these
strcmp()s again, and that strcmp() on very short strings is very cheap,
I don't think it's worth doing this.

> 
> > +            && av_packet_split_side_data(pkt) == 1) {  
> 
> this could fail with ENOMEM which complicates things

I can add a check for ENOMEM too. This should be the only thing that
looks like a failure, but could work in a separate call (like the one
on the libavcodec side).

> 
> if we do block such packets, its prbobably better to add a new
> static inline function to a header that checks if a packet has
> splitable side data and use that in av_packet_split_side_data() and
> here

Not sure what you mean here.

> Iam suggesting "static inline" here to make backporting easier so no
> new symbol is added to libavcodec that is needed by libavformat

What's the purpose?

> also it may be interresting to disable this check for fuzzing so
> side data can be fuzzed in a wider range of cases and any past
> testcases that happen to use this can still be used for regression
> testing

I think what you want is fault injection for memory errors, seems out
of scope here.
Michael Niedermayer March 8, 2017, 4:11 p.m. UTC | #3
On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:
> On Wed, 8 Mar 2017 15:36:25 +0100
> Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:
> > > It looks like this could lead to security issues, as side data readers  
[...]
> > also size checks are needed for the case where a lib gets replaced by
> > one with newer version that has a bigger struct.
> 
> Oh really? We never do this. Normal API structs are also considered
> appendable, so compiling against a newer API and then linking against
> an older version doesn't work. This is exactly the same case.

no its not

what you call normal structs are allocated by an allocator that is
part of the lib that defines it, the struct, and lib dependancies
ensure that its new. Any allocated struct as a result is large
enough though possibly not every field is set

with side data the code using it sets the size explicitly that makes
the size generally hardcoded in the lib using the code theres no
longer a common allocator (some exceptions exist).
The size a lib allocates that way is the
compile time sizeof() which may differ from another lib
and side data can be passed in both directions between libs not just
in the direction of their dependancy
so you can end up with a smaller side data and that means you have to
do checks.



[...]

> > 
> > > +            && av_packet_split_side_data(pkt) == 1) {  
> > 
> > this could fail with ENOMEM which complicates things
> 
> I can add a check for ENOMEM too. This should be the only thing that
> looks like a failure, but could work in a separate call (like the one
> on the libavcodec side).
> 
> > 

> > if we do block such packets, its prbobably better to add a new
> > static inline function to a header that checks if a packet has
> > splitable side data and use that in av_packet_split_side_data() and
> > here
> 
> Not sure what you mean here.
> 
> > Iam suggesting "static inline" here to make backporting easier so no
> > new symbol is added to libavcodec that is needed by libavformat
> 
> What's the purpose?

Its simpler, cleaner and faster


> 
> > also it may be interresting to disable this check for fuzzing so
> > side data can be fuzzed in a wider range of cases and any past
> > testcases that happen to use this can still be used for regression
> > testing
> 
> I think what you want is fault injection for memory errors, seems out
> of scope here.

no, i want fuzzing to continue to fuzz side data, it did so in the
past and it should continue to do so.

side data is in libav maybe native structs, in FFmpeg it was bytestream
like packet data[], thats at least how i viewed it.



[...]
wm4 March 8, 2017, 4:26 p.m. UTC | #4
On Wed, 8 Mar 2017 17:11:12 +0100
Michael Niedermayer <michael@niedermayer.cc> wrote:

> On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:
> > On Wed, 8 Mar 2017 15:36:25 +0100
> > Michael Niedermayer <michael@niedermayer.cc> wrote:
> >   
> > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:  
> > > > It looks like this could lead to security issues, as side data readers    
> [...]
> > > also size checks are needed for the case where a lib gets replaced by
> > > one with newer version that has a bigger struct.  
> > 
> > Oh really? We never do this. Normal API structs are also considered
> > appendable, so compiling against a newer API and then linking against
> > an older version doesn't work. This is exactly the same case.  
> 
> no its not
> 
> what you call normal structs are allocated by an allocator that is
> part of the lib that defines it, the struct, and lib dependancies
> ensure that its new. Any allocated struct as a result is large
> enough though possibly not every field is set
> 
> with side data the code using it sets the size explicitly that makes
> the size generally hardcoded in the lib using the code theres no
> longer a common allocator (some exceptions exist).
> The size a lib allocates that way is the
> compile time sizeof() which may differ from another lib
> and side data can be passed in both directions between libs not just
> in the direction of their dependancy
> so you can end up with a smaller side data and that means you have to
> do checks.

This is wrong. side data which has structs have corresponding functions
to get their allocation size. Of course that's all very error prone and
hard to use correctly and some were added only recently because the
API had holes, but that's how the libav* APIs are for now.

Besides, manually checking struct sizes/allocations makes for an even
worse ABI compatibility concept than FFmpeg currently follows. (Worse
as in ease of use and accidental incompatibilities and UB triggered as
consequence.)

> 
> 
> [...]
> 
> > >   
> > > > +            && av_packet_split_side_data(pkt) == 1) {    
> > > 
> > > this could fail with ENOMEM which complicates things  
> > 
> > I can add a check for ENOMEM too. This should be the only thing that
> > looks like a failure, but could work in a separate call (like the one
> > on the libavcodec side).
> >   
> > >   
> 
> > > if we do block such packets, its prbobably better to add a new
> > > static inline function to a header that checks if a packet has
> > > splitable side data and use that in av_packet_split_side_data() and
> > > here  
> > 
> > Not sure what you mean here.
> >   
> > > Iam suggesting "static inline" here to make backporting easier so no
> > > new symbol is added to libavcodec that is needed by libavformat  
> > 
> > What's the purpose?  
> 
> Its simpler, cleaner and faster

I mean of that function, or why we should care about symbols present.

> 
> >   
> > > also it may be interresting to disable this check for fuzzing so
> > > side data can be fuzzed in a wider range of cases and any past
> > > testcases that happen to use this can still be used for regression
> > > testing  
> > 
> > I think what you want is fault injection for memory errors, seems out
> > of scope here.  
> 
> no, i want fuzzing to continue to fuzz side data, it did so in the
> past and it should continue to do so.

You can fuzz side data as much as you can fuzz AVFrame or
AVCodecContext. I believe randomly changing in-memory data structures
is referred to as fault injection, not fuzzing.

> side data is in libav maybe native structs, in FFmpeg it was bytestream

No. FFmpeg has at least 1 struct-defined side data that Libav doesn't
have at all (mastering display data).

> like packet data[], thats at least how i viewed it.

You may have viewed it this way, but that's not how it actually is.
Like I said in my another reply, even FFmpeg code occasionally uses
side data without checking the side data size.

Also, in Libav side data contents cannot come directly from raw
files, and thus has potentially fewer security issues than FFmpeg.
Michael Niedermayer March 8, 2017, 6:03 p.m. UTC | #5
On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:
> On Wed, 8 Mar 2017 17:11:12 +0100
> Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:
> > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > >   
> > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:  
> > > > > It looks like this could lead to security issues, as side data readers    
> > [...]
> > > > also size checks are needed for the case where a lib gets replaced by
> > > > one with newer version that has a bigger struct.  
> > > 
> > > Oh really? We never do this. Normal API structs are also considered
> > > appendable, so compiling against a newer API and then linking against
> > > an older version doesn't work. This is exactly the same case.  
> > 
> > no its not
> > 
> > what you call normal structs are allocated by an allocator that is
> > part of the lib that defines it, the struct, and lib dependancies
> > ensure that its new. Any allocated struct as a result is large
> > enough though possibly not every field is set
> > 
> > with side data the code using it sets the size explicitly that makes
> > the size generally hardcoded in the lib using the code theres no
> > longer a common allocator (some exceptions exist).
> > The size a lib allocates that way is the
> > compile time sizeof() which may differ from another lib
> > and side data can be passed in both directions between libs not just
> > in the direction of their dependancy
> > so you can end up with a smaller side data and that means you have to
> > do checks.
> 
> This is wrong.

> side data which has structs have corresponding functions
> to get their allocation size. Of course that's all very error prone and
> hard to use correctly and some were added only recently because the
> API had holes, but that's how the libav* APIs are for now.

you talk about something else here.

fact is the allocated side data uses hardcoded size values often
anyone can look at
git grep -A3 new_side_data

theres is sizeof() use and there are litteral numbers also

if these ever change, checks on the size become needed
which was the original thing i meant in this sub argumentation.
checks are needed, or something else in their place
is needed in that case



> 
> Besides, manually checking struct sizes/allocations makes for an even
> worse ABI compatibility concept than FFmpeg currently follows. (Worse
> as in ease of use and accidental incompatibilities and UB triggered as
> consequence.)
> 

> > 
> > 
> > [...]
> > 
> > > >   
> > > > > +            && av_packet_split_side_data(pkt) == 1) {    
> > > > 
> > > > this could fail with ENOMEM which complicates things  
> > > 
> > > I can add a check for ENOMEM too. This should be the only thing that
> > > looks like a failure, but could work in a separate call (like the one
> > > on the libavcodec side).
> > >   
> > > >   
> > 
> > > > if we do block such packets, its prbobably better to add a new
> > > > static inline function to a header that checks if a packet has
> > > > splitable side data and use that in av_packet_split_side_data() and
> > > > here  
> > > 
> > > Not sure what you mean here.
> > >   
> > > > Iam suggesting "static inline" here to make backporting easier so no
> > > > new symbol is added to libavcodec that is needed by libavformat  
> > > 
> > > What's the purpose?  
> > 
> > Its simpler, cleaner and faster
> 
> I mean of that function, or why we should care about symbols present.

i think i explained this but ill try again all more verbosly

using a functiom to check for splitable sidedata instead of
spliting the side data is cleaner as we dont want to split it we just
want to check if theres any.

Its simpler as we dont have to deal with errors from the split code
and also dont need to deal with the case that split happened.

its faster as a static inline function as it would be inlined

its easier to backport as theres no new symbol added to another lib
on which libavformat depends. (that is if its static inline vs non
static)
If we add a new symbol we should bump minor but we cannot do that in
many past releases due to master using the next minor.Its easier thus
with no new symbol.


[...]
Michael Niedermayer March 8, 2017, 6:20 p.m. UTC | #6
On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:
> On Wed, 8 Mar 2017 17:11:12 +0100
> Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:
> > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > >   
> > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:  
[...]
> >
> > >   
> > > > also it may be interresting to disable this check for fuzzing so
> > > > side data can be fuzzed in a wider range of cases and any past
> > > > testcases that happen to use this can still be used for regression
> > > > testing  
> > > 
> > > I think what you want is fault injection for memory errors, seems out
> > > of scope here.  
> > 
> > no, i want fuzzing to continue to fuzz side data, it did so in the
> > past and it should continue to do so.
> 
> You can fuzz side data as much as you can fuzz AVFrame or
> AVCodecContext. I believe randomly changing in-memory data structures
> is referred to as fault injection, not fuzzing.

it doesnt really matter what you call it, but it was done and the
patch breaks it if theres no option to disable it or something else

[...]
wm4 March 8, 2017, 6:31 p.m. UTC | #7
On Wed, 8 Mar 2017 19:03:21 +0100
Michael Niedermayer <michael@niedermayer.cc> wrote:

> On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:
> > On Wed, 8 Mar 2017 17:11:12 +0100
> > Michael Niedermayer <michael@niedermayer.cc> wrote:
> >   
> > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:  
> > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > >     
> > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:    
> > > > > > It looks like this could lead to security issues, as side data readers      
> > > [...]  
> > > > > also size checks are needed for the case where a lib gets replaced by
> > > > > one with newer version that has a bigger struct.    
> > > > 
> > > > Oh really? We never do this. Normal API structs are also considered
> > > > appendable, so compiling against a newer API and then linking against
> > > > an older version doesn't work. This is exactly the same case.    
> > > 
> > > no its not
> > > 
> > > what you call normal structs are allocated by an allocator that is
> > > part of the lib that defines it, the struct, and lib dependancies
> > > ensure that its new. Any allocated struct as a result is large
> > > enough though possibly not every field is set
> > > 
> > > with side data the code using it sets the size explicitly that makes
> > > the size generally hardcoded in the lib using the code theres no
> > > longer a common allocator (some exceptions exist).
> > > The size a lib allocates that way is the
> > > compile time sizeof() which may differ from another lib
> > > and side data can be passed in both directions between libs not just
> > > in the direction of their dependancy
> > > so you can end up with a smaller side data and that means you have to
> > > do checks.  
> > 
> > This is wrong.  
> 
> > side data which has structs have corresponding functions
> > to get their allocation size. Of course that's all very error prone and
> > hard to use correctly and some were added only recently because the
> > API had holes, but that's how the libav* APIs are for now.  
> 
> you talk about something else here.
> 
> fact is the allocated side data uses hardcoded size values often
> anyone can look at
> git grep -A3 new_side_data
> 
> theres is sizeof() use and there are litteral numbers also

You have to use whatever is correct in each specific case. Using a
number or sizeof() argument for new_side_data is simply an API
violation in some cases, similar to e.g. using
av_malloc(sizeof(AVFrame)). There are a few.

I don't know why you want to "check" these uses with FATE. As I've said
in the other thread, that's like letting FATE check sizeof(AVFrame).

The right way is to check it in new_side_data, or have an API that is
not so hard to use incorrectly. This has been discussed before, when
we added new functions to add display mastering data or something
similar in an ABI-safe way.

> if these ever change, checks on the size become needed
> which was the original thing i meant in this sub argumentation.
> checks are needed, or something else in their place
> is needed in that case

And FATE-checking the sizes is the wrong thing to do. At least for the
side data types whose byte layouts are defined by the C ABI like
MASTERING_DISPLAY_METADATA, not by something wire-like as for
example the SKIP_SAMPLES ones.

> 
> 
> > 
> > Besides, manually checking struct sizes/allocations makes for an even
> > worse ABI compatibility concept than FFmpeg currently follows. (Worse
> > as in ease of use and accidental incompatibilities and UB triggered as
> > consequence.)
> >   
> 
> > > 
> > > 
> > > [...]
> > >   
> > > > >     
> > > > > > +            && av_packet_split_side_data(pkt) == 1) {      
> > > > > 
> > > > > this could fail with ENOMEM which complicates things    
> > > > 
> > > > I can add a check for ENOMEM too. This should be the only thing that
> > > > looks like a failure, but could work in a separate call (like the one
> > > > on the libavcodec side).
> > > >     
> > > > >     
> > >   
> > > > > if we do block such packets, its prbobably better to add a new
> > > > > static inline function to a header that checks if a packet has
> > > > > splitable side data and use that in av_packet_split_side_data() and
> > > > > here    
> > > > 
> > > > Not sure what you mean here.
> > > >     
> > > > > Iam suggesting "static inline" here to make backporting easier so no
> > > > > new symbol is added to libavcodec that is needed by libavformat    
> > > > 
> > > > What's the purpose?    
> > > 
> > > Its simpler, cleaner and faster  
> > 
> > I mean of that function, or why we should care about symbols present.  
> 
> i think i explained this but ill try again all more verbosly
> 
> using a functiom to check for splitable sidedata instead of
> spliting the side data is cleaner as we dont want to split it we just
> want to check if theres any.
> 
> Its simpler as we dont have to deal with errors from the split code
> and also dont need to deal with the case that split happened.

I don't see much simplicity in code duplication, putting code into
public headers (which also means you have to make sure this
compiles with C++), reducing compile times, and potentially exposing
implementation details.

In my opinion, this is a non-issue, since the split check call should go
away as soon as libavcodec doesn't try to split side data anymore.

> its faster as a static inline function as it would be inlined

This is in a code path whose performance is bound by raw I/O and maybe
things like memory allocation. The speed argument is not going to work.

> its easier to backport as theres no new symbol added to another lib
> on which libavformat depends. (that is if its static inline vs non
> static)
> If we add a new symbol we should bump minor but we cannot do that in
> many past releases due to master using the next minor.Its easier thus
> with no new symbol.

Or we just stay with the split call, which in the normal case will
run and exit quickly.
wm4 March 8, 2017, 6:32 p.m. UTC | #8
On Wed, 8 Mar 2017 19:20:15 +0100
Michael Niedermayer <michael@niedermayer.cc> wrote:

> On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:
> > On Wed, 8 Mar 2017 17:11:12 +0100
> > Michael Niedermayer <michael@niedermayer.cc> wrote:
> >   
> > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:  
> > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > >     
> > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:    
> [...]
> > >  
> > > >     
> > > > > also it may be interresting to disable this check for fuzzing so
> > > > > side data can be fuzzed in a wider range of cases and any past
> > > > > testcases that happen to use this can still be used for regression
> > > > > testing    
> > > > 
> > > > I think what you want is fault injection for memory errors, seems out
> > > > of scope here.    
> > > 
> > > no, i want fuzzing to continue to fuzz side data, it did so in the
> > > past and it should continue to do so.  
> > 
> > You can fuzz side data as much as you can fuzz AVFrame or
> > AVCodecContext. I believe randomly changing in-memory data structures
> > is referred to as fault injection, not fuzzing.  
> 
> it doesnt really matter what you call it, but it was done and the
> patch breaks it if theres no option to disable it or something else

Well, you can't do it anymore. Why are you so afraid that a potential
error source is being eliminated?

Of course you're still free to randomly change random memory locations
and experience new bugs, but I'm not sure why this would matter for
anything.
wm4 March 8, 2017, 6:35 p.m. UTC | #9
On Wed, 8 Mar 2017 19:20:15 +0100
Michael Niedermayer <michael@niedermayer.cc> wrote:

> On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:
> > On Wed, 8 Mar 2017 17:11:12 +0100
> > Michael Niedermayer <michael@niedermayer.cc> wrote:
> >   
> > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:  
> > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > >     
> > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:    
> [...]
> > >  
> > > >     
> > > > > also it may be interresting to disable this check for fuzzing so
> > > > > side data can be fuzzed in a wider range of cases and any past
> > > > > testcases that happen to use this can still be used for regression
> > > > > testing    
> > > > 
> > > > I think what you want is fault injection for memory errors, seems out
> > > > of scope here.    
> > > 
> > > no, i want fuzzing to continue to fuzz side data, it did so in the
> > > past and it should continue to do so.  
> > 
> > You can fuzz side data as much as you can fuzz AVFrame or
> > AVCodecContext. I believe randomly changing in-memory data structures
> > is referred to as fault injection, not fuzzing.  
> 
> it doesnt really matter what you call it, but it was done and the
> patch breaks it if theres no option to disable it or something else

PS: honestly, I think you're trolling with this. The possibility that
random packets could be interpreted as side-data doesn't look like
something that was accounted for, and I highly doubt this plays a role
for fuzzing (or ever occurred in a fuzzing case). Why are you so
hell-bent in finally preventing this arguably dangerous corner case? It
makes no sense at all. Explain yourself.
James Almer March 8, 2017, 6:51 p.m. UTC | #10
On 3/8/2017 3:35 PM, wm4 wrote:
> On Wed, 8 Mar 2017 19:20:15 +0100
> Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
>> On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:
>>> On Wed, 8 Mar 2017 17:11:12 +0100
>>> Michael Niedermayer <michael@niedermayer.cc> wrote:
>>>   
>>>> On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:  
>>>>> On Wed, 8 Mar 2017 15:36:25 +0100
>>>>> Michael Niedermayer <michael@niedermayer.cc> wrote:
>>>>>     
>>>>>> On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:    
>> [...]
>>>>  
>>>>>     
>>>>>> also it may be interresting to disable this check for fuzzing so
>>>>>> side data can be fuzzed in a wider range of cases and any past
>>>>>> testcases that happen to use this can still be used for regression
>>>>>> testing    
>>>>>
>>>>> I think what you want is fault injection for memory errors, seems out
>>>>> of scope here.    
>>>>
>>>> no, i want fuzzing to continue to fuzz side data, it did so in the
>>>> past and it should continue to do so.  
>>>
>>> You can fuzz side data as much as you can fuzz AVFrame or
>>> AVCodecContext. I believe randomly changing in-memory data structures
>>> is referred to as fault injection, not fuzzing.  
>>
>> it doesnt really matter what you call it, but it was done and the
>> patch breaks it if theres no option to disable it or something else
> 
> PS: honestly, I think you're trolling with this.

Can the two of you stop accusing the other of trolling? If there's some
miscommunication then try to work around it instead of assuming malice
from the other side.
The discussion so far has been fairly civil and technical, so please
don't escalate it for no gain.
Michael Niedermayer March 8, 2017, 6:56 p.m. UTC | #11
On Wed, Mar 08, 2017 at 07:31:27PM +0100, wm4 wrote:
> On Wed, 8 Mar 2017 19:03:21 +0100
> Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:
> > > On Wed, 8 Mar 2017 17:11:12 +0100
> > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > >   
> > > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:  
> > > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > >     
> > > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:    
> > > > > > > It looks like this could lead to security issues, as side data readers      
> > > > [...]  
> > > > > > also size checks are needed for the case where a lib gets replaced by
> > > > > > one with newer version that has a bigger struct.    
> > > > > 
> > > > > Oh really? We never do this. Normal API structs are also considered
> > > > > appendable, so compiling against a newer API and then linking against
> > > > > an older version doesn't work. This is exactly the same case.    
> > > > 
> > > > no its not
> > > > 
> > > > what you call normal structs are allocated by an allocator that is
> > > > part of the lib that defines it, the struct, and lib dependancies
> > > > ensure that its new. Any allocated struct as a result is large
> > > > enough though possibly not every field is set
> > > > 
> > > > with side data the code using it sets the size explicitly that makes
> > > > the size generally hardcoded in the lib using the code theres no
> > > > longer a common allocator (some exceptions exist).
> > > > The size a lib allocates that way is the
> > > > compile time sizeof() which may differ from another lib
> > > > and side data can be passed in both directions between libs not just
> > > > in the direction of their dependancy
> > > > so you can end up with a smaller side data and that means you have to
> > > > do checks.  
> > > 
> > > This is wrong.  
> > 
> > > side data which has structs have corresponding functions
> > > to get their allocation size. Of course that's all very error prone and
> > > hard to use correctly and some were added only recently because the
> > > API had holes, but that's how the libav* APIs are for now.  
> > 
> > you talk about something else here.
> > 
> > fact is the allocated side data uses hardcoded size values often
> > anyone can look at
> > git grep -A3 new_side_data
> > 
> > theres is sizeof() use and there are litteral numbers also
> 
> You have to use whatever is correct in each specific case. Using a
> number or sizeof() argument for new_side_data is simply an API
> violation in some cases, similar to e.g. using
> av_malloc(sizeof(AVFrame)). There are a few.
> 

> I don't know why you want to "check" these uses with FATE. As I've said
> in the other thread, that's like letting FATE check sizeof(AVFrame).
> 
> The right way is to check it in new_side_data, or have an API that is
> not so hard to use incorrectly. This has been discussed before, when
> we added new functions to add display mastering data or something
> similar in an ABI-safe way.
> 
> > if these ever change, checks on the size become needed
> > which was the original thing i meant in this sub argumentation.
> > checks are needed, or something else in their place
> > is needed in that case
> 
> And FATE-checking the sizes is the wrong thing to do. At least for the
> side data types whose byte layouts are defined by the C ABI like
> MASTERING_DISPLAY_METADATA, not by something wire-like as for
> example the SKIP_SAMPLES ones.

wrong thread or you totally misunderstand me

what you originally said:
> It looks like this could lead to security issues, as side data readers
> will for example rely on side data allocation sizes to be as large as
> needed for correct operation.

And what i replied:
    [...]

    also size checks are needed for the case where a lib gets replaced by
    one with newer version that has a bigger struct.

Now fact is that for cases where you link to a lib with a differently
sized struct or more generally any side data (which is what was meant)
if its not using an allocator it needs a check in the code or something
else, thats what i meant and thats from where this bizare sub
discussion started where you seem to keep disagreeing about things i
did not say

Iam not talking about fate here, thats a seperate thing


> 
> > 
> > 
> > > 
> > > Besides, manually checking struct sizes/allocations makes for an even
> > > worse ABI compatibility concept than FFmpeg currently follows. (Worse
> > > as in ease of use and accidental incompatibilities and UB triggered as
> > > consequence.)
> > >   
> > 
> > > > 
> > > > 
> > > > [...]
> > > >   
> > > > > >     
> > > > > > > +            && av_packet_split_side_data(pkt) == 1) {      
> > > > > > 
> > > > > > this could fail with ENOMEM which complicates things    
> > > > > 
> > > > > I can add a check for ENOMEM too. This should be the only thing that
> > > > > looks like a failure, but could work in a separate call (like the one
> > > > > on the libavcodec side).
> > > > >     
> > > > > >     
> > > >   
> > > > > > if we do block such packets, its prbobably better to add a new
> > > > > > static inline function to a header that checks if a packet has
> > > > > > splitable side data and use that in av_packet_split_side_data() and
> > > > > > here    
> > > > > 
> > > > > Not sure what you mean here.
> > > > >     
> > > > > > Iam suggesting "static inline" here to make backporting easier so no
> > > > > > new symbol is added to libavcodec that is needed by libavformat    
> > > > > 
> > > > > What's the purpose?    
> > > > 
> > > > Its simpler, cleaner and faster  
> > > 
> > > I mean of that function, or why we should care about symbols present.  
> > 
> > i think i explained this but ill try again all more verbosly
> > 
> > using a functiom to check for splitable sidedata instead of
> > spliting the side data is cleaner as we dont want to split it we just
> > want to check if theres any.
> > 
> > Its simpler as we dont have to deal with errors from the split code
> > and also dont need to deal with the case that split happened.
> 
> I don't see much simplicity in code duplication, putting code into
> public headers (which also means you have to make sure this
> compiles with C++), reducing compile times, and potentially exposing
> implementation details.

It can be in a private header, it doesnt need to be public


[...]
Michael Niedermayer March 8, 2017, 7:15 p.m. UTC | #12
On Wed, Mar 08, 2017 at 07:35:50PM +0100, wm4 wrote:
> On Wed, 8 Mar 2017 19:20:15 +0100
> Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:
> > > On Wed, 8 Mar 2017 17:11:12 +0100
> > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > >   
> > > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:  
> > > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > >     
> > > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:    
> > [...]
> > > >  
> > > > >     
> > > > > > also it may be interresting to disable this check for fuzzing so
> > > > > > side data can be fuzzed in a wider range of cases and any past
> > > > > > testcases that happen to use this can still be used for regression
> > > > > > testing    
> > > > > 
> > > > > I think what you want is fault injection for memory errors, seems out
> > > > > of scope here.    
> > > > 
> > > > no, i want fuzzing to continue to fuzz side data, it did so in the
> > > > past and it should continue to do so.  
> > > 
> > > You can fuzz side data as much as you can fuzz AVFrame or
> > > AVCodecContext. I believe randomly changing in-memory data structures
> > > is referred to as fault injection, not fuzzing.  
> > 
> > it doesnt really matter what you call it, but it was done and the
> > patch breaks it if theres no option to disable it or something else
> 
> PS: honestly, I think you're trolling with this. The possibility that
> random packets could be interpreted as side-data doesn't look like
> something that was accounted for, and I highly doubt this plays a role
> for fuzzing (or ever occurred in a fuzzing case). Why are you so
> hell-bent in finally preventing this arguably dangerous corner case? It
> makes no sense at all. Explain yourself.

It plays a role, for example the timeout in 731 goes away if
i change FF_MERGE_MARKER and the value prior to teh change is in the
sample according to my hex editor

so the fuzzer seems capable to create data to get past the check
iam not trolling and this does play a role, ive seen other issues
that used the side data split

how the fuzzer does this i dont know but i can imagine a few ways
like looking at the individual bytes of the comparission and evolving
towards breaking it or maybe some input file was generated by
libavformat leaking the marker into it somehow.


[...]
wm4 March 8, 2017, 7:16 p.m. UTC | #13
On Wed, 8 Mar 2017 19:56:42 +0100
Michael Niedermayer <michael@niedermayer.cc> wrote:

> On Wed, Mar 08, 2017 at 07:31:27PM +0100, wm4 wrote:
> > On Wed, 8 Mar 2017 19:03:21 +0100
> > Michael Niedermayer <michael@niedermayer.cc> wrote:
> >   
> > > On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:  
> > > > On Wed, 8 Mar 2017 17:11:12 +0100
> > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > >     
> > > > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:    
> > > > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > > >       
> > > > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:      
> > > > > > > > It looks like this could lead to security issues, as side data readers        
> > > > > [...]    
> > > > > > > also size checks are needed for the case where a lib gets replaced by
> > > > > > > one with newer version that has a bigger struct.      
> > > > > > 
> > > > > > Oh really? We never do this. Normal API structs are also considered
> > > > > > appendable, so compiling against a newer API and then linking against
> > > > > > an older version doesn't work. This is exactly the same case.      
> > > > > 
> > > > > no its not
> > > > > 
> > > > > what you call normal structs are allocated by an allocator that is
> > > > > part of the lib that defines it, the struct, and lib dependancies
> > > > > ensure that its new. Any allocated struct as a result is large
> > > > > enough though possibly not every field is set
> > > > > 
> > > > > with side data the code using it sets the size explicitly that makes
> > > > > the size generally hardcoded in the lib using the code theres no
> > > > > longer a common allocator (some exceptions exist).
> > > > > The size a lib allocates that way is the
> > > > > compile time sizeof() which may differ from another lib
> > > > > and side data can be passed in both directions between libs not just
> > > > > in the direction of their dependancy
> > > > > so you can end up with a smaller side data and that means you have to
> > > > > do checks.    
> > > > 
> > > > This is wrong.    
> > >   
> > > > side data which has structs have corresponding functions
> > > > to get their allocation size. Of course that's all very error prone and
> > > > hard to use correctly and some were added only recently because the
> > > > API had holes, but that's how the libav* APIs are for now.    
> > > 
> > > you talk about something else here.
> > > 
> > > fact is the allocated side data uses hardcoded size values often
> > > anyone can look at
> > > git grep -A3 new_side_data
> > > 
> > > theres is sizeof() use and there are litteral numbers also  
> > 
> > You have to use whatever is correct in each specific case. Using a
> > number or sizeof() argument for new_side_data is simply an API
> > violation in some cases, similar to e.g. using
> > av_malloc(sizeof(AVFrame)). There are a few.
> >   
> 
> > I don't know why you want to "check" these uses with FATE. As I've said
> > in the other thread, that's like letting FATE check sizeof(AVFrame).
> > 
> > The right way is to check it in new_side_data, or have an API that is
> > not so hard to use incorrectly. This has been discussed before, when
> > we added new functions to add display mastering data or something
> > similar in an ABI-safe way.
> >   
> > > if these ever change, checks on the size become needed
> > > which was the original thing i meant in this sub argumentation.
> > > checks are needed, or something else in their place
> > > is needed in that case  
> > 
> > And FATE-checking the sizes is the wrong thing to do. At least for the
> > side data types whose byte layouts are defined by the C ABI like
> > MASTERING_DISPLAY_METADATA, not by something wire-like as for
> > example the SKIP_SAMPLES ones.  
> 
> wrong thread or you totally misunderstand me
> 
> what you originally said:
> > It looks like this could lead to security issues, as side data readers
> > will for example rely on side data allocation sizes to be as large as
> > needed for correct operation.  
> 
> And what i replied:
>     [...]
> 
>     also size checks are needed for the case where a lib gets replaced by
>     one with newer version that has a bigger struct.

Yes, and that's why I said somewhere that this is the same as other
"appendable" structs in the libav* API. For example, AVFrame is
supposed to be extendable, which is why everything outside of libavutil
(the lib that defines the struct) must use av_frame_alloc() to allocate
AVFrame. sizeof(AVFrame) is not part of the ABI.

There is _no_ intention that an API user compiled against, say,
libavutil version 1.2.100 can link to libavutil version 1.1.100,
because 1.2.100 might have extended some structs (like AVFrame).

So where does your argument for size checks come from? We don't do
anything like this. There is no av_get_avframe_size(), and there's no
code that would do "if(av_get_avframe_size() < sizeof(AVFRame))". But
for side data we're suddenly supposed to do this?

For side data we have e.g. av_mastering_display_metadata_alloc() and 
av_mastering_display_metadata_create_side_data(). Nowhere are you
supposed to know or check the side data size.

(I thought some side data types have an API function that returns the
size of the corresponding struct, but I didn't find one. My mistake.)

And finally, my claim that it's a security problem: if some code
for example accesses mastering display side data, and the side data is
too small, that code will obviously read out of bounds. There isn't
even a way to check this, short of using sizeof(AVMastering...), which
is not part of the ABI. This is why arbitrary side data read from files
should be prevented.

Some side-data types are defined like a file format - these generally
use explicit size (as in, numbers in the code), and you accessed them
with AV_WB32/AV_RB32 etc. - these are different.

> Now fact is that for cases where you link to a lib with a differently
> sized struct or more generally any side data (which is what was meant)
> if its not using an allocator it needs a check in the code or something
> else, thats what i meant and thats from where this bizare sub
> discussion started where you seem to keep disagreeing about things i
> did not say
> 
> Iam not talking about fate here, thats a seperate thing

OK, then I don't know what you're arguing about. Isn't the ABI
situation pretty clear?

> 
> >   
> > > 
> > >   
> > > > 
> > > > Besides, manually checking struct sizes/allocations makes for an even
> > > > worse ABI compatibility concept than FFmpeg currently follows. (Worse
> > > > as in ease of use and accidental incompatibilities and UB triggered as
> > > > consequence.)
> > > >     
> > >   
> > > > > 
> > > > > 
> > > > > [...]
> > > > >     
> > > > > > >       
> > > > > > > > +            && av_packet_split_side_data(pkt) == 1) {        
> > > > > > > 
> > > > > > > this could fail with ENOMEM which complicates things      
> > > > > > 
> > > > > > I can add a check for ENOMEM too. This should be the only thing that
> > > > > > looks like a failure, but could work in a separate call (like the one
> > > > > > on the libavcodec side).
> > > > > >       
> > > > > > >       
> > > > >     
> > > > > > > if we do block such packets, its prbobably better to add a new
> > > > > > > static inline function to a header that checks if a packet has
> > > > > > > splitable side data and use that in av_packet_split_side_data() and
> > > > > > > here      
> > > > > > 
> > > > > > Not sure what you mean here.
> > > > > >       
> > > > > > > Iam suggesting "static inline" here to make backporting easier so no
> > > > > > > new symbol is added to libavcodec that is needed by libavformat      
> > > > > > 
> > > > > > What's the purpose?      
> > > > > 
> > > > > Its simpler, cleaner and faster    
> > > > 
> > > > I mean of that function, or why we should care about symbols present.    
> > > 
> > > i think i explained this but ill try again all more verbosly
> > > 
> > > using a functiom to check for splitable sidedata instead of
> > > spliting the side data is cleaner as we dont want to split it we just
> > > want to check if theres any.
> > > 
> > > Its simpler as we dont have to deal with errors from the split code
> > > and also dont need to deal with the case that split happened.  
> > 
> > I don't see much simplicity in code duplication, putting code into
> > public headers (which also means you have to make sure this
> > compiles with C++), reducing compile times, and potentially exposing
> > implementation details.  
> 
> It can be in a private header, it doesnt need to be public

OK, that could probably be reduced to the magic footer.
wm4 March 8, 2017, 7:23 p.m. UTC | #14
On Wed, 8 Mar 2017 20:15:24 +0100
Michael Niedermayer <michael@niedermayer.cc> wrote:

> On Wed, Mar 08, 2017 at 07:35:50PM +0100, wm4 wrote:
> > On Wed, 8 Mar 2017 19:20:15 +0100
> > Michael Niedermayer <michael@niedermayer.cc> wrote:
> >   
> > > On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:  
> > > > On Wed, 8 Mar 2017 17:11:12 +0100
> > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > >     
> > > > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:    
> > > > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > > >       
> > > > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:      
> > > [...]  
> > > > >    
> > > > > >       
> > > > > > > also it may be interresting to disable this check for fuzzing so
> > > > > > > side data can be fuzzed in a wider range of cases and any past
> > > > > > > testcases that happen to use this can still be used for regression
> > > > > > > testing      
> > > > > > 
> > > > > > I think what you want is fault injection for memory errors, seems out
> > > > > > of scope here.      
> > > > > 
> > > > > no, i want fuzzing to continue to fuzz side data, it did so in the
> > > > > past and it should continue to do so.    
> > > > 
> > > > You can fuzz side data as much as you can fuzz AVFrame or
> > > > AVCodecContext. I believe randomly changing in-memory data structures
> > > > is referred to as fault injection, not fuzzing.    
> > > 
> > > it doesnt really matter what you call it, but it was done and the
> > > patch breaks it if theres no option to disable it or something else  
> > 
> > PS: honestly, I think you're trolling with this. The possibility that
> > random packets could be interpreted as side-data doesn't look like
> > something that was accounted for, and I highly doubt this plays a role
> > for fuzzing (or ever occurred in a fuzzing case). Why are you so
> > hell-bent in finally preventing this arguably dangerous corner case? It
> > makes no sense at all. Explain yourself.  
> 
> It plays a role, for example the timeout in 731 goes away if
> i change FF_MERGE_MARKER and the value prior to teh change is in the
> sample according to my hex editor
> 
> so the fuzzer seems capable to create data to get past the check
> iam not trolling and this does play a role, ive seen other issues
> that used the side data split
> 
> how the fuzzer does this i dont know but i can imagine a few ways
> like looking at the individual bytes of the comparission and evolving
> towards breaking it or maybe some input file was generated by
> libavformat leaking the marker into it somehow.

If you argue this way, the fuzzer will expose issues that can happen
with the normal data flow too. (For example, if a demuxer sets a field
in a side data struct to a value read from the file - obviously the
fuzzer will be capable of exposing bugs related in readers of this
field.)

Normally you would be glad that an issue the fuzzer previously exposed
goes away completely.
Michael Niedermayer March 8, 2017, 7:42 p.m. UTC | #15
On Wed, Mar 08, 2017 at 08:23:26PM +0100, wm4 wrote:
> On Wed, 8 Mar 2017 20:15:24 +0100
> Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > On Wed, Mar 08, 2017 at 07:35:50PM +0100, wm4 wrote:
> > > On Wed, 8 Mar 2017 19:20:15 +0100
> > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > >   
> > > > On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:  
> > > > > On Wed, 8 Mar 2017 17:11:12 +0100
> > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > >     
> > > > > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:    
> > > > > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > > > >       
> > > > > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:      
> > > > [...]  
> > > > > >    
> > > > > > >       
> > > > > > > > also it may be interresting to disable this check for fuzzing so
> > > > > > > > side data can be fuzzed in a wider range of cases and any past
> > > > > > > > testcases that happen to use this can still be used for regression
> > > > > > > > testing      
> > > > > > > 
> > > > > > > I think what you want is fault injection for memory errors, seems out
> > > > > > > of scope here.      
> > > > > > 
> > > > > > no, i want fuzzing to continue to fuzz side data, it did so in the
> > > > > > past and it should continue to do so.    
> > > > > 
> > > > > You can fuzz side data as much as you can fuzz AVFrame or
> > > > > AVCodecContext. I believe randomly changing in-memory data structures
> > > > > is referred to as fault injection, not fuzzing.    
> > > > 
> > > > it doesnt really matter what you call it, but it was done and the
> > > > patch breaks it if theres no option to disable it or something else  
> > > 
> > > PS: honestly, I think you're trolling with this. The possibility that
> > > random packets could be interpreted as side-data doesn't look like
> > > something that was accounted for, and I highly doubt this plays a role
> > > for fuzzing (or ever occurred in a fuzzing case). Why are you so
> > > hell-bent in finally preventing this arguably dangerous corner case? It
> > > makes no sense at all. Explain yourself.  
> > 
> > It plays a role, for example the timeout in 731 goes away if
> > i change FF_MERGE_MARKER and the value prior to teh change is in the
> > sample according to my hex editor
> > 
> > so the fuzzer seems capable to create data to get past the check
> > iam not trolling and this does play a role, ive seen other issues
> > that used the side data split
> > 
> > how the fuzzer does this i dont know but i can imagine a few ways
> > like looking at the individual bytes of the comparission and evolving
> > towards breaking it or maybe some input file was generated by
> > libavformat leaking the marker into it somehow.
> 
> If you argue this way, the fuzzer will expose issues that can happen
> with the normal data flow too. (For example, if a demuxer sets a field
> in a side data struct to a value read from the file - obviously the
> fuzzer will be capable of exposing bugs related in readers of this
> field.)

> 
> Normally you would be glad that an issue the fuzzer previously exposed
> goes away completely.

It doesnt go completely away, it just closes an easy way to introduce
side data

but yes, iam in fact happy if the path marker->libavformat->libavcodec
goes away

its the details iam not completely happy with


[...]
--
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If you fake or manipulate statistics in a paper in physics you will never
get a job again.
If you fake or manipulate statistics in a paper in medicin you will get
a job for life at the pharma industry.
Michael Niedermayer March 8, 2017, 7:54 p.m. UTC | #16
On Wed, Mar 08, 2017 at 07:32:26PM +0100, wm4 wrote:
> On Wed, 8 Mar 2017 19:20:15 +0100
> Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:
> > > On Wed, 8 Mar 2017 17:11:12 +0100
> > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > >   
> > > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:  
> > > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > >     
> > > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:    
> > [...]
> > > >  
> > > > >     
> > > > > > also it may be interresting to disable this check for fuzzing so
> > > > > > side data can be fuzzed in a wider range of cases and any past
> > > > > > testcases that happen to use this can still be used for regression
> > > > > > testing    
> > > > > 
> > > > > I think what you want is fault injection for memory errors, seems out
> > > > > of scope here.    
> > > > 
> > > > no, i want fuzzing to continue to fuzz side data, it did so in the
> > > > past and it should continue to do so.  
> > > 
> > > You can fuzz side data as much as you can fuzz AVFrame or
> > > AVCodecContext. I believe randomly changing in-memory data structures
> > > is referred to as fault injection, not fuzzing.  
> > 
> > it doesnt really matter what you call it, but it was done and the
> > patch breaks it if theres no option to disable it or something else
> 
> Well, you can't do it anymore. Why are you so afraid that a potential
> error source is being eliminated?

well its rather hidden, harder to test for not eliminated.


[...]
wm4 March 8, 2017, 7:59 p.m. UTC | #17
On Wed, 8 Mar 2017 20:42:17 +0100
Michael Niedermayer <michael@niedermayer.cc> wrote:

> On Wed, Mar 08, 2017 at 08:23:26PM +0100, wm4 wrote:
> > On Wed, 8 Mar 2017 20:15:24 +0100
> > Michael Niedermayer <michael@niedermayer.cc> wrote:
> >   
> > > On Wed, Mar 08, 2017 at 07:35:50PM +0100, wm4 wrote:  
> > > > On Wed, 8 Mar 2017 19:20:15 +0100
> > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > >     
> > > > > On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:    
> > > > > > On Wed, 8 Mar 2017 17:11:12 +0100
> > > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > > >       
> > > > > > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:      
> > > > > > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > > > > >         
> > > > > > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:        
> > > > > [...]    
> > > > > > >      
> > > > > > > >         
> > > > > > > > > also it may be interresting to disable this check for fuzzing so
> > > > > > > > > side data can be fuzzed in a wider range of cases and any past
> > > > > > > > > testcases that happen to use this can still be used for regression
> > > > > > > > > testing        
> > > > > > > > 
> > > > > > > > I think what you want is fault injection for memory errors, seems out
> > > > > > > > of scope here.        
> > > > > > > 
> > > > > > > no, i want fuzzing to continue to fuzz side data, it did so in the
> > > > > > > past and it should continue to do so.      
> > > > > > 
> > > > > > You can fuzz side data as much as you can fuzz AVFrame or
> > > > > > AVCodecContext. I believe randomly changing in-memory data structures
> > > > > > is referred to as fault injection, not fuzzing.      
> > > > > 
> > > > > it doesnt really matter what you call it, but it was done and the
> > > > > patch breaks it if theres no option to disable it or something else    
> > > > 
> > > > PS: honestly, I think you're trolling with this. The possibility that
> > > > random packets could be interpreted as side-data doesn't look like
> > > > something that was accounted for, and I highly doubt this plays a role
> > > > for fuzzing (or ever occurred in a fuzzing case). Why are you so
> > > > hell-bent in finally preventing this arguably dangerous corner case? It
> > > > makes no sense at all. Explain yourself.    
> > > 
> > > It plays a role, for example the timeout in 731 goes away if
> > > i change FF_MERGE_MARKER and the value prior to teh change is in the
> > > sample according to my hex editor
> > > 
> > > so the fuzzer seems capable to create data to get past the check
> > > iam not trolling and this does play a role, ive seen other issues
> > > that used the side data split
> > > 
> > > how the fuzzer does this i dont know but i can imagine a few ways
> > > like looking at the individual bytes of the comparission and evolving
> > > towards breaking it or maybe some input file was generated by
> > > libavformat leaking the marker into it somehow.  
> > 
> > If you argue this way, the fuzzer will expose issues that can happen
> > with the normal data flow too. (For example, if a demuxer sets a field
> > in a side data struct to a value read from the file - obviously the
> > fuzzer will be capable of exposing bugs related in readers of this
> > field.)  
> 
> > 
> > Normally you would be glad that an issue the fuzzer previously exposed
> > goes away completely.  
> 
> It doesnt go completely away, it just closes an easy way to introduce
> side data

Well, in the first place it prevents accidental interpretation of input
data as merged side data. You seem to be sad about the fact that the
fuzzer won't be able to test such accidentally interpreted side data,
which makes no sense to me.

> but yes, iam in fact happy if the path marker->libavformat->libavcodec
> goes away

Like what?

> its the details iam not completely happy with

Which is? Other than the ENOMEM issue (will fix), strcmp (you didn't
reply to my remarks), and the idea to introduce a special function to
check for merged side data (I'm hesitant, maybe will do).
wm4 March 8, 2017, 8:01 p.m. UTC | #18
On Wed, 8 Mar 2017 20:54:43 +0100
Michael Niedermayer <michael@niedermayer.cc> wrote:

> On Wed, Mar 08, 2017 at 07:32:26PM +0100, wm4 wrote:
> > On Wed, 8 Mar 2017 19:20:15 +0100
> > Michael Niedermayer <michael@niedermayer.cc> wrote:
> >   
> > > On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:  
> > > > On Wed, 8 Mar 2017 17:11:12 +0100
> > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > >     
> > > > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:    
> > > > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > > >       
> > > > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:      
> > > [...]  
> > > > >    
> > > > > >       
> > > > > > > also it may be interresting to disable this check for fuzzing so
> > > > > > > side data can be fuzzed in a wider range of cases and any past
> > > > > > > testcases that happen to use this can still be used for regression
> > > > > > > testing      
> > > > > > 
> > > > > > I think what you want is fault injection for memory errors, seems out
> > > > > > of scope here.      
> > > > > 
> > > > > no, i want fuzzing to continue to fuzz side data, it did so in the
> > > > > past and it should continue to do so.    
> > > > 
> > > > You can fuzz side data as much as you can fuzz AVFrame or
> > > > AVCodecContext. I believe randomly changing in-memory data structures
> > > > is referred to as fault injection, not fuzzing.    
> > > 
> > > it doesnt really matter what you call it, but it was done and the
> > > patch breaks it if theres no option to disable it or something else  
> > 
> > Well, you can't do it anymore. Why are you so afraid that a potential
> > error source is being eliminated?  
> 
> well its rather hidden, harder to test for not eliminated.

How is it not eliminated?
Michael Niedermayer March 8, 2017, 10:34 p.m. UTC | #19
On Wed, Mar 08, 2017 at 09:01:32PM +0100, wm4 wrote:
> On Wed, 8 Mar 2017 20:54:43 +0100
> Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > On Wed, Mar 08, 2017 at 07:32:26PM +0100, wm4 wrote:
> > > On Wed, 8 Mar 2017 19:20:15 +0100
> > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > >   
> > > > On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:  
> > > > > On Wed, 8 Mar 2017 17:11:12 +0100
> > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > >     
> > > > > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:    
> > > > > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > > > >       
> > > > > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:      
> > > > [...]  
> > > > > >    
> > > > > > >       
> > > > > > > > also it may be interresting to disable this check for fuzzing so
> > > > > > > > side data can be fuzzed in a wider range of cases and any past
> > > > > > > > testcases that happen to use this can still be used for regression
> > > > > > > > testing      
> > > > > > > 
> > > > > > > I think what you want is fault injection for memory errors, seems out
> > > > > > > of scope here.      
> > > > > > 
> > > > > > no, i want fuzzing to continue to fuzz side data, it did so in the
> > > > > > past and it should continue to do so.    
> > > > > 
> > > > > You can fuzz side data as much as you can fuzz AVFrame or
> > > > > AVCodecContext. I believe randomly changing in-memory data structures
> > > > > is referred to as fault injection, not fuzzing.    
> > > > 
> > > > it doesnt really matter what you call it, but it was done and the
> > > > patch breaks it if theres no option to disable it or something else  
> > > 
> > > Well, you can't do it anymore. Why are you so afraid that a potential
> > > error source is being eliminated?  
> > 
> > well its rather hidden, harder to test for not eliminated.
> 
> How is it not eliminated?

every side data we support can be generated by something otherwise
we wouldnt have that side data type

[...]
wm4 March 9, 2017, 6:50 a.m. UTC | #20
On Wed, 8 Mar 2017 23:34:39 +0100
Michael Niedermayer <michael@niedermayer.cc> wrote:

> On Wed, Mar 08, 2017 at 09:01:32PM +0100, wm4 wrote:
> > On Wed, 8 Mar 2017 20:54:43 +0100
> > Michael Niedermayer <michael@niedermayer.cc> wrote:
> >   
> > > On Wed, Mar 08, 2017 at 07:32:26PM +0100, wm4 wrote:  
> > > > On Wed, 8 Mar 2017 19:20:15 +0100
> > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > >     
> > > > > On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:    
> > > > > > On Wed, 8 Mar 2017 17:11:12 +0100
> > > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > > >       
> > > > > > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:      
> > > > > > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > > > > >         
> > > > > > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:        
> > > > > [...]    
> > > > > > >      
> > > > > > > >         
> > > > > > > > > also it may be interresting to disable this check for fuzzing so
> > > > > > > > > side data can be fuzzed in a wider range of cases and any past
> > > > > > > > > testcases that happen to use this can still be used for regression
> > > > > > > > > testing        
> > > > > > > > 
> > > > > > > > I think what you want is fault injection for memory errors, seems out
> > > > > > > > of scope here.        
> > > > > > > 
> > > > > > > no, i want fuzzing to continue to fuzz side data, it did so in the
> > > > > > > past and it should continue to do so.      
> > > > > > 
> > > > > > You can fuzz side data as much as you can fuzz AVFrame or
> > > > > > AVCodecContext. I believe randomly changing in-memory data structures
> > > > > > is referred to as fault injection, not fuzzing.      
> > > > > 
> > > > > it doesnt really matter what you call it, but it was done and the
> > > > > patch breaks it if theres no option to disable it or something else    
> > > > 
> > > > Well, you can't do it anymore. Why are you so afraid that a potential
> > > > error source is being eliminated?    
> > > 
> > > well its rather hidden, harder to test for not eliminated.  
> > 
> > How is it not eliminated?  
> 
> every side data we support can be generated by something otherwise

Like what? You're not answering, just spouting confused stuff. What
kind of reply is this? 

> we wouldnt have that side data type

????????????????
Michael Niedermayer March 9, 2017, 11 a.m. UTC | #21
On Thu, Mar 09, 2017 at 07:50:14AM +0100, wm4 wrote:
> On Wed, 8 Mar 2017 23:34:39 +0100
> Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > On Wed, Mar 08, 2017 at 09:01:32PM +0100, wm4 wrote:
> > > On Wed, 8 Mar 2017 20:54:43 +0100
> > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > >   
> > > > On Wed, Mar 08, 2017 at 07:32:26PM +0100, wm4 wrote:  
> > > > > On Wed, 8 Mar 2017 19:20:15 +0100
> > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > >     
> > > > > > On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:    
> > > > > > > On Wed, 8 Mar 2017 17:11:12 +0100
> > > > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > > > >       
> > > > > > > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:      
> > > > > > > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > > > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > > > > > >         
> > > > > > > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:        
> > > > > > [...]    
> > > > > > > >      
> > > > > > > > >         
> > > > > > > > > > also it may be interresting to disable this check for fuzzing so
> > > > > > > > > > side data can be fuzzed in a wider range of cases and any past
> > > > > > > > > > testcases that happen to use this can still be used for regression
> > > > > > > > > > testing        
> > > > > > > > > 
> > > > > > > > > I think what you want is fault injection for memory errors, seems out
> > > > > > > > > of scope here.        
> > > > > > > > 
> > > > > > > > no, i want fuzzing to continue to fuzz side data, it did so in the
> > > > > > > > past and it should continue to do so.      
> > > > > > > 
> > > > > > > You can fuzz side data as much as you can fuzz AVFrame or
> > > > > > > AVCodecContext. I believe randomly changing in-memory data structures
> > > > > > > is referred to as fault injection, not fuzzing.      
> > > > > > 
> > > > > > it doesnt really matter what you call it, but it was done and the
> > > > > > patch breaks it if theres no option to disable it or something else    
> > > > > 
> > > > > Well, you can't do it anymore. Why are you so afraid that a potential
> > > > > error source is being eliminated?    
> > > > 
> > > > well its rather hidden, harder to test for not eliminated.  
> > > 
> > > How is it not eliminated?  
> > 
> > every side data we support can be generated by something otherwise
> 
> Like what? You're not answering, just spouting confused stuff. What
> kind of reply is this? 
> 
> > we wouldnt have that side data type
> 
> ????????????????

This is very basic really but lets elaborate
for each side data type T
possiblity A
nothing uses side data type T

possiblity B
something uses side data type T


Its the same with a codec, either a codec is used in some case or
its used in no case.

If something is used in no case then it has been eliminated as you
describe.
If somehing is still used in a case it has not been eliminated

If as you describe side data has been eliminated then you could
remove side data as a whole from the source code.

If you cannot remove side data or a specific side data type from
the source code then it has not been eliminated

your change removes one way for an attacker to set side data but
by the fact that you dont remove any of the side data types its
clear you are aware of that every is still in use in some code path.

a attacker may need to use a specific container format to set a
specific side data type or may depend on a specific demuxer lib or
application that allows him to set a side data type.

now if you remove every way to set side data for an attacker then
you can remove that side data type as a whole from the code.
Of course that removes whatever the side data is for.

Let me provide a specific example
If a container suports changing extradata mid stream it will either
be support or not.
if any demuxer supports it then you have not eliminated the possiblity
for an attacker

I hope writing a elaborate reply will not lead to this discussion
to shift onto some unrelated detail


[...]
Nicolas George March 9, 2017, 11:16 a.m. UTC | #22
Le nonidi 19 ventôse, an CCXXV, Michael Niedermayer a écrit :
> This is very basic really but lets elaborate
> for each side data type T
> possiblity A
> nothing uses side data type T
> 
> possiblity B
> something uses side data type T
> 
> Its the same with a codec, either a codec is used in some case or
> its used in no case.
> 
> If something is used in no case then it has been eliminated as you
> describe.
> If somehing is still used in a case it has not been eliminated
> 
> If as you describe side data has been eliminated then you could
> remove side data as a whole from the source code.
> 
> If you cannot remove side data or a specific side data type from
> the source code then it has not been eliminated
> 
> your change removes one way for an attacker to set side data but
> by the fact that you dont remove any of the side data types its
> clear you are aware of that every is still in use in some code path.
> 
> a attacker may need to use a specific container format to set a
> specific side data type or may depend on a specific demuxer lib or
> application that allows him to set a side data type.
> 
> now if you remove every way to set side data for an attacker then
> you can remove that side data type as a whole from the code.
> Of course that removes whatever the side data is for.
> 
> Let me provide a specific example
> If a container suports changing extradata mid stream it will either
> be support or not.
> if any demuxer supports it then you have not eliminated the possiblity
> for an attacker
> 
> I hope writing a elaborate reply will not lead to this discussion
> to shift onto some unrelated detail

You are rehashing a lot of obvious facts, but you do not address the
important questions.

Side data is useful. It is a badly designed API, because it adds a lot
of complexity for a hypothetical benefit but fails to reap that benefit.
As such, it should be kept but enhanced, either by removing the
complexity or by actually reaping the benefit.

But it is not the topic of this discussion.

MERGED side data is a completely brainded design that should never have
been written.

I am purposefully not looking at the archive to find out who is
responsible for this mess. Now is not the time to point fingers but to
fix the code.

Now, please answer this very specific question:

If someone were to REMOVE ALL AND EVERY use of
av_packet_merge_side_data() and av_packet_split_side_data(), what would
be the actual bad consequences?

But before you start with fuzzing or anything similar, let me stop you:
fuzzing exposes bugs that can be triggered by crafted inputs. If fuzzing
can not trigger it, that means the bug does not exist, period.

Regards,
wm4 March 9, 2017, 11:16 a.m. UTC | #23
On Thu, 9 Mar 2017 12:00:38 +0100
Michael Niedermayer <michael@niedermayer.cc> wrote:

> On Thu, Mar 09, 2017 at 07:50:14AM +0100, wm4 wrote:
> > On Wed, 8 Mar 2017 23:34:39 +0100
> > Michael Niedermayer <michael@niedermayer.cc> wrote:
> >   
> > > On Wed, Mar 08, 2017 at 09:01:32PM +0100, wm4 wrote:  
> > > > On Wed, 8 Mar 2017 20:54:43 +0100
> > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > >     
> > > > > On Wed, Mar 08, 2017 at 07:32:26PM +0100, wm4 wrote:    
> > > > > > On Wed, 8 Mar 2017 19:20:15 +0100
> > > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > > >       
> > > > > > > On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:      
> > > > > > > > On Wed, 8 Mar 2017 17:11:12 +0100
> > > > > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > > > > >         
> > > > > > > > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:        
> > > > > > > > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > > > > > > > Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > > > > > > > >           
> > > > > > > > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:          
> > > > > > > [...]      
> > > > > > > > >        
> > > > > > > > > >           
> > > > > > > > > > > also it may be interresting to disable this check for fuzzing so
> > > > > > > > > > > side data can be fuzzed in a wider range of cases and any past
> > > > > > > > > > > testcases that happen to use this can still be used for regression
> > > > > > > > > > > testing          
> > > > > > > > > > 
> > > > > > > > > > I think what you want is fault injection for memory errors, seems out
> > > > > > > > > > of scope here.          
> > > > > > > > > 
> > > > > > > > > no, i want fuzzing to continue to fuzz side data, it did so in the
> > > > > > > > > past and it should continue to do so.        
> > > > > > > > 
> > > > > > > > You can fuzz side data as much as you can fuzz AVFrame or
> > > > > > > > AVCodecContext. I believe randomly changing in-memory data structures
> > > > > > > > is referred to as fault injection, not fuzzing.        
> > > > > > > 
> > > > > > > it doesnt really matter what you call it, but it was done and the
> > > > > > > patch breaks it if theres no option to disable it or something else      
> > > > > > 
> > > > > > Well, you can't do it anymore. Why are you so afraid that a potential
> > > > > > error source is being eliminated?      
> > > > > 
> > > > > well its rather hidden, harder to test for not eliminated.    
> > > > 
> > > > How is it not eliminated?    
> > > 
> > > every side data we support can be generated by something otherwise  
> > 
> > Like what? You're not answering, just spouting confused stuff. What
> > kind of reply is this? 
> >   
> > > we wouldnt have that side data type  
> > 
> > ????????????????  
> 
> This is very basic really but lets elaborate
> for each side data type T
> possiblity A
> nothing uses side data type T
> 
> possiblity B
> something uses side data type T
> 
> 
> Its the same with a codec, either a codec is used in some case or
> its used in no case.
> 
> If something is used in no case then it has been eliminated as you
> describe.
> If somehing is still used in a case it has not been eliminated
> 
> If as you describe side data has been eliminated then you could
> remove side data as a whole from the source code.
> 
> If you cannot remove side data or a specific side data type from
> the source code then it has not been eliminated
> 
> your change removes one way for an attacker to set side data but
> by the fact that you dont remove any of the side data types its
> clear you are aware of that every is still in use in some code path.
> 
> a attacker may need to use a specific container format to set a
> specific side data type or may depend on a specific demuxer lib or
> application that allows him to set a side data type.
> 
> now if you remove every way to set side data for an attacker then
> you can remove that side data type as a whole from the code.
> Of course that removes whatever the side data is for.
> 
> Let me provide a specific example
> If a container suports changing extradata mid stream it will either
> be support or not.
> if any demuxer supports it then you have not eliminated the possiblity
> for an attacker
> 
> I hope writing a elaborate reply will not lead to this discussion
> to shift onto some unrelated detail

OK, but that has absolutely nothing to do with the issue at hand.

Of course side data can still be used after this patch!

Now if you could get back on topic...
wm4 March 9, 2017, 11:31 a.m. UTC | #24
On Thu, 9 Mar 2017 12:16:09 +0100
Nicolas George <george@nsup.org> wrote:

> Now, please answer this very specific question:
> 
> If someone were to REMOVE ALL AND EVERY use of
> av_packet_merge_side_data() and av_packet_split_side_data(), what would
> be the actual bad consequences?

Simply that API users, which pass only the packet data itself from
libavformat to libavcodec, will break in cases where the side data is
"required". The argument is that such API users exist because they
cannot pass the AVPacket reference along, but have to go through their
own layers, which can transport only the raw packet data itself and
maybe timestamps.

> But before you start with fuzzing or anything similar, let me stop you:
> fuzzing exposes bugs that can be triggered by crafted inputs. If fuzzing
> can not trigger it, that means the bug does not exist, period.

I think the argument is that it's easier to fuzz side-data related
things if side data can be accidentally read from raw packets that look
like they have merged side data.
Michael Niedermayer March 9, 2017, 11:42 a.m. UTC | #25
On Thu, Mar 09, 2017 at 12:16:09PM +0100, Nicolas George wrote:
> Le nonidi 19 ventôse, an CCXXV, Michael Niedermayer a écrit :
> > This is very basic really but lets elaborate
> > for each side data type T
> > possiblity A
> > nothing uses side data type T
> > 
> > possiblity B
> > something uses side data type T
> > 
> > Its the same with a codec, either a codec is used in some case or
> > its used in no case.
> > 
> > If something is used in no case then it has been eliminated as you
> > describe.
> > If somehing is still used in a case it has not been eliminated
> > 
> > If as you describe side data has been eliminated then you could
> > remove side data as a whole from the source code.
> > 
> > If you cannot remove side data or a specific side data type from
> > the source code then it has not been eliminated
> > 
> > your change removes one way for an attacker to set side data but
> > by the fact that you dont remove any of the side data types its
> > clear you are aware of that every is still in use in some code path.
> > 
> > a attacker may need to use a specific container format to set a
> > specific side data type or may depend on a specific demuxer lib or
> > application that allows him to set a side data type.
> > 
> > now if you remove every way to set side data for an attacker then
> > you can remove that side data type as a whole from the code.
> > Of course that removes whatever the side data is for.
> > 
> > Let me provide a specific example
> > If a container suports changing extradata mid stream it will either
> > be support or not.
> > if any demuxer supports it then you have not eliminated the possiblity
> > for an attacker
> > 
> > I hope writing a elaborate reply will not lead to this discussion
> > to shift onto some unrelated detail
> 
> You are rehashing a lot of obvious facts, but you do not address the
> important questions.

yes, i was trying to clarify a reply that was apparently unclear and
not understood.
its like
statement -> point out disagreement -> do not understand -> clarify
and
clarify -> "You are rehashing a lot of obvious facts"

Sorry if that felt off topic, it probably was

[...]
Nicolas George March 9, 2017, 11:45 a.m. UTC | #26
Le nonidi 19 ventôse, an CCXXV, Michael Niedermayer a écrit :
> yes, i was trying to clarify a reply that was apparently unclear and
> not understood.
> its like
> statement -> point out disagreement -> do not understand -> clarify
> and
> clarify -> "You are rehashing a lot of obvious facts"
> 
> Sorry if that felt off topic, it probably was

Ok, but please answer the actual question in my mail.

Regards,
Michael Niedermayer March 9, 2017, 12:12 p.m. UTC | #27
On Thu, Mar 09, 2017 at 12:16:09PM +0100, Nicolas George wrote:
> Le nonidi 19 ventôse, an CCXXV, Michael Niedermayer a écrit :
> > This is very basic really but lets elaborate
> > for each side data type T
> > possiblity A
> > nothing uses side data type T
> > 
> > possiblity B
> > something uses side data type T
> > 
> > Its the same with a codec, either a codec is used in some case or
> > its used in no case.
> > 
> > If something is used in no case then it has been eliminated as you
> > describe.
> > If somehing is still used in a case it has not been eliminated
> > 
> > If as you describe side data has been eliminated then you could
> > remove side data as a whole from the source code.
> > 
> > If you cannot remove side data or a specific side data type from
> > the source code then it has not been eliminated
> > 
> > your change removes one way for an attacker to set side data but
> > by the fact that you dont remove any of the side data types its
> > clear you are aware of that every is still in use in some code path.
> > 
> > a attacker may need to use a specific container format to set a
> > specific side data type or may depend on a specific demuxer lib or
> > application that allows him to set a side data type.
> > 
> > now if you remove every way to set side data for an attacker then
> > you can remove that side data type as a whole from the code.
> > Of course that removes whatever the side data is for.
> > 
> > Let me provide a specific example
> > If a container suports changing extradata mid stream it will either
> > be support or not.
> > if any demuxer supports it then you have not eliminated the possiblity
> > for an attacker
> > 
> > I hope writing a elaborate reply will not lead to this discussion
> > to shift onto some unrelated detail
> 
> You are rehashing a lot of obvious facts, but you do not address the
> important questions.
> 
> Side data is useful. It is a badly designed API, because it adds a lot
> of complexity for a hypothetical benefit but fails to reap that benefit.
> As such, it should be kept but enhanced, either by removing the
> complexity or by actually reaping the benefit.
> 

> But it is not the topic of this discussion.

> 
> MERGED side data is a completely brainded design that should never have
> been written.

Iam not sure its on topic, but it gave FFmpeg an advantage
over our competition at the time when this code was added.


> 
> I am purposefully not looking at the archive to find out who is
> responsible for this mess. Now is not the time to point fingers but to
> fix the code.
> 
> Now, please answer this very specific question:
> 

> If someone were to REMOVE ALL AND EVERY use of
> av_packet_merge_side_data() and av_packet_split_side_data(), what would
> be the actual bad consequences?

* All applications using libavformat and libavcodec would need to
preserve the side data between the layers, they currently do not
have to. Thats also a API/ABI change requiring a major bump

* Any testcases that used this to inject side data will no longer
  function as regression tests

* It reduces (but does not eliminate) the ways by which a fuzzer can
  inject side data.


> 
> But before you start with fuzzing or anything similar, let me stop you:
> fuzzing exposes bugs that can be triggered by crafted inputs.

> If fuzzing
> can not trigger it, that means the bug does not exist, period.

This may be formally correct if you have infinite time but we dont.

A specific example would be a raw h264 file used as input to the fuzzer
with the split code the fuzzer can inject side data by adding a
8 byte value, fuzzers actually manage to do this

without split code the fuzzer can still inject side data but it needs
to build the right container around the h264 frames, like a matroska
file that supports the specific type of side data.
This is much harder for a fuzzer to do
So the fuzzer then could only do that with cases that are mkv files
to begin with which reduces the attack surface probed by the fuzzer


[...]
diff mbox

Patch

diff --git a/libavformat/utils.c b/libavformat/utils.c
index 37d7024465..68f3c977d6 100644
--- a/libavformat/utils.c
+++ b/libavformat/utils.c
@@ -840,6 +840,15 @@  int ff_read_packet(AVFormatContext *s, AVPacket *pkt)
             *pkt = tmp;
         }
 
+        if (strcmp(s->iformat->name, "concat") && strcmp(s->iformat->name, "hls,applehttp")
+            && av_packet_split_side_data(pkt) == 1) {
+            av_log(s, AV_LOG_ERROR,
+                   "FFmpeg-style merged side data found in raw packet. "
+                   "The packet is rejected for security reasons.\n");
+            av_packet_unref(pkt);
+            return AVERROR_INVALIDDATA;
+        }
+
         if ((s->flags & AVFMT_FLAG_DISCARD_CORRUPT) &&
             (pkt->flags & AV_PKT_FLAG_CORRUPT)) {
             av_log(s, AV_LOG_WARNING,