diff mbox

[FFmpeg-devel,1/8] avcodec/cinepak: Require 1 bit per 4x4 block as minimum input

Message ID e6e6ed749639994dba07928ff851070f3050ad2f.camel@acc.umu.se
State New
Headers show

Commit Message

Tomas Härdin Aug. 18, 2019, 8:47 a.m. UTC
sön 2019-08-18 klockan 02:35 +0200 skrev Tomas Härdin:
> lör 2019-08-17 klockan 17:33 +0200 skrev Michael Niedermayer:
> > On Sat, Aug 17, 2019 at 12:26:27AM +0200, Tomas Härdin wrote:
> > > fre 2019-08-16 klockan 14:57 +0200 skrev Tomas Härdin:
> > > 
> > > I feel I should point out that being conservative here is at odds with
> > > the general "best effort" approach taken in this project. These toy
> > > codecs are useful as illustrative examples of this contradiction. I'm
> > > sure there are many more examples of files that can cause ffmpeg to do
> > > a lot more work than expected, for almost every codec. I know afl-fuzz
> > > is likely to find out that it can make the ZMBV decoder do a lot of
> > > work for a small input file, because I've seen it do that with gzip.
> > > 
> > > The user base for cinepak is of course miniscule, so I doubt anyone's
> > > going to complain that their broken CVID files don't work any more. I
> > > certainly don't care since cinepakenc only puts out valid files. 
> > > But
> > > again, for other formats we're just going to have to tell users to put
> > > ffmpeg inside a sandbox and stick a CPU limit on it. Even ffprobe is
> > > vulnerable to DoS-y things.
> > 
> > yes
> > 
> > the question ATM is just what to do here about this codec ?
> > apply the patch ?
> > change it ?
> 
> Well for a start, the file is 65535 x 209 pixels, 3166 frames. I
> wouldn't call decoding that @ 263 fps particularly slow
> 
> Second, it's not the decoder which is slow. If I comment out the
> "*got_frame = 1;" then the test also runs fast. I'm not sure what
> happens elsewhere with the decoded buffer, but I suspect there's a
> bunch of useless malloc()/memset()ing going on. Maybe the decoder is
> using ff_reget_buffer() or av_frame_ref() incorrectly, I'm not sure.

I did some investigation, it is indeed ff_reget_buffer(). It copies the
frame data for some reason. The fix is simple in this case: just call
ff_get_buffer() once in cinepak_decode_init() and keep overwriting the
same frame.

> As I said on IRC, this class of problems will exist for every codec.
> Cinepak is easy to decode, even at these resolutions. Just imagine what
> will happens when someone feeds in a 65535x209 av1 stream..

And related to this, ff_reget_buffer() is used for a lot of these
codecs which only overwrite pixels in the old frame. flicvideo, gifdec,
msrle, roqvideodec and others probably have the same flaw.

Patched attached.

/Tomas

Comments

Michael Niedermayer Aug. 18, 2019, 9:43 a.m. UTC | #1
On Sun, Aug 18, 2019 at 10:47:26AM +0200, Tomas Härdin wrote:
> sön 2019-08-18 klockan 02:35 +0200 skrev Tomas Härdin:
> > lör 2019-08-17 klockan 17:33 +0200 skrev Michael Niedermayer:
> > > On Sat, Aug 17, 2019 at 12:26:27AM +0200, Tomas Härdin wrote:
> > > > fre 2019-08-16 klockan 14:57 +0200 skrev Tomas Härdin:
> > > > 
> > > > I feel I should point out that being conservative here is at odds with
> > > > the general "best effort" approach taken in this project. These toy
> > > > codecs are useful as illustrative examples of this contradiction. I'm
> > > > sure there are many more examples of files that can cause ffmpeg to do
> > > > a lot more work than expected, for almost every codec. I know afl-fuzz
> > > > is likely to find out that it can make the ZMBV decoder do a lot of
> > > > work for a small input file, because I've seen it do that with gzip.
> > > > 
> > > > The user base for cinepak is of course miniscule, so I doubt anyone's
> > > > going to complain that their broken CVID files don't work any more. I
> > > > certainly don't care since cinepakenc only puts out valid files. 
> > > > But
> > > > again, for other formats we're just going to have to tell users to put
> > > > ffmpeg inside a sandbox and stick a CPU limit on it. Even ffprobe is
> > > > vulnerable to DoS-y things.
> > > 
> > > yes
> > > 
> > > the question ATM is just what to do here about this codec ?
> > > apply the patch ?
> > > change it ?
> > 
> > Well for a start, the file is 65535 x 209 pixels, 3166 frames. I
> > wouldn't call decoding that @ 263 fps particularly slow
> > 
> > Second, it's not the decoder which is slow. If I comment out the
> > "*got_frame = 1;" then the test also runs fast. I'm not sure what
> > happens elsewhere with the decoded buffer, but I suspect there's a
> > bunch of useless malloc()/memset()ing going on. Maybe the decoder is
> > using ff_reget_buffer() or av_frame_ref() incorrectly, I'm not sure.
> 
> I did some investigation, it is indeed ff_reget_buffer(). It copies the
> frame data for some reason. The fix is simple in this case: just call
> ff_get_buffer() once in cinepak_decode_init() and keep overwriting the
> same frame.
> 
> > As I said on IRC, this class of problems will exist for every codec.
> > Cinepak is easy to decode, even at these resolutions. Just imagine what
> > will happens when someone feeds in a 65535x209 av1 stream..
> 
> And related to this, ff_reget_buffer() is used for a lot of these
> codecs which only overwrite pixels in the old frame. flicvideo, gifdec,
> msrle, roqvideodec and others probably have the same flaw.

not calling any form of *get_buffer per frame breaks decoding into
user supplied buffers.

If you check the documentation of the get_buffer2 callback

" This callback is called at the beginning of each frame to get data buffer(s) for it."

That would not be possible if its just called once in init

and yes i too wish there was a magic fix but i think most things that
look like magic fixes have a fatal flaw. But maybe iam missing something
in fact i hope that iam missing something and that there is a magic fix

PS: if you think of changing the API, i dont think its the API.
I mean every user application will read the frames it receives, so
even if inside the decoder we just return the same frame with 2 pixels
different the user doesnt know this and has to read the whole frame.
The problem is moved around but its still there.

Thanks

[...]
Paul B Mahol Aug. 18, 2019, 10 a.m. UTC | #2
On Sun, Aug 18, 2019 at 11:44 AM Michael Niedermayer <michael@niedermayer.cc>
wrote:

> On Sun, Aug 18, 2019 at 10:47:26AM +0200, Tomas Härdin wrote:
> > sön 2019-08-18 klockan 02:35 +0200 skrev Tomas Härdin:
> > > lör 2019-08-17 klockan 17:33 +0200 skrev Michael Niedermayer:
> > > > On Sat, Aug 17, 2019 at 12:26:27AM +0200, Tomas Härdin wrote:
> > > > > fre 2019-08-16 klockan 14:57 +0200 skrev Tomas Härdin:
> > > > >
> > > > > I feel I should point out that being conservative here is at odds
> with
> > > > > the general "best effort" approach taken in this project. These toy
> > > > > codecs are useful as illustrative examples of this contradiction.
> I'm
> > > > > sure there are many more examples of files that can cause ffmpeg
> to do
> > > > > a lot more work than expected, for almost every codec. I know
> afl-fuzz
> > > > > is likely to find out that it can make the ZMBV decoder do a lot of
> > > > > work for a small input file, because I've seen it do that with
> gzip.
> > > > >
> > > > > The user base for cinepak is of course miniscule, so I doubt
> anyone's
> > > > > going to complain that their broken CVID files don't work any
> more. I
> > > > > certainly don't care since cinepakenc only puts out valid files.
> > > > > But
> > > > > again, for other formats we're just going to have to tell users to
> put
> > > > > ffmpeg inside a sandbox and stick a CPU limit on it. Even ffprobe
> is
> > > > > vulnerable to DoS-y things.
> > > >
> > > > yes
> > > >
> > > > the question ATM is just what to do here about this codec ?
> > > > apply the patch ?
> > > > change it ?
> > >
> > > Well for a start, the file is 65535 x 209 pixels, 3166 frames. I
> > > wouldn't call decoding that @ 263 fps particularly slow
> > >
> > > Second, it's not the decoder which is slow. If I comment out the
> > > "*got_frame = 1;" then the test also runs fast. I'm not sure what
> > > happens elsewhere with the decoded buffer, but I suspect there's a
> > > bunch of useless malloc()/memset()ing going on. Maybe the decoder is
> > > using ff_reget_buffer() or av_frame_ref() incorrectly, I'm not sure.
> >
> > I did some investigation, it is indeed ff_reget_buffer(). It copies the
> > frame data for some reason. The fix is simple in this case: just call
> > ff_get_buffer() once in cinepak_decode_init() and keep overwriting the
> > same frame.
> >
> > > As I said on IRC, this class of problems will exist for every codec.
> > > Cinepak is easy to decode, even at these resolutions. Just imagine what
> > > will happens when someone feeds in a 65535x209 av1 stream..
> >
> > And related to this, ff_reget_buffer() is used for a lot of these
> > codecs which only overwrite pixels in the old frame. flicvideo, gifdec,
> > msrle, roqvideodec and others probably have the same flaw.
>
> not calling any form of *get_buffer per frame breaks decoding into
> user supplied buffers.
>
> If you check the documentation of the get_buffer2 callback
>
> " This callback is called at the beginning of each frame to get data
> buffer(s) for it."
>
> That would not be possible if its just called once in init
>
> and yes i too wish there was a magic fix but i think most things that
> look like magic fixes have a fatal flaw. But maybe iam missing something
> in fact i hope that iam missing something and that there is a magic fix
>

Magic fix is enabling reference counted frames in fuzzer.


>
> PS: if you think of changing the API, i dont think its the API.
> I mean every user application will read the frames it receives, so
> even if inside the decoder we just return the same frame with 2 pixels
> different the user doesnt know this and has to read the whole frame.
> The problem is moved around but its still there.
>
> Thanks
>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> During times of universal deceit, telling the truth becomes a
> revolutionary act. -- George Orwell
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Michael Niedermayer Aug. 18, 2019, 10:19 a.m. UTC | #3
On Sun, Aug 18, 2019 at 12:00:45PM +0200, Paul B Mahol wrote:
> On Sun, Aug 18, 2019 at 11:44 AM Michael Niedermayer <michael@niedermayer.cc>
> wrote:
> 
> > On Sun, Aug 18, 2019 at 10:47:26AM +0200, Tomas Härdin wrote:
> > > sön 2019-08-18 klockan 02:35 +0200 skrev Tomas Härdin:
> > > > lör 2019-08-17 klockan 17:33 +0200 skrev Michael Niedermayer:
> > > > > On Sat, Aug 17, 2019 at 12:26:27AM +0200, Tomas Härdin wrote:
> > > > > > fre 2019-08-16 klockan 14:57 +0200 skrev Tomas Härdin:
> > > > > >
> > > > > > I feel I should point out that being conservative here is at odds
> > with
> > > > > > the general "best effort" approach taken in this project. These toy
> > > > > > codecs are useful as illustrative examples of this contradiction.
> > I'm
> > > > > > sure there are many more examples of files that can cause ffmpeg
> > to do
> > > > > > a lot more work than expected, for almost every codec. I know
> > afl-fuzz
> > > > > > is likely to find out that it can make the ZMBV decoder do a lot of
> > > > > > work for a small input file, because I've seen it do that with
> > gzip.
> > > > > >
> > > > > > The user base for cinepak is of course miniscule, so I doubt
> > anyone's
> > > > > > going to complain that their broken CVID files don't work any
> > more. I
> > > > > > certainly don't care since cinepakenc only puts out valid files.
> > > > > > But
> > > > > > again, for other formats we're just going to have to tell users to
> > put
> > > > > > ffmpeg inside a sandbox and stick a CPU limit on it. Even ffprobe
> > is
> > > > > > vulnerable to DoS-y things.
> > > > >
> > > > > yes
> > > > >
> > > > > the question ATM is just what to do here about this codec ?
> > > > > apply the patch ?
> > > > > change it ?
> > > >
> > > > Well for a start, the file is 65535 x 209 pixels, 3166 frames. I
> > > > wouldn't call decoding that @ 263 fps particularly slow
> > > >
> > > > Second, it's not the decoder which is slow. If I comment out the
> > > > "*got_frame = 1;" then the test also runs fast. I'm not sure what
> > > > happens elsewhere with the decoded buffer, but I suspect there's a
> > > > bunch of useless malloc()/memset()ing going on. Maybe the decoder is
> > > > using ff_reget_buffer() or av_frame_ref() incorrectly, I'm not sure.
> > >
> > > I did some investigation, it is indeed ff_reget_buffer(). It copies the
> > > frame data for some reason. The fix is simple in this case: just call
> > > ff_get_buffer() once in cinepak_decode_init() and keep overwriting the
> > > same frame.
> > >
> > > > As I said on IRC, this class of problems will exist for every codec.
> > > > Cinepak is easy to decode, even at these resolutions. Just imagine what
> > > > will happens when someone feeds in a 65535x209 av1 stream..
> > >
> > > And related to this, ff_reget_buffer() is used for a lot of these
> > > codecs which only overwrite pixels in the old frame. flicvideo, gifdec,
> > > msrle, roqvideodec and others probably have the same flaw.
> >
> > not calling any form of *get_buffer per frame breaks decoding into
> > user supplied buffers.
> >
> > If you check the documentation of the get_buffer2 callback
> >
> > " This callback is called at the beginning of each frame to get data
> > buffer(s) for it."
> >
> > That would not be possible if its just called once in init
> >
> > and yes i too wish there was a magic fix but i think most things that
> > look like magic fixes have a fatal flaw. But maybe iam missing something
> > in fact i hope that iam missing something and that there is a magic fix
> >
> 
> Magic fix is enabling reference counted frames in fuzzer.

That is covered by the part below which you maybe did not read


> 
> 
> >
> > PS: if you think of changing the API, i dont think its the API.

> > I mean every user application will read the frames it receives, so
> > even if inside the decoder we just return the same frame with 2 pixels
> > different the user doesnt know this and has to read the whole frame.
> > The problem is moved around but its still there.

This above

Its a very specific feature of the fuzzer currently that it does not
read the returned frame. But basically every real application will 
read it, it would be rather pointless to decode if its not read.

Thanks

[...]
Paul B Mahol Aug. 18, 2019, 10:25 a.m. UTC | #4
On Sun, Aug 18, 2019 at 12:19 PM Michael Niedermayer <michael@niedermayer.cc>
wrote:

> On Sun, Aug 18, 2019 at 12:00:45PM +0200, Paul B Mahol wrote:
> > On Sun, Aug 18, 2019 at 11:44 AM Michael Niedermayer
> <michael@niedermayer.cc>
> > wrote:
> >
> > > On Sun, Aug 18, 2019 at 10:47:26AM +0200, Tomas Härdin wrote:
> > > > sön 2019-08-18 klockan 02:35 +0200 skrev Tomas Härdin:
> > > > > lör 2019-08-17 klockan 17:33 +0200 skrev Michael Niedermayer:
> > > > > > On Sat, Aug 17, 2019 at 12:26:27AM +0200, Tomas Härdin wrote:
> > > > > > > fre 2019-08-16 klockan 14:57 +0200 skrev Tomas Härdin:
> > > > > > >
> > > > > > > I feel I should point out that being conservative here is at
> odds
> > > with
> > > > > > > the general "best effort" approach taken in this project.
> These toy
> > > > > > > codecs are useful as illustrative examples of this
> contradiction.
> > > I'm
> > > > > > > sure there are many more examples of files that can cause
> ffmpeg
> > > to do
> > > > > > > a lot more work than expected, for almost every codec. I know
> > > afl-fuzz
> > > > > > > is likely to find out that it can make the ZMBV decoder do a
> lot of
> > > > > > > work for a small input file, because I've seen it do that with
> > > gzip.
> > > > > > >
> > > > > > > The user base for cinepak is of course miniscule, so I doubt
> > > anyone's
> > > > > > > going to complain that their broken CVID files don't work any
> > > more. I
> > > > > > > certainly don't care since cinepakenc only puts out valid
> files.
> > > > > > > But
> > > > > > > again, for other formats we're just going to have to tell
> users to
> > > put
> > > > > > > ffmpeg inside a sandbox and stick a CPU limit on it. Even
> ffprobe
> > > is
> > > > > > > vulnerable to DoS-y things.
> > > > > >
> > > > > > yes
> > > > > >
> > > > > > the question ATM is just what to do here about this codec ?
> > > > > > apply the patch ?
> > > > > > change it ?
> > > > >
> > > > > Well for a start, the file is 65535 x 209 pixels, 3166 frames. I
> > > > > wouldn't call decoding that @ 263 fps particularly slow
> > > > >
> > > > > Second, it's not the decoder which is slow. If I comment out the
> > > > > "*got_frame = 1;" then the test also runs fast. I'm not sure what
> > > > > happens elsewhere with the decoded buffer, but I suspect there's a
> > > > > bunch of useless malloc()/memset()ing going on. Maybe the decoder
> is
> > > > > using ff_reget_buffer() or av_frame_ref() incorrectly, I'm not
> sure.
> > > >
> > > > I did some investigation, it is indeed ff_reget_buffer(). It copies
> the
> > > > frame data for some reason. The fix is simple in this case: just call
> > > > ff_get_buffer() once in cinepak_decode_init() and keep overwriting
> the
> > > > same frame.
> > > >
> > > > > As I said on IRC, this class of problems will exist for every
> codec.
> > > > > Cinepak is easy to decode, even at these resolutions. Just imagine
> what
> > > > > will happens when someone feeds in a 65535x209 av1 stream..
> > > >
> > > > And related to this, ff_reget_buffer() is used for a lot of these
> > > > codecs which only overwrite pixels in the old frame. flicvideo,
> gifdec,
> > > > msrle, roqvideodec and others probably have the same flaw.
> > >
> > > not calling any form of *get_buffer per frame breaks decoding into
> > > user supplied buffers.
> > >
> > > If you check the documentation of the get_buffer2 callback
> > >
> > > " This callback is called at the beginning of each frame to get data
> > > buffer(s) for it."
> > >
> > > That would not be possible if its just called once in init
> > >
> > > and yes i too wish there was a magic fix but i think most things that
> > > look like magic fixes have a fatal flaw. But maybe iam missing
> something
> > > in fact i hope that iam missing something and that there is a magic fix
> > >
> >
> > Magic fix is enabling reference counted frames in fuzzer.
>
> That is covered by the part below which you maybe did not read
>
>
> >
> >
> > >
> > > PS: if you think of changing the API, i dont think its the API.
>
> > > I mean every user application will read the frames it receives, so
> > > even if inside the decoder we just return the same frame with 2 pixels
> > > different the user doesnt know this and has to read the whole frame.
> > > The problem is moved around but its still there.
>
> This above
>
> Its a very specific feature of the fuzzer currently that it does not
> read the returned frame. But basically every real application will
> read it, it would be rather pointless to decode if its not read.
>
>
No, you are very mistaken and confused.


> Thanks
>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> I do not agree with what you have to say, but I'll defend to the death your
> right to say it. -- Voltaire
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Tomas Härdin Aug. 18, 2019, 11:40 a.m. UTC | #5
sön 2019-08-18 klockan 12:19 +0200 skrev Michael Niedermayer:
> On Sun, Aug 18, 2019 at 12:00:45PM +0200, Paul B Mahol wrote:
> > On Sun, Aug 18, 2019 at 11:44 AM Michael Niedermayer <michael@niedermayer.cc>
> > wrote:
> > 
> > > On Sun, Aug 18, 2019 at 10:47:26AM +0200, Tomas Härdin wrote:
> > > > sön 2019-08-18 klockan 02:35 +0200 skrev Tomas Härdin:
> > > > 
> > > > I did some investigation, it is indeed ff_reget_buffer(). It copies the
> > > > frame data for some reason. The fix is simple in this case: just call
> > > > ff_get_buffer() once in cinepak_decode_init() and keep overwriting the
> > > > same frame.
> > > > 
> > > > > As I said on IRC, this class of problems will exist for every codec.
> > > > > Cinepak is easy to decode, even at these resolutions. Just imagine what
> > > > > will happens when someone feeds in a 65535x209 av1 stream..
> > > > 
> > > > And related to this, ff_reget_buffer() is used for a lot of these
> > > > codecs which only overwrite pixels in the old frame. flicvideo, gifdec,
> > > > msrle, roqvideodec and others probably have the same flaw.
> > > 
> > > not calling any form of *get_buffer per frame breaks decoding into
> > > user supplied buffers.
> > > 
> > > If you check the documentation of the get_buffer2 callback
> > > 
> > > " This callback is called at the beginning of each frame to get data
> > > buffer(s) for it."
> > > 
> > > That would not be possible if its just called once in init

Sorry, I'm a bit rusty on lavc internals.

> > > and yes i too wish there was a magic fix but i think most things that
> > > look like magic fixes have a fatal flaw. But maybe iam missing something
> > > in fact i hope that iam missing something and that there is a magic fix
> > > 
> > 
> > Magic fix is enabling reference counted frames in fuzzer.
> 
> That is covered by the part below which you maybe did not read
> 
> > > PS: if you think of changing the API, i dont think its the API.
> > > I mean every user application will read the frames it receives, so
> > > even if inside the decoder we just return the same frame with 2 pixels
> > > different the user doesnt know this and has to read the whole frame.
> > > The problem is moved around but its still there.

Copying is still slower than not copying. Enabling refcounting fixes
the timeout issue here, and will likely silence a whole bunch of false
positives for this class of files.

It would be beneficial to have a consistent way of signalling that a
frame didn't change, since a bunch of codecs have that property.
Currently it's a mix of discontinuous timestamps, longer frame
durations and repeating identical frames.

> so feeding really crazy resolutions into a decoder requires some
> small but proportional amout of input bytes.
> double the width and the minimum input bytes double sort of.

Lavc is already very lenient with what it accepts. How do you detect
the difference between "this packet is too small to decode to an entire
frame" from "this packet is too small but we could still get a few MBs
out of it"?

FTR I've been in favor of rejecting broken files for a while now, and
I'm in favor of something like checking that the number of bytes in all
strips add up to some minimum. This can be combined with a pre-parsing
step that also rejects the input if any chunk has incorrect size.

> codecs OTOH which allow coding a whole frame in 10bytes input
> independant of the resolution behave worse in this way 
> as that can produce orders of magnitude more load per bandwidth
> the attacker needs.

Users are ultimately responsible for limiting how much resources their
programs can use. As you say, getting high amplification isn't hard.
Not even ffprobe is safe from this. I've been in talks with the
peertube people about this very topic not too long ago.

Stuff like this is why I've been harping on about parsers and verifying
things, and being very specific about what we accept. Unfortunately
FFculture seems to be against this.

/Tomas
Michael Niedermayer Aug. 18, 2019, 12:18 p.m. UTC | #6
On Sun, Aug 18, 2019 at 01:40:01PM +0200, Tomas Härdin wrote:
> sön 2019-08-18 klockan 12:19 +0200 skrev Michael Niedermayer:
> > On Sun, Aug 18, 2019 at 12:00:45PM +0200, Paul B Mahol wrote:
> > > On Sun, Aug 18, 2019 at 11:44 AM Michael Niedermayer <michael@niedermayer.cc>
> > > wrote:
> > > 
> > > > On Sun, Aug 18, 2019 at 10:47:26AM +0200, Tomas Härdin wrote:
> > > > > sön 2019-08-18 klockan 02:35 +0200 skrev Tomas Härdin:
> > > > > 
> > > > > I did some investigation, it is indeed ff_reget_buffer(). It copies the
> > > > > frame data for some reason. The fix is simple in this case: just call
> > > > > ff_get_buffer() once in cinepak_decode_init() and keep overwriting the
> > > > > same frame.
> > > > > 
> > > > > > As I said on IRC, this class of problems will exist for every codec.
> > > > > > Cinepak is easy to decode, even at these resolutions. Just imagine what
> > > > > > will happens when someone feeds in a 65535x209 av1 stream..
> > > > > 
> > > > > And related to this, ff_reget_buffer() is used for a lot of these
> > > > > codecs which only overwrite pixels in the old frame. flicvideo, gifdec,
> > > > > msrle, roqvideodec and others probably have the same flaw.
> > > > 
> > > > not calling any form of *get_buffer per frame breaks decoding into
> > > > user supplied buffers.
> > > > 
> > > > If you check the documentation of the get_buffer2 callback
> > > > 
> > > > " This callback is called at the beginning of each frame to get data
> > > > buffer(s) for it."
> > > > 
> > > > That would not be possible if its just called once in init
> 
> Sorry, I'm a bit rusty on lavc internals.
> 
> > > > and yes i too wish there was a magic fix but i think most things that
> > > > look like magic fixes have a fatal flaw. But maybe iam missing something
> > > > in fact i hope that iam missing something and that there is a magic fix
> > > > 
> > > 
> > > Magic fix is enabling reference counted frames in fuzzer.
> > 
> > That is covered by the part below which you maybe did not read
> > 
> > > > PS: if you think of changing the API, i dont think its the API.
> > > > I mean every user application will read the frames it receives, so
> > > > even if inside the decoder we just return the same frame with 2 pixels
> > > > different the user doesnt know this and has to read the whole frame.
> > > > The problem is moved around but its still there.
> 
> Copying is still slower than not copying. Enabling refcounting fixes
> the timeout issue here, and will likely silence a whole bunch of false
> positives for this class of files.

it makes probably sense to enable ref counting but we should
immedeatly in the next or a previous commit make the fuzzer read the frames
from the  decoder. Thats what basically every user app would do.
Otherwise we would have a bunch of issues closed and then reopened
later.

an alternative viewpoint to this would be to set the refcounting flag
from the input so the fuzzer itself has control over it and we test
both codepathes. This would improve coverage


> 
> It would be beneficial to have a consistent way of signalling that a
> frame didn't change, since a bunch of codecs have that property.
> Currently it's a mix of discontinuous timestamps, longer frame
> durations and repeating identical frames.

yes, i strongly agree.


> 
> > so feeding really crazy resolutions into a decoder requires some
> > small but proportional amout of input bytes.
> > double the width and the minimum input bytes double sort of.
> 
> Lavc is already very lenient with what it accepts. How do you detect
> the difference between "this packet is too small to decode to an entire
> frame" from "this packet is too small but we could still get a few MBs
> out of it"?

In reality this is actually not hard because a frame that is smaller
than the minimum valid size is generally for many codecs so small 
it really wont contain anything usefull to decode.
And we have discard_damaged_percentage where the user can tune it too.
Patches using discard_damaged_percentage are sometimes objected too
though so it is not consistently used.

Thanks

[...]
Tomas Härdin Aug. 18, 2019, 12:49 p.m. UTC | #7
sön 2019-08-18 klockan 14:18 +0200 skrev Michael Niedermayer:
> On Sun, Aug 18, 2019 at 01:40:01PM +0200, Tomas Härdin wrote:
> > sön 2019-08-18 klockan 12:19 +0200 skrev Michael Niedermayer:
> > > On Sun, Aug 18, 2019 at 12:00:45PM +0200, Paul B Mahol wrote:
> > > > On Sun, Aug 18, 2019 at 11:44 AM Michael Niedermayer <michael@niedermayer.cc>
> > > > > and yes i too wish there was a magic fix but i think most things that
> > > > > look like magic fixes have a fatal flaw. But maybe iam missing something
> > > > > in fact i hope that iam missing something and that there is a magic fix
> > > > > 
> > > > 
> > > > Magic fix is enabling reference counted frames in fuzzer.
> > > 
> > > That is covered by the part below which you maybe did not read
> > > 
> > > > > PS: if you think of changing the API, i dont think its the API.
> > > > > I mean every user application will read the frames it receives, so
> > > > > even if inside the decoder we just return the same frame with 2 pixels
> > > > > different the user doesnt know this and has to read the whole frame.
> > > > > The problem is moved around but its still there.
> > 
> > Copying is still slower than not copying. Enabling refcounting fixes
> > the timeout issue here, and will likely silence a whole bunch of false
> > positives for this class of files.
> 
> it makes probably sense to enable ref counting but we should
> immedeatly in the next or a previous commit make the fuzzer read the frames
> from the  decoder. Thats what basically every user app would do.
> Otherwise we would have a bunch of issues closed and then reopened
> later.

Why should we care how much work the user does? We're fuzzing lavc
here, not user programs. Certain use cases are decode-only (ffprobe
-show_frames for example)

> an alternative viewpoint to this would be to set the refcounting flag
> from the input so the fuzzer itself has control over it and we test
> both codepathes. This would improve coverage

Not a bad idea. But as this example shows, not refcounting has
performance implications. The fuzzer should be more lenient with
timeout in that case, imo.

> > It would be beneficial to have a consistent way of signalling that a
> > frame didn't change, since a bunch of codecs have that property.
> > Currently it's a mix of discontinuous timestamps, longer frame
> > durations and repeating identical frames.
> 
> yes, i strongly agree.
>
> > > so feeding really crazy resolutions into a decoder requires some
> > > small but proportional amout of input bytes.
> > > double the width and the minimum input bytes double sort of.
> > 
> > Lavc is already very lenient with what it accepts. How do you detect
> > the difference between "this packet is too small to decode to an entire
> > frame" from "this packet is too small but we could still get a few MBs
> > out of it"?
> 
> In reality this is actually not hard because a frame that is smaller
> than the minimum valid size is generally for many codecs so small 
> it really wont contain anything usefull to decode.
> And we have discard_damaged_percentage where the user can tune it too.
> Patches using discard_damaged_percentage are sometimes objected too
> though so it is not consistently used.

I remain skeptical of this, since it makes the parsing more complicated
and slower, and increases mental load. There's a sliding scale of
strictness here, and if I try to implement even something basic as "the
length in the header should match the length of the actual data" then
FATE breaks. We can of course change FATE to accept that changed
strictness.

I get the feeling whenver we put in checks like this it's only a matter
of time before the fuzzers find out some other way to trip up the
decoders. This is why I advocate for maximum strictness unless there's
a good reason to not be strict.

/Tomas
Michael Niedermayer Aug. 18, 2019, 5:04 p.m. UTC | #8
On Sun, Aug 18, 2019 at 02:49:39PM +0200, Tomas Härdin wrote:
> sön 2019-08-18 klockan 14:18 +0200 skrev Michael Niedermayer:
> > On Sun, Aug 18, 2019 at 01:40:01PM +0200, Tomas Härdin wrote:
> > > sön 2019-08-18 klockan 12:19 +0200 skrev Michael Niedermayer:
> > > > On Sun, Aug 18, 2019 at 12:00:45PM +0200, Paul B Mahol wrote:
> > > > > On Sun, Aug 18, 2019 at 11:44 AM Michael Niedermayer <michael@niedermayer.cc>
> > > > > > and yes i too wish there was a magic fix but i think most things that
> > > > > > look like magic fixes have a fatal flaw. But maybe iam missing something
> > > > > > in fact i hope that iam missing something and that there is a magic fix
> > > > > > 
> > > > > 
> > > > > Magic fix is enabling reference counted frames in fuzzer.
> > > > 
> > > > That is covered by the part below which you maybe did not read
> > > > 
> > > > > > PS: if you think of changing the API, i dont think its the API.
> > > > > > I mean every user application will read the frames it receives, so
> > > > > > even if inside the decoder we just return the same frame with 2 pixels
> > > > > > different the user doesnt know this and has to read the whole frame.
> > > > > > The problem is moved around but its still there.
> > > 
> > > Copying is still slower than not copying. Enabling refcounting fixes
> > > the timeout issue here, and will likely silence a whole bunch of false
> > > positives for this class of files.
> > 
> > it makes probably sense to enable ref counting but we should
> > immedeatly in the next or a previous commit make the fuzzer read the frames
> > from the  decoder. Thats what basically every user app would do.
> > Otherwise we would have a bunch of issues closed and then reopened
> > later.
> 
> Why should we care how much work the user does? We're fuzzing lavc
> here, not user programs. Certain use cases are decode-only (ffprobe
> -show_frames for example)

thats a valid point of view

The user though has few options if she gets many frames, she can just
continue processing or stop, she doesnt know how (in)valid the stream is

OTOH libavcodec knows the codec bitstream, and can check various things

so just moving the responsibility to the user app would move it where
its much harder to fix well
That is currently, we could export all kinds of metadata about the
amount of errors. Then a user app could decide what to do.
It would become duplicated code between user apps though...


> 
> > an alternative viewpoint to this would be to set the refcounting flag
> > from the input so the fuzzer itself has control over it and we test
> > both codepathes. This would improve coverage
> 
> Not a bad idea. But as this example shows, not refcounting has
> performance implications. The fuzzer should be more lenient with
> timeout in that case, imo.

this is possible, yes

Thanks

[...]
Tomas Härdin Aug. 19, 2019, 8:56 a.m. UTC | #9
sön 2019-08-18 klockan 19:04 +0200 skrev Michael Niedermayer:
> On Sun, Aug 18, 2019 at 02:49:39PM +0200, Tomas Härdin wrote:
> > sön 2019-08-18 klockan 14:18 +0200 skrev Michael Niedermayer:
> > > On Sun, Aug 18, 2019 at 01:40:01PM +0200, Tomas Härdin wrote:
> > > > sön 2019-08-18 klockan 12:19 +0200 skrev Michael Niedermayer:
> > > > > On Sun, Aug 18, 2019 at 12:00:45PM +0200, Paul B Mahol wrote:
> > > > > > On Sun, Aug 18, 2019 at 11:44 AM Michael Niedermayer <michael@niedermayer.cc>
> > > > > > > and yes i too wish there was a magic fix but i think most things that
> > > > > > > look like magic fixes have a fatal flaw. But maybe iam missing something
> > > > > > > in fact i hope that iam missing something and that there is a magic fix
> > > > > > > 
> > > > > > 
> > > > > > Magic fix is enabling reference counted frames in fuzzer.
> > > > > 
> > > > > That is covered by the part below which you maybe did not read
> > > > > 
> > > > > > > PS: if you think of changing the API, i dont think its the API.
> > > > > > > I mean every user application will read the frames it receives, so
> > > > > > > even if inside the decoder we just return the same frame with 2 pixels
> > > > > > > different the user doesnt know this and has to read the whole frame.
> > > > > > > The problem is moved around but its still there.
> > > > 
> > > > Copying is still slower than not copying. Enabling refcounting fixes
> > > > the timeout issue here, and will likely silence a whole bunch of false
> > > > positives for this class of files.
> > > 
> > > it makes probably sense to enable ref counting but we should
> > > immedeatly in the next or a previous commit make the fuzzer read the frames
> > > from the  decoder. Thats what basically every user app would do.
> > > Otherwise we would have a bunch of issues closed and then reopened
> > > later.
> > 
> > Why should we care how much work the user does? We're fuzzing lavc
> > here, not user programs. Certain use cases are decode-only (ffprobe
> > -show_frames for example)
> 
> thats a valid point of view
> 
> The user though has few options if she gets many frames, she can just
> continue processing or stop, she doesnt know how (in)valid the stream is

This is way too abstract. Who is this user, specifically? If we were
talking about ffmpeg, handbrake, peertube or something else then we
could actually help them if they're using the API incorrectly. Worrying
about hypotheticals is a waste of time. You can't test them.

> OTOH libavcodec knows the codec bitstream, and can check various things
> 
> so just moving the responsibility to the user app would move it where
> its much harder to fix well
> That is currently, we could export all kinds of metadata about the
> amount of errors. Then a user app could decide what to do.
> It would become duplicated code between user apps though...

No one outside of forensics or satellite TV is going to care about
that, and if they do they can pay for it, and we can maintain tests for
it. It's not hard for a user to add checks like "if resolution > 3840 ×
 2160 then reject the file", which makes a lot of these fuzzy "anti-
DoS" patches pointless busywork. It just causes developer fatigue.

/Tomas
diff mbox

Patch

From c85f23ca4c42f9ce27f512be897214b8689c1c94 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tomas=20H=C3=A4rdin?= <tjoppen@acc.umu.se>
Date: Sun, 18 Aug 2019 10:30:59 +0200
Subject: [PATCH] libavcodec/cinepak: Avoid copying frame data

We can just keep overwriting the same frame.
This speeds up the decoder, especially for very
large frames, since skip MBs are turned into nops.

clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_CINEPAK_fuzzer-5695832885559296
32577 ms -> 42 ms
---
 libavcodec/cinepak.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/libavcodec/cinepak.c b/libavcodec/cinepak.c
index aeb15de0ed..e3e6ecfdf0 100644
--- a/libavcodec/cinepak.c
+++ b/libavcodec/cinepak.c
@@ -424,6 +424,7 @@  static int cinepak_decode (CinepakContext *s)
 static av_cold int cinepak_decode_init(AVCodecContext *avctx)
 {
     CinepakContext *s = avctx->priv_data;
+    int ret;
 
     s->avctx = avctx;
     s->width = (avctx->width + 3) & ~3;
@@ -444,6 +445,9 @@  static av_cold int cinepak_decode_init(AVCodecContext *avctx)
     if (!s->frame)
         return AVERROR(ENOMEM);
 
+    if ((ret = ff_get_buffer(avctx, s->frame, 0)) < 0)
+        return ret;
+
     return 0;
 }
 
@@ -473,9 +477,6 @@  static int cinepak_decode_frame(AVCodecContext *avctx,
         return ret;
     }
 
-    if ((ret = ff_reget_buffer(avctx, s->frame)) < 0)
-        return ret;
-
     if (s->palette_video) {
         int size;
         const uint8_t *pal = av_packet_get_side_data(avpkt, AV_PKT_DATA_PALETTE, &size);
-- 
2.20.1