[FFmpeg-devel,2/2] avcodec/fitsdec: Prevent division by 0 with huge data_max

Submitted by Michael Niedermayer on July 15, 2019, 10:50 p.m.

Details

Message ID 20190715225039.9689-2-michael@niedermayer.cc
State New
Headers show

Commit Message

Michael Niedermayer July 15, 2019, 10:50 p.m.
Fixes: division by 0
Fixes: 15657/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FITS_fuzzer-5738154838982656

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/fitsdec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Reimar Döffinger July 16, 2019, 6:34 a.m.
On 16.07.2019, at 00:50, Michael Niedermayer <michael@niedermayer.cc> wrote:

> Fixes: division by 0
> Fixes: 15657/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FITS_fuzzer-5738154838982656
> 
> Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
> libavcodec/fitsdec.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libavcodec/fitsdec.c b/libavcodec/fitsdec.c
> index 4f452422ef..fe941f199d 100644
> --- a/libavcodec/fitsdec.c
> +++ b/libavcodec/fitsdec.c
> @@ -174,7 +174,7 @@ static int fits_read_header(AVCodecContext *avctx, const uint8_t **ptr, FITSHead
>             return AVERROR_INVALIDDATA;
>         }
>         av_log(avctx, AV_LOG_WARNING, "data min/max indicates a blank image\n");
> -        header->data_max ++;
> +        header->data_max += fabs(header->data_max) / 10000000 + 1;

This is really non-obvious, both by itself, in where/how it causes the division by 0 and that the error here isn't worse than the division by 0 for example, and why this constant was chosen.
Also why a division and not a multiply by the inverse?
Why not * (1.0f / (1 << 24)) for example, which for single-precision IEEE I think should result in exactly 1 ULP (well, possibly 2 with rounding) increments?
Why is this even using floating-point? And why not double-precision at least?
Michael Niedermayer July 16, 2019, 6:31 p.m.
On Tue, Jul 16, 2019 at 08:34:14AM +0200, Reimar Döffinger wrote:
> On 16.07.2019, at 00:50, Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > Fixes: division by 0
> > Fixes: 15657/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FITS_fuzzer-5738154838982656
> > 
> > Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > ---
> > libavcodec/fitsdec.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/libavcodec/fitsdec.c b/libavcodec/fitsdec.c
> > index 4f452422ef..fe941f199d 100644
> > --- a/libavcodec/fitsdec.c
> > +++ b/libavcodec/fitsdec.c
> > @@ -174,7 +174,7 @@ static int fits_read_header(AVCodecContext *avctx, const uint8_t **ptr, FITSHead
> >             return AVERROR_INVALIDDATA;
> >         }
> >         av_log(avctx, AV_LOG_WARNING, "data min/max indicates a blank image\n");
> > -        header->data_max ++;
> > +        header->data_max += fabs(header->data_max) / 10000000 + 1;
> 
> This is really non-obvious, both by itself, in where/how it causes the division by 0 and that the error here isn't worse than the division by 0 for example, and why this constant was chosen.

division by 0 occured in:
*dst++ = ((t - header.data_min) * ((1 << (sizeof(type) * 8)) - 1)) / (header.data_max - header.data_min); \


> Also why a division and not a multiply by the inverse?

no reason, this could be changed


> Why not * (1.0f / (1 << 24)) for example, which for single-precision IEEE I think should result in exactly 1 ULP (well, possibly 2 with rounding) increments?

the division by 0 could occur with files which contain only one color. Or
otherwise corrupted / odd files.
what the code tries to do is to do a reasonable small change away from division
by 0. 
The way the whole implementation of fits is done is to scale the input range
to the output range. If the input is 0 as it can be on a constant color input
this hits a singularity. As this is basically a 0/0 anything could be output
and would be equally wrong so the exact value like / 10000000 or other dont
really matter.
For these values to matter, first how the decoder interprets data would have
to be changed:
" Also to interpret data, values are linearly scaled using min-max scaling but not RGB images."



> Why is this even using floating-point? And why not double-precision at least?


[...]
Reimar Döffinger July 18, 2019, 6:21 a.m.
On 16.07.2019, at 20:31, Michael Niedermayer <michael@niedermayer.cc> wrote:

> On Tue, Jul 16, 2019 at 08:34:14AM +0200, Reimar Döffinger wrote:
>> On 16.07.2019, at 00:50, Michael Niedermayer <michael@niedermayer.cc> wrote:
>> 
>>> Fixes: division by 0
>>> Fixes: 15657/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FITS_fuzzer-5738154838982656
>>> 
>>> Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
>>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
>>> ---
>>> libavcodec/fitsdec.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/libavcodec/fitsdec.c b/libavcodec/fitsdec.c
>>> index 4f452422ef..fe941f199d 100644
>>> --- a/libavcodec/fitsdec.c
>>> +++ b/libavcodec/fitsdec.c
>>> @@ -174,7 +174,7 @@ static int fits_read_header(AVCodecContext *avctx, const uint8_t **ptr, FITSHead
>>>            return AVERROR_INVALIDDATA;
>>>        }
>>>        av_log(avctx, AV_LOG_WARNING, "data min/max indicates a blank image\n");
>>> -        header->data_max ++;
>>> +        header->data_max += fabs(header->data_max) / 10000000 + 1;
>> 
>> This is really non-obvious, both by itself, in where/how it causes the division by 0 and that the error here isn't worse than the division by 0 for example, and why this constant was chosen.
> 
> division by 0 occured in:
> *dst++ = ((t - header.data_min) * ((1 << (sizeof(type) * 8)) - 1)) / (header.data_max - header.data_min); \

I looked at the code, and now it makes even less sense to me.
First, why is that reported as an error at all?
Dividing by 0 is well defined for floating-point.
Next, your patch handles only one corner-case while not handling others.
For example, data_min and data_max can also be inf or NaN, which equally make no sense (and result in a division by NaN, which can hardly be better than a division by 0).
Next, bzero is applied to data_min and data_max which can cause precision issues, so it's a bit questionable but maybe non-trivial to fix completely.
However as data_max is never used but only the delta, most of these issues can be fixed much more thoroughly (and improve performance) by calculating and storing only that delta, and before applying bzero. Then delta can simply be overridden to 1 if it is fishy (not a normal or 0).
Of course there is a question if values above data_max are supposed to result in output > 1 or be clamped, but I guess that issue can be ignored.
As the code pays no particular attention to precision issue it would also be a question if calculating the inverse and use multiplications instead of divisions would make sense, but that admittedly is just an optimization.
Michael Niedermayer July 18, 2019, 10:54 a.m.
On Thu, Jul 18, 2019 at 08:21:21AM +0200, Reimar Döffinger wrote:
> 
> 
> On 16.07.2019, at 20:31, Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > On Tue, Jul 16, 2019 at 08:34:14AM +0200, Reimar Döffinger wrote:
> >> On 16.07.2019, at 00:50, Michael Niedermayer <michael@niedermayer.cc> wrote:
> >> 
> >>> Fixes: division by 0
> >>> Fixes: 15657/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FITS_fuzzer-5738154838982656
> >>> 
> >>> Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> >>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> >>> ---
> >>> libavcodec/fitsdec.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>> 
> >>> diff --git a/libavcodec/fitsdec.c b/libavcodec/fitsdec.c
> >>> index 4f452422ef..fe941f199d 100644
> >>> --- a/libavcodec/fitsdec.c
> >>> +++ b/libavcodec/fitsdec.c
> >>> @@ -174,7 +174,7 @@ static int fits_read_header(AVCodecContext *avctx, const uint8_t **ptr, FITSHead
> >>>            return AVERROR_INVALIDDATA;
> >>>        }
> >>>        av_log(avctx, AV_LOG_WARNING, "data min/max indicates a blank image\n");
> >>> -        header->data_max ++;
> >>> +        header->data_max += fabs(header->data_max) / 10000000 + 1;
> >> 
> >> This is really non-obvious, both by itself, in where/how it causes the division by 0 and that the error here isn't worse than the division by 0 for example, and why this constant was chosen.
> > 
> > division by 0 occured in:
> > *dst++ = ((t - header.data_min) * ((1 << (sizeof(type) * 8)) - 1)) / (header.data_max - header.data_min); \
> 
> I looked at the code, and now it makes even less sense to me.
> First, why is that reported as an error at all?
> Dividing by 0 is well defined for floating-point.

at least at the point where its stored in an integer it becomes painfull
to the compiler to find a way to do it.


> Next, your patch handles only one corner-case while not handling others.
> For example, data_min and data_max can also be inf or NaN, which equally make no sense (and result in a division by NaN, which can hardly be better than a division by 0).
> Next, bzero is applied to data_min and data_max which can cause precision issues, so it's a bit questionable but maybe non-trivial to fix completely.
> However as data_max is never used but only the delta, most of these issues can be fixed much more thoroughly (and improve performance) by calculating and storing only that delta, and before applying bzero. Then delta can simply be overridden to 1 if it is fishy (not a normal or 0).
> Of course there is a question if values above data_max are supposed to result in output > 1 or be clamped, but I guess that issue can be ignored.
> As the code pays no particular attention to precision issue it would also be a question if calculating the inverse and use multiplications instead of divisions would make sense, but that admittedly is just an optimization.

Iam not sure if inf is a problem (from a very quick look) that would get
divided to 0 i guess nan would be an issue, i didnt think of this, i will 
redo this and post a better patch

Thnaks

[...]
Reimar Döffinger July 18, 2019, 9:39 p.m.
On 18.07.2019, at 12:54, Michael Niedermayer <michael@niedermayer.cc> wrote:

> On Thu, Jul 18, 2019 at 08:21:21AM +0200, Reimar Döffinger wrote:
>> 
>> 
>> On 16.07.2019, at 20:31, Michael Niedermayer <michael@niedermayer.cc> wrote:
>> 
>>> On Tue, Jul 16, 2019 at 08:34:14AM +0200, Reimar Döffinger wrote:
>>>> On 16.07.2019, at 00:50, Michael Niedermayer <michael@niedermayer.cc> wrote:
>>>> 
>>>>> Fixes: division by 0
>>>>> Fixes: 15657/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FITS_fuzzer-5738154838982656
>>>>> 
>>>>> Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
>>>>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
>>>>> ---
>>>>> libavcodec/fitsdec.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>> 
>>>>> diff --git a/libavcodec/fitsdec.c b/libavcodec/fitsdec.c
>>>>> index 4f452422ef..fe941f199d 100644
>>>>> --- a/libavcodec/fitsdec.c
>>>>> +++ b/libavcodec/fitsdec.c
>>>>> @@ -174,7 +174,7 @@ static int fits_read_header(AVCodecContext *avctx, const uint8_t **ptr, FITSHead
>>>>>           return AVERROR_INVALIDDATA;
>>>>>       }
>>>>>       av_log(avctx, AV_LOG_WARNING, "data min/max indicates a blank image\n");
>>>>> -        header->data_max ++;
>>>>> +        header->data_max += fabs(header->data_max) / 10000000 + 1;
>>>> 
>>>> This is really non-obvious, both by itself, in where/how it causes the division by 0 and that the error here isn't worse than the division by 0 for example, and why this constant was chosen.
>>> 
>>> division by 0 occured in:
>>> *dst++ = ((t - header.data_min) * ((1 << (sizeof(type) * 8)) - 1)) / (header.data_max - header.data_min); \
>> 
>> I looked at the code, and now it makes even less sense to me.
>> First, why is that reported as an error at all?
>> Dividing by 0 is well defined for floating-point.
> 
> at least at the point where its stored in an integer it becomes painfull
> to the compiler to find a way to do it.

Hm, I am not quite following. The division results in inf or nan, and those get converted to integer the usual way (not sure how well that is defined in C, but it's not a division by 0 error).

> Iam not sure if inf is a problem (from a very quick look) that would get
> divided to 0 i guess nan would be an issue, i didnt think of this, i will 
> redo this and post a better patch

inf / inf = nan
From my point of view if 0.0 / 0.0 is considered an issue that seems like it should apply for inf as well.
When it comes to actually correct code, there might also be the question what to do in that case.
Could be considered "bad data/normalization range, just skip the whole scaling".
Or could define it like OpenGL (and other) normalization operations:
-inf becomes -1, inf becomes 1 all else 0. But that would need a special code case, and I guess it's not worth it.

Patch hide | download patch | download mbox

diff --git a/libavcodec/fitsdec.c b/libavcodec/fitsdec.c
index 4f452422ef..fe941f199d 100644
--- a/libavcodec/fitsdec.c
+++ b/libavcodec/fitsdec.c
@@ -174,7 +174,7 @@  static int fits_read_header(AVCodecContext *avctx, const uint8_t **ptr, FITSHead
             return AVERROR_INVALIDDATA;
         }
         av_log(avctx, AV_LOG_WARNING, "data min/max indicates a blank image\n");
-        header->data_max ++;
+        header->data_max += fabs(header->data_max) / 10000000 + 1;
     }
 
     return 0;