diff mbox series

[FFmpeg-devel] avcodec/golomb: Prevent shift by negative number

Message ID 20200710134803.4347-1-andreas.rheinhardt@gmail.com
State Superseded
Headers show
Series [FFmpeg-devel] avcodec/golomb: Prevent shift by negative number | expand

Checks

Context Check Description
andriy/default pending
andriy/make success Make finished
andriy/make_fate success Make fate finished

Commit Message

Andreas Rheinhardt July 10, 2020, 1:48 p.m. UTC
This happened in get_ue_golomb() if the cached bitstream reader was in
use, because there was no check to handle the case of the read value
not being in the range 0..8190.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
---
 libavcodec/golomb.h | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Lynne July 10, 2020, 2:48 p.m. UTC | #1
Jul 10, 2020, 14:48 by andreas.rheinhardt@gmail.com:

> This happened in get_ue_golomb() if the cached bitstream reader was in
> use, because there was no check to handle the case of the read value
> not being in the range 0..8190.
>
> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
> ---
>  libavcodec/golomb.h | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/libavcodec/golomb.h b/libavcodec/golomb.h
> index 7fd46a91bd..5bfcfe085f 100644
> --- a/libavcodec/golomb.h
> +++ b/libavcodec/golomb.h
> @@ -66,6 +66,10 @@ static inline int get_ue_golomb(GetBitContext *gb)
>  return ff_ue_golomb_vlc_code[buf];
>  } else {
>  int log = 2 * av_log2(buf) - 31;
> +        if (log < 0) {
> +            av_log(NULL, AV_LOG_ERROR, "Invalid UE golomb code\n");
> +            return AVERROR_INVALIDDATA;
> +        }
>  buf >>= log;
>  buf--;
>  skip_bits_long(gb, 32 - log);
>

That's in an extremely hot path. Any alternatives?
Derek Buitenhuis July 10, 2020, 3:03 p.m. UTC | #2
On 10/07/2020 15:48, Lynne wrote:
> That's in an extremely hot path. Any alternatives?

Anothe thing of note is that basically nothing in the entire
codebase checks the return value of this function. Possibly
literally nothing.

- Derek
Andreas Rheinhardt July 10, 2020, 3:12 p.m. UTC | #3
Lynne:
> Jul 10, 2020, 14:48 by andreas.rheinhardt@gmail.com:
> 
>> This happened in get_ue_golomb() if the cached bitstream reader was in
>> use, because there was no check to handle the case of the read value
>> not being in the range 0..8190.
>>
>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
>> ---
>>  libavcodec/golomb.h | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/libavcodec/golomb.h b/libavcodec/golomb.h
>> index 7fd46a91bd..5bfcfe085f 100644
>> --- a/libavcodec/golomb.h
>> +++ b/libavcodec/golomb.h
>> @@ -66,6 +66,10 @@ static inline int get_ue_golomb(GetBitContext *gb)
>>  return ff_ue_golomb_vlc_code[buf];
>>  } else {
>>  int log = 2 * av_log2(buf) - 31;
>> +        if (log < 0) {
>> +            av_log(NULL, AV_LOG_ERROR, "Invalid UE golomb code\n");
>> +            return AVERROR_INVALIDDATA;
>> +        }
>>  buf >>= log;
>>  buf--;
>>  skip_bits_long(gb, 32 - log);
>>
> 
> That's in an extremely hot path. Any alternatives?

If I knew a better solution, I'd have implemented it; I would also have
done the same to the codepath for the noncached bitstream reader (which
is even hotter). Notice that I actually only copied the check from the
noncached codepath (where one even has to check for log < 7 because one
might only have 25 valid bits).

There is one thing that is improvable though: The av_log(). In
situations where the return value is checked and a dedicated error
message emitted, it is unnecessary to have an av_log() here (in
particular one that logs to NULL).

- Andreas
Tomas Härdin July 13, 2020, 7:04 p.m. UTC | #4
fre 2020-07-10 klockan 15:48 +0200 skrev Andreas Rheinhardt:
> This happened in get_ue_golomb() if the cached bitstream reader was
> in
> use, because there was no check to handle the case of the read value
> not being in the range 0..8190.
> 
> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
> ---
>  libavcodec/golomb.h | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/libavcodec/golomb.h b/libavcodec/golomb.h
> index 7fd46a91bd..5bfcfe085f 100644
> --- a/libavcodec/golomb.h
> +++ b/libavcodec/golomb.h
> @@ -66,6 +66,10 @@ static inline int get_ue_golomb(GetBitContext *gb)
>          return ff_ue_golomb_vlc_code[buf];
>      } else {
>          int log = 2 * av_log2(buf) - 31;
> +        if (log < 0) {
> +            av_log(NULL, AV_LOG_ERROR, "Invalid UE golomb code\n");
> +            return AVERROR_INVALIDDATA;
> +        }

How come invalid codes can even be present like this? Seems
inefficient. Or is the decoder wrong? Maybe Michael wants to chime in
here, since he wrote the original implementation.

Reading a bit about Exp-Golomb it seems counting trailing zeroes would 
be the way to go. There's even a builtin for it, __builtin_ctz().
Alternatively a table-driven approach could be used.

/Tomas
Andreas Rheinhardt July 13, 2020, 7:19 p.m. UTC | #5
Tomas Härdin:
> fre 2020-07-10 klockan 15:48 +0200 skrev Andreas Rheinhardt:
>> This happened in get_ue_golomb() if the cached bitstream reader was
>> in
>> use, because there was no check to handle the case of the read value
>> not being in the range 0..8190.
>>
>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
>> ---
>>  libavcodec/golomb.h | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/libavcodec/golomb.h b/libavcodec/golomb.h
>> index 7fd46a91bd..5bfcfe085f 100644
>> --- a/libavcodec/golomb.h
>> +++ b/libavcodec/golomb.h
>> @@ -66,6 +66,10 @@ static inline int get_ue_golomb(GetBitContext *gb)
>>          return ff_ue_golomb_vlc_code[buf];
>>      } else {
>>          int log = 2 * av_log2(buf) - 31;
>> +        if (log < 0) {
>> +            av_log(NULL, AV_LOG_ERROR, "Invalid UE golomb code\n");
>> +            return AVERROR_INVALIDDATA;
>> +        }
> 
> How come invalid codes can even be present like this? Seems
> inefficient. Or is the decoder wrong? Maybe Michael wants to chime in
> here, since he wrote the original implementation.
> 
This function is only supposed to be used when the expected value of the
exp-golomb code is in the range 0..8190 (or other words: if there are
not more than 12 leading zeroes). And just because a specification says
that a value must be smaller than that does not mean that it is actually
fulfilled in real files.

(Btw: The cached bitstream reader version of this function can actually
handle 16 leading zeroes (and this patch only errors out if there are
more); the limit of 12 is imposed because the noncached bitstream reader
can't handle more.)

> Reading a bit about Exp-Golomb it seems counting trailing zeroes would 
> be the way to go. There's even a builtin for it, __builtin_ctz().
> Alternatively a table-driven approach could be used.
> 
Here is the code in question for the cached bitstream reader:

    buf = show_bits_long(gb, 32);

    if (buf >= (1 << 27)) {
        buf >>= 32 - 9;
        skip_bits_long(gb, ff_golomb_vlc_len[buf]);

        return ff_ue_golomb_vlc_code[buf];
    } else {
        int log = 2 * av_log2(buf) - 31;
        buf >>= log;
        buf--;
        skip_bits_long(gb, 32 - log);

        return buf;
    }

So you see we are already using a table-driven approach for small values
(namely for those values where the number of leading zeroes is <= 4) and
if we are outside of that range, an approach using av_log2 (which for
supported systems is implemented using __builtin_clz() (not trailing
zeroes!)) is used.

- Andreas
Michael Niedermayer July 13, 2020, 8:07 p.m. UTC | #6
On Mon, Jul 13, 2020 at 09:04:30PM +0200, Tomas Härdin wrote:
> fre 2020-07-10 klockan 15:48 +0200 skrev Andreas Rheinhardt:
> > This happened in get_ue_golomb() if the cached bitstream reader was
> > in
> > use, because there was no check to handle the case of the read value
> > not being in the range 0..8190.
> > 
> > Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
> > ---
> >  libavcodec/golomb.h | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/libavcodec/golomb.h b/libavcodec/golomb.h
> > index 7fd46a91bd..5bfcfe085f 100644
> > --- a/libavcodec/golomb.h
> > +++ b/libavcodec/golomb.h
> > @@ -66,6 +66,10 @@ static inline int get_ue_golomb(GetBitContext *gb)
> >          return ff_ue_golomb_vlc_code[buf];
> >      } else {
> >          int log = 2 * av_log2(buf) - 31;
> > +        if (log < 0) {
> > +            av_log(NULL, AV_LOG_ERROR, "Invalid UE golomb code\n");
> > +            return AVERROR_INVALIDDATA;
> > +        }
> 
> How come invalid codes can even be present like this? Seems
> inefficient. Or is the decoder wrong? Maybe Michael wants to chime in
> here, since he wrote the original implementation.

The codepath of the non cached one does a check too. It should be
done consistently.
If the check causes a meassurable speedloss, then it can be implemented
without a check. For example by using asm for the shift operation, or maybe
compilers have some nicer function doing a never undefined shift. If such
function does not exist, its something that maybe someone should add to
gcc & clang because other projects could benefit from it too in similar
situations.
Of course all such improvments make the code more complex as a fallback
with check is needed if none of the non standard "safe" functions exist.
So it only makes sense to go that way if this actually affects speed.
It is after all not on the main codepath which is handled with a table

thx

[...]
Andreas Rheinhardt July 14, 2020, 4:10 p.m. UTC | #7
Michael Niedermayer:
> On Mon, Jul 13, 2020 at 09:04:30PM +0200, Tomas Härdin wrote:
>> fre 2020-07-10 klockan 15:48 +0200 skrev Andreas Rheinhardt:
>>> This happened in get_ue_golomb() if the cached bitstream reader was
>>> in
>>> use, because there was no check to handle the case of the read value
>>> not being in the range 0..8190.
>>>
>>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
>>> ---
>>>  libavcodec/golomb.h | 4 ++++
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/libavcodec/golomb.h b/libavcodec/golomb.h
>>> index 7fd46a91bd..5bfcfe085f 100644
>>> --- a/libavcodec/golomb.h
>>> +++ b/libavcodec/golomb.h
>>> @@ -66,6 +66,10 @@ static inline int get_ue_golomb(GetBitContext *gb)
>>>          return ff_ue_golomb_vlc_code[buf];
>>>      } else {
>>>          int log = 2 * av_log2(buf) - 31;
>>> +        if (log < 0) {
>>> +            av_log(NULL, AV_LOG_ERROR, "Invalid UE golomb code\n");
>>> +            return AVERROR_INVALIDDATA;
>>> +        }
>>
>> How come invalid codes can even be present like this? Seems
>> inefficient. Or is the decoder wrong? Maybe Michael wants to chime in
>> here, since he wrote the original implementation.
> 
> The codepath of the non cached one does a check too. It should be
> done consistently.

Does this imply that you want me to error out for log < 7 although the
cached bitstream reader is able to read the number even when log is in
the range of 0..6? (The reason for this is that one only gets 25 valid
bits in the noncached version.)

> If the check causes a meassurable speedloss, then it can be implemented
> without a check. For example by using asm for the shift operation, or maybe
> compilers have some nicer function doing a never undefined shift. If such
> function does not exist, its something that maybe someone should add to
> gcc & clang because other projects could benefit from it too in similar
> situations.

22e960ad478e568f4094971a58c6ad8f549c0180 (which enabled the check and
added the av_log() in the noncached branch) claims no measurable speedloss.
I have not found any such compiler builtin function for this purpose;
neither GCC nor Clang have one.
Notice that if we had such an always defined shift, we could use it to
make put_bits() write 0-32 bits at once if this always defined shift had
the property that 0U << 32 equals 0.

- Andreas
Michael Niedermayer July 14, 2020, 8:27 p.m. UTC | #8
On Tue, Jul 14, 2020 at 06:10:39PM +0200, Andreas Rheinhardt wrote:
> Michael Niedermayer:
> > On Mon, Jul 13, 2020 at 09:04:30PM +0200, Tomas Härdin wrote:
> >> fre 2020-07-10 klockan 15:48 +0200 skrev Andreas Rheinhardt:
> >>> This happened in get_ue_golomb() if the cached bitstream reader was
> >>> in
> >>> use, because there was no check to handle the case of the read value
> >>> not being in the range 0..8190.
> >>>
> >>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
> >>> ---
> >>>  libavcodec/golomb.h | 4 ++++
> >>>  1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/libavcodec/golomb.h b/libavcodec/golomb.h
> >>> index 7fd46a91bd..5bfcfe085f 100644
> >>> --- a/libavcodec/golomb.h
> >>> +++ b/libavcodec/golomb.h
> >>> @@ -66,6 +66,10 @@ static inline int get_ue_golomb(GetBitContext *gb)
> >>>          return ff_ue_golomb_vlc_code[buf];
> >>>      } else {
> >>>          int log = 2 * av_log2(buf) - 31;
> >>> +        if (log < 0) {
> >>> +            av_log(NULL, AV_LOG_ERROR, "Invalid UE golomb code\n");
> >>> +            return AVERROR_INVALIDDATA;
> >>> +        }
> >>
> >> How come invalid codes can even be present like this? Seems
> >> inefficient. Or is the decoder wrong? Maybe Michael wants to chime in
> >> here, since he wrote the original implementation.
> > 
> > The codepath of the non cached one does a check too. It should be
> > done consistently.
> 
> Does this imply that you want me to error out for log < 7 although the
> cached bitstream reader is able to read the number even when log is in
> the range of 0..6? (The reason for this is that one only gets 25 valid
> bits in the noncached version.)

is it intended to be valid for this function to be used for longer codes ?
if not it should fail because otherwise a bug in the code would be missed
also it would result in fewer differences and easier testing as both
would behave the same

thx

[...]
Andreas Rheinhardt July 14, 2020, 8:31 p.m. UTC | #9
Michael Niedermayer:
> On Tue, Jul 14, 2020 at 06:10:39PM +0200, Andreas Rheinhardt wrote:
>> Michael Niedermayer:
>>> On Mon, Jul 13, 2020 at 09:04:30PM +0200, Tomas Härdin wrote:
>>>> fre 2020-07-10 klockan 15:48 +0200 skrev Andreas Rheinhardt:
>>>>> This happened in get_ue_golomb() if the cached bitstream reader was
>>>>> in
>>>>> use, because there was no check to handle the case of the read value
>>>>> not being in the range 0..8190.
>>>>>
>>>>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
>>>>> ---
>>>>>  libavcodec/golomb.h | 4 ++++
>>>>>  1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/libavcodec/golomb.h b/libavcodec/golomb.h
>>>>> index 7fd46a91bd..5bfcfe085f 100644
>>>>> --- a/libavcodec/golomb.h
>>>>> +++ b/libavcodec/golomb.h
>>>>> @@ -66,6 +66,10 @@ static inline int get_ue_golomb(GetBitContext *gb)
>>>>>          return ff_ue_golomb_vlc_code[buf];
>>>>>      } else {
>>>>>          int log = 2 * av_log2(buf) - 31;
>>>>> +        if (log < 0) {
>>>>> +            av_log(NULL, AV_LOG_ERROR, "Invalid UE golomb code\n");
>>>>> +            return AVERROR_INVALIDDATA;
>>>>> +        }
>>>>
>>>> How come invalid codes can even be present like this? Seems
>>>> inefficient. Or is the decoder wrong? Maybe Michael wants to chime in
>>>> here, since he wrote the original implementation.
>>>
>>> The codepath of the non cached one does a check too. It should be
>>> done consistently.
>>
>> Does this imply that you want me to error out for log < 7 although the
>> cached bitstream reader is able to read the number even when log is in
>> the range of 0..6? (The reason for this is that one only gets 25 valid
>> bits in the noncached version.)
> 
> is it intended to be valid for this function to be used for longer codes ?
> if not it should fail because otherwise a bug in the code would be missed
> also it would result in fewer differences and easier testing as both
> would behave the same
> 
Ok, will change locally.

- Andreas
diff mbox series

Patch

diff --git a/libavcodec/golomb.h b/libavcodec/golomb.h
index 7fd46a91bd..5bfcfe085f 100644
--- a/libavcodec/golomb.h
+++ b/libavcodec/golomb.h
@@ -66,6 +66,10 @@  static inline int get_ue_golomb(GetBitContext *gb)
         return ff_ue_golomb_vlc_code[buf];
     } else {
         int log = 2 * av_log2(buf) - 31;
+        if (log < 0) {
+            av_log(NULL, AV_LOG_ERROR, "Invalid UE golomb code\n");
+            return AVERROR_INVALIDDATA;
+        }
         buf >>= log;
         buf--;
         skip_bits_long(gb, 32 - log);