diff mbox series

[FFmpeg-devel] seek: Fix crashes in ff_seek_frame_binary if built with latest Clang 14

Message ID 20211018094056.3979756-1-martin@martin.st
State Accepted
Commit ab792634197e364ca1bb194f9abe36836e42f12d
Headers show
Series [FFmpeg-devel] seek: Fix crashes in ff_seek_frame_binary if built with latest Clang 14 | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished
andriy/make_ppc success Make finished
andriy/make_fate_ppc success Make fate finished

Commit Message

Martin Storsjö Oct. 18, 2021, 9:40 a.m. UTC
Passing an uninitialized variable as argument to a function is
undefined behaviour (UB). The compiler can assume that UB does not
happen.

Hence, the compiler can assume that the variables are never
uninitialized when passed as argument, which means that the codepaths
that initializes them must be taken.

In ff_seek_frame_binary, this means that the compiler can assume
that the codepaths that initialize pos_min and pos_max are taken,
which means that the conditions "if (sti->index_entries)" and
"if (index >= 0)" can be optimized out.

Current Clang git versions (upcoming Clang 14) enabled an optimization
that does this, which broke the current version of this function
(which intentionally left the variables uninitialized, but silencing
warnings about being uninitialized). See [1] for discussion on
the matter.

[1] https://reviews.llvm.org/D105169#3069555

Signed-off-by: Martin Storsjö <martin@martin.st>
---
 libavformat/seek.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Martin Storsjö Oct. 18, 2021, 11:44 a.m. UTC | #1
On Mon, 18 Oct 2021, Paul B Mahol wrote:

> lgtm

Thanks, pushed.

// Martin
Andreas Rheinhardt Nov. 6, 2021, 5:54 p.m. UTC | #2
Martin Storsjö:
> Passing an uninitialized variable as argument to a function is
> undefined behaviour (UB). The compiler can assume that UB does not
> happen.
> 
> Hence, the compiler can assume that the variables are never
> uninitialized when passed as argument, which means that the codepaths
> that initializes them must be taken.
> 
> In ff_seek_frame_binary, this means that the compiler can assume
> that the codepaths that initialize pos_min and pos_max are taken,
> which means that the conditions "if (sti->index_entries)" and
> "if (index >= 0)" can be optimized out.
> 
> Current Clang git versions (upcoming Clang 14) enabled an optimization
> that does this, which broke the current version of this function
> (which intentionally left the variables uninitialized, but silencing
> warnings about being uninitialized). See [1] for discussion on
> the matter.
> 
> [1] https://reviews.llvm.org/D105169#3069555
> 
> Signed-off-by: Martin Storsjö <martin@martin.st>
> ---
>  libavformat/seek.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libavformat/seek.c b/libavformat/seek.c
> index 40169736df..405ca316b3 100644
> --- a/libavformat/seek.c
> +++ b/libavformat/seek.c
> @@ -283,7 +283,7 @@ int ff_seek_frame_binary(AVFormatContext *s, int stream_index,
>                           int64_t target_ts, int flags)
>  {
>      const AVInputFormat *const avif = s->iformat;
> -    int64_t av_uninit(pos_min), av_uninit(pos_max), pos, pos_limit;
> +    int64_t pos_min = 0, pos_max = 0, pos, pos_limit;
>      int64_t ts_min, ts_max, ts;
>      int index;
>      int64_t ret;
> 

I already wanted to write that the compiler is wrong, but it seems it is
not, as C11 differs from C99 in this respect (C11 6.3.2.1 2):

"If the lvalue designates an
 object of automatic storage duration that could have been declared with
the register storage class (never had its address taken), and that
object is uninitialized (not declared
 with an initializer and no assignment to it has been performed prior to
use), the behavior
 is undefined."

For GCC and Clang av_uninit(x) is defined as x=x. And that is a problem:
In case this macro is used to declare an automatic variable that is
could be declared with the register storage class the
pseudo-initialization is UB according to the above. So I think we will
have to modify the macro to make it safe. Or just stop using it.
(How could such a hack ever end up in a public header?)

- Andreas
Martin Storsjö Nov. 6, 2021, 6:16 p.m. UTC | #3
On Sat, 6 Nov 2021, Andreas Rheinhardt wrote:

> Martin Storsjö:
>> Passing an uninitialized variable as argument to a function is
>> undefined behaviour (UB). The compiler can assume that UB does not
>> happen.
>> 
>> Hence, the compiler can assume that the variables are never
>> uninitialized when passed as argument, which means that the codepaths
>> that initializes them must be taken.
>> 
>> In ff_seek_frame_binary, this means that the compiler can assume
>> that the codepaths that initialize pos_min and pos_max are taken,
>> which means that the conditions "if (sti->index_entries)" and
>> "if (index >= 0)" can be optimized out.
>> 
>> Current Clang git versions (upcoming Clang 14) enabled an optimization
>> that does this, which broke the current version of this function
>> (which intentionally left the variables uninitialized, but silencing
>> warnings about being uninitialized). See [1] for discussion on
>> the matter.
>> 
>> [1] https://reviews.llvm.org/D105169#3069555
>> 
>> Signed-off-by: Martin Storsjö <martin@martin.st>
>> ---
>>  libavformat/seek.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/libavformat/seek.c b/libavformat/seek.c
>> index 40169736df..405ca316b3 100644
>> --- a/libavformat/seek.c
>> +++ b/libavformat/seek.c
>> @@ -283,7 +283,7 @@ int ff_seek_frame_binary(AVFormatContext *s, int stream_index,
>>                           int64_t target_ts, int flags)
>>  {
>>      const AVInputFormat *const avif = s->iformat;
>> -    int64_t av_uninit(pos_min), av_uninit(pos_max), pos, pos_limit;
>> +    int64_t pos_min = 0, pos_max = 0, pos, pos_limit;
>>      int64_t ts_min, ts_max, ts;
>>      int index;
>>      int64_t ret;
>> 
>
> I already wanted to write that the compiler is wrong, but it seems it is
> not, as C11 differs from C99 in this respect (C11 6.3.2.1 2):
>
> "If the lvalue designates an
> object of automatic storage duration that could have been declared with
> the register storage class (never had its address taken), and that
> object is uninitialized (not declared
> with an initializer and no assignment to it has been performed prior to
> use), the behavior
> is undefined."
>
> For GCC and Clang av_uninit(x) is defined as x=x. And that is a problem:
> In case this macro is used to declare an automatic variable that is
> could be declared with the register storage class the
> pseudo-initialization is UB according to the above. So I think we will
> have to modify the macro to make it safe. Or just stop using it.
> (How could such a hack ever end up in a public header?)

FWIW, most of the issue here comes from the fact that the uninitialized 
value is passed as a parameter - in that respect, av_uninit() isn't any 
better or worse than just leaving the variable plain uninitialized. (Not 
that one really should reason around where UB is ok...)

Also, I haven't tried to read the standard wrt that, but I would expect 
that passing an uninitialized value as parameter was UB even before C11?

I haven't tracked the new feature upstream that closely, the change in 
clang/llvm that regressed the old code here might have been reverted 
and/or reapplied (for other reasons) though, not sure what the current 
state is.

// Martin
Andreas Rheinhardt Nov. 6, 2021, 7:04 p.m. UTC | #4
Martin Storsjö:
> On Sat, 6 Nov 2021, Andreas Rheinhardt wrote:
> 
>> Martin Storsjö:
>>> Passing an uninitialized variable as argument to a function is
>>> undefined behaviour (UB). The compiler can assume that UB does not
>>> happen.
>>>
>>> Hence, the compiler can assume that the variables are never
>>> uninitialized when passed as argument, which means that the codepaths
>>> that initializes them must be taken.
>>>
>>> In ff_seek_frame_binary, this means that the compiler can assume
>>> that the codepaths that initialize pos_min and pos_max are taken,
>>> which means that the conditions "if (sti->index_entries)" and
>>> "if (index >= 0)" can be optimized out.
>>>
>>> Current Clang git versions (upcoming Clang 14) enabled an optimization
>>> that does this, which broke the current version of this function
>>> (which intentionally left the variables uninitialized, but silencing
>>> warnings about being uninitialized). See [1] for discussion on
>>> the matter.
>>>
>>> [1] https://reviews.llvm.org/D105169#3069555
>>>
>>> Signed-off-by: Martin Storsjö <martin@martin.st>
>>> ---
>>>  libavformat/seek.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/libavformat/seek.c b/libavformat/seek.c
>>> index 40169736df..405ca316b3 100644
>>> --- a/libavformat/seek.c
>>> +++ b/libavformat/seek.c
>>> @@ -283,7 +283,7 @@ int ff_seek_frame_binary(AVFormatContext *s, int
>>> stream_index,
>>>                           int64_t target_ts, int flags)
>>>  {
>>>      const AVInputFormat *const avif = s->iformat;
>>> -    int64_t av_uninit(pos_min), av_uninit(pos_max), pos, pos_limit;
>>> +    int64_t pos_min = 0, pos_max = 0, pos, pos_limit;
>>>      int64_t ts_min, ts_max, ts;
>>>      int index;
>>>      int64_t ret;
>>>
>>
>> I already wanted to write that the compiler is wrong, but it seems it is
>> not, as C11 differs from C99 in this respect (C11 6.3.2.1 2):
>>
>> "If the lvalue designates an
>> object of automatic storage duration that could have been declared with
>> the register storage class (never had its address taken), and that
>> object is uninitialized (not declared
>> with an initializer and no assignment to it has been performed prior to
>> use), the behavior
>> is undefined."
>>
>> For GCC and Clang av_uninit(x) is defined as x=x. And that is a problem:
>> In case this macro is used to declare an automatic variable that is
>> could be declared with the register storage class the
>> pseudo-initialization is UB according to the above. So I think we will
>> have to modify the macro to make it safe. Or just stop using it.
>> (How could such a hack ever end up in a public header?)
> 
> FWIW, most of the issue here comes from the fact that the uninitialized
> value is passed as a parameter - in that respect, av_uninit() isn't any
> better or worse than just leaving the variable plain uninitialized. (Not
> that one really should reason around where UB is ok...)
> 

It is UB even when it is not used as a function argument at all. While
there seems to be no current compiler that uses this against us, there
is no guarantee that it will stay this way. After all, this patch is
about an unpleasant surprise and there might be more of that in the future.

> Also, I haven't tried to read the standard wrt that, but I would expect
> that passing an uninitialized value as parameter was UB even before C11?
> 

The whole clause which I cited does not exist in old standards. C99's
non-normative annex J.2 contains the following item in the list of
undefined behaviour: "The value of an object with automatic storage
duration is used while it is
 indeterminate (6.2.4, 6.7.8, 6.8)." Yet I have not found a normative
statement to back this up. It is explicitly mentioned that using a trap
representation is UB. If the type in question has a trap representation,
then the compiler may of course always presume that every indeterminate
object is a trap representation; but if it is known that the type in
question does not permit trap representations (like the uintX_t types),
then I don't see why this should be UB.
But in any case it can be inferred from the annex entry that it was
always the intention of the standards committee for it to be UB. (And I
don't know whether the LLVM patch actually checked for the C version;
I'd be surprised if it did.)

> I haven't tracked the new feature upstream that closely, the change in
> clang/llvm that regressed the old code here might have been reverted
> and/or reapplied (for other reasons) though, not sure what the current
> state is.
> 

I don't even see a reason to track said patch.

- Andreas
Martin Storsjö Nov. 6, 2021, 8:04 p.m. UTC | #5
On Sat, 6 Nov 2021, Andreas Rheinhardt wrote:

> Martin Storsjö:
>> On Sat, 6 Nov 2021, Andreas Rheinhardt wrote:
>> 
>>> For GCC and Clang av_uninit(x) is defined as x=x. And that is a problem:
>>> In case this macro is used to declare an automatic variable that is
>>> could be declared with the register storage class the
>>> pseudo-initialization is UB according to the above. So I think we will
>>> have to modify the macro to make it safe. Or just stop using it.
>>> (How could such a hack ever end up in a public header?)
>> 
>> FWIW, most of the issue here comes from the fact that the uninitialized
>> value is passed as a parameter - in that respect, av_uninit() isn't any
>> better or worse than just leaving the variable plain uninitialized. (Not
>> that one really should reason around where UB is ok...)
>> 
>
> It is UB even when it is not used as a function argument at all. While
> there seems to be no current compiler that uses this against us, there
> is no guarantee that it will stay this way. After all, this patch is
> about an unpleasant surprise and there might be more of that in the future.

Yes, that's true, as leaving a variable uninitialized then initializing it 
before actual use is fine, as long as it's not used uninitialized - while 
here we have the UB already in the initialization.

However - in the case of the Clang optimization breaking our code, there's 
no difference between leaving it entirely uninitialized or using the x=x 
trick though: In both cases, the compiler concluded that as the values 
passed to the function call must be initialized at that point, the 
codepaths that initialize the values must have been taken, and thus 
optimizing out the conditions.

// Martin
Andreas Rheinhardt Nov. 6, 2021, 8:38 p.m. UTC | #6
Andreas Rheinhardt:
> Martin Storsjö:
>> On Sat, 6 Nov 2021, Andreas Rheinhardt wrote:
>>
>>> Martin Storsjö:
>>>> Passing an uninitialized variable as argument to a function is
>>>> undefined behaviour (UB). The compiler can assume that UB does not
>>>> happen.
>>>>
>>>> Hence, the compiler can assume that the variables are never
>>>> uninitialized when passed as argument, which means that the codepaths
>>>> that initializes them must be taken.
>>>>
>>>> In ff_seek_frame_binary, this means that the compiler can assume
>>>> that the codepaths that initialize pos_min and pos_max are taken,
>>>> which means that the conditions "if (sti->index_entries)" and
>>>> "if (index >= 0)" can be optimized out.
>>>>
>>>> Current Clang git versions (upcoming Clang 14) enabled an optimization
>>>> that does this, which broke the current version of this function
>>>> (which intentionally left the variables uninitialized, but silencing
>>>> warnings about being uninitialized). See [1] for discussion on
>>>> the matter.
>>>>
>>>> [1] https://reviews.llvm.org/D105169#3069555
>>>>
>>>> Signed-off-by: Martin Storsjö <martin@martin.st>
>>>> ---
>>>>  libavformat/seek.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/libavformat/seek.c b/libavformat/seek.c
>>>> index 40169736df..405ca316b3 100644
>>>> --- a/libavformat/seek.c
>>>> +++ b/libavformat/seek.c
>>>> @@ -283,7 +283,7 @@ int ff_seek_frame_binary(AVFormatContext *s, int
>>>> stream_index,
>>>>                           int64_t target_ts, int flags)
>>>>  {
>>>>      const AVInputFormat *const avif = s->iformat;
>>>> -    int64_t av_uninit(pos_min), av_uninit(pos_max), pos, pos_limit;
>>>> +    int64_t pos_min = 0, pos_max = 0, pos, pos_limit;
>>>>      int64_t ts_min, ts_max, ts;
>>>>      int index;
>>>>      int64_t ret;
>>>>
>>>
>>> I already wanted to write that the compiler is wrong, but it seems it is
>>> not, as C11 differs from C99 in this respect (C11 6.3.2.1 2):
>>>
>>> "If the lvalue designates an
>>> object of automatic storage duration that could have been declared with
>>> the register storage class (never had its address taken), and that
>>> object is uninitialized (not declared
>>> with an initializer and no assignment to it has been performed prior to
>>> use), the behavior
>>> is undefined."
>>>
>>> For GCC and Clang av_uninit(x) is defined as x=x. And that is a problem:
>>> In case this macro is used to declare an automatic variable that is
>>> could be declared with the register storage class the
>>> pseudo-initialization is UB according to the above. So I think we will
>>> have to modify the macro to make it safe. Or just stop using it.
>>> (How could such a hack ever end up in a public header?)
>>
>> FWIW, most of the issue here comes from the fact that the uninitialized
>> value is passed as a parameter - in that respect, av_uninit() isn't any
>> better or worse than just leaving the variable plain uninitialized. (Not
>> that one really should reason around where UB is ok...)
>>
> 
> It is UB even when it is not used as a function argument at all. While
> there seems to be no current compiler that uses this against us, there
> is no guarantee that it will stay this way. After all, this patch is
> about an unpleasant surprise and there might be more of that in the future.
> 
>> Also, I haven't tried to read the standard wrt that, but I would expect
>> that passing an uninitialized value as parameter was UB even before C11?
>>
> 
> The whole clause which I cited does not exist in old standards. C99's
> non-normative annex J.2 contains the following item in the list of
> undefined behaviour: "The value of an object with automatic storage
> duration is used while it is
>  indeterminate (6.2.4, 6.7.8, 6.8)." Yet I have not found a normative
> statement to back this up. It is explicitly mentioned that using a trap
> representation is UB. If the type in question has a trap representation,
> then the compiler may of course always presume that every indeterminate
> object is a trap representation; but if it is known that the type in
> question does not permit trap representations (like the uintX_t types),
> then I don't see why this should be UB.

Addition: Using indeterminate values was definitely UB in C90 by the
very definition of UB:
"undefined behavior: Behavior, upon use of a non-portable or erroneous
program construct, of erroneous data, or of indeterminately valued
objects, for which the standard imposes no requirements".
C99 and C11 define UB as "behavior, upon use of a nonportable or
erroneous program construct or of erroneous data,
 for which this International Standard imposes no requirements".
So it was always intended that using indeterminate values is UB. This
seems to be just an oversight on part of the authors of C99.

- Andreas
diff mbox series

Patch

diff --git a/libavformat/seek.c b/libavformat/seek.c
index 40169736df..405ca316b3 100644
--- a/libavformat/seek.c
+++ b/libavformat/seek.c
@@ -283,7 +283,7 @@  int ff_seek_frame_binary(AVFormatContext *s, int stream_index,
                          int64_t target_ts, int flags)
 {
     const AVInputFormat *const avif = s->iformat;
-    int64_t av_uninit(pos_min), av_uninit(pos_max), pos, pos_limit;
+    int64_t pos_min = 0, pos_max = 0, pos, pos_limit;
     int64_t ts_min, ts_max, ts;
     int index;
     int64_t ret;