Message ID | 20211018094056.3979756-1-martin@martin.st |
---|---|
State | Accepted |
Commit | ab792634197e364ca1bb194f9abe36836e42f12d |
Headers | show |
Series | [FFmpeg-devel] seek: Fix crashes in ff_seek_frame_binary if built with latest Clang 14 | expand |
Context | Check | Description |
---|---|---|
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
andriy/make_ppc | success | Make finished |
andriy/make_fate_ppc | success | Make fate finished |
On Mon, 18 Oct 2021, Paul B Mahol wrote:
> lgtm
Thanks, pushed.
// Martin
Martin Storsjö: > Passing an uninitialized variable as argument to a function is > undefined behaviour (UB). The compiler can assume that UB does not > happen. > > Hence, the compiler can assume that the variables are never > uninitialized when passed as argument, which means that the codepaths > that initializes them must be taken. > > In ff_seek_frame_binary, this means that the compiler can assume > that the codepaths that initialize pos_min and pos_max are taken, > which means that the conditions "if (sti->index_entries)" and > "if (index >= 0)" can be optimized out. > > Current Clang git versions (upcoming Clang 14) enabled an optimization > that does this, which broke the current version of this function > (which intentionally left the variables uninitialized, but silencing > warnings about being uninitialized). See [1] for discussion on > the matter. > > [1] https://reviews.llvm.org/D105169#3069555 > > Signed-off-by: Martin Storsjö <martin@martin.st> > --- > libavformat/seek.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/libavformat/seek.c b/libavformat/seek.c > index 40169736df..405ca316b3 100644 > --- a/libavformat/seek.c > +++ b/libavformat/seek.c > @@ -283,7 +283,7 @@ int ff_seek_frame_binary(AVFormatContext *s, int stream_index, > int64_t target_ts, int flags) > { > const AVInputFormat *const avif = s->iformat; > - int64_t av_uninit(pos_min), av_uninit(pos_max), pos, pos_limit; > + int64_t pos_min = 0, pos_max = 0, pos, pos_limit; > int64_t ts_min, ts_max, ts; > int index; > int64_t ret; > I already wanted to write that the compiler is wrong, but it seems it is not, as C11 differs from C99 in this respect (C11 6.3.2.1 2): "If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined." For GCC and Clang av_uninit(x) is defined as x=x. And that is a problem: In case this macro is used to declare an automatic variable that is could be declared with the register storage class the pseudo-initialization is UB according to the above. So I think we will have to modify the macro to make it safe. Or just stop using it. (How could such a hack ever end up in a public header?) - Andreas
On Sat, 6 Nov 2021, Andreas Rheinhardt wrote: > Martin Storsjö: >> Passing an uninitialized variable as argument to a function is >> undefined behaviour (UB). The compiler can assume that UB does not >> happen. >> >> Hence, the compiler can assume that the variables are never >> uninitialized when passed as argument, which means that the codepaths >> that initializes them must be taken. >> >> In ff_seek_frame_binary, this means that the compiler can assume >> that the codepaths that initialize pos_min and pos_max are taken, >> which means that the conditions "if (sti->index_entries)" and >> "if (index >= 0)" can be optimized out. >> >> Current Clang git versions (upcoming Clang 14) enabled an optimization >> that does this, which broke the current version of this function >> (which intentionally left the variables uninitialized, but silencing >> warnings about being uninitialized). See [1] for discussion on >> the matter. >> >> [1] https://reviews.llvm.org/D105169#3069555 >> >> Signed-off-by: Martin Storsjö <martin@martin.st> >> --- >> libavformat/seek.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/libavformat/seek.c b/libavformat/seek.c >> index 40169736df..405ca316b3 100644 >> --- a/libavformat/seek.c >> +++ b/libavformat/seek.c >> @@ -283,7 +283,7 @@ int ff_seek_frame_binary(AVFormatContext *s, int stream_index, >> int64_t target_ts, int flags) >> { >> const AVInputFormat *const avif = s->iformat; >> - int64_t av_uninit(pos_min), av_uninit(pos_max), pos, pos_limit; >> + int64_t pos_min = 0, pos_max = 0, pos, pos_limit; >> int64_t ts_min, ts_max, ts; >> int index; >> int64_t ret; >> > > I already wanted to write that the compiler is wrong, but it seems it is > not, as C11 differs from C99 in this respect (C11 6.3.2.1 2): > > "If the lvalue designates an > object of automatic storage duration that could have been declared with > the register storage class (never had its address taken), and that > object is uninitialized (not declared > with an initializer and no assignment to it has been performed prior to > use), the behavior > is undefined." > > For GCC and Clang av_uninit(x) is defined as x=x. And that is a problem: > In case this macro is used to declare an automatic variable that is > could be declared with the register storage class the > pseudo-initialization is UB according to the above. So I think we will > have to modify the macro to make it safe. Or just stop using it. > (How could such a hack ever end up in a public header?) FWIW, most of the issue here comes from the fact that the uninitialized value is passed as a parameter - in that respect, av_uninit() isn't any better or worse than just leaving the variable plain uninitialized. (Not that one really should reason around where UB is ok...) Also, I haven't tried to read the standard wrt that, but I would expect that passing an uninitialized value as parameter was UB even before C11? I haven't tracked the new feature upstream that closely, the change in clang/llvm that regressed the old code here might have been reverted and/or reapplied (for other reasons) though, not sure what the current state is. // Martin
Martin Storsjö: > On Sat, 6 Nov 2021, Andreas Rheinhardt wrote: > >> Martin Storsjö: >>> Passing an uninitialized variable as argument to a function is >>> undefined behaviour (UB). The compiler can assume that UB does not >>> happen. >>> >>> Hence, the compiler can assume that the variables are never >>> uninitialized when passed as argument, which means that the codepaths >>> that initializes them must be taken. >>> >>> In ff_seek_frame_binary, this means that the compiler can assume >>> that the codepaths that initialize pos_min and pos_max are taken, >>> which means that the conditions "if (sti->index_entries)" and >>> "if (index >= 0)" can be optimized out. >>> >>> Current Clang git versions (upcoming Clang 14) enabled an optimization >>> that does this, which broke the current version of this function >>> (which intentionally left the variables uninitialized, but silencing >>> warnings about being uninitialized). See [1] for discussion on >>> the matter. >>> >>> [1] https://reviews.llvm.org/D105169#3069555 >>> >>> Signed-off-by: Martin Storsjö <martin@martin.st> >>> --- >>> libavformat/seek.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/libavformat/seek.c b/libavformat/seek.c >>> index 40169736df..405ca316b3 100644 >>> --- a/libavformat/seek.c >>> +++ b/libavformat/seek.c >>> @@ -283,7 +283,7 @@ int ff_seek_frame_binary(AVFormatContext *s, int >>> stream_index, >>> int64_t target_ts, int flags) >>> { >>> const AVInputFormat *const avif = s->iformat; >>> - int64_t av_uninit(pos_min), av_uninit(pos_max), pos, pos_limit; >>> + int64_t pos_min = 0, pos_max = 0, pos, pos_limit; >>> int64_t ts_min, ts_max, ts; >>> int index; >>> int64_t ret; >>> >> >> I already wanted to write that the compiler is wrong, but it seems it is >> not, as C11 differs from C99 in this respect (C11 6.3.2.1 2): >> >> "If the lvalue designates an >> object of automatic storage duration that could have been declared with >> the register storage class (never had its address taken), and that >> object is uninitialized (not declared >> with an initializer and no assignment to it has been performed prior to >> use), the behavior >> is undefined." >> >> For GCC and Clang av_uninit(x) is defined as x=x. And that is a problem: >> In case this macro is used to declare an automatic variable that is >> could be declared with the register storage class the >> pseudo-initialization is UB according to the above. So I think we will >> have to modify the macro to make it safe. Or just stop using it. >> (How could such a hack ever end up in a public header?) > > FWIW, most of the issue here comes from the fact that the uninitialized > value is passed as a parameter - in that respect, av_uninit() isn't any > better or worse than just leaving the variable plain uninitialized. (Not > that one really should reason around where UB is ok...) > It is UB even when it is not used as a function argument at all. While there seems to be no current compiler that uses this against us, there is no guarantee that it will stay this way. After all, this patch is about an unpleasant surprise and there might be more of that in the future. > Also, I haven't tried to read the standard wrt that, but I would expect > that passing an uninitialized value as parameter was UB even before C11? > The whole clause which I cited does not exist in old standards. C99's non-normative annex J.2 contains the following item in the list of undefined behaviour: "The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.8, 6.8)." Yet I have not found a normative statement to back this up. It is explicitly mentioned that using a trap representation is UB. If the type in question has a trap representation, then the compiler may of course always presume that every indeterminate object is a trap representation; but if it is known that the type in question does not permit trap representations (like the uintX_t types), then I don't see why this should be UB. But in any case it can be inferred from the annex entry that it was always the intention of the standards committee for it to be UB. (And I don't know whether the LLVM patch actually checked for the C version; I'd be surprised if it did.) > I haven't tracked the new feature upstream that closely, the change in > clang/llvm that regressed the old code here might have been reverted > and/or reapplied (for other reasons) though, not sure what the current > state is. > I don't even see a reason to track said patch. - Andreas
On Sat, 6 Nov 2021, Andreas Rheinhardt wrote: > Martin Storsjö: >> On Sat, 6 Nov 2021, Andreas Rheinhardt wrote: >> >>> For GCC and Clang av_uninit(x) is defined as x=x. And that is a problem: >>> In case this macro is used to declare an automatic variable that is >>> could be declared with the register storage class the >>> pseudo-initialization is UB according to the above. So I think we will >>> have to modify the macro to make it safe. Or just stop using it. >>> (How could such a hack ever end up in a public header?) >> >> FWIW, most of the issue here comes from the fact that the uninitialized >> value is passed as a parameter - in that respect, av_uninit() isn't any >> better or worse than just leaving the variable plain uninitialized. (Not >> that one really should reason around where UB is ok...) >> > > It is UB even when it is not used as a function argument at all. While > there seems to be no current compiler that uses this against us, there > is no guarantee that it will stay this way. After all, this patch is > about an unpleasant surprise and there might be more of that in the future. Yes, that's true, as leaving a variable uninitialized then initializing it before actual use is fine, as long as it's not used uninitialized - while here we have the UB already in the initialization. However - in the case of the Clang optimization breaking our code, there's no difference between leaving it entirely uninitialized or using the x=x trick though: In both cases, the compiler concluded that as the values passed to the function call must be initialized at that point, the codepaths that initialize the values must have been taken, and thus optimizing out the conditions. // Martin
Andreas Rheinhardt: > Martin Storsjö: >> On Sat, 6 Nov 2021, Andreas Rheinhardt wrote: >> >>> Martin Storsjö: >>>> Passing an uninitialized variable as argument to a function is >>>> undefined behaviour (UB). The compiler can assume that UB does not >>>> happen. >>>> >>>> Hence, the compiler can assume that the variables are never >>>> uninitialized when passed as argument, which means that the codepaths >>>> that initializes them must be taken. >>>> >>>> In ff_seek_frame_binary, this means that the compiler can assume >>>> that the codepaths that initialize pos_min and pos_max are taken, >>>> which means that the conditions "if (sti->index_entries)" and >>>> "if (index >= 0)" can be optimized out. >>>> >>>> Current Clang git versions (upcoming Clang 14) enabled an optimization >>>> that does this, which broke the current version of this function >>>> (which intentionally left the variables uninitialized, but silencing >>>> warnings about being uninitialized). See [1] for discussion on >>>> the matter. >>>> >>>> [1] https://reviews.llvm.org/D105169#3069555 >>>> >>>> Signed-off-by: Martin Storsjö <martin@martin.st> >>>> --- >>>> libavformat/seek.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/libavformat/seek.c b/libavformat/seek.c >>>> index 40169736df..405ca316b3 100644 >>>> --- a/libavformat/seek.c >>>> +++ b/libavformat/seek.c >>>> @@ -283,7 +283,7 @@ int ff_seek_frame_binary(AVFormatContext *s, int >>>> stream_index, >>>> int64_t target_ts, int flags) >>>> { >>>> const AVInputFormat *const avif = s->iformat; >>>> - int64_t av_uninit(pos_min), av_uninit(pos_max), pos, pos_limit; >>>> + int64_t pos_min = 0, pos_max = 0, pos, pos_limit; >>>> int64_t ts_min, ts_max, ts; >>>> int index; >>>> int64_t ret; >>>> >>> >>> I already wanted to write that the compiler is wrong, but it seems it is >>> not, as C11 differs from C99 in this respect (C11 6.3.2.1 2): >>> >>> "If the lvalue designates an >>> object of automatic storage duration that could have been declared with >>> the register storage class (never had its address taken), and that >>> object is uninitialized (not declared >>> with an initializer and no assignment to it has been performed prior to >>> use), the behavior >>> is undefined." >>> >>> For GCC and Clang av_uninit(x) is defined as x=x. And that is a problem: >>> In case this macro is used to declare an automatic variable that is >>> could be declared with the register storage class the >>> pseudo-initialization is UB according to the above. So I think we will >>> have to modify the macro to make it safe. Or just stop using it. >>> (How could such a hack ever end up in a public header?) >> >> FWIW, most of the issue here comes from the fact that the uninitialized >> value is passed as a parameter - in that respect, av_uninit() isn't any >> better or worse than just leaving the variable plain uninitialized. (Not >> that one really should reason around where UB is ok...) >> > > It is UB even when it is not used as a function argument at all. While > there seems to be no current compiler that uses this against us, there > is no guarantee that it will stay this way. After all, this patch is > about an unpleasant surprise and there might be more of that in the future. > >> Also, I haven't tried to read the standard wrt that, but I would expect >> that passing an uninitialized value as parameter was UB even before C11? >> > > The whole clause which I cited does not exist in old standards. C99's > non-normative annex J.2 contains the following item in the list of > undefined behaviour: "The value of an object with automatic storage > duration is used while it is > indeterminate (6.2.4, 6.7.8, 6.8)." Yet I have not found a normative > statement to back this up. It is explicitly mentioned that using a trap > representation is UB. If the type in question has a trap representation, > then the compiler may of course always presume that every indeterminate > object is a trap representation; but if it is known that the type in > question does not permit trap representations (like the uintX_t types), > then I don't see why this should be UB. Addition: Using indeterminate values was definitely UB in C90 by the very definition of UB: "undefined behavior: Behavior, upon use of a non-portable or erroneous program construct, of erroneous data, or of indeterminately valued objects, for which the standard imposes no requirements". C99 and C11 define UB as "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements". So it was always intended that using indeterminate values is UB. This seems to be just an oversight on part of the authors of C99. - Andreas
diff --git a/libavformat/seek.c b/libavformat/seek.c index 40169736df..405ca316b3 100644 --- a/libavformat/seek.c +++ b/libavformat/seek.c @@ -283,7 +283,7 @@ int ff_seek_frame_binary(AVFormatContext *s, int stream_index, int64_t target_ts, int flags) { const AVInputFormat *const avif = s->iformat; - int64_t av_uninit(pos_min), av_uninit(pos_max), pos, pos_limit; + int64_t pos_min = 0, pos_max = 0, pos, pos_limit; int64_t ts_min, ts_max, ts; int index; int64_t ret;
Passing an uninitialized variable as argument to a function is undefined behaviour (UB). The compiler can assume that UB does not happen. Hence, the compiler can assume that the variables are never uninitialized when passed as argument, which means that the codepaths that initializes them must be taken. In ff_seek_frame_binary, this means that the compiler can assume that the codepaths that initialize pos_min and pos_max are taken, which means that the conditions "if (sti->index_entries)" and "if (index >= 0)" can be optimized out. Current Clang git versions (upcoming Clang 14) enabled an optimization that does this, which broke the current version of this function (which intentionally left the variables uninitialized, but silencing warnings about being uninitialized). See [1] for discussion on the matter. [1] https://reviews.llvm.org/D105169#3069555 Signed-off-by: Martin Storsjö <martin@martin.st> --- libavformat/seek.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)