diff mbox series

[FFmpeg-devel,1/5] lavu/common.h: Fix UB in av_clipl_int32_c()

Message ID 27e5f614496e729f68326a31c3ac70d5923cfa97.camel@haerdin.se
State New
Headers show
Series [FFmpeg-devel,1/5] lavu/common.h: Fix UB in av_clipl_int32_c() | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Tomas Härdin May 29, 2024, 10:13 p.m. UTC
The entire patchset passes FATE

/Tomas

Comments

Andreas Rheinhardt May 29, 2024, 10:31 p.m. UTC | #1
Tomas Härdin:
>   */
>  static av_always_inline av_const int32_t av_clipl_int32_c(int64_t a)
>  {
> -    if ((a+0x80000000u) & ~UINT64_C(0xFFFFFFFF)) return (int32_t)((a>>63) ^ 0x7FFFFFFF);
> -    else                                         return (int32_t)a;
> +    if ((a+UINT64_C(0x80000000)) & ~UINT64_C(0xFFFFFFFF)) return (int32_t)((a>>63) ^ 0x7FFFFFFF);
> +    else                                                  return (int32_t)a;

IMO (uint64_t)a + 0x80000000 is more readable. (Maybe it would even be
good to use >> 32 instead of ~UINT64_C(0xFFFFFFFF)?)

- Andreas
Rémi Denis-Courmont May 30, 2024, 6:41 a.m. UTC | #2
Hi,

Le 30 mai 2024 01:13:14 GMT+03:00, "Tomas Härdin" <git@haerdin.se> a écrit :
>The entire patchset passes FATE

Is the version in riscv/intmath.h safe? It looks to me that the GCC codegen for not only RV64 but also AArch{32,64} and x86-64 is better than this.
Tomas Härdin May 30, 2024, 9:40 a.m. UTC | #3
tor 2024-05-30 klockan 09:41 +0300 skrev Rémi Denis-Courmont:
> Hi,
> 
> Le 30 mai 2024 01:13:14 GMT+03:00, "Tomas Härdin" <git@haerdin.se> a
> écrit :
> > The entire patchset passes FATE
> 
> Is the version in riscv/intmath.h safe? It looks to me that the GCC
> codegen for not only RV64 but also AArch{32,64} and x86-64 is better
> than this.

I haven't checked. It seems weird to me to have two different C
versions. We shouldn't rely on type punning. The standard compliant way
is to use memcpy()

/Tomas
Tomas Härdin May 30, 2024, 9:48 a.m. UTC | #4
tor 2024-05-30 klockan 00:31 +0200 skrev Andreas Rheinhardt:
> Tomas Härdin:
> >   */
> >  static av_always_inline av_const int32_t av_clipl_int32_c(int64_t
> > a)
> >  {
> > -    if ((a+0x80000000u) & ~UINT64_C(0xFFFFFFFF)) return
> > (int32_t)((a>>63) ^ 0x7FFFFFFF);
> > -    else                                         return
> > (int32_t)a;
> > +    if ((a+UINT64_C(0x80000000)) & ~UINT64_C(0xFFFFFFFF)) return
> > (int32_t)((a>>63) ^ 0x7FFFFFFF);
> > +    else                                                  return
> > (int32_t)a;
> 
> IMO (uint64_t)a + 0x80000000 is more readable. (Maybe it would even
> be
> good to use >> 32 instead of ~UINT64_C(0xFFFFFFFF)?)

It already uses UINT64_C, hence why I used it.

>> 32 would work also. Does it make any difference performance wise?

/Tomas
Rémi Denis-Courmont May 30, 2024, 11:50 a.m. UTC | #5
Le 30 mai 2024 12:40:20 GMT+03:00, "Tomas Härdin" <git@haerdin.se> a écrit :
>tor 2024-05-30 klockan 09:41 +0300 skrev Rémi Denis-Courmont:
>> Hi,
>> 
>> Le 30 mai 2024 01:13:14 GMT+03:00, "Tomas Härdin" <git@haerdin.se> a
>> écrit :
>> > The entire patchset passes FATE
>> 
>> Is the version in riscv/intmath.h safe? It looks to me that the GCC
>> codegen for not only RV64 but also AArch{32,64} and x86-64 is better
>> than this.
>
>I haven't checked. It seems weird to me to have two different C
>versions.

The common one ends up horrendously bad on RV, and presumably on MIPS and some other RISC ISA.

> We shouldn't rely on type punning.

Because?

We should depend on punning as long as it conforms to the standard.

> The standard compliant way
>is to use memcpy()

That's way worse than union in terms of how proactively the compiler needs to optimise, and both approaches are as confirming.
Tomas Härdin May 30, 2024, 2:07 p.m. UTC | #6
tor 2024-05-30 klockan 14:50 +0300 skrev Rémi Denis-Courmont:
> 
> 
> Le 30 mai 2024 12:40:20 GMT+03:00, "Tomas Härdin" <git@haerdin.se> a
> écrit :
> > tor 2024-05-30 klockan 09:41 +0300 skrev Rémi Denis-Courmont:
> > > Hi,
> > > 
> > > Le 30 mai 2024 01:13:14 GMT+03:00, "Tomas Härdin"
> > > <git@haerdin.se> a
> > > écrit :
> > > > The entire patchset passes FATE
> > > 
> > > Is the version in riscv/intmath.h safe? It looks to me that the
> > > GCC
> > > codegen for not only RV64 but also AArch{32,64} and x86-64 is
> > > better
> > > than this.
> > 
> > I haven't checked. It seems weird to me to have two different C
> > versions.
> 
> The common one ends up horrendously bad on RV, and presumably on MIPS
> and some other RISC ISA.
> 
> > We shouldn't rely on type punning.
> 
> Because?
> 
> We should depend on punning as long as it conforms to the standard.

My mistake, I forgot type punning is allowed in C. It's UB in C++

> > The standard compliant way
> > is to use memcpy()
> 
> That's way worse than union in terms of how proactively the compiler
> needs to optimise, and both approaches are as confirming.

A good compiler will do the same thing

Maybe I can get the riscv version covered by Eva as well. That's beyond
the scope of this patchset

/Tomas
Rémi Denis-Courmont May 30, 2024, 2:28 p.m. UTC | #7
Le 30 mai 2024 17:07:21 GMT+03:00, "Tomas Härdin" <git@haerdin.se> a écrit :
>> We should depend on punning as long as it conforms to the standard.
>
>My mistake, I forgot type punning is allowed in C. It's UB in C++
>
>> > The standard compliant way
>> > is to use memcpy()
>> 
>> That's way worse than union in terms of how proactively the compiler
>> needs to optimise, and both approaches are as confirming.
>
>A good compiler will do the same thing

True, and I don't care very much about memcpy vs union, as they both rely on matching representation. AFAIR, FFmpeg tends to use unions though.

>
>Maybe I can get the riscv version covered by Eva as well. That's beyond
>the scope of this patchset

IMHO, this specific patch (and the following one) are beating dead horses. Sure there may be theoretical UB in the current code, but if there is a *better* implementation, better switch to that than bike shedding the fix for the UB.
Tomas Härdin May 30, 2024, 3:32 p.m. UTC | #8
tor 2024-05-30 klockan 17:28 +0300 skrev Rémi Denis-Courmont:
> 
> 
> Le 30 mai 2024 17:07:21 GMT+03:00, "Tomas Härdin" <git@haerdin.se> a
> écrit :
> > > We should depend on punning as long as it conforms to the
> > > standard.
> > 
> > My mistake, I forgot type punning is allowed in C. It's UB in C++
> > 
> > > > The standard compliant way
> > > > is to use memcpy()
> > > 
> > > That's way worse than union in terms of how proactively the
> > > compiler
> > > needs to optimise, and both approaches are as confirming.
> > 
> > A good compiler will do the same thing
> 
> True, and I don't care very much about memcpy vs union, as they both
> rely on matching representation. AFAIR, FFmpeg tends to use unions
> though.
> 
> > 
> > Maybe I can get the riscv version covered by Eva as well. That's
> > beyond
> > the scope of this patchset
> 
> IMHO, this specific patch (and the following one) are beating dead
> horses. Sure there may be theoretical UB in the current code, but if
> there is a *better* implementation, better switch to that than bike
> shedding the fix for the UB.

Are you saying that UB is acceptable? You know the compiler is free to
assume signed arithmetic doesn't overflow, right? If so then what other
UB might we accept?

/Tomas
Rémi Denis-Courmont May 30, 2024, 3:38 p.m. UTC | #9
Le torstaina 30. toukokuuta 2024, 18.32.19 EEST Tomas Härdin a écrit :
> Are you saying that UB is acceptable?

Are you imitating Thilo and grand-standing by putting words in my mouth?

Yes and so -1 for you.
James Almer May 30, 2024, 3:42 p.m. UTC | #10
On 5/30/2024 12:32 PM, Tomas Härdin wrote:
> tor 2024-05-30 klockan 17:28 +0300 skrev Rémi Denis-Courmont:
>>
>>
>> Le 30 mai 2024 17:07:21 GMT+03:00, "Tomas Härdin" <git@haerdin.se> a
>> écrit :
>>>> We should depend on punning as long as it conforms to the
>>>> standard.
>>>
>>> My mistake, I forgot type punning is allowed in C. It's UB in C++
>>>
>>>>> The standard compliant way
>>>>> is to use memcpy()
>>>>
>>>> That's way worse than union in terms of how proactively the
>>>> compiler
>>>> needs to optimise, and both approaches are as confirming.
>>>
>>> A good compiler will do the same thing
>>
>> True, and I don't care very much about memcpy vs union, as they both
>> rely on matching representation. AFAIR, FFmpeg tends to use unions
>> though.
>>
>>>
>>> Maybe I can get the riscv version covered by Eva as well. That's
>>> beyond
>>> the scope of this patchset
>>
>> IMHO, this specific patch (and the following one) are beating dead
>> horses. Sure there may be theoretical UB in the current code, but if
>> there is a *better* implementation, better switch to that than bike
>> shedding the fix for the UB.
> 
> Are you saying that UB is acceptable? You know the compiler is free to
> assume signed arithmetic doesn't overflow, right? If so then what other
> UB might we accept?

He did not say that... He said we should switch to a better 
implementation rather than trying to fix the existing potentially buggy one.
Tomas Härdin May 30, 2024, 4:48 p.m. UTC | #11
tor 2024-05-30 klockan 12:42 -0300 skrev James Almer:
> On 5/30/2024 12:32 PM, Tomas Härdin wrote:
> > tor 2024-05-30 klockan 17:28 +0300 skrev Rémi Denis-Courmont:
> > > 
> > > 
> > > Le 30 mai 2024 17:07:21 GMT+03:00, "Tomas Härdin"
> > > <git@haerdin.se> a
> > > écrit :
> > > > > We should depend on punning as long as it conforms to the
> > > > > standard.
> > > > 
> > > > My mistake, I forgot type punning is allowed in C. It's UB in
> > > > C++
> > > > 
> > > > > > The standard compliant way
> > > > > > is to use memcpy()
> > > > > 
> > > > > That's way worse than union in terms of how proactively the
> > > > > compiler
> > > > > needs to optimise, and both approaches are as confirming.
> > > > 
> > > > A good compiler will do the same thing
> > > 
> > > True, and I don't care very much about memcpy vs union, as they
> > > both
> > > rely on matching representation. AFAIR, FFmpeg tends to use
> > > unions
> > > though.
> > > 
> > > > 
> > > > Maybe I can get the riscv version covered by Eva as well.
> > > > That's
> > > > beyond
> > > > the scope of this patchset
> > > 
> > > IMHO, this specific patch (and the following one) are beating
> > > dead
> > > horses. Sure there may be theoretical UB in the current code, but
> > > if
> > > there is a *better* implementation, better switch to that than
> > > bike
> > > shedding the fix for the UB.
> > 
> > Are you saying that UB is acceptable? You know the compiler is free
> > to
> > assume signed arithmetic doesn't overflow, right? If so then what
> > other
> > UB might we accept?
> 
> He did not say that... He said we should switch to a better 
> implementation rather than trying to fix the existing potentially
> buggy one.

I have a fix for demonstrable UB and Rémi is problematizing it. It is
not a "theoretical" UB - that's not how UB works. Any compiler doing
basic value analysis will find it, and is therefore free to do whatever
it wants, for example deleting all calls to av_clipl_int32_c().

We could certainly replace some of these functions with intrinsics, but
that's not what this patchset is about. I don't know what set of
compilers we support. I don't know what intrinsics they support. Am I
to be compelled to figure that out, and provide the necessary
intrinsics for all of them?

This may all seem trivial, and it is, but this patchset is also a test
balloon. Line struggle is important. What I see is the stalling of
fixes of *known broken code*. That is not encouraging.

/Tomas
Rémi Denis-Courmont May 30, 2024, 5:49 p.m. UTC | #12
Le torstaina 30. toukokuuta 2024, 19.48.13 EEST Tomas Härdin a écrit :
> > > Are you saying that UB is acceptable? You know the compiler is free
> > > to
> > > assume signed arithmetic doesn't overflow, right? If so then what
> > > other
> > > UB might we accept?
> > 
> > He did not say that... He said we should switch to a better
> > implementation rather than trying to fix the existing potentially
> > buggy one.
> 
> I have a fix for demonstrable UB and Rémi is problematizing it.

Andreas made cosmetic arguments against this patch before I had even seen the 
patch, forget comment on it.

> It is not a "theoretical" UB - that's not how UB works.

It is a *theoretical* UB if you can not prove that it leads to misbehaviour in 
any *practical* use. In theory, all UB is *potentially* fatal. Emphasis on 
potentially.

So yes, while all UB instances are bad and deserve fixing, they are not all 
equally bad nor urgent. UB that is proven to lead to remote code execution is 
way worse than theoretical UB that has only been proven in literature, and is 
not known or even seriously suspected to lead to broken optimisations.

> Any compiler doing
> basic value analysis will find it, and is therefore free to do whatever
> it wants, for example deleting all calls to av_clipl_int32_c().

That is formally true. But it is also formally true that, by that same logic, 
since there is most certainly some UB instance left elsewhere in the codebase, 
the entirety of libavutil could be elided by the compiler. In other words, in 
theory, FFmpeg does not work at all. Does that mean that we should give up on 
the project here and now?

> We could certainly replace some of these functions with intrinsics, but
> that's not what this patchset is about.

I am not sure what is your point because nobody said that av_clipl_int32_c() 
should be replaced by intrinsics.

> I don't know what set of compilers we support.

That is irrelevant since all C99, C11 and C23 compilers support the proposed 
substitute code as long as <stdint.h> defines int32_t.

> I don't know what intrinsics they support.

Also irrelevant.

> Am I to be compelled to figure that out, and provide the necessary
> intrinsics for all of them?

No, and you are the only person to have made an implication to the contrary as 
far as *this* patch is concerned.
Michael Niedermayer May 30, 2024, 7:07 p.m. UTC | #13
On Thu, May 30, 2024 at 08:49:12PM +0300, Rémi Denis-Courmont wrote:
> Le torstaina 30. toukokuuta 2024, 19.48.13 EEST Tomas Härdin a écrit :
> > > > Are you saying that UB is acceptable? You know the compiler is free
> > > > to
> > > > assume signed arithmetic doesn't overflow, right? If so then what
> > > > other
> > > > UB might we accept?
> > > 
> > > He did not say that... He said we should switch to a better
> > > implementation rather than trying to fix the existing potentially
> > > buggy one.
> > 
> > I have a fix for demonstrable UB and Rémi is problematizing it.
> 
> Andreas made cosmetic arguments against this patch before I had even seen the 
> patch, forget comment on it.
> 

> > It is not a "theoretical" UB - that's not how UB works.
> 
> It is a *theoretical* UB if you can not prove that it leads to misbehaviour in 

If the function doesnt get called with values triggering UB then its not UB.

If the function gets called with values triggering the signed overflow then its UB
And its a bug unless the applications intended behavior is undefined.

also i would not bet on that the function produces the correct output for
input values that trigger UB on every platform

The case where this really could be a problem is if its used with compile
time constants that would trigger the overflow because in these cases
the optimizer can assume the whole codepath leading to it can be
removed.

IMHO we should simply fix UB instead of arguing over how bad it could be
or when.

thx

[...]
Rémi Denis-Courmont May 30, 2024, 7:20 p.m. UTC | #14
Le torstaina 30. toukokuuta 2024, 22.07.13 EEST Michael Niedermayer a écrit :
> If the function doesnt get called with values triggering UB then its not UB.

As Tomas pointed out, that statement is actually false. Specifically, if the 
compiler can prove that the function can be called with values triggering UB, 
then the code is UB, even if those offending values do not actually occur in a 
given instance of the program.

The C specification is known to contradict causality.

For instance, if you have pass an uninitialised value to av_clipl_int32_c(), 
then the code is UB, even if the actual value in the register or stack slot is 
never one that could trigger UB. Of course, usage of uninitialised values is a 
bad practice, but it is not, per se, UB.
Michael Niedermayer May 31, 2024, 1:03 a.m. UTC | #15
On Thu, May 30, 2024 at 06:38:42PM +0300, Rémi Denis-Courmont wrote:
> Le torstaina 30. toukokuuta 2024, 18.32.19 EEST Tomas Härdin a écrit :
> > Are you saying that UB is acceptable?
> 

no, thats not what Remi meant IIUC


> Are you imitating Thilo and grand-standing by putting words in my mouth?

That seems unneccesarily insulting towards Thilo

also i think you quickly feel attacked and in some cases the other person
may have had no intend to attack you. And things can be just plain
misunderstandings

I certainly have missunderstood people many times and i certainly was
missunderstood by people also many times.

thx


[...]
Tomas Härdin May 31, 2024, 3:23 p.m. UTC | #16
tor 2024-05-30 klockan 20:49 +0300 skrev Rémi Denis-Courmont:
> Le torstaina 30. toukokuuta 2024, 19.48.13 EEST Tomas Härdin a écrit 
> > It is not a "theoretical" UB - that's not how UB works.
> 
> It is a *theoretical* UB if you can not prove that it leads to
> misbehaviour in 
> any *practical* use. In theory, all UB is *potentially* fatal.
> Emphasis on 
> potentially.

The issue is that compilers can change without notice

> > Any compiler doing
> > basic value analysis will find it, and is therefore free to do
> > whatever
> > it wants, for example deleting all calls to av_clipl_int32_c().
> 
> That is formally true. But it is also formally true that, by that
> same logic, 
> since there is most certainly some UB instance left elsewhere in the
> codebase, 
> the entirety of libavutil could be elided by the compiler.

I mean, part of what I'm doing with my little value analysis experiment
is finding these instances of UB throughout lavu and fixing them, so
that no compiler will perform what we might consider dubious or
unexpected optimizations. It is far more powerful than fuzzing when it
comes to discovering bugs

I'm looking at rational.c at the moment, and there are definitely some
dubious codepaths. There have also been fixes for signed overflow in
rational.c already. I wouldn't be surprised if there are more corner
cases that can be triggered by some demuxer or decoder that fuzzing
hasn't discovered yet

Maybe in some glorious future we'll have a version of C where signed
overflow is defined behavior

/Tomas
Rémi Denis-Courmont June 3, 2024, 7:32 a.m. UTC | #17
Le 31 mai 2024 04:03:24 GMT+03:00, Michael Niedermayer <michael@niedermayer.cc> a écrit :
>> Are you imitating Thilo and grand-standing by putting words in my mouth?
>
>That seems unneccesarily insulting towards Thilo

This is not an insult in the first place, and besides Thilo did put words in mouth to discredit me as shown by the mailing mist archives.

And you had no problems with that. You also had no problems with Thilo calling Kieran, Derek and myself trolls on this very mailing list, and that was an actual insult. So care to explain why you have a problem with this and not that?

Because it sure looks like blatant double standards to me, and against me.
Michael Niedermayer June 3, 2024, 9:21 p.m. UTC | #18
On Mon, Jun 03, 2024 at 10:32:17AM +0300, Rémi Denis-Courmont wrote:
> 
> 
> Le 31 mai 2024 04:03:24 GMT+03:00, Michael Niedermayer <michael@niedermayer.cc> a écrit :
> >> Are you imitating Thilo and grand-standing by putting words in my mouth?
> >
> >That seems unneccesarily insulting towards Thilo
> 
> This is not an insult in the first place, and besides Thilo did put words in mouth to discredit me as shown by the mailing mist archives.
> 
> And you had no problems with that. You also had no problems with Thilo calling Kieran, Derek and myself trolls on this very mailing list, and that was an actual insult. So care to explain why you have a problem with this and not that?

about "you had no problems with that"
First you dont know and cannot know what i have a problem with and what not.
(so please dont claim such a thing)

Second, i do not remember thilo calling you a troll. What i remember (and thats from
my memory) thilo preceived one of kierans replies as trolling and then did not
reply to that. There are 2 things here.
Preceiving a reply as trolling is not the same as calling someone a troll.
And the 2nd thing, i was not happy that the NAB 2024 invitation was not
resend or pinged to ensure people knew they could go there and help if
they wanted.
So in fact i was not ok with that.

and third, iam a human being, and i will certainly miss some things, in fact
iam not even trying to catch every statement thats abbrasive and point
it out. But when i do conciously notice something that seems that it could
hurt someone or be against the team spirit then i will try to point it out.
My goal is not to "warn" anyone its really meant to just bring it to the speakers
attention as (s)he may even be unware

thx

[...]
Tomas Härdin June 14, 2024, 12:31 p.m. UTC | #19
Pushed patches 1-4

/Tomas
diff mbox series

Patch

From c000b8a5e90883f28ce6c58960227e5825ac20d1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tomas=20H=C3=A4rdin?= <git@haerdin.se>
Date: Wed, 15 May 2024 21:03:47 +0200
Subject: [PATCH 1/5] lavu/common.h: Fix UB in av_clipl_int32_c()

Found by value analysis
---
 libavutil/common.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavutil/common.h b/libavutil/common.h
index 3e4c339893..ac68c0cfff 100644
--- a/libavutil/common.h
+++ b/libavutil/common.h
@@ -252,8 +252,8 @@  static av_always_inline av_const int16_t av_clip_int16_c(int a)
  */
 static av_always_inline av_const int32_t av_clipl_int32_c(int64_t a)
 {
-    if ((a+0x80000000u) & ~UINT64_C(0xFFFFFFFF)) return (int32_t)((a>>63) ^ 0x7FFFFFFF);
-    else                                         return (int32_t)a;
+    if ((a+UINT64_C(0x80000000)) & ~UINT64_C(0xFFFFFFFF)) return (int32_t)((a>>63) ^ 0x7FFFFFFF);
+    else                                                  return (int32_t)a;
 }
 
 /**
-- 
2.39.2