diff mbox series

[FFmpeg-devel] avutil/mem: always align by at least 32 bytes

Message ID 20231203201027.2255-1-timo@rothenpieler.org
State New
Headers show
Series [FFmpeg-devel] avutil/mem: always align by at least 32 bytes | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Timo Rothenpieler Dec. 3, 2023, 8:10 p.m. UTC
FFmpeg has instances of DECLARE_ALIGNED(32, ...) in a lot of structs,
which then end up heap-allocated.
By declaring any variable in a struct, or tree of structs, to be 32 byte
aligned, it allows the compiler to safely assume the entire struct
itself is also 32 byte aligned.

This might make the compiler emit code which straight up crashes or
misbehaves in other ways, and at least in one instances is now
documented to actually do (see ticket 10549 on trac).
The issue there is that an unrelated variable in SingleChannelElement is
declared to have an alignment of 32 bytes. So if the compiler does a copy
in decode_cpe() with avx instructions, but ffmpeg is built with
--disable-avx, this results in a crash, since the memory is only 16 byte
aligned.
Mind you, even if the compiler does not emit avx instructions, the code
is still invalid and could misbehave. It just happens not to. Declaring
any variable in a struct with a 32 byte alignment promises 32 byte
alignment of the whole struct to the compiler.

Instead of now going through all instances of variables in structs
being declared as 32 byte aligned, this patch bumps the minimum alignment
to 32 bytes.
---
 libavutil/mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Timo Rothenpieler Dec. 6, 2023, 12:27 p.m. UTC | #1
On 03/12/2023 21:10, Timo Rothenpieler wrote:
> FFmpeg has instances of DECLARE_ALIGNED(32, ...) in a lot of structs,
> which then end up heap-allocated.
> By declaring any variable in a struct, or tree of structs, to be 32 byte
> aligned, it allows the compiler to safely assume the entire struct
> itself is also 32 byte aligned.
> 
> This might make the compiler emit code which straight up crashes or
> misbehaves in other ways, and at least in one instances is now
> documented to actually do (see ticket 10549 on trac).
> The issue there is that an unrelated variable in SingleChannelElement is
> declared to have an alignment of 32 bytes. So if the compiler does a copy
> in decode_cpe() with avx instructions, but ffmpeg is built with
> --disable-avx, this results in a crash, since the memory is only 16 byte
> aligned.
> Mind you, even if the compiler does not emit avx instructions, the code
> is still invalid and could misbehave. It just happens not to. Declaring
> any variable in a struct with a 32 byte alignment promises 32 byte
> alignment of the whole struct to the compiler.
> 
> Instead of now going through all instances of variables in structs
> being declared as 32 byte aligned, this patch bumps the minimum alignment
> to 32 bytes.
> ---
>   libavutil/mem.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libavutil/mem.c b/libavutil/mem.c
> index 36b8940a0c..26a9b9753b 100644
> --- a/libavutil/mem.c
> +++ b/libavutil/mem.c
> @@ -62,7 +62,7 @@ void  free(void *ptr);
>   
>   #endif /* MALLOC_PREFIX */
>   
> -#define ALIGN (HAVE_AVX512 ? 64 : (HAVE_AVX ? 32 : 16))
> +#define ALIGN (HAVE_AVX512 ? 64 : 32)
>   
>   /* NOTE: if you want to override these functions with your own
>    * implementations (not recommended) you have to link libav* as

ping
James Almer Dec. 6, 2023, 12:31 p.m. UTC | #2
On 12/3/2023 5:10 PM, Timo Rothenpieler wrote:
> FFmpeg has instances of DECLARE_ALIGNED(32, ...) in a lot of structs,
> which then end up heap-allocated.
> By declaring any variable in a struct, or tree of structs, to be 32 byte
> aligned, it allows the compiler to safely assume the entire struct
> itself is also 32 byte aligned.
> 
> This might make the compiler emit code which straight up crashes or
> misbehaves in other ways, and at least in one instances is now
> documented to actually do (see ticket 10549 on trac).
> The issue there is that an unrelated variable in SingleChannelElement is
> declared to have an alignment of 32 bytes. So if the compiler does a copy
> in decode_cpe() with avx instructions, but ffmpeg is built with
> --disable-avx, this results in a crash, since the memory is only 16 byte
> aligned.

Wont we run into similar issues with avx512 eventually?

> Mind you, even if the compiler does not emit avx instructions, the code
> is still invalid and could misbehave. It just happens not to. Declaring
> any variable in a struct with a 32 byte alignment promises 32 byte
> alignment of the whole struct to the compiler.
> 
> Instead of now going through all instances of variables in structs
> being declared as 32 byte aligned, this patch bumps the minimum alignment
> to 32 bytes.
> ---
>   libavutil/mem.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libavutil/mem.c b/libavutil/mem.c
> index 36b8940a0c..26a9b9753b 100644
> --- a/libavutil/mem.c
> +++ b/libavutil/mem.c
> @@ -62,7 +62,7 @@ void  free(void *ptr);
>   
>   #endif /* MALLOC_PREFIX */
>   
> -#define ALIGN (HAVE_AVX512 ? 64 : (HAVE_AVX ? 32 : 16))
> +#define ALIGN (HAVE_AVX512 ? 64 : 32)
>   
>   /* NOTE: if you want to override these functions with your own
>    * implementations (not recommended) you have to link libav* as
Ronald S. Bultje Dec. 6, 2023, 12:50 p.m. UTC | #3
Hi,

On Sun, Dec 3, 2023 at 3:10 PM Timo Rothenpieler <timo@rothenpieler.org>
wrote:

> So if the compiler does a copy
> in decode_cpe() with avx instructions, but ffmpeg is built with
> --disable-avx
>

Ehm... What? That seems like the core bug then?

Ronald
James Almer Dec. 6, 2023, 12:54 p.m. UTC | #4
On 12/6/2023 9:50 AM, Ronald S. Bultje wrote:
> Hi,
> 
> On Sun, Dec 3, 2023 at 3:10 PM Timo Rothenpieler <timo@rothenpieler.org>
> wrote:
> 
>> So if the compiler does a copy
>> in decode_cpe() with avx instructions, but ffmpeg is built with
>> --disable-avx
>>
> 
> Ehm... What? That seems like the core bug then?

--disable-avx will disable any ffmpeg specific AVX dsp function that 
depends on HAVE_AVX_EXTERNAL and similar, but will not do anything to 
the -march argument you can pass the compiler with the --cpu configure 
option.

configure --cpu=haswell --disable-avx is completely valid.

> 
> Ronald
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Timo Rothenpieler Dec. 6, 2023, 12:56 p.m. UTC | #5
On 06/12/2023 13:31, James Almer wrote:
> On 12/3/2023 5:10 PM, Timo Rothenpieler wrote:
>> FFmpeg has instances of DECLARE_ALIGNED(32, ...) in a lot of structs,
>> which then end up heap-allocated.
>> By declaring any variable in a struct, or tree of structs, to be 32 byte
>> aligned, it allows the compiler to safely assume the entire struct
>> itself is also 32 byte aligned.
>>
>> This might make the compiler emit code which straight up crashes or
>> misbehaves in other ways, and at least in one instances is now
>> documented to actually do (see ticket 10549 on trac).
>> The issue there is that an unrelated variable in SingleChannelElement is
>> declared to have an alignment of 32 bytes. So if the compiler does a copy
>> in decode_cpe() with avx instructions, but ffmpeg is built with
>> --disable-avx, this results in a crash, since the memory is only 16 byte
>> aligned.
> 
> Wont we run into similar issues with avx512 eventually?

It's only indirectly related to AVX.
The core issue is that we have structs with elements that declare an 
alignment of 32 bytes all over the codebase.
I checked all instances, and we do not have any struct members that 
declare a higher alignment requirement than 32.
(There's two instances of 64 byte alignment, but those are not on struct 
members, but on stack variables.)

This promises the compiler that the memory of the whole struct is 
aligned accordingly. So no matter if it breaks because of AVX or 
something else, the compiler could generate broken code if we heap 
allocate those structs with too small of an alignment.
It could generate other, non-AVX code, that depends on that alignment.

So we will only run into this again if someone decides to add a struct 
member with bigger alignment to a heap allocated struct somewhere.

>> Mind you, even if the compiler does not emit avx instructions, the code
>> is still invalid and could misbehave. It just happens not to. Declaring
>> any variable in a struct with a 32 byte alignment promises 32 byte
>> alignment of the whole struct to the compiler.
>>
>> Instead of now going through all instances of variables in structs
>> being declared as 32 byte aligned, this patch bumps the minimum alignment
>> to 32 bytes.
>> ---
>>   libavutil/mem.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/libavutil/mem.c b/libavutil/mem.c
>> index 36b8940a0c..26a9b9753b 100644
>> --- a/libavutil/mem.c
>> +++ b/libavutil/mem.c
>> @@ -62,7 +62,7 @@ void  free(void *ptr);
>>   #endif /* MALLOC_PREFIX */
>> -#define ALIGN (HAVE_AVX512 ? 64 : (HAVE_AVX ? 32 : 16))
>> +#define ALIGN (HAVE_AVX512 ? 64 : 32)
>>   /* NOTE: if you want to override these functions with your own
>>    * implementations (not recommended) you have to link libav* as
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Martin Storsjö Dec. 6, 2023, 1:25 p.m. UTC | #6
On Sun, 3 Dec 2023, Timo Rothenpieler wrote:

> FFmpeg has instances of DECLARE_ALIGNED(32, ...) in a lot of structs,
> which then end up heap-allocated.
> By declaring any variable in a struct, or tree of structs, to be 32 byte
> aligned, it allows the compiler to safely assume the entire struct
> itself is also 32 byte aligned.
>
> This might make the compiler emit code which straight up crashes or
> misbehaves in other ways, and at least in one instances is now
> documented to actually do (see ticket 10549 on trac).
> The issue there is that an unrelated variable in SingleChannelElement is
> declared to have an alignment of 32 bytes. So if the compiler does a copy
> in decode_cpe() with avx instructions, but ffmpeg is built with
> --disable-avx, this results in a crash, since the memory is only 16 byte
> aligned.
> Mind you, even if the compiler does not emit avx instructions, the code
> is still invalid and could misbehave. It just happens not to. Declaring
> any variable in a struct with a 32 byte alignment promises 32 byte
> alignment of the whole struct to the compiler.
>
> Instead of now going through all instances of variables in structs
> being declared as 32 byte aligned, this patch bumps the minimum alignment
> to 32 bytes.
> ---
> libavutil/mem.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libavutil/mem.c b/libavutil/mem.c
> index 36b8940a0c..26a9b9753b 100644
> --- a/libavutil/mem.c
> +++ b/libavutil/mem.c
> @@ -62,7 +62,7 @@ void  free(void *ptr);
>
> #endif /* MALLOC_PREFIX */
>
> -#define ALIGN (HAVE_AVX512 ? 64 : (HAVE_AVX ? 32 : 16))
> +#define ALIGN (HAVE_AVX512 ? 64 : 32)

LGTM

It could be good to add a comment here, to indicate how this value relates 
to the alignemnts used in structs.

For others who commented in this thread, it all boils down to something 
like this:

struct MyData {
     uint8_t __attribute__((aligned(32))) aligned_data[1024];
};

void func(void) {
     struct MyData *obj = malloc(sizeof(*obj)); // Uusally only aligned to 8 bytes
     // operate on obj->aligned_data[]
}

Due to how aligned_data is declared, we promise to the compiler that it is 
aligned to 32 bytes, and that the compiler can assume this wherever. 
Depending on -march or whatever, this can be to access it with 
instructions that assume 32 byte alignment.

// Martin
Timo Rothenpieler Dec. 6, 2023, 1:27 p.m. UTC | #7
On 06/12/2023 14:25, Martin Storsjö wrote:
> On Sun, 3 Dec 2023, Timo Rothenpieler wrote:
> 
>> FFmpeg has instances of DECLARE_ALIGNED(32, ...) in a lot of structs,
>> which then end up heap-allocated.
>> By declaring any variable in a struct, or tree of structs, to be 32 byte
>> aligned, it allows the compiler to safely assume the entire struct
>> itself is also 32 byte aligned.
>>
>> This might make the compiler emit code which straight up crashes or
>> misbehaves in other ways, and at least in one instances is now
>> documented to actually do (see ticket 10549 on trac).
>> The issue there is that an unrelated variable in SingleChannelElement is
>> declared to have an alignment of 32 bytes. So if the compiler does a copy
>> in decode_cpe() with avx instructions, but ffmpeg is built with
>> --disable-avx, this results in a crash, since the memory is only 16 byte
>> aligned.
>> Mind you, even if the compiler does not emit avx instructions, the code
>> is still invalid and could misbehave. It just happens not to. Declaring
>> any variable in a struct with a 32 byte alignment promises 32 byte
>> alignment of the whole struct to the compiler.
>>
>> Instead of now going through all instances of variables in structs
>> being declared as 32 byte aligned, this patch bumps the minimum alignment
>> to 32 bytes.
>> ---
>> libavutil/mem.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/libavutil/mem.c b/libavutil/mem.c
>> index 36b8940a0c..26a9b9753b 100644
>> --- a/libavutil/mem.c
>> +++ b/libavutil/mem.c
>> @@ -62,7 +62,7 @@ void  free(void *ptr);
>>
>> #endif /* MALLOC_PREFIX */
>>
>> -#define ALIGN (HAVE_AVX512 ? 64 : (HAVE_AVX ? 32 : 16))
>> +#define ALIGN (HAVE_AVX512 ? 64 : 32)
> 
> LGTM
> 
> It could be good to add a comment here, to indicate how this value 
> relates to the alignemnts used in structs.
> 
> For others who commented in this thread, it all boils down to something 
> like this:
> 
> struct MyData {
>      uint8_t __attribute__((aligned(32))) aligned_data[1024];
> };

It's even a bit more complex than that.
The case that's crashing right now is a member that has no alignment 
declared on itself at all.
But another member of the same struct does, and so the compiler assumes 
the whole struct to be aligned.

> void func(void) {
>      struct MyData *obj = malloc(sizeof(*obj)); // Uusally only aligned 
> to 8 bytes
>      // operate on obj->aligned_data[]
> }
> 
> Due to how aligned_data is declared, we promise to the compiler that it 
> is aligned to 32 bytes, and that the compiler can assume this wherever. 
> Depending on -march or whatever, this can be to access it with 
> instructions that assume 32 byte alignment.
> 
> // Martin
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Martin Storsjö Dec. 6, 2023, 1:29 p.m. UTC | #8
On Wed, 6 Dec 2023, Timo Rothenpieler wrote:

>
>
> On 06/12/2023 14:25, Martin Storsjö wrote:
>> On Sun, 3 Dec 2023, Timo Rothenpieler wrote:
>> 
>>> FFmpeg has instances of DECLARE_ALIGNED(32, ...) in a lot of structs,
>>> which then end up heap-allocated.
>>> By declaring any variable in a struct, or tree of structs, to be 32 byte
>>> aligned, it allows the compiler to safely assume the entire struct
>>> itself is also 32 byte aligned.
>>>
>>> This might make the compiler emit code which straight up crashes or
>>> misbehaves in other ways, and at least in one instances is now
>>> documented to actually do (see ticket 10549 on trac).
>>> The issue there is that an unrelated variable in SingleChannelElement is
>>> declared to have an alignment of 32 bytes. So if the compiler does a copy
>>> in decode_cpe() with avx instructions, but ffmpeg is built with
>>> --disable-avx, this results in a crash, since the memory is only 16 byte
>>> aligned.
>>> Mind you, even if the compiler does not emit avx instructions, the code
>>> is still invalid and could misbehave. It just happens not to. Declaring
>>> any variable in a struct with a 32 byte alignment promises 32 byte
>>> alignment of the whole struct to the compiler.
>>>
>>> Instead of now going through all instances of variables in structs
>>> being declared as 32 byte aligned, this patch bumps the minimum alignment
>>> to 32 bytes.
>>> ---
>>> libavutil/mem.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/libavutil/mem.c b/libavutil/mem.c
>>> index 36b8940a0c..26a9b9753b 100644
>>> --- a/libavutil/mem.c
>>> +++ b/libavutil/mem.c
>>> @@ -62,7 +62,7 @@ void  free(void *ptr);
>>>
>>> #endif /* MALLOC_PREFIX */
>>>
>>> -#define ALIGN (HAVE_AVX512 ? 64 : (HAVE_AVX ? 32 : 16))
>>> +#define ALIGN (HAVE_AVX512 ? 64 : 32)
>> 
>> LGTM
>> 
>> It could be good to add a comment here, to indicate how this value 
>> relates to the alignemnts used in structs.
>> 
>> For others who commented in this thread, it all boils down to something 
>> like this:
>> 
>> struct MyData {
>>      uint8_t __attribute__((aligned(32))) aligned_data[1024];
>> };
>
> It's even a bit more complex than that.
> The case that's crashing right now is a member that has no alignment 
> declared on itself at all.
> But another member of the same struct does, and so the compiler assumes 
> the whole struct to be aligned.

Ah, tricky! Yeah, that's also a valid assumption for the compiler, but 
also a rather non-obvious one.

// Martin
Timo Rothenpieler Dec. 8, 2023, 12:15 a.m. UTC | #9
On 06.12.2023 14:25, Martin Storsjö wrote:
> On Sun, 3 Dec 2023, Timo Rothenpieler wrote:
> 
>> FFmpeg has instances of DECLARE_ALIGNED(32, ...) in a lot of structs,
>> which then end up heap-allocated.
>> By declaring any variable in a struct, or tree of structs, to be 32 byte
>> aligned, it allows the compiler to safely assume the entire struct
>> itself is also 32 byte aligned.
>>
>> This might make the compiler emit code which straight up crashes or
>> misbehaves in other ways, and at least in one instances is now
>> documented to actually do (see ticket 10549 on trac).
>> The issue there is that an unrelated variable in SingleChannelElement is
>> declared to have an alignment of 32 bytes. So if the compiler does a copy
>> in decode_cpe() with avx instructions, but ffmpeg is built with
>> --disable-avx, this results in a crash, since the memory is only 16 byte
>> aligned.
>> Mind you, even if the compiler does not emit avx instructions, the code
>> is still invalid and could misbehave. It just happens not to. Declaring
>> any variable in a struct with a 32 byte alignment promises 32 byte
>> alignment of the whole struct to the compiler.
>>
>> Instead of now going through all instances of variables in structs
>> being declared as 32 byte aligned, this patch bumps the minimum alignment
>> to 32 bytes.
>> ---
>> libavutil/mem.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/libavutil/mem.c b/libavutil/mem.c
>> index 36b8940a0c..26a9b9753b 100644
>> --- a/libavutil/mem.c
>> +++ b/libavutil/mem.c
>> @@ -62,7 +62,7 @@ void  free(void *ptr);
>>
>> #endif /* MALLOC_PREFIX */
>>
>> -#define ALIGN (HAVE_AVX512 ? 64 : (HAVE_AVX ? 32 : 16))
>> +#define ALIGN (HAVE_AVX512 ? 64 : 32)
> 
> LGTM
> 
> It could be good to add a comment here, to indicate how this value 
> relates to the alignemnts used in structs.

Thinking about this, I don't think a comment _here_ is a good idea.
I'd be more inclined to add it to the DECLARE_ALIGNED macro.
People defining a new aligned member variable are at least a little more 
likely to read that, compared to something next to this random define 
that's not even in a header.

Will add that and push soon.
I'll also check how far this will need backported. Likely to almost all 
versions ever.

> For others who commented in this thread, it all boils down to something 
> like this:
> 
> struct MyData {
>      uint8_t __attribute__((aligned(32))) aligned_data[1024];
> };
> 
> void func(void) {
>      struct MyData *obj = malloc(sizeof(*obj)); // Uusally only aligned 
> to 8 bytes
>      // operate on obj->aligned_data[]
> }
> 
> Due to how aligned_data is declared, we promise to the compiler that it 
> is aligned to 32 bytes, and that the compiler can assume this wherever. 
> Depending on -march or whatever, this can be to access it with 
> instructions that assume 32 byte alignment.
> 
> // Martin
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Martin Storsjö Dec. 8, 2023, 5:57 a.m. UTC | #10
On Fri, 8 Dec 2023, Timo Rothenpieler wrote:

> On 06.12.2023 14:25, Martin Storsjö wrote:
>> On Sun, 3 Dec 2023, Timo Rothenpieler wrote:
>> 
>>> FFmpeg has instances of DECLARE_ALIGNED(32, ...) in a lot of structs,
>>> which then end up heap-allocated.
>>> By declaring any variable in a struct, or tree of structs, to be 32 
> byte
>>> aligned, it allows the compiler to safely assume the entire struct
>>> itself is also 32 byte aligned.
>>>
>>> This might make the compiler emit code which straight up crashes or
>>> misbehaves in other ways, and at least in one instances is now
>>> documented to actually do (see ticket 10549 on trac).
>>> The issue there is that an unrelated variable in SingleChannelElement 
> is
>>> declared to have an alignment of 32 bytes. So if the compiler does a 
> copy
>>> in decode_cpe() with avx instructions, but ffmpeg is built with
>>> --disable-avx, this results in a crash, since the memory is only 16 
> byte
>>> aligned.
>>> Mind you, even if the compiler does not emit avx instructions, the 
> code
>>> is still invalid and could misbehave. It just happens not to. 
> Declaring
>>> any variable in a struct with a 32 byte alignment promises 32 byte
>>> alignment of the whole struct to the compiler.
>>>
>>> Instead of now going through all instances of variables in structs
>>> being declared as 32 byte aligned, this patch bumps the minimum 
> alignment
>>> to 32 bytes.
>>> ---
>>> libavutil/mem.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/libavutil/mem.c b/libavutil/mem.c
>>> index 36b8940a0c..26a9b9753b 100644
>>> --- a/libavutil/mem.c
>>> +++ b/libavutil/mem.c
>>> @@ -62,7 +62,7 @@ void  free(void *ptr);
>>>
>>> #endif /* MALLOC_PREFIX */
>>>
>>> -#define ALIGN (HAVE_AVX512 ? 64 : (HAVE_AVX ? 32 : 16))
>>> +#define ALIGN (HAVE_AVX512 ? 64 : 32)
>> 
>> LGTM
>> 
>> It could be good to add a comment here, to indicate how this value 
>> relates to the alignemnts used in structs.
>
> Thinking about this, I don't think a comment _here_ is a good idea.
> I'd be more inclined to add it to the DECLARE_ALIGNED macro.
> People defining a new aligned member variable are at least a little more 
> likely to read that, compared to something next to this random define 
> that's not even in a header.

I still think it'd be good to have a comment here, too, to explain this 
code to whoever is looking at it (why 32 or 64, why not 16/32/64 etc). I 
agree that it can be good to mention it near DECLARE_ALIGNED as well, but 
these comments serve different readers.

// Martin
Andreas Rheinhardt Dec. 8, 2023, 10:01 a.m. UTC | #11
Timo Rothenpieler:
> FFmpeg has instances of DECLARE_ALIGNED(32, ...) in a lot of structs,
> which then end up heap-allocated.
> By declaring any variable in a struct, or tree of structs, to be 32 byte
> aligned, it allows the compiler to safely assume the entire struct
> itself is also 32 byte aligned.
> 
> This might make the compiler emit code which straight up crashes or
> misbehaves in other ways, and at least in one instances is now
> documented to actually do (see ticket 10549 on trac).
> The issue there is that an unrelated variable in SingleChannelElement is
> declared to have an alignment of 32 bytes. So if the compiler does a copy
> in decode_cpe() with avx instructions, but ffmpeg is built with
> --disable-avx, this results in a crash, since the memory is only 16 byte
> aligned.
> Mind you, even if the compiler does not emit avx instructions, the code
> is still invalid and could misbehave. It just happens not to. Declaring
> any variable in a struct with a 32 byte alignment promises 32 byte
> alignment of the whole struct to the compiler.
> 
> Instead of now going through all instances of variables in structs
> being declared as 32 byte aligned, this patch bumps the minimum alignment
> to 32 bytes.
> ---
>  libavutil/mem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libavutil/mem.c b/libavutil/mem.c
> index 36b8940a0c..26a9b9753b 100644
> --- a/libavutil/mem.c
> +++ b/libavutil/mem.c
> @@ -62,7 +62,7 @@ void  free(void *ptr);
>  
>  #endif /* MALLOC_PREFIX */
>  
> -#define ALIGN (HAVE_AVX512 ? 64 : (HAVE_AVX ? 32 : 16))
> +#define ALIGN (HAVE_AVX512 ? 64 : 32)
>  
>  /* NOTE: if you want to override these functions with your own
>   * implementations (not recommended) you have to link libav* as

1. There is another way in which this can be triggered: Namely if one
uses a build with AVX, but combines it with a lavu built without it; it
is also triggerable on non-x86 (having an insufficiently aligned pointer
is always UB even if the CPU does not have instructions that would
benefit from the additional alignment). You should mention this in the
commit message.

2. This topic gave me headaches when creating RefStruct. I "solved" it
by (ab)using STRIDE_ALIGN which mimicks the alignment of av_malloc(),
thereby ensuring that RefStruct does not break lavc builds built with
the avx dsp functions enabled (but it does not guard against using a
lavu whose av_malloc() only provides less alignment).

3. There is a downside to your patch: It bumps alignment for non-x86
arches which wastes memory (and may make allocators slower). We could
fix this by modifying the 32-byte-alignment macros to only provide 16
byte alignment if the ARCH_ (and potentially the HAVE_) defines indicate
that no alignment bigger than 16 is needed.

- Andreas
Timo Rothenpieler Dec. 8, 2023, 5:56 p.m. UTC | #12
On 08.12.2023 11:01, Andreas Rheinhardt wrote:
> Timo Rothenpieler:
>> FFmpeg has instances of DECLARE_ALIGNED(32, ...) in a lot of structs,
>> which then end up heap-allocated.
>> By declaring any variable in a struct, or tree of structs, to be 32 byte
>> aligned, it allows the compiler to safely assume the entire struct
>> itself is also 32 byte aligned.
>>
>> This might make the compiler emit code which straight up crashes or
>> misbehaves in other ways, and at least in one instances is now
>> documented to actually do (see ticket 10549 on trac).
>> The issue there is that an unrelated variable in SingleChannelElement is
>> declared to have an alignment of 32 bytes. So if the compiler does a copy
>> in decode_cpe() with avx instructions, but ffmpeg is built with
>> --disable-avx, this results in a crash, since the memory is only 16 byte
>> aligned.
>> Mind you, even if the compiler does not emit avx instructions, the code
>> is still invalid and could misbehave. It just happens not to. Declaring
>> any variable in a struct with a 32 byte alignment promises 32 byte
>> alignment of the whole struct to the compiler.
>>
>> Instead of now going through all instances of variables in structs
>> being declared as 32 byte aligned, this patch bumps the minimum alignment
>> to 32 bytes.
>> ---
>>   libavutil/mem.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/libavutil/mem.c b/libavutil/mem.c
>> index 36b8940a0c..26a9b9753b 100644
>> --- a/libavutil/mem.c
>> +++ b/libavutil/mem.c
>> @@ -62,7 +62,7 @@ void  free(void *ptr);
>>   
>>   #endif /* MALLOC_PREFIX */
>>   
>> -#define ALIGN (HAVE_AVX512 ? 64 : (HAVE_AVX ? 32 : 16))
>> +#define ALIGN (HAVE_AVX512 ? 64 : 32)
>>   
>>   /* NOTE: if you want to override these functions with your own
>>    * implementations (not recommended) you have to link libav* as
> 
> 1. There is another way in which this can be triggered: Namely if one
> uses a build with AVX, but combines it with a lavu built without it; it
> is also triggerable on non-x86 (having an insufficiently aligned pointer
> is always UB even if the CPU does not have instructions that would
> benefit from the additional alignment). You should mention this in the
> commit message.

Is mixing the libraries really a scenario we need to care about/support?

And yeah, this is only marginally related to AVX, in that it's what 
triggers a crash in this scenario.
But as stated in the commit message, it's not a valid thing to do in any 
case, on any arch. It just happens not to crash.

> 2. This topic gave me headaches when creating RefStruct. I "solved" it
> by (ab)using STRIDE_ALIGN which mimicks the alignment of av_malloc(),
> thereby ensuring that RefStruct does not break lavc builds built with
> the avx dsp functions enabled (but it does not guard against using a
> lavu whose av_malloc() only provides less alignment).
> 
> 3. There is a downside to your patch: It bumps alignment for non-x86
> arches which wastes memory (and may make allocators slower). We could
> fix this by modifying the 32-byte-alignment macros to only provide 16
> byte alignment if the ARCH_ (and potentially the HAVE_) defines indicate
> that no alignment bigger than 16 is needed.

But it's invalid on any other arch as well, just hasn't bitten us yet.
I'm not sure if I'd want to start maintaining a growingly complex list 
of CPU extensions and make the DECLARE_ALIGNED macro lie if we think it 
doesn't need 32 byte alignment.
We don't really know why someone wants a variable aligned after all.


> - Andreas
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Nicolas George Dec. 8, 2023, 6:11 p.m. UTC | #13
Timo Rothenpieler (12023-12-08):
> Is mixing the libraries really a scenario we need to care about/support?

No. We should merge all the libraries into a single libffmpeg.so. Having
separate libraries brings us no end of hassle and drawbacks, starting
with all the avpriv symbols and backward compatibility layers, and the
benefits it brings could be reached in simpler and more efficient ways.

But anytime I brought it up, the same naysayers would object, but when I
ask what precise benefit they think the current situation brings, with
the intent of explaining how it can be done better differently (I do not
bring that half-backed, I have thought about it beforehand) or in some
case explaining that no, this is not a benefit because linking does not
work like that. And then the naysayers would whine that I am making too
much a fuss.

Barring merging all libraries into a single libffmpeg.so, have configure
compute AV_LIBRARY_SIGNATURE as a 64 bits hash of the version and
configuration, then have in version.h of each library:

#define avsmth_version_check_signature() \
  avsmth_version_check_signature_ ## AV_LIBRARY_SIGNATURE()

then have avsmth_version_check_signature() in each library call the ones
in the library it depends on, and core functions like *register_all()
call it. Then if the libraries are mixed at run time it will produce an
error message by the linker that can be searched on the web.

Regards,
Andreas Rheinhardt Dec. 9, 2023, 5:23 a.m. UTC | #14
Timo Rothenpieler:
> On 08.12.2023 11:01, Andreas Rheinhardt wrote:
>> Timo Rothenpieler:
>>> FFmpeg has instances of DECLARE_ALIGNED(32, ...) in a lot of structs,
>>> which then end up heap-allocated.
>>> By declaring any variable in a struct, or tree of structs, to be 32 byte
>>> aligned, it allows the compiler to safely assume the entire struct
>>> itself is also 32 byte aligned.
>>>
>>> This might make the compiler emit code which straight up crashes or
>>> misbehaves in other ways, and at least in one instances is now
>>> documented to actually do (see ticket 10549 on trac).
>>> The issue there is that an unrelated variable in SingleChannelElement is
>>> declared to have an alignment of 32 bytes. So if the compiler does a
>>> copy
>>> in decode_cpe() with avx instructions, but ffmpeg is built with
>>> --disable-avx, this results in a crash, since the memory is only 16 byte
>>> aligned.
>>> Mind you, even if the compiler does not emit avx instructions, the code
>>> is still invalid and could misbehave. It just happens not to. Declaring
>>> any variable in a struct with a 32 byte alignment promises 32 byte
>>> alignment of the whole struct to the compiler.
>>>
>>> Instead of now going through all instances of variables in structs
>>> being declared as 32 byte aligned, this patch bumps the minimum
>>> alignment
>>> to 32 bytes.
>>> ---
>>>   libavutil/mem.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/libavutil/mem.c b/libavutil/mem.c
>>> index 36b8940a0c..26a9b9753b 100644
>>> --- a/libavutil/mem.c
>>> +++ b/libavutil/mem.c
>>> @@ -62,7 +62,7 @@ void  free(void *ptr);
>>>     #endif /* MALLOC_PREFIX */
>>>   -#define ALIGN (HAVE_AVX512 ? 64 : (HAVE_AVX ? 32 : 16))
>>> +#define ALIGN (HAVE_AVX512 ? 64 : 32)
>>>     /* NOTE: if you want to override these functions with your own
>>>    * implementations (not recommended) you have to link libav* as
>>
>> 1. There is another way in which this can be triggered: Namely if one
>> uses a build with AVX, but combines it with a lavu built without it; it
>> is also triggerable on non-x86 (having an insufficiently aligned pointer
>> is always UB even if the CPU does not have instructions that would
>> benefit from the additional alignment). You should mention this in the
>> commit message.
> 
> Is mixing the libraries really a scenario we need to care about/support?
> 

IMO, no, but Anton cares about it a lot.

> And yeah, this is only marginally related to AVX, in that it's what
> triggers a crash in this scenario.
> But as stated in the commit message, it's not a valid thing to do in any
> case, on any arch. It just happens not to crash.
> 

I know, but this patch also happens to fix this (at least to some
degree), so this should be mentioned in the commit message.

>> 2. This topic gave me headaches when creating RefStruct. I "solved" it
>> by (ab)using STRIDE_ALIGN which mimicks the alignment of av_malloc(),
>> thereby ensuring that RefStruct does not break lavc builds built with
>> the avx dsp functions enabled (but it does not guard against using a
>> lavu whose av_malloc() only provides less alignment).
>>
>> 3. There is a downside to your patch: It bumps alignment for non-x86
>> arches which wastes memory (and may make allocators slower). We could
>> fix this by modifying the 32-byte-alignment macros to only provide 16
>> byte alignment if the ARCH_ (and potentially the HAVE_) defines indicate
>> that no alignment bigger than 16 is needed.
> 
> But it's invalid on any other arch as well, just hasn't bitten us yet.

It is not invalid if we modify DECLARE_ALIGNED to never use more
alignment than 16 on non-x86 arches. Then all the other arches can
continue to use 16.

> I'm not sure if I'd want to start maintaining a growingly complex list
> of CPU extensions and make the DECLARE_ALIGNED macro lie if we think it
> doesn't need 32 byte alignment.
> We don't really know why someone wants a variable aligned after all.

I am fine with that point. Although I don't think it would be that
complicated if it is done at one point (namely in configure) and if all
the other places would just use a macro for max alignment (that would be
placed in config.h).

- Andreas
Timo Rothenpieler Jan. 12, 2024, 11:10 p.m. UTC | #15
On 09.12.2023 06:23, Andreas Rheinhardt wrote:
> Timo Rothenpieler:
>> On 08.12.2023 11:01, Andreas Rheinhardt wrote:
>>> Timo Rothenpieler:
>>>> FFmpeg has instances of DECLARE_ALIGNED(32, ...) in a lot of structs,
>>>> which then end up heap-allocated.
>>>> By declaring any variable in a struct, or tree of structs, to be 32 byte
>>>> aligned, it allows the compiler to safely assume the entire struct
>>>> itself is also 32 byte aligned.
>>>>
>>>> This might make the compiler emit code which straight up crashes or
>>>> misbehaves in other ways, and at least in one instances is now
>>>> documented to actually do (see ticket 10549 on trac).
>>>> The issue there is that an unrelated variable in SingleChannelElement is
>>>> declared to have an alignment of 32 bytes. So if the compiler does a
>>>> copy
>>>> in decode_cpe() with avx instructions, but ffmpeg is built with
>>>> --disable-avx, this results in a crash, since the memory is only 16 byte
>>>> aligned.
>>>> Mind you, even if the compiler does not emit avx instructions, the code
>>>> is still invalid and could misbehave. It just happens not to. Declaring
>>>> any variable in a struct with a 32 byte alignment promises 32 byte
>>>> alignment of the whole struct to the compiler.
>>>>
>>>> Instead of now going through all instances of variables in structs
>>>> being declared as 32 byte aligned, this patch bumps the minimum
>>>> alignment
>>>> to 32 bytes.
>>>> ---
>>>>    libavutil/mem.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/libavutil/mem.c b/libavutil/mem.c
>>>> index 36b8940a0c..26a9b9753b 100644
>>>> --- a/libavutil/mem.c
>>>> +++ b/libavutil/mem.c
>>>> @@ -62,7 +62,7 @@ void  free(void *ptr);
>>>>      #endif /* MALLOC_PREFIX */
>>>>    -#define ALIGN (HAVE_AVX512 ? 64 : (HAVE_AVX ? 32 : 16))
>>>> +#define ALIGN (HAVE_AVX512 ? 64 : 32)
>>>>      /* NOTE: if you want to override these functions with your own
>>>>     * implementations (not recommended) you have to link libav* as
>>>
>>> 1. There is another way in which this can be triggered: Namely if one
>>> uses a build with AVX, but combines it with a lavu built without it; it
>>> is also triggerable on non-x86 (having an insufficiently aligned pointer
>>> is always UB even if the CPU does not have instructions that would
>>> benefit from the additional alignment). You should mention this in the
>>> commit message.
>>
>> Is mixing the libraries really a scenario we need to care about/support?
>>
> 
> IMO, no, but Anton cares about it a lot.
> 
>> And yeah, this is only marginally related to AVX, in that it's what
>> triggers a crash in this scenario.
>> But as stated in the commit message, it's not a valid thing to do in any
>> case, on any arch. It just happens not to crash.
>>
> 
> I know, but this patch also happens to fix this (at least to some
> degree), so this should be mentioned in the commit message.
> 
>>> 2. This topic gave me headaches when creating RefStruct. I "solved" it
>>> by (ab)using STRIDE_ALIGN which mimicks the alignment of av_malloc(),
>>> thereby ensuring that RefStruct does not break lavc builds built with
>>> the avx dsp functions enabled (but it does not guard against using a
>>> lavu whose av_malloc() only provides less alignment).
>>>
>>> 3. There is a downside to your patch: It bumps alignment for non-x86
>>> arches which wastes memory (and may make allocators slower). We could
>>> fix this by modifying the 32-byte-alignment macros to only provide 16
>>> byte alignment if the ARCH_ (and potentially the HAVE_) defines indicate
>>> that no alignment bigger than 16 is needed.
>>
>> But it's invalid on any other arch as well, just hasn't bitten us yet.
> 
> It is not invalid if we modify DECLARE_ALIGNED to never use more
> alignment than 16 on non-x86 arches. Then all the other arches can
> continue to use 16.
> 
>> I'm not sure if I'd want to start maintaining a growingly complex list
>> of CPU extensions and make the DECLARE_ALIGNED macro lie if we think it
>> doesn't need 32 byte alignment.
>> We don't really know why someone wants a variable aligned after all.
> 
> I am fine with that point. Although I don't think it would be that
> complicated if it is done at one point (namely in configure) and if all
> the other places would just use a macro for max alignment (that would be
> placed in config.h).

ping about this.
I'm still not sure about the correct way forward here.

Aside from the complexity of figuring out the reasonable max align, I'm 
also a not sure on how to modify the macro to make use of it at all.
You can't put any kind of MIN/MAX construct into the body of the 
alignment macro after all.
diff mbox series

Patch

diff --git a/libavutil/mem.c b/libavutil/mem.c
index 36b8940a0c..26a9b9753b 100644
--- a/libavutil/mem.c
+++ b/libavutil/mem.c
@@ -62,7 +62,7 @@  void  free(void *ptr);
 
 #endif /* MALLOC_PREFIX */
 
-#define ALIGN (HAVE_AVX512 ? 64 : (HAVE_AVX ? 32 : 16))
+#define ALIGN (HAVE_AVX512 ? 64 : 32)
 
 /* NOTE: if you want to override these functions with your own
  * implementations (not recommended) you have to link libav* as