diff mbox series

[FFmpeg-devel] cpu: Limit the number of auto threads in 32 bit builds

Message ID 20220905093027.4090687-1-martin@martin.st
State Superseded
Headers show
Series [FFmpeg-devel] cpu: Limit the number of auto threads in 32 bit builds | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Martin Storsjö Sept. 5, 2022, 9:30 a.m. UTC
Limit the returned value from av_cpu_count to sensible amounts
in 32 bit builds.

This chosen limit, 64, is somewhat arbitrary - a 32 bit process
is capable of creating much more than 64 threads. But in many
cases, multiple parts of the encoding pipeline (decoder, filters,
encoders) all create a pool of threads, auto sized according to the
number of cores.

In one failing test, the process had managed to create 506 threads
before a pthread_create call failed.

In the current set of fate tests, the filter-lavd-scalenorm test
seems to be the limiting factor; in a 32 bit build (arm linux,
running on an aarch64 kernel), it starts failing with an auto
thread count somewhere around 85. Therefore, pick the maximum
with some margin below this.

This fixes running fate without any manually set number of threads
in 32 bit builds on machines with huge numbers of cores.
---
 libavutil/cpu.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Andreas Rheinhardt Sept. 5, 2022, 11:45 a.m. UTC | #1
Martin Storsjö:
> Limit the returned value from av_cpu_count to sensible amounts
> in 32 bit builds.
> 
> This chosen limit, 64, is somewhat arbitrary - a 32 bit process
> is capable of creating much more than 64 threads. But in many
> cases, multiple parts of the encoding pipeline (decoder, filters,
> encoders) all create a pool of threads, auto sized according to the
> number of cores.
> 
> In one failing test, the process had managed to create 506 threads
> before a pthread_create call failed.
> 
> In the current set of fate tests, the filter-lavd-scalenorm test
> seems to be the limiting factor; in a 32 bit build (arm linux,
> running on an aarch64 kernel), it starts failing with an auto
> thread count somewhere around 85. Therefore, pick the maximum
> with some margin below this.
> 
> This fixes running fate without any manually set number of threads
> in 32 bit builds on machines with huge numbers of cores.
> ---
>  libavutil/cpu.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/libavutil/cpu.c b/libavutil/cpu.c
> index 0035e927a5..094bd71d3d 100644
> --- a/libavutil/cpu.c
> +++ b/libavutil/cpu.c
> @@ -233,6 +233,12 @@ int av_cpu_count(void)
>      nb_cpus = sysinfo.dwNumberOfProcessors;
>  #endif
>  
> +#if SIZE_MAX <= UINT32_MAX
> +    // Avoid running out of memory/address space in 32 bit builds, by
> +    // limiting the number of auto threads.
> +    nb_cpus = FFMIN(nb_cpus, 64);
> +#endif
> +
>      if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed))
>          av_log(NULL, AV_LOG_DEBUG, "detected %d logical cores\n", nb_cpus);
>  

I don't think we should be lying in libavutil/cpu.c. We should instead
limit the number of threads in the functions that actually create
threads based upon the return value of av_cpu_count(); e.g. both
frame_thread_encoder.c (limit 64) as well as pthread_frame.c and
pthread_slice.c (limit 16) already limit these numbers.
lavu/slicethread.c doesn't seem to do so.

- Andreas
Martin Storsjö Sept. 5, 2022, 11:50 a.m. UTC | #2
On Mon, 5 Sep 2022, Andreas Rheinhardt wrote:

> Martin Storsjö:
>> Limit the returned value from av_cpu_count to sensible amounts
>> in 32 bit builds.
>> 
>> This chosen limit, 64, is somewhat arbitrary - a 32 bit process
>> is capable of creating much more than 64 threads. But in many
>> cases, multiple parts of the encoding pipeline (decoder, filters,
>> encoders) all create a pool of threads, auto sized according to the
>> number of cores.
>> 
>> In one failing test, the process had managed to create 506 threads
>> before a pthread_create call failed.
>> 
>> In the current set of fate tests, the filter-lavd-scalenorm test
>> seems to be the limiting factor; in a 32 bit build (arm linux,
>> running on an aarch64 kernel), it starts failing with an auto
>> thread count somewhere around 85. Therefore, pick the maximum
>> with some margin below this.
>> 
>> This fixes running fate without any manually set number of threads
>> in 32 bit builds on machines with huge numbers of cores.
>> ---
>>  libavutil/cpu.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>> 
>> diff --git a/libavutil/cpu.c b/libavutil/cpu.c
>> index 0035e927a5..094bd71d3d 100644
>> --- a/libavutil/cpu.c
>> +++ b/libavutil/cpu.c
>> @@ -233,6 +233,12 @@ int av_cpu_count(void)
>>      nb_cpus = sysinfo.dwNumberOfProcessors;
>>  #endif
>> 
>> +#if SIZE_MAX <= UINT32_MAX
>> +    // Avoid running out of memory/address space in 32 bit builds, by
>> +    // limiting the number of auto threads.
>> +    nb_cpus = FFMIN(nb_cpus, 64);
>> +#endif
>> +
>>      if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed))
>>          av_log(NULL, AV_LOG_DEBUG, "detected %d logical cores\n", nb_cpus);
>> 
>
> I don't think we should be lying in libavutil/cpu.c.

> We should instead limit the number of threads in the functions that 
> actually create threads based upon the return value of av_cpu_count(); 
> e.g. both frame_thread_encoder.c (limit 64) as well as pthread_frame.c 
> and pthread_slice.c (limit 16) already limit these numbers. 
> lavu/slicethread.c doesn't seem to do so.

Right, that's probably the more sensible thing to do.

For something like lavu/slicethread.c, it probably makes sense with a 
low-ish limit, like 16 or 32 - splitting up an individual frame in more 
slices than that probably doesn't make sense as automatic default, right?

// Martin
Andreas Rheinhardt Sept. 5, 2022, 11:54 a.m. UTC | #3
Martin Storsjö:
> On Mon, 5 Sep 2022, Andreas Rheinhardt wrote:
> 
>> Martin Storsjö:
>>> Limit the returned value from av_cpu_count to sensible amounts
>>> in 32 bit builds.
>>>
>>> This chosen limit, 64, is somewhat arbitrary - a 32 bit process
>>> is capable of creating much more than 64 threads. But in many
>>> cases, multiple parts of the encoding pipeline (decoder, filters,
>>> encoders) all create a pool of threads, auto sized according to the
>>> number of cores.
>>>
>>> In one failing test, the process had managed to create 506 threads
>>> before a pthread_create call failed.
>>>
>>> In the current set of fate tests, the filter-lavd-scalenorm test
>>> seems to be the limiting factor; in a 32 bit build (arm linux,
>>> running on an aarch64 kernel), it starts failing with an auto
>>> thread count somewhere around 85. Therefore, pick the maximum
>>> with some margin below this.
>>>
>>> This fixes running fate without any manually set number of threads
>>> in 32 bit builds on machines with huge numbers of cores.
>>> ---
>>>  libavutil/cpu.c | 6 ++++++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/libavutil/cpu.c b/libavutil/cpu.c
>>> index 0035e927a5..094bd71d3d 100644
>>> --- a/libavutil/cpu.c
>>> +++ b/libavutil/cpu.c
>>> @@ -233,6 +233,12 @@ int av_cpu_count(void)
>>>      nb_cpus = sysinfo.dwNumberOfProcessors;
>>>  #endif
>>>
>>> +#if SIZE_MAX <= UINT32_MAX
>>> +    // Avoid running out of memory/address space in 32 bit builds, by
>>> +    // limiting the number of auto threads.
>>> +    nb_cpus = FFMIN(nb_cpus, 64);
>>> +#endif
>>> +
>>>      if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed))
>>>          av_log(NULL, AV_LOG_DEBUG, "detected %d logical cores\n",
>>> nb_cpus);
>>>
>>
>> I don't think we should be lying in libavutil/cpu.c.
> 
>> We should instead limit the number of threads in the functions that
>> actually create threads based upon the return value of av_cpu_count();
>> e.g. both frame_thread_encoder.c (limit 64) as well as pthread_frame.c
>> and pthread_slice.c (limit 16) already limit these numbers.
>> lavu/slicethread.c doesn't seem to do so.
> 
> Right, that's probably the more sensible thing to do.
> 
> For something like lavu/slicethread.c, it probably makes sense with a
> low-ish limit, like 16 or 32 - splitting up an individual frame in more
> slices than that probably doesn't make sense as automatic default, right?
> 

Yes. E.g. MAX_AUTO_THREADS in pthread_slice.c is 16.

- Andreas
diff mbox series

Patch

diff --git a/libavutil/cpu.c b/libavutil/cpu.c
index 0035e927a5..094bd71d3d 100644
--- a/libavutil/cpu.c
+++ b/libavutil/cpu.c
@@ -233,6 +233,12 @@  int av_cpu_count(void)
     nb_cpus = sysinfo.dwNumberOfProcessors;
 #endif
 
+#if SIZE_MAX <= UINT32_MAX
+    // Avoid running out of memory/address space in 32 bit builds, by
+    // limiting the number of auto threads.
+    nb_cpus = FFMIN(nb_cpus, 64);
+#endif
+
     if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed))
         av_log(NULL, AV_LOG_DEBUG, "detected %d logical cores\n", nb_cpus);