Message ID | 20220905093027.4090687-1-martin@martin.st |
---|---|
State | Superseded |
Headers | show |
Series | [FFmpeg-devel] cpu: Limit the number of auto threads in 32 bit builds | expand |
Context | Check | Description |
---|---|---|
yinshiyou/make_loongarch64 | success | Make finished |
yinshiyou/make_fate_loongarch64 | success | Make fate finished |
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
Martin Storsjö: > Limit the returned value from av_cpu_count to sensible amounts > in 32 bit builds. > > This chosen limit, 64, is somewhat arbitrary - a 32 bit process > is capable of creating much more than 64 threads. But in many > cases, multiple parts of the encoding pipeline (decoder, filters, > encoders) all create a pool of threads, auto sized according to the > number of cores. > > In one failing test, the process had managed to create 506 threads > before a pthread_create call failed. > > In the current set of fate tests, the filter-lavd-scalenorm test > seems to be the limiting factor; in a 32 bit build (arm linux, > running on an aarch64 kernel), it starts failing with an auto > thread count somewhere around 85. Therefore, pick the maximum > with some margin below this. > > This fixes running fate without any manually set number of threads > in 32 bit builds on machines with huge numbers of cores. > --- > libavutil/cpu.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/libavutil/cpu.c b/libavutil/cpu.c > index 0035e927a5..094bd71d3d 100644 > --- a/libavutil/cpu.c > +++ b/libavutil/cpu.c > @@ -233,6 +233,12 @@ int av_cpu_count(void) > nb_cpus = sysinfo.dwNumberOfProcessors; > #endif > > +#if SIZE_MAX <= UINT32_MAX > + // Avoid running out of memory/address space in 32 bit builds, by > + // limiting the number of auto threads. > + nb_cpus = FFMIN(nb_cpus, 64); > +#endif > + > if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed)) > av_log(NULL, AV_LOG_DEBUG, "detected %d logical cores\n", nb_cpus); > I don't think we should be lying in libavutil/cpu.c. We should instead limit the number of threads in the functions that actually create threads based upon the return value of av_cpu_count(); e.g. both frame_thread_encoder.c (limit 64) as well as pthread_frame.c and pthread_slice.c (limit 16) already limit these numbers. lavu/slicethread.c doesn't seem to do so. - Andreas
On Mon, 5 Sep 2022, Andreas Rheinhardt wrote: > Martin Storsjö: >> Limit the returned value from av_cpu_count to sensible amounts >> in 32 bit builds. >> >> This chosen limit, 64, is somewhat arbitrary - a 32 bit process >> is capable of creating much more than 64 threads. But in many >> cases, multiple parts of the encoding pipeline (decoder, filters, >> encoders) all create a pool of threads, auto sized according to the >> number of cores. >> >> In one failing test, the process had managed to create 506 threads >> before a pthread_create call failed. >> >> In the current set of fate tests, the filter-lavd-scalenorm test >> seems to be the limiting factor; in a 32 bit build (arm linux, >> running on an aarch64 kernel), it starts failing with an auto >> thread count somewhere around 85. Therefore, pick the maximum >> with some margin below this. >> >> This fixes running fate without any manually set number of threads >> in 32 bit builds on machines with huge numbers of cores. >> --- >> libavutil/cpu.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/libavutil/cpu.c b/libavutil/cpu.c >> index 0035e927a5..094bd71d3d 100644 >> --- a/libavutil/cpu.c >> +++ b/libavutil/cpu.c >> @@ -233,6 +233,12 @@ int av_cpu_count(void) >> nb_cpus = sysinfo.dwNumberOfProcessors; >> #endif >> >> +#if SIZE_MAX <= UINT32_MAX >> + // Avoid running out of memory/address space in 32 bit builds, by >> + // limiting the number of auto threads. >> + nb_cpus = FFMIN(nb_cpus, 64); >> +#endif >> + >> if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed)) >> av_log(NULL, AV_LOG_DEBUG, "detected %d logical cores\n", nb_cpus); >> > > I don't think we should be lying in libavutil/cpu.c. > We should instead limit the number of threads in the functions that > actually create threads based upon the return value of av_cpu_count(); > e.g. both frame_thread_encoder.c (limit 64) as well as pthread_frame.c > and pthread_slice.c (limit 16) already limit these numbers. > lavu/slicethread.c doesn't seem to do so. Right, that's probably the more sensible thing to do. For something like lavu/slicethread.c, it probably makes sense with a low-ish limit, like 16 or 32 - splitting up an individual frame in more slices than that probably doesn't make sense as automatic default, right? // Martin
Martin Storsjö: > On Mon, 5 Sep 2022, Andreas Rheinhardt wrote: > >> Martin Storsjö: >>> Limit the returned value from av_cpu_count to sensible amounts >>> in 32 bit builds. >>> >>> This chosen limit, 64, is somewhat arbitrary - a 32 bit process >>> is capable of creating much more than 64 threads. But in many >>> cases, multiple parts of the encoding pipeline (decoder, filters, >>> encoders) all create a pool of threads, auto sized according to the >>> number of cores. >>> >>> In one failing test, the process had managed to create 506 threads >>> before a pthread_create call failed. >>> >>> In the current set of fate tests, the filter-lavd-scalenorm test >>> seems to be the limiting factor; in a 32 bit build (arm linux, >>> running on an aarch64 kernel), it starts failing with an auto >>> thread count somewhere around 85. Therefore, pick the maximum >>> with some margin below this. >>> >>> This fixes running fate without any manually set number of threads >>> in 32 bit builds on machines with huge numbers of cores. >>> --- >>> libavutil/cpu.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/libavutil/cpu.c b/libavutil/cpu.c >>> index 0035e927a5..094bd71d3d 100644 >>> --- a/libavutil/cpu.c >>> +++ b/libavutil/cpu.c >>> @@ -233,6 +233,12 @@ int av_cpu_count(void) >>> nb_cpus = sysinfo.dwNumberOfProcessors; >>> #endif >>> >>> +#if SIZE_MAX <= UINT32_MAX >>> + // Avoid running out of memory/address space in 32 bit builds, by >>> + // limiting the number of auto threads. >>> + nb_cpus = FFMIN(nb_cpus, 64); >>> +#endif >>> + >>> if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed)) >>> av_log(NULL, AV_LOG_DEBUG, "detected %d logical cores\n", >>> nb_cpus); >>> >> >> I don't think we should be lying in libavutil/cpu.c. > >> We should instead limit the number of threads in the functions that >> actually create threads based upon the return value of av_cpu_count(); >> e.g. both frame_thread_encoder.c (limit 64) as well as pthread_frame.c >> and pthread_slice.c (limit 16) already limit these numbers. >> lavu/slicethread.c doesn't seem to do so. > > Right, that's probably the more sensible thing to do. > > For something like lavu/slicethread.c, it probably makes sense with a > low-ish limit, like 16 or 32 - splitting up an individual frame in more > slices than that probably doesn't make sense as automatic default, right? > Yes. E.g. MAX_AUTO_THREADS in pthread_slice.c is 16. - Andreas
diff --git a/libavutil/cpu.c b/libavutil/cpu.c index 0035e927a5..094bd71d3d 100644 --- a/libavutil/cpu.c +++ b/libavutil/cpu.c @@ -233,6 +233,12 @@ int av_cpu_count(void) nb_cpus = sysinfo.dwNumberOfProcessors; #endif +#if SIZE_MAX <= UINT32_MAX + // Avoid running out of memory/address space in 32 bit builds, by + // limiting the number of auto threads. + nb_cpus = FFMIN(nb_cpus, 64); +#endif + if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed)) av_log(NULL, AV_LOG_DEBUG, "detected %d logical cores\n", nb_cpus);