diff mbox series

[FFmpeg-devel,V2] lavc/libvpx: increase thread limit to 64

Message ID 20221006134959.771-1-ovchinnikov.dmitrii@gmail.com
State New
Headers show
Series [FFmpeg-devel,V2] lavc/libvpx: increase thread limit to 64 | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Dmitrii Ovchinnikov Oct. 6, 2022, 1:49 p.m. UTC
This change improves the performance and multicore scalability of the vp9
codec for streaming single-pass encoded videos by taking advantage of up
to 64 cores in the system. The current thread limit for ffmpeg codecs is 16
(MAX_AUTO_THREADS in pthread_internal.h) due to a limitation in H.264 codec
that prevents more than 16 threads being used.

Experiments show that increasing the thread limit to 64 for vp9 improves
the performance for encoding 4K raw videos for streaming by up to 47%
compared to 16 threads, and from 20-30% for 32 threads, with the same quality
as measured by the VMAF score.

Rationale for this change:
Vp9 uses tiling to split the video frame into multiple columns; tiles must
be at least 256 pixels wide, so there is a limit to how many tiles can be
used. The tiles can be processed in parallel, and more tiles mean more CPU
threads can be used. 4K videos can make use of 16 threads, and 8K videos
can use 32. Row-mt can double the number of threads so 64 threads can be used.
---
 libavcodec/libvpx.h    | 2 ++
 libavcodec/libvpxdec.c | 2 +-
 libavcodec/libvpxenc.c | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

Comments

Dmitrii Ovchinnikov Oct. 6, 2022, 2:03 p.m. UTC | #1
Removed unnecessary comment as requested by Ronald S. Bultje.

some more thoughts about the patch:

This is my deduction from what I could find - There was a hard limit of 16
threads in libvpx as there was no benefit to adding more threads based upon
the parallelism  available in the the codec. This was based upon
resolutions upto 3840x2160 (more details below). For 4k/8K videos this
restriction does not allow enough parallelism to be exploited. Adding more
threads improves performance for higher resolution videos.

Details:
Tiling and Threading Recommendations
Tiling splits the video frame into multiple columns, which slightly reduces
quality but speeds up encoding performance. Tiles must be at least 256
pixels wide, so there is a limit to how many tiles can be used. Depending
upon the number of tiles and the resolution of the output frame, more CPU
threads may be useful. There is limited value to multiple threads when the
output frame size is very small.

The following settings are recommended for tiling and threading at various
resolutions. The number of threads is doubled as there is an option row-mt
when set allows row-based multithreading within the tiles.

Frame Size | Number of tile-columns | Number of threads
320x240 | 1 (-tile-columns 0) | 2
640x360 | 2 (-tile-columns 1) | 4
640x480 | 2 (-tile-columns 1) | 4
1280x720 | 4 (-tile-columns 2) | 8
1920x1080 | 4 (-tile-columns 2) | 8
2560x1440 | 8 (-tile-columns 3) | 16
3840x2160 | 8 (-tile-columns 3) | 16

In ffmpeg, the number of tiles is controlled with the -tile-columns
parameter and the number of threads by -threads. For example, a 640x480
encode would use the command-line -tile-columns 2 -threads 4.
Andreas Rheinhardt Oct. 6, 2022, 2:12 p.m. UTC | #2
OvchinnikovDmitrii:
> This change improves the performance and multicore scalability of the vp9
> codec for streaming single-pass encoded videos by taking advantage of up
> to 64 cores in the system. The current thread limit for ffmpeg codecs is 16
> (MAX_AUTO_THREADS in pthread_internal.h) due to a limitation in H.264 codec
> that prevents more than 16 threads being used.
> 
> Experiments show that increasing the thread limit to 64 for vp9 improves
> the performance for encoding 4K raw videos for streaming by up to 47%
> compared to 16 threads, and from 20-30% for 32 threads, with the same quality
> as measured by the VMAF score.
> 
> Rationale for this change:
> Vp9 uses tiling to split the video frame into multiple columns; tiles must
> be at least 256 pixels wide, so there is a limit to how many tiles can be
> used. The tiles can be processed in parallel, and more tiles mean more CPU
> threads can be used. 4K videos can make use of 16 threads, and 8K videos
> can use 32. Row-mt can double the number of threads so 64 threads can be used.
> ---
>  libavcodec/libvpx.h    | 2 ++
>  libavcodec/libvpxdec.c | 2 +-
>  libavcodec/libvpxenc.c | 2 +-
>  3 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/libavcodec/libvpx.h b/libavcodec/libvpx.h
> index 0caed8cdcb..331feb8745 100644
> --- a/libavcodec/libvpx.h
> +++ b/libavcodec/libvpx.h
> @@ -25,6 +25,8 @@
>  
>  #include "codec_internal.h"
>  
> +#define MAX_VPX_THREADS 64
> +
>  void ff_vp9_init_static(FFCodec *codec);
>  #if 0
>  enum AVPixelFormat ff_vpx_imgfmt_to_pixfmt(vpx_img_fmt_t img);
> diff --git a/libavcodec/libvpxdec.c b/libavcodec/libvpxdec.c
> index 9cd2c56caf..0ae19c3f72 100644
> --- a/libavcodec/libvpxdec.c
> +++ b/libavcodec/libvpxdec.c
> @@ -88,7 +88,7 @@ static av_cold int vpx_init(AVCodecContext *avctx,
>                              const struct vpx_codec_iface *iface)
>  {
>      struct vpx_codec_dec_cfg deccfg = {
> -        .threads = FFMIN(avctx->thread_count ? avctx->thread_count : av_cpu_count(), 16)
> +        .threads = FFMIN(avctx->thread_count ? avctx->thread_count : av_cpu_count(), MAX_VPX_THREADS)
>      };
>  
>      av_log(avctx, AV_LOG_INFO, "%s\n", vpx_codec_version_str());
> diff --git a/libavcodec/libvpxenc.c b/libavcodec/libvpxenc.c
> index 667cffc200..3ff86ad08d 100644
> --- a/libavcodec/libvpxenc.c
> +++ b/libavcodec/libvpxenc.c
> @@ -942,7 +942,7 @@ static av_cold int vpx_init(AVCodecContext *avctx,
>      enccfg.g_timebase.num = avctx->time_base.num;
>      enccfg.g_timebase.den = avctx->time_base.den;
>      enccfg.g_threads      =
> -        FFMIN(avctx->thread_count ? avctx->thread_count : av_cpu_count(), 16);
> +        FFMIN(avctx->thread_count ? avctx->thread_count : av_cpu_count(), MAX_VPX_THREADS);
>      enccfg.g_lag_in_frames= ctx->lag_in_frames;
>  
>      if (avctx->flags & AV_CODEC_FLAG_PASS1)

1. Why do you still impose an upper limit unconditionally even if the
user has set his preferred number of threads?
2. According to
https://ffmpeg.org/pipermail/ffmpeg-devel/2018-November/236406.html the
maximum of 16 has not been chosen because of H.264, but because there
was some form of restriction in libvpx. Or at least there was belief in
the existence of such a restriction.
3. This code potentially calls av_cpu_count() twice.

- Andreas
Dmitrii Ovchinnikov Oct. 25, 2022, 3:55 p.m. UTC | #3
>> Why do you still impose an upper limit unconditionally even if the
>>user has set his preferred number of threads?
Libvpx-vp9 does not support number of threads greater than 64, so we impose
an upper limit of 64.
E.g. if we set it any higher we get the following execution error:
[libvpx-vp9 @ 0x2f631c0] Failed to initialize encoder: Invalid parameter
[libvpx-vp9 @ 0x2f631c0]   Additional information: g_threads out of range
[..64]
Error initializing output stream 0:0 -- Error while opening encoder for
output stream #0:0 - maybe incorrect parameters such as bit_rate, rate,
width or height

>>According to
https://ffmpeg.org/pipermail/ffmpeg-devel/2018-November/236406.html the
>>maximum of 16 has not been chosen because of H.264, but because there
>>was some form of restriction in libvpx. Or at least there was belief in
>>the existence of such a restriction.
There is a restriction of maximum 64 threads, not 16.

>>This code potentially calls av_cpu_count() twice.
Can you please clarify how it calls it twice? Thanks.
James Zern Nov. 21, 2022, 5:34 p.m. UTC | #4
On Tue, Oct 25, 2022 at 8:56 AM Dmitrii Ovchinnikov
<ovchinnikov.dmitrii@gmail.com> wrote:
>
> >> Why do you still impose an upper limit unconditionally even if the
> >>user has set his preferred number of threads?
> Libvpx-vp9 does not support number of threads greater than 64, so we impose
> an upper limit of 64.
> E.g. if we set it any higher we get the following execution error:
> [libvpx-vp9 @ 0x2f631c0] Failed to initialize encoder: Invalid parameter
> [libvpx-vp9 @ 0x2f631c0]   Additional information: g_threads out of range
> [..64]
> Error initializing output stream 0:0 -- Error while opening encoder for
> output stream #0:0 - maybe incorrect parameters such as bit_rate, rate,
> width or height
>

Deferring the check to libvpx should be fine and would mean less
maintenance of this wrapper if there are any changes made there.

> >>According to
> https://ffmpeg.org/pipermail/ffmpeg-devel/2018-November/236406.html the
> >>maximum of 16 has not been chosen because of H.264, but because there
> >>was some form of restriction in libvpx. Or at least there was belief in
> >>the existence of such a restriction.
> There is a restriction of maximum 64 threads, not 16.
>
> >>This code potentially calls av_cpu_count() twice.
> Can you please clarify how it calls it twice? Thanks.
>
Dmitrii Ovchinnikov Nov. 23, 2022, 2:25 p.m. UTC | #5
Thanks for the answer and comment! I will think about how best to
rework the patch and then send a new version.
diff mbox series

Patch

diff --git a/libavcodec/libvpx.h b/libavcodec/libvpx.h
index 0caed8cdcb..331feb8745 100644
--- a/libavcodec/libvpx.h
+++ b/libavcodec/libvpx.h
@@ -25,6 +25,8 @@ 
 
 #include "codec_internal.h"
 
+#define MAX_VPX_THREADS 64
+
 void ff_vp9_init_static(FFCodec *codec);
 #if 0
 enum AVPixelFormat ff_vpx_imgfmt_to_pixfmt(vpx_img_fmt_t img);
diff --git a/libavcodec/libvpxdec.c b/libavcodec/libvpxdec.c
index 9cd2c56caf..0ae19c3f72 100644
--- a/libavcodec/libvpxdec.c
+++ b/libavcodec/libvpxdec.c
@@ -88,7 +88,7 @@  static av_cold int vpx_init(AVCodecContext *avctx,
                             const struct vpx_codec_iface *iface)
 {
     struct vpx_codec_dec_cfg deccfg = {
-        .threads = FFMIN(avctx->thread_count ? avctx->thread_count : av_cpu_count(), 16)
+        .threads = FFMIN(avctx->thread_count ? avctx->thread_count : av_cpu_count(), MAX_VPX_THREADS)
     };
 
     av_log(avctx, AV_LOG_INFO, "%s\n", vpx_codec_version_str());
diff --git a/libavcodec/libvpxenc.c b/libavcodec/libvpxenc.c
index 667cffc200..3ff86ad08d 100644
--- a/libavcodec/libvpxenc.c
+++ b/libavcodec/libvpxenc.c
@@ -942,7 +942,7 @@  static av_cold int vpx_init(AVCodecContext *avctx,
     enccfg.g_timebase.num = avctx->time_base.num;
     enccfg.g_timebase.den = avctx->time_base.den;
     enccfg.g_threads      =
-        FFMIN(avctx->thread_count ? avctx->thread_count : av_cpu_count(), 16);
+        FFMIN(avctx->thread_count ? avctx->thread_count : av_cpu_count(), MAX_VPX_THREADS);
     enccfg.g_lag_in_frames= ctx->lag_in_frames;
 
     if (avctx->flags & AV_CODEC_FLAG_PASS1)