diff mbox series

[FFmpeg-devel,RFC] ffmpeg: default to single thread when hwaccel is active

Message ID 20210929170134.29381-1-timo@rothenpieler.org
State New
Headers show
Series [FFmpeg-devel,RFC] ffmpeg: default to single thread when hwaccel is active | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished
andriy/make_ppc success Make finished
andriy/make_fate_ppc success Make fate finished

Commit Message

Timo Rothenpieler Sept. 29, 2021, 5:01 p.m. UTC
---
 fftools/ffmpeg.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

Not sure if there is any downside to this.
Threading for hwaccel does not make a whole lot of sense, and at least
in case of nvdec wastes a lot of VRAM for no performance gain, and
specially on high core count systems by default can even exhaust the
maximum frame pool size.

Comments

Dennis Mungai Sept. 29, 2021, 5:17 p.m. UTC | #1
On Wed, 29 Sept 2021 at 20:02, Timo Rothenpieler <timo@rothenpieler.org>
wrote:

> ---
>  fftools/ffmpeg.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> Not sure if there is any downside to this.
> Threading for hwaccel does not make a whole lot of sense, and at least
> in case of nvdec wastes a lot of VRAM for no performance gain, and
> specially on high core count systems by default can even exhaust the
> maximum frame pool size.
>
> diff --git a/fftools/ffmpeg.c b/fftools/ffmpeg.c
> index 98c2421938..d007d55173 100644
> --- a/fftools/ffmpeg.c
> +++ b/fftools/ffmpeg.c
> @@ -2995,12 +2995,6 @@ FF_ENABLE_DEPRECATION_WARNINGS
>           * audio, and video decoders such as cuvid or mediacodec */
>          ist->dec_ctx->pkt_timebase = ist->st->time_base;
>
> -        if (!av_dict_get(ist->decoder_opts, "threads", NULL, 0))
> -            av_dict_set(&ist->decoder_opts, "threads", "auto", 0);
> -        /* Attached pics are sparse, therefore we would not want to delay
> their decoding till EOF. */
> -        if (ist->st->disposition & AV_DISPOSITION_ATTACHED_PIC)
> -            av_dict_set(&ist->decoder_opts, "threads", "1", 0);
> -
>          ret = hw_device_setup_for_decode(ist);
>          if (ret < 0) {
>              snprintf(error, error_len, "Device setup failed for "
> @@ -3009,6 +3003,16 @@ FF_ENABLE_DEPRECATION_WARNINGS
>              return ret;
>          }
>
> +        if (!av_dict_get(ist->decoder_opts, "threads", NULL, 0)) {
> +            if (ist->dec_ctx->hw_device_ctx)
> +                av_dict_set(&ist->decoder_opts, "threads", "1", 0);
> +            else
> +                av_dict_set(&ist->decoder_opts, "threads", "auto", 0);
> +        }
> +        /* Attached pics are sparse, therefore we would not want to delay
> their decoding till EOF. */
> +        if (ist->st->disposition & AV_DISPOSITION_ATTACHED_PIC)
> +            av_dict_set(&ist->decoder_opts, "threads", "1", 0);
> +
>          if ((ret = avcodec_open2(ist->dec_ctx, codec,
> &ist->decoder_opts)) < 0) {
>              if (ret == AVERROR_EXPERIMENTAL)
>                  abort_codec_experimental(codec, 0);
> --
> 2.25.1
>
>
>
A potential downside to this would be on QSV's side, which needs at least
2  threads to prevent dead-locking, see
https://github.com/Intel-Media-SDK/MediaSDK/blob/master/_studio/mfx_lib/scheduler/linux/src/mfx_scheduler_core_ischeduler.cpp#L90-L93

Regards,

Dennis.
Timo Rothenpieler Sept. 29, 2021, 5:22 p.m. UTC | #2
On 29.09.2021 19:17, Dennis Mungai wrote:
> A potential downside to this would be on QSV's side, which needs at least
> 2  threads to prevent dead-locking, see
> https://github.com/Intel-Media-SDK/MediaSDK/blob/master/_studio/mfx_lib/scheduler/linux/src/mfx_scheduler_core_ischeduler.cpp#L90-L93

Is that even controlled by the avcodec threads parameter?
I can't find any occurrence of "numberOfThreads" anywhere in the entire 
codebase.
Dennis Mungai Sept. 29, 2021, 5:31 p.m. UTC | #3
On Wed, 29 Sept 2021 at 20:22, Timo Rothenpieler <timo@rothenpieler.org>
wrote:

> On 29.09.2021 19:17, Dennis Mungai wrote:
> > A potential downside to this would be on QSV's side, which needs at least
> > 2  threads to prevent dead-locking, see
> >
> https://github.com/Intel-Media-SDK/MediaSDK/blob/master/_studio/mfx_lib/scheduler/linux/src/mfx_scheduler_core_ischeduler.cpp#L90-L93
>
> Is that even controlled by the avcodec threads parameter?
> I can't find any occurrence of "numberOfThreads" anywhere in the entire
> codebase.
>
>
See this patchwork:
https://patchwork.ffmpeg.org/project/ffmpeg/patch/20210408070929.860244-1-haihao.xiang@intel.com/
Submitted  by Haihao Xiang, CC'ed herein.
Timo Rothenpieler Sept. 29, 2021, 5:37 p.m. UTC | #4
On 29.09.2021 19:31, Dennis Mungai wrote:
> On Wed, 29 Sept 2021 at 20:22, Timo Rothenpieler <timo@rothenpieler.org>
> wrote:
> 
>> On 29.09.2021 19:17, Dennis Mungai wrote:
>>> A potential downside to this would be on QSV's side, which needs at least
>>> 2  threads to prevent dead-locking, see
>>>
>> https://github.com/Intel-Media-SDK/MediaSDK/blob/master/_studio/mfx_lib/scheduler/linux/src/mfx_scheduler_core_ischeduler.cpp#L90-L93
>>
>> Is that even controlled by the avcodec threads parameter?
>> I can't find any occurrence of "numberOfThreads" anywhere in the entire
>> codebase.
>>
>>
> See this patchwork:
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20210408070929.860244-1-haihao.xiang@intel.com/
> Submitted  by Haihao Xiang, CC'ed herein.

That almost makes me think we'd need a per-hwaccel default of the thread 
count.
Preferably not hacked as hard coded list into ffmpeg.c
Dennis Mungai Sept. 29, 2021, 5:39 p.m. UTC | #5
On Wed, 29 Sept 2021 at 20:37, Timo Rothenpieler <timo@rothenpieler.org>
wrote:

> On 29.09.2021 19:31, Dennis Mungai wrote:
> > On Wed, 29 Sept 2021 at 20:22, Timo Rothenpieler <timo@rothenpieler.org>
> > wrote:
> >
> >> On 29.09.2021 19:17, Dennis Mungai wrote:
> >>> A potential downside to this would be on QSV's side, which needs at
> least
> >>> 2  threads to prevent dead-locking, see
> >>>
> >>
> https://github.com/Intel-Media-SDK/MediaSDK/blob/master/_studio/mfx_lib/scheduler/linux/src/mfx_scheduler_core_ischeduler.cpp#L90-L93
> >>
> >> Is that even controlled by the avcodec threads parameter?
> >> I can't find any occurrence of "numberOfThreads" anywhere in the entire
> >> codebase.
> >>
> >>
> > See this patchwork:
> >
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20210408070929.860244-1-haihao.xiang@intel.com/
> > Submitted  by Haihao Xiang, CC'ed herein.
>
> That almost makes me think we'd need a per-hwaccel default of the thread
> count.
> Preferably not hacked as hard coded list into ffmpeg.c
>
>
Exactly.
Andreas Rheinhardt Sept. 30, 2021, 12:02 a.m. UTC | #6
Timo Rothenpieler:
> On 29.09.2021 19:31, Dennis Mungai wrote:
>> On Wed, 29 Sept 2021 at 20:22, Timo Rothenpieler <timo@rothenpieler.org>
>> wrote:
>>
>>> On 29.09.2021 19:17, Dennis Mungai wrote:
>>>> A potential downside to this would be on QSV's side, which needs at
>>>> least
>>>> 2  threads to prevent dead-locking, see
>>>>
>>> https://github.com/Intel-Media-SDK/MediaSDK/blob/master/_studio/mfx_lib/scheduler/linux/src/mfx_scheduler_core_ischeduler.cpp#L90-L93
>>>
>>>
>>> Is that even controlled by the avcodec threads parameter?
>>> I can't find any occurrence of "numberOfThreads" anywhere in the entire
>>> codebase.
>>>
>>>
>> See this patchwork:
>> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20210408070929.860244-1-haihao.xiang@intel.com/
>>
>> Submitted  by Haihao Xiang, CC'ed herein.
> 
> That almost makes me think we'd need a per-hwaccel default of the thread
> count.
> Preferably not hacked as hard coded list into ffmpeg.c
> 

This can be done via AVCodecDefaults: The qsv codecs should use a thread
default of "auto" and ffmpeg.c should check whether the threads
parameter has been overridden by an AVCodecDefault (by checking whether
it is set to its default value) before overriding it in case the user
did not explicitly set something.

- Andreas

PS: The qsv patch also needs to check the AV_CODEC_CAP_OTHER_THREADS cap
(so that user know that setting the threads option makes sense).
Timo Rothenpieler Sept. 30, 2021, 11:51 a.m. UTC | #7
On 30.09.2021 02:02, Andreas Rheinhardt wrote:
> Timo Rothenpieler:
>> On 29.09.2021 19:31, Dennis Mungai wrote:
>>> On Wed, 29 Sept 2021 at 20:22, Timo Rothenpieler <timo@rothenpieler.org>
>>> wrote:
>>>
>>>> On 29.09.2021 19:17, Dennis Mungai wrote:
>>>>> A potential downside to this would be on QSV's side, which needs at
>>>>> least
>>>>> 2  threads to prevent dead-locking, see
>>>>>
>>>> https://github.com/Intel-Media-SDK/MediaSDK/blob/master/_studio/mfx_lib/scheduler/linux/src/mfx_scheduler_core_ischeduler.cpp#L90-L93
>>>>
>>>>
>>>> Is that even controlled by the avcodec threads parameter?
>>>> I can't find any occurrence of "numberOfThreads" anywhere in the entire
>>>> codebase.
>>>>
>>>>
>>> See this patchwork:
>>> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20210408070929.860244-1-haihao.xiang@intel.com/
>>>
>>> Submitted  by Haihao Xiang, CC'ed herein.
>>
>> That almost makes me think we'd need a per-hwaccel default of the thread
>> count.
>> Preferably not hacked as hard coded list into ffmpeg.c
>>
> 
> This can be done via AVCodecDefaults: The qsv codecs should use a thread
> default of "auto" and ffmpeg.c should check whether the threads
> parameter has been overridden by an AVCodecDefault (by checking whether
> it is set to its default value) before overriding it in case the user
> did not explicitly set something.

Does that also work for hwaccels though?
For the most part, the codec is always the native hevc/h264/... decoder.

> - Andreas
> 
> PS: The qsv patch also needs to check the AV_CODEC_CAP_OTHER_THREADS cap
> (so that user know that setting the threads option makes sense).
Andreas Rheinhardt Sept. 30, 2021, 12:09 p.m. UTC | #8
Timo Rothenpieler:
> On 30.09.2021 02:02, Andreas Rheinhardt wrote:
>> Timo Rothenpieler:
>>> On 29.09.2021 19:31, Dennis Mungai wrote:
>>>> On Wed, 29 Sept 2021 at 20:22, Timo Rothenpieler
>>>> <timo@rothenpieler.org>
>>>> wrote:
>>>>
>>>>> On 29.09.2021 19:17, Dennis Mungai wrote:
>>>>>> A potential downside to this would be on QSV's side, which needs at
>>>>>> least
>>>>>> 2  threads to prevent dead-locking, see
>>>>>>
>>>>> https://github.com/Intel-Media-SDK/MediaSDK/blob/master/_studio/mfx_lib/scheduler/linux/src/mfx_scheduler_core_ischeduler.cpp#L90-L93
>>>>>
>>>>>
>>>>>
>>>>> Is that even controlled by the avcodec threads parameter?
>>>>> I can't find any occurrence of "numberOfThreads" anywhere in the
>>>>> entire
>>>>> codebase.
>>>>>
>>>>>
>>>> See this patchwork:
>>>> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20210408070929.860244-1-haihao.xiang@intel.com/
>>>>
>>>>
>>>> Submitted  by Haihao Xiang, CC'ed herein.
>>>
>>> That almost makes me think we'd need a per-hwaccel default of the thread
>>> count.
>>> Preferably not hacked as hard coded list into ffmpeg.c
>>>
>>
>> This can be done via AVCodecDefaults: The qsv codecs should use a thread
>> default of "auto" and ffmpeg.c should check whether the threads
>> parameter has been overridden by an AVCodecDefault (by checking whether
>> it is set to its default value) before overriding it in case the user
>> did not explicitly set something.
> 
> Does that also work for hwaccels though?
> For the most part, the codec is always the native hevc/h264/... decoder.
> 

It works on a per-codec basis. It doesn't work for those hwaccels that
are in AVCodec.hw_config; but the QSV stuff isn't available through
AVCodec.hw_config if I am not mistaken (is there actually a reason for
this?), but only via dedicated codecs, so it works to distinguish QSV
from the rest.

- Andreas
diff mbox series

Patch

diff --git a/fftools/ffmpeg.c b/fftools/ffmpeg.c
index 98c2421938..d007d55173 100644
--- a/fftools/ffmpeg.c
+++ b/fftools/ffmpeg.c
@@ -2995,12 +2995,6 @@  FF_ENABLE_DEPRECATION_WARNINGS
          * audio, and video decoders such as cuvid or mediacodec */
         ist->dec_ctx->pkt_timebase = ist->st->time_base;
 
-        if (!av_dict_get(ist->decoder_opts, "threads", NULL, 0))
-            av_dict_set(&ist->decoder_opts, "threads", "auto", 0);
-        /* Attached pics are sparse, therefore we would not want to delay their decoding till EOF. */
-        if (ist->st->disposition & AV_DISPOSITION_ATTACHED_PIC)
-            av_dict_set(&ist->decoder_opts, "threads", "1", 0);
-
         ret = hw_device_setup_for_decode(ist);
         if (ret < 0) {
             snprintf(error, error_len, "Device setup failed for "
@@ -3009,6 +3003,16 @@  FF_ENABLE_DEPRECATION_WARNINGS
             return ret;
         }
 
+        if (!av_dict_get(ist->decoder_opts, "threads", NULL, 0)) {
+            if (ist->dec_ctx->hw_device_ctx)
+                av_dict_set(&ist->decoder_opts, "threads", "1", 0);
+            else
+                av_dict_set(&ist->decoder_opts, "threads", "auto", 0);
+        }
+        /* Attached pics are sparse, therefore we would not want to delay their decoding till EOF. */
+        if (ist->st->disposition & AV_DISPOSITION_ATTACHED_PIC)
+            av_dict_set(&ist->decoder_opts, "threads", "1", 0);
+
         if ((ret = avcodec_open2(ist->dec_ctx, codec, &ist->decoder_opts)) < 0) {
             if (ret == AVERROR_EXPERIMENTAL)
                 abort_codec_experimental(codec, 0);