diff mbox

[FFmpeg-devel] Fwd: [PATCH] avcodec: allow hardware decoding with multithread for FFmpeg

Message ID HE1PR10MB04090F764D544A7155007E9BF90F0@HE1PR10MB0409.EURPRD10.PROD.OUTLOOK.COM
State Not Applicable
Headers show

Commit Message

Stève Lhomme July 27, 2016, 4:43 p.m. UTC
Hello fellow FFmpegers,

Is there still an issue with hardware decoding when combined with
multithread ? It seems to work fine on our Windows build. Although we
have a mutex in place in the D3D11 variant of the code that may help.
It mostly protects the video context...

If necessary we can have the same trick for DXVA2 if there are still
known issues.

Steve


---------- Forwarded message ----------
From: Steve Lhomme <robux4@videolabs.io>

Date: Wed, Jul 27, 2016 at 4:43 PM
Subject: [PATCH] avcodec: allow hardware decoding with multithread for FFmpeg
To: vlc-devel@videolan.org


The context is protected by a mutex.
---
 modules/codec/avcodec/video.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

         msg_Warn(p_dec, "thread type %d: disabling hardware acceleration",
--
2.8.2

Comments

James Almer July 27, 2016, 4:51 p.m. UTC | #1
On 7/27/2016 1:43 PM, Stève Lhomme wrote:
> Hello fellow FFmpegers,
> 
> Is there still an issue with hardware decoding when combined with
> multithread ? It seems to work fine on our Windows build. Although we
> have a mutex in place in the D3D11 variant of the code that may help.
> It mostly protects the video context...
> 
> If necessary we can have the same trick for DXVA2 if there are still
> known issues.
> 
> Steve
> 
> 
> ---------- Forwarded message ----------
> From: Steve Lhomme <robux4@videolabs.io>
> Date: Wed, Jul 27, 2016 at 4:43 PM
> Subject: [PATCH] avcodec: allow hardware decoding with multithread for FFmpeg
> To: vlc-devel@videolan.org
> 
> 
> The context is protected by a mutex.
> ---
>  modules/codec/avcodec/video.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/modules/codec/avcodec/video.c b/modules/codec/avcodec/video.c
> index 0c10f9e..bbf3ebb 100644
> --- a/modules/codec/avcodec/video.c
> +++ b/modules/codec/avcodec/video.c

I'm not sure what project this belongs to, but it's certainly not
ffmpeg...

> @@ -1233,7 +1233,7 @@ static enum PixelFormat ffmpeg_GetFormat(
> AVCodecContext *p_context,
>      if (!can_hwaccel)
>          return swfmt;
> 
> -#if (LIBAVCODEC_VERSION_MICRO >= 100) /* FFmpeg only */
> +#if (LIBAVCODEC_VERSION_MICRO >= 100) && !defined(_WIN32) /* FFmpeg only */
>      if (p_context->active_thread_type)
>      {
>          msg_Warn(p_dec, "thread type %d: disabling hardware acceleration",
> --
> 2.8.2
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
James Almer July 27, 2016, 5:41 p.m. UTC | #2
On 7/27/2016 1:51 PM, James Almer wrote:
> On 7/27/2016 1:43 PM, Stève Lhomme wrote:
>> Hello fellow FFmpegers,
>>
>> Is there still an issue with hardware decoding when combined with
>> multithread ? It seems to work fine on our Windows build. Although we
>> have a mutex in place in the D3D11 variant of the code that may help.
>> It mostly protects the video context...
>>
>> If necessary we can have the same trick for DXVA2 if there are still
>> known issues.
>>
>> Steve
>>
>>
>> ---------- Forwarded message ----------
>> From: Steve Lhomme <robux4@videolabs.io>
>> Date: Wed, Jul 27, 2016 at 4:43 PM
>> Subject: [PATCH] avcodec: allow hardware decoding with multithread for FFmpeg
>> To: vlc-devel@videolan.org
>>
>>
>> The context is protected by a mutex.
>> ---
>>  modules/codec/avcodec/video.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/modules/codec/avcodec/video.c b/modules/codec/avcodec/video.c
>> index 0c10f9e..bbf3ebb 100644
>> --- a/modules/codec/avcodec/video.c
>> +++ b/modules/codec/avcodec/video.c
> 
> I'm not sure what project this belongs to, but it's certainly not
> ffmpeg...

Nevermind, I assumed you were sending this patch here thinking it was for
this project. Didn't notice the "to" line in the patch or the fact this is
an fwd.

Sorry for the misunderstanding.
Hendrik Leppkes July 27, 2016, 5:58 p.m. UTC | #3
On Wed, Jul 27, 2016 at 6:43 PM, Stève Lhomme <robux4@videolabs.io> wrote:
> Hello fellow FFmpegers,
>
> Is there still an issue with hardware decoding when combined with
> multithread ? It seems to work fine on our Windows build. Although we
> have a mutex in place in the D3D11 variant of the code that may help.
> It mostly protects the video context...
>
> If necessary we can have the same trick for DXVA2 if there are still
> known issues.
>

ffmpeg.c just produces corrupt decoding when you try to use that, so
use at your own peril.
Not to mention that it doesn't offer any performance benefits.

- Hendrik
Stève Lhomme July 27, 2016, 6:29 p.m. UTC | #4
On Wed, Jul 27, 2016 at 7:58 PM, Hendrik Leppkes <h.leppkes@gmail.com> wrote:
> On Wed, Jul 27, 2016 at 6:43 PM, Stève Lhomme <robux4@videolabs.io> wrote:

>> Hello fellow FFmpegers,

>>

>> Is there still an issue with hardware decoding when combined with

>> multithread ? It seems to work fine on our Windows build. Although we

>> have a mutex in place in the D3D11 variant of the code that may help.

>> It mostly protects the video context...

>>

>> If necessary we can have the same trick for DXVA2 if there are still

>> known issues.

>>

>

> ffmpeg.c just produces corrupt decoding when you try to use that, so

> use at your own peril.

> Not to mention that it doesn't offer any performance benefits.


Thanks.

> - Hendrik

> _______________________________________________

> ffmpeg-devel mailing list

> ffmpeg-devel@ffmpeg.org

> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Stève Lhomme July 28, 2016, 7:39 a.m. UTC | #5
On Wed, Jul 27, 2016 at 7:58 PM, Hendrik Leppkes <h.leppkes@gmail.com> wrote:
> On Wed, Jul 27, 2016 at 6:43 PM, Stève Lhomme <robux4@videolabs.io> wrote:

>> Hello fellow FFmpegers,

>>

>> Is there still an issue with hardware decoding when combined with

>> multithread ? It seems to work fine on our Windows build. Although we

>> have a mutex in place in the D3D11 variant of the code that may help.

>> It mostly protects the video context...

>>

>> If necessary we can have the same trick for DXVA2 if there are still

>> known issues.

>>

>

> ffmpeg.c just produces corrupt decoding when you try to use that, so

> use at your own peril.


Is it random or pretty easy to get the issue ? I'd certainly like to
understand the issue better.

With D3D11 we were using the decoder context in the decoder threads
and the vout thread. In debug builds Windows was complaining with
various warnings about that. On some phones it was even crashing.
After protecting the context in the libavcodec and the vout everything
seems to work fine. Is there anything else that would explain issues ?
Maybe Linux or Mac don't detect such cross accesses and that's how
corruption happens ?

> Not to mention that it doesn't offer any performance benefits.


libavcodec can decode multiple frames in parallel why couldn't the GPU
do it as well ? It would be odd that GPU based solutions are inferior
to CPU based ones.

Last but not least, why is it considered ok to use multiple threads in
libav and not in FFmpeg ? The code is pretty much the same for GPU
decoding.

> - Hendrik

> _______________________________________________

> ffmpeg-devel mailing list

> ffmpeg-devel@ffmpeg.org

> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Hendrik Leppkes July 28, 2016, 8:02 a.m. UTC | #6
On Thu, Jul 28, 2016 at 9:39 AM, Stève Lhomme <robux4@videolabs.io> wrote:
> On Wed, Jul 27, 2016 at 7:58 PM, Hendrik Leppkes <h.leppkes@gmail.com> wrote:
>> On Wed, Jul 27, 2016 at 6:43 PM, Stève Lhomme <robux4@videolabs.io> wrote:
>>> Hello fellow FFmpegers,
>>>
>>> Is there still an issue with hardware decoding when combined with
>>> multithread ? It seems to work fine on our Windows build. Although we
>>> have a mutex in place in the D3D11 variant of the code that may help.
>>> It mostly protects the video context...
>>>
>>> If necessary we can have the same trick for DXVA2 if there are still
>>> known issues.
>>>
>>
>> ffmpeg.c just produces corrupt decoding when you try to use that, so
>> use at your own peril.
>
> Is it random or pretty easy to get the issue ? I'd certainly like to
> understand the issue better.

Its really simple: If you lock a frame for read-back while its being
used as a reference frame by the GPU, the decoding on the GPU fails.
You could try to protect every access with a global lock, I guess, but
thats really a rather ugly solution and doesn't really give any
benefits over just not threading (see below for why its not faster
anyway)

This can be instantly reproduced on DXVA2 at least with both Intel and
NVIDIA GPUs (do not know about AMD, don't have any handy).
I'm not sure what happens on Linux, some people said its safer there
due to different ways to copy the data off the surfaces not being a
full lock, at least in the way ffmpeg.c implements it.

>
> With D3D11 we were using the decoder context in the decoder threads
> and the vout thread. In debug builds Windows was complaining with
> various warnings about that. On some phones it was even crashing.
> After protecting the context in the libavcodec and the vout everything
> seems to work fine. Is there anything else that would explain issues ?
> Maybe Linux or Mac don't detect such cross accesses and that's how
> corruption happens ?
>
>> Not to mention that it doesn't offer any performance benefits.
>
> libavcodec can decode multiple frames in parallel why couldn't the GPU
> do it as well ? It would be odd that GPU based solutions are inferior
> to CPU based ones.

GPUs also decode multiple frames at once, which is why performance
suffers when you instantly try to lock them after getting them out of
the decoder (because it needs to sync first), however the GPU does
this silently behind the back in an async fashion, you don't need CPU
threads to feed it simultaneously.

There is a bit of misconception because threaded decoding through
ffmpeg.c *looks* faster right now on some systems, but that comes back
to the forced sync mentioned above. Threads add an inherent delay
until the frame is returned to the caller, and as such gives the GPU
more time to asynchroniously finish the frame. You can fix that
without threads by simply delaying locking the frames by about 2
frames, and get full speed then.

On top of that threading requires allocating more frame surfaces just
for the way threading works in avcodec (one per thread at least, in
extreme cases 16 per thread), so it even costs resources without
gaining anything.
Not to mention that handling threaded hwaccel in avcodec has caused a
variety of bugs in the past, which are hopefully fixed now, but its
still rather ugly due to that.

>
> Last but not least, why is it considered ok to use multiple threads in
> libav and not in FFmpeg ? The code is pretty much the same for GPU
> decoding.
>

Libav runs into the same corruption issues, they just never cared to
do anything about that.

- Hendrik
diff mbox

Patch

diff --git a/modules/codec/avcodec/video.c b/modules/codec/avcodec/video.c
index 0c10f9e..bbf3ebb 100644
--- a/modules/codec/avcodec/video.c
+++ b/modules/codec/avcodec/video.c
@@ -1233,7 +1233,7 @@  static enum PixelFormat ffmpeg_GetFormat(
AVCodecContext *p_context,
     if (!can_hwaccel)
         return swfmt;

-#if (LIBAVCODEC_VERSION_MICRO >= 100) /* FFmpeg only */
+#if (LIBAVCODEC_VERSION_MICRO >= 100) && !defined(_WIN32) /* FFmpeg only */
     if (p_context->active_thread_type)
     {