diff mbox series

[FFmpeg-devel,1/3] avutils/hwcontext: add derive-device function which searches for existing devices in both directions

Message ID 85ff784b6ff80d4f2e0f724d0b93472e50d2c256.1651349262.git.ffmpegagent@gmail.com
State New
Headers show
Series Add derive-device function which searches for existing devices in both directions | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Aman Karmani April 30, 2022, 8:07 p.m. UTC
From: softworkz <softworkz@hotmail.com>

The test /libavutil/tests/hwdevice checks that when deriving a device
from a source device and then deriving back to the type of the source
device, the result is matching the original source device, i.e. the
derivation mechanism doesn't create a new device in this case.

Previously, this test was usually passed, but only due to two different
kind of flaws:

1. The test covers only a single level of derivation (and back)

It derives device Y from device X and then Y back to the type of X and
checks whether the result matches X.

What it doesn't check for, are longer chains of derivation like:

CUDA1 > OpenCL2 > CUDA3 and then back to OpenCL4

In that case, the second derivation returns the first device (CUDA3 ==
CUDA1), but when deriving OpenCL4, hwcontext.c was creating a new
OpenCL4 context instead of returning OpenCL2, because there was no link
from CUDA1 to OpenCL2 (only backwards from OpenCL2 to CUDA1)

If the test would check for two levels of derivation, it would have
failed.

This patch fixes those (yet untested) cases by introducing forward
references (derived_device) in addition to the existing back references
(source_device).

2. hwcontext_qsv didn't properly set the source_device

In case of QSV, hwcontext_qsv creates a source context internally
(vaapi, dxva2 or d3d11va) without calling av_hwdevice_ctx_create_derived
and without setting source_device.

This way, the hwcontext test ran successful, but what practically
happened, was that - for example - deriving vaapi from qsv didn't return
the original underlying vaapi device and a new one was created instead:
Exactly what the test is intended to detect and prevent. It just
couldn't do so, because the original device was hidden (= not set as the
source_device of the QSV device).

This patch properly makes these setting and fixes all derivation
scenarios.

(at a later stage, /libavutil/tests/hwdevice should be extended to check
longer derivation chains as well)

Signed-off-by: softworkz <softworkz@hotmail.com>
---
 libavutil/hwcontext.c          | 72 +++++++++++++++++++++++++++++++---
 libavutil/hwcontext.h          | 20 ++++++++++
 libavutil/hwcontext_internal.h |  6 +++
 libavutil/hwcontext_qsv.c      | 13 ++++--
 4 files changed, 102 insertions(+), 9 deletions(-)

Comments

Mark Thompson April 30, 2022, 9:38 p.m. UTC | #1
On 30/04/2022 21:07, softworkz wrote:
> From: softworkz <softworkz@hotmail.com>
> 
> The test /libavutil/tests/hwdevice checks that when deriving a device
> from a source device and then deriving back to the type of the source
> device, the result is matching the original source device, i.e. the
> derivation mechanism doesn't create a new device in this case.
> 
> Previously, this test was usually passed, but only due to two different
> kind of flaws:
> 
> 1. The test covers only a single level of derivation (and back)
> 
> It derives device Y from device X and then Y back to the type of X and
> checks whether the result matches X.
> 
> What it doesn't check for, are longer chains of derivation like:
> 
> CUDA1 > OpenCL2 > CUDA3 and then back to OpenCL4
> 
> In that case, the second derivation returns the first device (CUDA3 ==
> CUDA1), but when deriving OpenCL4, hwcontext.c was creating a new
> OpenCL4 context instead of returning OpenCL2, because there was no link
> from CUDA1 to OpenCL2 (only backwards from OpenCL2 to CUDA1)
> 
> If the test would check for two levels of derivation, it would have
> failed.
> 
> This patch fixes those (yet untested) cases by introducing forward
> references (derived_device) in addition to the existing back references
> (source_device).
> 
> 2. hwcontext_qsv didn't properly set the source_device
> 
> In case of QSV, hwcontext_qsv creates a source context internally
> (vaapi, dxva2 or d3d11va) without calling av_hwdevice_ctx_create_derived
> and without setting source_device.
> 
> This way, the hwcontext test ran successful, but what practically
> happened, was that - for example - deriving vaapi from qsv didn't return
> the original underlying vaapi device and a new one was created instead:
> Exactly what the test is intended to detect and prevent. It just
> couldn't do so, because the original device was hidden (= not set as the
> source_device of the QSV device).
> 
> This patch properly makes these setting and fixes all derivation
> scenarios.
> 
> (at a later stage, /libavutil/tests/hwdevice should be extended to check
> longer derivation chains as well)
> 
> Signed-off-by: softworkz <softworkz@hotmail.com>
> ---
>   libavutil/hwcontext.c          | 72 +++++++++++++++++++++++++++++++---
>   libavutil/hwcontext.h          | 20 ++++++++++
>   libavutil/hwcontext_internal.h |  6 +++
>   libavutil/hwcontext_qsv.c      | 13 ++++--
>   4 files changed, 102 insertions(+), 9 deletions(-)

Yeah, something like this seems fair.

Some general comments:

* Whenever you use derivation it creates a circular reference, so the instances can never be freed in the current implementation.

* The thread-safety properties of the hwcontext API have been lost - you can no longer operate on devices independently across threads (insofar as the underlying API allows that).
   Maybe there is an argument that derivation is something which should happen early on and therefore documenting it as thread-unsafe is ok, but when hwupload/hwmap can use it inside filtergraphs that just isn't going to happen (and will be violated in the FFmpeg utility if filters get threaded, as is being worked on).

* I'm not sure that it is reasonable to ignore options.  If an unrelated component derived a device before you with special options, you might get that device even if you have incompatible different options.

> diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c
> index ab9ad3703e..1aea7dd5c3 100644
> --- a/libavutil/hwcontext.c
> +++ b/libavutil/hwcontext.c
> @@ -123,6 +123,7 @@ static const AVClass hwdevice_ctx_class = {
>   static void hwdevice_ctx_free(void *opaque, uint8_t *data)
>   {
>       AVHWDeviceContext *ctx = (AVHWDeviceContext*)data;
> +    int i;
>   
>       /* uninit might still want access the hw context and the user
>        * free() callback might destroy it, so uninit has to be called first */
> @@ -133,6 +134,8 @@ static void hwdevice_ctx_free(void *opaque, uint8_t *data)
>           ctx->free(ctx);
>   
>       av_buffer_unref(&ctx->internal->source_device);
> +    for (i = 0; i < AV_HWDEVICE_TYPE_NB; i++)
> +        av_buffer_unref(&ctx->internal->derived_devices[i]);
>   
>       av_freep(&ctx->hwctx);
>       av_freep(&ctx->internal->priv);
> @@ -644,10 +647,31 @@ fail:
>       return ret;
>   }
>   
> -int av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
> -                                        enum AVHWDeviceType type,
> -                                        AVBufferRef *src_ref,
> -                                        AVDictionary *options, int flags)
> +static AVBufferRef* find_derived_hwdevice_ctx(AVBufferRef *src_ref, enum AVHWDeviceType type)
> +{
> +    AVBufferRef *tmp_ref;
> +    AVHWDeviceContext *src_ctx;
> +    int i;
> +
> +    src_ctx = (AVHWDeviceContext*)src_ref->data;
> +    if (src_ctx->type == type)
> +        return src_ref;
> +
> +    for (i = 0; i < AV_HWDEVICE_TYPE_NB; i++)
> +        if (src_ctx->internal->derived_devices[i]) {
> +            tmp_ref = find_derived_hwdevice_ctx(src_ctx->internal->derived_devices[i], type);
> +            if (tmp_ref)
> +                return tmp_ref;
> +        }
> +
> +    return NULL;
> +}
> +
> +static int hwdevice_ctx_create_derived(AVBufferRef **dst_ref_ptr,
> +                                       enum AVHWDeviceType type,
> +                                       AVBufferRef *src_ref,
> +                                       AVDictionary *options, int flags,
> +                                       int get_existing)
>   {
>       AVBufferRef *dst_ref = NULL, *tmp_ref;
>       AVHWDeviceContext *dst_ctx, *tmp_ctx;
> @@ -667,6 +691,18 @@ int av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
>           tmp_ref = tmp_ctx->internal->source_device;
>       }
>   
> +    if (get_existing) {
> +        tmp_ref = find_derived_hwdevice_ctx(src_ref, type);
> +        if (tmp_ref) {
> +            dst_ref = av_buffer_ref(tmp_ref);
> +            if (!dst_ref) {
> +                ret = AVERROR(ENOMEM);
> +                goto fail;
> +            }
> +            goto done;
> +        }
> +    }
> +
>       dst_ref = av_hwdevice_ctx_alloc(type);
>       if (!dst_ref) {
>           ret = AVERROR(ENOMEM);
> @@ -688,6 +724,13 @@ int av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
>                       ret = AVERROR(ENOMEM);
>                       goto fail;
>                   }
> +                if (!tmp_ctx->internal->derived_devices[type]) {

I wonder whether you only want to do this when the user made the new call, not the old one?

The semantics would perhaps feel clearer as "get or create the shared derived device" rather than "get the first device derived or create a new one if not".

> +                    tmp_ctx->internal->derived_devices[type] = av_buffer_ref(dst_ref);
> +                    if (!tmp_ctx->internal->derived_devices[type]) {
> +                        ret = AVERROR(ENOMEM);
> +                        goto fail;
> +                    }
> +                }
>                   ret = av_hwdevice_ctx_init(dst_ref);
>                   if (ret < 0)
>                       goto fail;
> @@ -712,12 +755,29 @@ fail:
>       return ret;
>   }
>   
> +int av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
> +                                        enum AVHWDeviceType type,
> +                                        AVBufferRef *src_ref,
> +                                        AVDictionary *options, int flags)
> +{
> +    return hwdevice_ctx_create_derived(dst_ref_ptr, type, src_ref,
> +                                       options, flags, 0);
> +}
> +
> +int av_hwdevice_ctx_get_or_create_derived(AVBufferRef **dst_ref_ptr,
> +                                          enum AVHWDeviceType type,
> +                                          AVBufferRef *src_ref, int flags)
> +{
> +    return hwdevice_ctx_create_derived(dst_ref_ptr, type, src_ref,
> +                                       NULL, flags, 1);
> +}
> +
>   int av_hwdevice_ctx_create_derived(AVBufferRef **dst_ref_ptr,
>                                      enum AVHWDeviceType type,
>                                      AVBufferRef *src_ref, int flags)
>   {
> -    return av_hwdevice_ctx_create_derived_opts(dst_ref_ptr, type, src_ref,
> -                                               NULL, flags);
> +    return hwdevice_ctx_create_derived(dst_ref_ptr, type, src_ref,
> +                                       NULL, flags, 0);
>   }
>   
>   static void ff_hwframe_unmap(void *opaque, uint8_t *data)
> diff --git a/libavutil/hwcontext.h b/libavutil/hwcontext.h
> index c18b7e1e8b..3785811f98 100644
> --- a/libavutil/hwcontext.h
> +++ b/libavutil/hwcontext.h
> @@ -37,6 +37,7 @@ enum AVHWDeviceType {
>       AV_HWDEVICE_TYPE_OPENCL,
>       AV_HWDEVICE_TYPE_MEDIACODEC,
>       AV_HWDEVICE_TYPE_VULKAN,
> +    AV_HWDEVICE_TYPE_NB,          ///< number of hw device types

Can we avoid adding a non-constant constant to the user API?

av_hwdevice_iterate_types() exists for this purpose.

>   };
>   
>   typedef struct AVHWDeviceInternal AVHWDeviceInternal;
> @@ -328,6 +329,25 @@ int av_hwdevice_ctx_create_derived(AVBufferRef **dst_ctx,
>                                      enum AVHWDeviceType type,
>                                      AVBufferRef *src_ctx, int flags);
>   
> +/**
> + * Create a new device of the specified type from an existing device.
> + *
> + * This function performs the same action as av_hwdevice_ctx_create_derived,
> + * however, if a derived device of the specified type already exists,
> + * it returns the existing instance.
> + *
> + * @param dst_ctx On success, a reference to the newly-created
> + *                AVHWDeviceContext.
> + * @param type    The type of the new device to create.
> + * @param src_ctx A reference to an existing AVHWDeviceContext which will be
> + *                used to create the new device.
> + * @param flags   Currently unused; should be set to zero.
> + * @return        Zero on success, a negative AVERROR code on failure.
> + */
> +int av_hwdevice_ctx_get_or_create_derived(AVBufferRef **dst_ctx,
> +                                          enum AVHWDeviceType type,
> +                                          AVBufferRef *src_ctx, int flags);

Include the options here?  Not having them in the original call was an unfortunate omission, I think it would be better to include them here as well even if you don't use them immediately.

> +
>   /**
>    * Create a new device of the specified type from an existing device.
>    *
> diff --git a/libavutil/hwcontext_internal.h b/libavutil/hwcontext_internal.h
> index e6266494ac..f6fb67c491 100644
> --- a/libavutil/hwcontext_internal.h
> +++ b/libavutil/hwcontext_internal.h
> @@ -109,6 +109,12 @@ struct AVHWDeviceInternal {
>        * context it was derived from.
>        */
>       AVBufferRef *source_device;
> +
> +    /**
> +     * An array of reference to device contexts which
> +     * were derived from this device.
> +     */
> +    AVBufferRef *derived_devices[AV_HWDEVICE_TYPE_NB];
>   };
>   
>   struct AVHWFramesInternal {

- Mark
Soft Works April 30, 2022, 10:42 p.m. UTC | #2
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark
> Thompson
> Sent: Saturday, April 30, 2022 11:39 PM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add derive-
> device function which searches for existing devices in both directions
> 
> On 30/04/2022 21:07, softworkz wrote:
> > From: softworkz <softworkz@hotmail.com>
> >
> > The test /libavutil/tests/hwdevice checks that when deriving a
> device
> > from a source device and then deriving back to the type of the
> source
> > device, the result is matching the original source device, i.e. the
> > derivation mechanism doesn't create a new device in this case.
> >
> > Previously, this test was usually passed, but only due to two
> different
> > kind of flaws:
> >
> > 1. The test covers only a single level of derivation (and back)
> >
> > It derives device Y from device X and then Y back to the type of X
> and
> > checks whether the result matches X.
> >
> > What it doesn't check for, are longer chains of derivation like:
> >
> > CUDA1 > OpenCL2 > CUDA3 and then back to OpenCL4
> >
> > In that case, the second derivation returns the first device (CUDA3
> ==
> > CUDA1), but when deriving OpenCL4, hwcontext.c was creating a new
> > OpenCL4 context instead of returning OpenCL2, because there was no
> link
> > from CUDA1 to OpenCL2 (only backwards from OpenCL2 to CUDA1)
> >
> > If the test would check for two levels of derivation, it would have
> > failed.
> >
> > This patch fixes those (yet untested) cases by introducing forward
> > references (derived_device) in addition to the existing back
> references
> > (source_device).
> >
> > 2. hwcontext_qsv didn't properly set the source_device
> >
> > In case of QSV, hwcontext_qsv creates a source context internally
> > (vaapi, dxva2 or d3d11va) without calling
> av_hwdevice_ctx_create_derived
> > and without setting source_device.
> >
> > This way, the hwcontext test ran successful, but what practically
> > happened, was that - for example - deriving vaapi from qsv didn't
> return
> > the original underlying vaapi device and a new one was created
> instead:
> > Exactly what the test is intended to detect and prevent. It just
> > couldn't do so, because the original device was hidden (= not set as
> the
> > source_device of the QSV device).
> >
> > This patch properly makes these setting and fixes all derivation
> > scenarios.
> >
> > (at a later stage, /libavutil/tests/hwdevice should be extended to
> check
> > longer derivation chains as well)
> >
> > Signed-off-by: softworkz <softworkz@hotmail.com>
> > ---
> >   libavutil/hwcontext.c          | 72
> +++++++++++++++++++++++++++++++---
> >   libavutil/hwcontext.h          | 20 ++++++++++
> >   libavutil/hwcontext_internal.h |  6 +++
> >   libavutil/hwcontext_qsv.c      | 13 ++++--
> >   4 files changed, 102 insertions(+), 9 deletions(-)
> 
> Yeah, something like this seems fair.

:-)

> Some general comments:
> 
> * Whenever you use derivation it creates a circular reference, so the
> instances can never be freed in the current implementation.

It's been a while...I thought there wasn't, but looking at it now,
it seems you are right.

How would you solve it?


> * The thread-safety properties of the hwcontext API have been lost -
> you can no longer operate on devices independently across threads
> (insofar as the underlying API allows that).
>    Maybe there is an argument that derivation is something which
> should happen early on and therefore documenting it as thread-unsafe
> is ok, but when hwupload/hwmap can use it inside filtergraphs that
> just isn't going to happen (and will be violated in the FFmpeg utility
> if filters get threaded, as is being worked on).

From my understanding there will be a single separate thread which
handles all filtergraph operations.
I don't think it would even be possible (without massive changes)
to arbitrate filter processing in parallel.
But even if this would be implemented: the filtergraph setup (init,
uninit, query_formats, etc.) would surely happen on a single thread.


> * I'm not sure that it is reasonable to ignore options.  If an
> unrelated component derived a device before you with special options,
> you might get that device even if you have incompatible different
> options.

I understand what you mean, but this is outside the scope of 
this patchset, because when you would want to do this, it
would need to be implemented for derivation in general, not
in this patchset which only adds reverse-search to the
existing derivation functionality.


> > diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c
> > index ab9ad3703e..1aea7dd5c3 100644
> > --- a/libavutil/hwcontext.c
> > +++ b/libavutil/hwcontext.c
> > @@ -123,6 +123,7 @@ static const AVClass hwdevice_ctx_class = {
> >   static void hwdevice_ctx_free(void *opaque, uint8_t *data)
> >   {
> >       AVHWDeviceContext *ctx = (AVHWDeviceContext*)data;
> > +    int i;
> >
> >       /* uninit might still want access the hw context and the user
> >        * free() callback might destroy it, so uninit has to be
> called first */
> > @@ -133,6 +134,8 @@ static void hwdevice_ctx_free(void *opaque,
> uint8_t *data)
> >           ctx->free(ctx);
> >
> >       av_buffer_unref(&ctx->internal->source_device);
> > +    for (i = 0; i < AV_HWDEVICE_TYPE_NB; i++)
> > +        av_buffer_unref(&ctx->internal->derived_devices[i]);
> >
> >       av_freep(&ctx->hwctx);
> >       av_freep(&ctx->internal->priv);
> > @@ -644,10 +647,31 @@ fail:
> >       return ret;
> >   }
> >
> > -int av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
> > -                                        enum AVHWDeviceType type,
> > -                                        AVBufferRef *src_ref,
> > -                                        AVDictionary *options, int
> flags)
> > +static AVBufferRef* find_derived_hwdevice_ctx(AVBufferRef *src_ref,
> enum AVHWDeviceType type)
> > +{
> > +    AVBufferRef *tmp_ref;
> > +    AVHWDeviceContext *src_ctx;
> > +    int i;
> > +
> > +    src_ctx = (AVHWDeviceContext*)src_ref->data;
> > +    if (src_ctx->type == type)
> > +        return src_ref;
> > +
> > +    for (i = 0; i < AV_HWDEVICE_TYPE_NB; i++)
> > +        if (src_ctx->internal->derived_devices[i]) {
> > +            tmp_ref = find_derived_hwdevice_ctx(src_ctx->internal-
> >derived_devices[i], type);
> > +            if (tmp_ref)
> > +                return tmp_ref;
> > +        }
> > +
> > +    return NULL;
> > +}
> > +
> > +static int hwdevice_ctx_create_derived(AVBufferRef **dst_ref_ptr,
> > +                                       enum AVHWDeviceType type,
> > +                                       AVBufferRef *src_ref,
> > +                                       AVDictionary *options, int
> flags,
> > +                                       int get_existing)
> >   {
> >       AVBufferRef *dst_ref = NULL, *tmp_ref;
> >       AVHWDeviceContext *dst_ctx, *tmp_ctx;
> > @@ -667,6 +691,18 @@ int
> av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
> >           tmp_ref = tmp_ctx->internal->source_device;
> >       }
> >
> > +    if (get_existing) {
> > +        tmp_ref = find_derived_hwdevice_ctx(src_ref, type);
> > +        if (tmp_ref) {
> > +            dst_ref = av_buffer_ref(tmp_ref);
> > +            if (!dst_ref) {
> > +                ret = AVERROR(ENOMEM);
> > +                goto fail;
> > +            }
> > +            goto done;
> > +        }
> > +    }
> > +
> >       dst_ref = av_hwdevice_ctx_alloc(type);
> >       if (!dst_ref) {
> >           ret = AVERROR(ENOMEM);
> > @@ -688,6 +724,13 @@ int
> av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
> >                       ret = AVERROR(ENOMEM);
> >                       goto fail;
> >                   }
> > +                if (!tmp_ctx->internal->derived_devices[type]) {
> 
> I wonder whether you only want to do this when the user made the new
> call, not the old one?
> 
> The semantics would perhaps feel clearer as "get or create the shared
> derived device" rather than "get the first device derived or create a
> new one if not".

I've been there for a moment, and then I thought that when the API
consumer would mix API calls, e.g. first without 'get' and second
with 'get', then the second call would not produce the expected 
result.

Let me know what you think, I have no strong opinion about this.


> > +                    tmp_ctx->internal->derived_devices[type] =
> av_buffer_ref(dst_ref);
> > +                    if (!tmp_ctx->internal->derived_devices[type])
> {
> > +                        ret = AVERROR(ENOMEM);
> > +                        goto fail;
> > +                    }
> > +                }
> >                   ret = av_hwdevice_ctx_init(dst_ref);
> >                   if (ret < 0)
> >                       goto fail;
> > @@ -712,12 +755,29 @@ fail:
> >       return ret;
> >   }
> >
> > +int av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
> > +                                        enum AVHWDeviceType type,
> > +                                        AVBufferRef *src_ref,
> > +                                        AVDictionary *options, int
> flags)
> > +{
> > +    return hwdevice_ctx_create_derived(dst_ref_ptr, type, src_ref,
> > +                                       options, flags, 0);
> > +}
> > +
> > +int av_hwdevice_ctx_get_or_create_derived(AVBufferRef
> **dst_ref_ptr,
> > +                                          enum AVHWDeviceType type,
> > +                                          AVBufferRef *src_ref, int
> flags)
> > +{
> > +    return hwdevice_ctx_create_derived(dst_ref_ptr, type, src_ref,
> > +                                       NULL, flags, 1);
> > +}
> > +
> >   int av_hwdevice_ctx_create_derived(AVBufferRef **dst_ref_ptr,
> >                                      enum AVHWDeviceType type,
> >                                      AVBufferRef *src_ref, int
> flags)
> >   {
> > -    return av_hwdevice_ctx_create_derived_opts(dst_ref_ptr, type,
> src_ref,
> > -                                               NULL, flags);
> > +    return hwdevice_ctx_create_derived(dst_ref_ptr, type, src_ref,
> > +                                       NULL, flags, 0);
> >   }
> >
> >   static void ff_hwframe_unmap(void *opaque, uint8_t *data)
> > diff --git a/libavutil/hwcontext.h b/libavutil/hwcontext.h
> > index c18b7e1e8b..3785811f98 100644
> > --- a/libavutil/hwcontext.h
> > +++ b/libavutil/hwcontext.h
> > @@ -37,6 +37,7 @@ enum AVHWDeviceType {
> >       AV_HWDEVICE_TYPE_OPENCL,
> >       AV_HWDEVICE_TYPE_MEDIACODEC,
> >       AV_HWDEVICE_TYPE_VULKAN,
> > +    AV_HWDEVICE_TYPE_NB,          ///< number of hw device types
> 
> Can we avoid adding a non-constant constant to the user API?
> 
> av_hwdevice_iterate_types() exists for this purpose.

There was a reason why this can't be used. IIRC, it was that the
device count needs to be known at some other place where 
av_hwdevice_iterate_types() is not available.

Please see the previous discussion with Hendrik about this.


> >   };
> >
> >   typedef struct AVHWDeviceInternal AVHWDeviceInternal;
> > @@ -328,6 +329,25 @@ int av_hwdevice_ctx_create_derived(AVBufferRef
> **dst_ctx,
> >                                      enum AVHWDeviceType type,
> >                                      AVBufferRef *src_ctx, int
> flags);
> >
> > +/**
> > + * Create a new device of the specified type from an existing
> device.
> > + *
> > + * This function performs the same action as
> av_hwdevice_ctx_create_derived,
> > + * however, if a derived device of the specified type already
> exists,
> > + * it returns the existing instance.
> > + *
> > + * @param dst_ctx On success, a reference to the newly-created
> > + *                AVHWDeviceContext.
> > + * @param type    The type of the new device to create.
> > + * @param src_ctx A reference to an existing AVHWDeviceContext
> which will be
> > + *                used to create the new device.
> > + * @param flags   Currently unused; should be set to zero.
> > + * @return        Zero on success, a negative AVERROR code on
> failure.
> > + */
> > +int av_hwdevice_ctx_get_or_create_derived(AVBufferRef **dst_ctx,
> > +                                          enum AVHWDeviceType type,
> > +                                          AVBufferRef *src_ctx, int
> flags);
> 
> Include the options here?  Not having them in the original call was an
> unfortunate omission, I think it would be better to include them here
> as well even if you don't use them immediately.

I didn't see the options being used anywhere, that's why I thought
it would be the other way round (options param overload exists 
for legacy/compatibility reasons).

But sure, I'll change it then to include options.


Thanks for the quick review!


Kind regards,
softworkz
Mark Thompson May 1, 2022, 10 p.m. UTC | #3
On 30/04/2022 23:42, Soft Works wrote:
> 
> 
>> -----Original Message-----
>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark
>> Thompson
>> Sent: Saturday, April 30, 2022 11:39 PM
>> To: ffmpeg-devel@ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add derive-
>> device function which searches for existing devices in both directions
>>
>> On 30/04/2022 21:07, softworkz wrote:
>>> From: softworkz <softworkz@hotmail.com>
>>>
>>> The test /libavutil/tests/hwdevice checks that when deriving a
>> device
>>> from a source device and then deriving back to the type of the
>> source
>>> device, the result is matching the original source device, i.e. the
>>> derivation mechanism doesn't create a new device in this case.
>>>
>>> Previously, this test was usually passed, but only due to two
>> different
>>> kind of flaws:
>>>
>>> 1. The test covers only a single level of derivation (and back)
>>>
>>> It derives device Y from device X and then Y back to the type of X
>> and
>>> checks whether the result matches X.
>>>
>>> What it doesn't check for, are longer chains of derivation like:
>>>
>>> CUDA1 > OpenCL2 > CUDA3 and then back to OpenCL4
>>>
>>> In that case, the second derivation returns the first device (CUDA3
>> ==
>>> CUDA1), but when deriving OpenCL4, hwcontext.c was creating a new
>>> OpenCL4 context instead of returning OpenCL2, because there was no
>> link
>>> from CUDA1 to OpenCL2 (only backwards from OpenCL2 to CUDA1)
>>>
>>> If the test would check for two levels of derivation, it would have
>>> failed.
>>>
>>> This patch fixes those (yet untested) cases by introducing forward
>>> references (derived_device) in addition to the existing back
>> references
>>> (source_device).
>>>
>>> 2. hwcontext_qsv didn't properly set the source_device
>>>
>>> In case of QSV, hwcontext_qsv creates a source context internally
>>> (vaapi, dxva2 or d3d11va) without calling
>> av_hwdevice_ctx_create_derived
>>> and without setting source_device.
>>>
>>> This way, the hwcontext test ran successful, but what practically
>>> happened, was that - for example - deriving vaapi from qsv didn't
>> return
>>> the original underlying vaapi device and a new one was created
>> instead:
>>> Exactly what the test is intended to detect and prevent. It just
>>> couldn't do so, because the original device was hidden (= not set as
>> the
>>> source_device of the QSV device).
>>>
>>> This patch properly makes these setting and fixes all derivation
>>> scenarios.
>>>
>>> (at a later stage, /libavutil/tests/hwdevice should be extended to
>> check
>>> longer derivation chains as well)
>>>
>>> Signed-off-by: softworkz <softworkz@hotmail.com>
>>> ---
>>>    libavutil/hwcontext.c          | 72
>> +++++++++++++++++++++++++++++++---
>>>    libavutil/hwcontext.h          | 20 ++++++++++
>>>    libavutil/hwcontext_internal.h |  6 +++
>>>    libavutil/hwcontext_qsv.c      | 13 ++++--
>>>    4 files changed, 102 insertions(+), 9 deletions(-)
>>
>> Yeah, something like this seems fair.
> 
> :-)
> 
>> Some general comments:
>>
>> * Whenever you use derivation it creates a circular reference, so the
>> instances can never be freed in the current implementation.
> 
> It's been a while...I thought there wasn't, but looking at it now,
> it seems you are right.
> 
> How would you solve it?

Hmm.  You do need both the source and derived device to be able to keep the other alive with this form, so the strict reference-counting structure isn't going to work.  Given that, I guess it's got to do something else but I've no idea what.

>> * The thread-safety properties of the hwcontext API have been lost -
>> you can no longer operate on devices independently across threads
>> (insofar as the underlying API allows that).
>>     Maybe there is an argument that derivation is something which
>> should happen early on and therefore documenting it as thread-unsafe
>> is ok, but when hwupload/hwmap can use it inside filtergraphs that
>> just isn't going to happen (and will be violated in the FFmpeg utility
>> if filters get threaded, as is being worked on).
> 
>  From my understanding there will be a single separate thread which
> handles all filtergraph operations.
> I don't think it would even be possible (without massive changes)
> to arbitrate filter processing in parallel.
> But even if this would be implemented: the filtergraph setup (init,
> uninit, query_formats, etc.) would surely happen on a single thread.

The ffmpeg utility creates filtergraphs dynamically when the first frame is available from their inputs, so I don't see why you wouldn't allow multiple of them to be created in parallel in that case.

If you create all devices at the beginning and then give references to them to the various filters which need them (so never manipulate devices dynamically within the graph) then it would be ok, but I think you've already rejected that approach.

>> * I'm not sure that it is reasonable to ignore options.  If an
>> unrelated component derived a device before you with special options,
>> you might get that device even if you have incompatible different
>> options.
> 
> I understand what you mean, but this is outside the scope of
> this patchset, because when you would want to do this, it
> would need to be implemented for derivation in general, not
> in this patchset which only adds reverse-search to the
> existing derivation functionality.

I'm not sure what you mean by that?  The feature already exists; here is a concrete example of where you would get the wrong result:

Start with VAAPI device A.

Component P derives Vulkan device B with some extension options X.

Component Q wants the same device as P, so it derives again with extension options X and gets B.

Everything works fine for a while.

Later, unrelated component R is inserted before P and Q.  It wants a Vulkan device C with extension options Y, so it derives that.

Now component Q is broken because it gets C instead of B and has the wrong extensions enabled.

>>> diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c
>>> index ab9ad3703e..1aea7dd5c3 100644
>>> --- a/libavutil/hwcontext.c
>>> +++ b/libavutil/hwcontext.c
>>> @@ -123,6 +123,7 @@ static const AVClass hwdevice_ctx_class = {
>>>    static void hwdevice_ctx_free(void *opaque, uint8_t *data)
>>>    {
>>>        AVHWDeviceContext *ctx = (AVHWDeviceContext*)data;
>>> +    int i;
>>>
>>>        /* uninit might still want access the hw context and the user
>>>         * free() callback might destroy it, so uninit has to be
>> called first */
>>> @@ -133,6 +134,8 @@ static void hwdevice_ctx_free(void *opaque,
>> uint8_t *data)
>>>            ctx->free(ctx);
>>>
>>>        av_buffer_unref(&ctx->internal->source_device);
>>> +    for (i = 0; i < AV_HWDEVICE_TYPE_NB; i++)
>>> +        av_buffer_unref(&ctx->internal->derived_devices[i]);
>>>
>>>        av_freep(&ctx->hwctx);
>>>        av_freep(&ctx->internal->priv);
>>> @@ -644,10 +647,31 @@ fail:
>>>        return ret;
>>>    }
>>>
>>> -int av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
>>> -                                        enum AVHWDeviceType type,
>>> -                                        AVBufferRef *src_ref,
>>> -                                        AVDictionary *options, int
>> flags)
>>> +static AVBufferRef* find_derived_hwdevice_ctx(AVBufferRef *src_ref,
>> enum AVHWDeviceType type)
>>> +{
>>> +    AVBufferRef *tmp_ref;
>>> +    AVHWDeviceContext *src_ctx;
>>> +    int i;
>>> +
>>> +    src_ctx = (AVHWDeviceContext*)src_ref->data;
>>> +    if (src_ctx->type == type)
>>> +        return src_ref;
>>> +
>>> +    for (i = 0; i < AV_HWDEVICE_TYPE_NB; i++)
>>> +        if (src_ctx->internal->derived_devices[i]) {
>>> +            tmp_ref = find_derived_hwdevice_ctx(src_ctx->internal-
>>> derived_devices[i], type);
>>> +            if (tmp_ref)
>>> +                return tmp_ref;
>>> +        }
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>> +static int hwdevice_ctx_create_derived(AVBufferRef **dst_ref_ptr,
>>> +                                       enum AVHWDeviceType type,
>>> +                                       AVBufferRef *src_ref,
>>> +                                       AVDictionary *options, int
>> flags,
>>> +                                       int get_existing)
>>>    {
>>>        AVBufferRef *dst_ref = NULL, *tmp_ref;
>>>        AVHWDeviceContext *dst_ctx, *tmp_ctx;
>>> @@ -667,6 +691,18 @@ int
>> av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
>>>            tmp_ref = tmp_ctx->internal->source_device;
>>>        }
>>>
>>> +    if (get_existing) {
>>> +        tmp_ref = find_derived_hwdevice_ctx(src_ref, type);
>>> +        if (tmp_ref) {
>>> +            dst_ref = av_buffer_ref(tmp_ref);
>>> +            if (!dst_ref) {
>>> +                ret = AVERROR(ENOMEM);
>>> +                goto fail;
>>> +            }
>>> +            goto done;
>>> +        }
>>> +    }
>>> +
>>>        dst_ref = av_hwdevice_ctx_alloc(type);
>>>        if (!dst_ref) {
>>>            ret = AVERROR(ENOMEM);
>>> @@ -688,6 +724,13 @@ int
>> av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
>>>                        ret = AVERROR(ENOMEM);
>>>                        goto fail;
>>>                    }
>>> +                if (!tmp_ctx->internal->derived_devices[type]) {
>>
>> I wonder whether you only want to do this when the user made the new
>> call, not the old one?
>>
>> The semantics would perhaps feel clearer as "get or create the shared
>> derived device" rather than "get the first device derived or create a
>> new one if not".
> 
> I've been there for a moment, and then I thought that when the API
> consumer would mix API calls, e.g. first without 'get' and second
> with 'get', then the second call would not produce the expected
> result.
> 
> Let me know what you think, I have no strong opinion about this.

Can you explain your example further?

Making the shared device always opt-in seems better to me to avoid unexpected interactions.  (Like in the above example where a non-sharing component is added before everything else - when sharing is implicit this ends up being the first device derived and gets shared with others.)

>>> +                    tmp_ctx->internal->derived_devices[type] =
>> av_buffer_ref(dst_ref);
>>> +                    if (!tmp_ctx->internal->derived_devices[type])
>> {
>>> +                        ret = AVERROR(ENOMEM);
>>> +                        goto fail;
>>> +                    }
>>> +                }
>>>                    ret = av_hwdevice_ctx_init(dst_ref);
>>>                    if (ret < 0)
>>>                        goto fail;
>>> @@ -712,12 +755,29 @@ fail:
>>>        return ret;
>>>    }
>>>
>>> +int av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
>>> +                                        enum AVHWDeviceType type,
>>> +                                        AVBufferRef *src_ref,
>>> +                                        AVDictionary *options, int
>> flags)
>>> +{
>>> +    return hwdevice_ctx_create_derived(dst_ref_ptr, type, src_ref,
>>> +                                       options, flags, 0);
>>> +}
>>> +
>>> +int av_hwdevice_ctx_get_or_create_derived(AVBufferRef
>> **dst_ref_ptr,
>>> +                                          enum AVHWDeviceType type,
>>> +                                          AVBufferRef *src_ref, int
>> flags)
>>> +{
>>> +    return hwdevice_ctx_create_derived(dst_ref_ptr, type, src_ref,
>>> +                                       NULL, flags, 1);
>>> +}
>>> +
>>>    int av_hwdevice_ctx_create_derived(AVBufferRef **dst_ref_ptr,
>>>                                       enum AVHWDeviceType type,
>>>                                       AVBufferRef *src_ref, int
>> flags)
>>>    {
>>> -    return av_hwdevice_ctx_create_derived_opts(dst_ref_ptr, type,
>> src_ref,
>>> -                                               NULL, flags);
>>> +    return hwdevice_ctx_create_derived(dst_ref_ptr, type, src_ref,
>>> +                                       NULL, flags, 0);
>>>    }
>>>
>>>    static void ff_hwframe_unmap(void *opaque, uint8_t *data)
>>> diff --git a/libavutil/hwcontext.h b/libavutil/hwcontext.h
>>> index c18b7e1e8b..3785811f98 100644
>>> --- a/libavutil/hwcontext.h
>>> +++ b/libavutil/hwcontext.h
>>> @@ -37,6 +37,7 @@ enum AVHWDeviceType {
>>>        AV_HWDEVICE_TYPE_OPENCL,
>>>        AV_HWDEVICE_TYPE_MEDIACODEC,
>>>        AV_HWDEVICE_TYPE_VULKAN,
>>> +    AV_HWDEVICE_TYPE_NB,          ///< number of hw device types
>>
>> Can we avoid adding a non-constant constant to the user API?
>>
>> av_hwdevice_iterate_types() exists for this purpose.
> 
> There was a reason why this can't be used. IIRC, it was that the
> device count needs to be known at some other place where
> av_hwdevice_iterate_types() is not available.
> 
> Please see the previous discussion with Hendrik about this.

Where is that?  The only place I see this used is the array of derived devices.

Two alternative implementations without the constant spring to mind:

* A shorter array indexed by av_hwdevice_iterate_types() which would not have empty entries for devices not compatible with the current platform.

* An array of type/reference pairs.

>>>    };
>>>
>>>    typedef struct AVHWDeviceInternal AVHWDeviceInternal;
>>> @@ -328,6 +329,25 @@ int av_hwdevice_ctx_create_derived(AVBufferRef
>> **dst_ctx,
>>>                                       enum AVHWDeviceType type,
>>>                                       AVBufferRef *src_ctx, int
>> flags);
>>>
>>> +/**
>>> + * Create a new device of the specified type from an existing
>> device.
>>> + *
>>> + * This function performs the same action as
>> av_hwdevice_ctx_create_derived,
>>> + * however, if a derived device of the specified type already
>> exists,
>>> + * it returns the existing instance.
>>> + *
>>> + * @param dst_ctx On success, a reference to the newly-created
>>> + *                AVHWDeviceContext.
>>> + * @param type    The type of the new device to create.
>>> + * @param src_ctx A reference to an existing AVHWDeviceContext
>> which will be
>>> + *                used to create the new device.
>>> + * @param flags   Currently unused; should be set to zero.
>>> + * @return        Zero on success, a negative AVERROR code on
>> failure.
>>> + */
>>> +int av_hwdevice_ctx_get_or_create_derived(AVBufferRef **dst_ctx,
>>> +                                          enum AVHWDeviceType type,
>>> +                                          AVBufferRef *src_ctx, int
>> flags);
>>
>> Include the options here?  Not having them in the original call was an
>> unfortunate omission, I think it would be better to include them here
>> as well even if you don't use them immediately.
> 
> I didn't see the options being used anywhere, that's why I thought
> it would be the other way round (options param overload exists
> for legacy/compatibility reasons).
> 
> But sure, I'll change it then to include options.

- Mark
Soft Works May 2, 2022, 6:49 a.m. UTC | #4
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark
> Thompson
> Sent: Monday, May 2, 2022 12:01 AM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add derive-
> device function which searches for existing devices in both directions

[..]

> >> * Whenever you use derivation it creates a circular reference, so
> the
> >> instances can never be freed in the current implementation.
> >
> > It's been a while...I thought there wasn't, but looking at it now,
> > it seems you are right.
> >
> > How would you solve it?
> 
> Hmm.  You do need both the source and derived device to be able to
> keep the other alive with this form, so the strict reference-counting
> structure isn't going to work.  Given that, I guess it's got to do
> something else but I've no idea what.

Ok, before I come up with something that might find objection, I'd like
to ask all around:

Do we have some pattern for weak/shared referencing via AVBufferRef?

Thanks
softworkz
Soft Works May 2, 2022, 8:14 a.m. UTC | #5
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark
> Thompson
> Sent: Monday, May 2, 2022 12:01 AM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add derive-
> device function which searches for existing devices in both directions

[..]

> >> * The thread-safety properties of the hwcontext API have been lost
> -
> >> you can no longer operate on devices independently across threads
> >> (insofar as the underlying API allows that).
> >>     Maybe there is an argument that derivation is something which
> >> should happen early on and therefore documenting it as thread-
> unsafe
> >> is ok, but when hwupload/hwmap can use it inside filtergraphs that
> >> just isn't going to happen (and will be violated in the FFmpeg
> utility
> >> if filters get threaded, as is being worked on).
> >
> >  From my understanding there will be a single separate thread which
> > handles all filtergraph operations.
> > I don't think it would even be possible (without massive changes)
> > to arbitrate filter processing in parallel.
> > But even if this would be implemented: the filtergraph setup (init,
> > uninit, query_formats, etc.) would surely happen on a single thread.
> 
> The ffmpeg utility creates filtergraphs dynamically when the first
> frame is available from their inputs, so I don't see why you wouldn't
> allow multiple of them to be created in parallel in that case.
> 
> If you create all devices at the beginning and then give references to
> them to the various filters which need them (so never manipulate
> devices dynamically within the graph) then it would be ok, but I think
> you've already rejected that approach.

Please let's not break out of the scope of this patchset.
This patchset is not about re-doing device derivation. The only
small change that it does is that it returns an existing device
instead of creating a new one when such device already exists
in the same(!) chain.

The change it makes has really nothing to do with threading or 
anything around it.


> >> * I'm not sure that it is reasonable to ignore options.  If an
> >> unrelated component derived a device before you with special
> options,
> >> you might get that device even if you have incompatible different
> >> options.
> >
> > I understand what you mean, but this is outside the scope of
> > this patchset, because when you would want to do this, it
> > would need to be implemented for derivation in general, not
> > in this patchset which only adds reverse-search to the
> > existing derivation functionality.
> 
> I'm not sure what you mean by that?  The feature already exists; here
> is a concrete example of where you would get the wrong result:
> 
> Start with VAAPI device A.
> 
> Component P derives Vulkan device B with some extension options X.
> 
> Component Q wants the same device as P, so it derives again with
> extension options X and gets B.
> 
> Everything works fine for a while.
> 
> Later, unrelated component R is inserted before P and Q.  It wants a
> Vulkan device C with extension options Y, so it derives that.
> 
> Now component Q is broken because it gets C instead of B and has the
> wrong extensions enabled.

As per your request, this patchset's changes are now limited to
use ffmpeg cli. And there is currently no support for "extension
options" when deriving a device.

What I meant above is this: 

Assume this patchset wouldn't be applied, but a patchset would
be applied that allows to specify "extension options".
Then, even without this patchset, I could construct a similar 
example, where you would get the same device when deriving 
two times from the same device but with different extension
options.

That's why I said: yes, I understand the case you are talking
about. But it would require two separate patches, one for
enabling extension options and another one for matching 
extension options when deriving a device. This would make
sense, but:

It has nothing to do with this patchset.
(it could be done before or afterwards)

I'm open to discuss this (separately), because it opens up 
a range of questions how it could be done.

> >>> diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c
> >>> index ab9ad3703e..1aea7dd5c3 100644
> >>> --- a/libavutil/hwcontext.c
> >>> +++ b/libavutil/hwcontext.c
> >>> @@ -123,6 +123,7 @@ static const AVClass hwdevice_ctx_class = {
> >>>    static void hwdevice_ctx_free(void *opaque, uint8_t *data)
> >>>    {
> >>>        AVHWDeviceContext *ctx = (AVHWDeviceContext*)data;
> >>> +    int i;
> >>>
> >>>        /* uninit might still want access the hw context and the
> user
> >>>         * free() callback might destroy it, so uninit has to be
> >> called first */
> >>> @@ -133,6 +134,8 @@ static void hwdevice_ctx_free(void *opaque,
> >> uint8_t *data)
> >>>            ctx->free(ctx);
> >>>
> >>>        av_buffer_unref(&ctx->internal->source_device);
> >>> +    for (i = 0; i < AV_HWDEVICE_TYPE_NB; i++)
> >>> +        av_buffer_unref(&ctx->internal->derived_devices[i]);
> >>>
> >>>        av_freep(&ctx->hwctx);
> >>>        av_freep(&ctx->internal->priv);
> >>> @@ -644,10 +647,31 @@ fail:
> >>>        return ret;
> >>>    }
> >>>
> >>> -int av_hwdevice_ctx_create_derived_opts(AVBufferRef
> **dst_ref_ptr,
> >>> -                                        enum AVHWDeviceType type,
> >>> -                                        AVBufferRef *src_ref,
> >>> -                                        AVDictionary *options,
> int
> >> flags)
> >>> +static AVBufferRef* find_derived_hwdevice_ctx(AVBufferRef
> *src_ref,
> >> enum AVHWDeviceType type)
> >>> +{
> >>> +    AVBufferRef *tmp_ref;
> >>> +    AVHWDeviceContext *src_ctx;
> >>> +    int i;
> >>> +
> >>> +    src_ctx = (AVHWDeviceContext*)src_ref->data;
> >>> +    if (src_ctx->type == type)
> >>> +        return src_ref;
> >>> +
> >>> +    for (i = 0; i < AV_HWDEVICE_TYPE_NB; i++)
> >>> +        if (src_ctx->internal->derived_devices[i]) {
> >>> +            tmp_ref = find_derived_hwdevice_ctx(src_ctx-
> >internal-
> >>> derived_devices[i], type);
> >>> +            if (tmp_ref)
> >>> +                return tmp_ref;
> >>> +        }
> >>> +
> >>> +    return NULL;
> >>> +}
> >>> +
> >>> +static int hwdevice_ctx_create_derived(AVBufferRef **dst_ref_ptr,
> >>> +                                       enum AVHWDeviceType type,
> >>> +                                       AVBufferRef *src_ref,
> >>> +                                       AVDictionary *options, int
> >> flags,
> >>> +                                       int get_existing)
> >>>    {
> >>>        AVBufferRef *dst_ref = NULL, *tmp_ref;
> >>>        AVHWDeviceContext *dst_ctx, *tmp_ctx;
> >>> @@ -667,6 +691,18 @@ int
> >> av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
> >>>            tmp_ref = tmp_ctx->internal->source_device;
> >>>        }
> >>>
> >>> +    if (get_existing) {
> >>> +        tmp_ref = find_derived_hwdevice_ctx(src_ref, type);
> >>> +        if (tmp_ref) {
> >>> +            dst_ref = av_buffer_ref(tmp_ref);
> >>> +            if (!dst_ref) {
> >>> +                ret = AVERROR(ENOMEM);
> >>> +                goto fail;
> >>> +            }
> >>> +            goto done;
> >>> +        }
> >>> +    }
> >>> +
> >>>        dst_ref = av_hwdevice_ctx_alloc(type);
> >>>        if (!dst_ref) {
> >>>            ret = AVERROR(ENOMEM);
> >>> @@ -688,6 +724,13 @@ int
> >> av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
> >>>                        ret = AVERROR(ENOMEM);
> >>>                        goto fail;
> >>>                    }
> >>> +                if (!tmp_ctx->internal->derived_devices[type]) {
> >>
> >> I wonder whether you only want to do this when the user made the
> new
> >> call, not the old one?
> >>
> >> The semantics would perhaps feel clearer as "get or create the
> shared
> >> derived device" rather than "get the first device derived or create
> a
> >> new one if not".
> >
> > I've been there for a moment, and then I thought that when the API
> > consumer would mix API calls, e.g. first without 'get' and second
> > with 'get', then the second call would not produce the expected
> > result.
> >
> > Let me know what you think, I have no strong opinion about this.
> 
> Can you explain your example further?

Maybe we should get clear about what this patchset does exactly.
Let's look at the following derivation chain of devices:

A
├─ X
│  └─ Y
├─ B
│  └─ C
└─ V
   └─ W

The meaning is: 

- Y is derived from X, X is derived from A
- C is derived from B, B is derived from A
- W is derived from V, V is derived from A

In the existing implementation, each device "knows" its parent
(via the 'source_device' field).

When you call av_hwdevice_ctx_create_derived() and specify "C"
as the source device, then it will iterate the tree upwards,
so when B is of the requested type, it will return B or if
A is of the requested type, it will return A.
Otherwise, it will create a new device of the requested type
and sets C as its parent.

But it doesn't return X, Y, V or W (when any would match the
requested type).

This is the current behavior.


What this patchset does is that we also store the derived 
children for each device (derived_devices array).

In the example above, it means hat A has references to 
X, B and V. X to Y, B to C and V to W.

The behavior of the new function is as follows:

When you call av_hwdevice_ctx_get_or_create_derived() and specify "C"
as the source device, then it will iterate the tree upwards,
so when B is of the requested type, it will return B or if
A is of the requested type, it will return A (like before).

Additionally, it will also iterate all through other children 
of B and other children of A. Which means that if X, Y, V or W
matches the requested type, it would return it.

Otherwise, it will create a new device of the requested type
and sets C as its parent.

This is the behavior of the new function.
All that it changes is that it searches the full tree instead
of the parents only.


Now back to your original question:

>>> +                if (!tmp_ctx->internal->derived_devices[type]) {
>>
>> I wonder whether you only want to do this when the user made the new
>> call, not the old one?
>>
>> The semantics would perhaps feel clearer as "get or create the shared
>> derived device" rather than "get the first device derived or create a
>> new one if not".

The line of code you commented on, is about adding a newly created
derived device to the child collection of its parent.

>> I wonder whether you only want to do this when the user made the new
>> call, not the old one?

The difference between old and new is that old searches only parents
and new searches the full tree.
But should calling the old function also control whether a newly created
derived device is added to the child collection of its parent?

It might be confusing behavior when you first call the old function
and later call the new function but you would get the device only
when it's a parent but not when it's a down-tree-child of any parent.

And that's where I said:

> Let me know what you think, I have no strong opinion about this.

Best regards,
softworkz
Soft Works May 2, 2022, 8:28 a.m. UTC | #6
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark
> Thompson
> Sent: Monday, May 2, 2022 12:01 AM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add derive-
> device function which searches for existing devices in both directions

[..]

> > There was a reason why this can't be used. IIRC, it was that the
> > device count needs to be known at some other place where
> > av_hwdevice_iterate_types() is not available.
> >
> > Please see the previous discussion with Hendrik about this.
> 
> Where is that?  The only place I see this used is the array of derived
> devices.

Sorry, it was with Lynne, not Hendrik:

https://www.mail-archive.com/ffmpeg-devel@ffmpeg.org/msg127768.html

> Two alternative implementations without the constant spring to mind:
> 
> * A shorter array indexed by av_hwdevice_iterate_types() which would
> not have empty entries for devices not compatible with the current
> platform.
> 
> * An array of type/reference pairs.

I would prefer not to over-engineer this. Everybody else seemed to
be OK with the current approach.

Best regards,
softworkz
Mark Thompson May 2, 2022, 10:11 p.m. UTC | #7
On 02/05/2022 09:14, Soft Works wrote:
>> -----Original Message-----
>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark
>> Thompson
>> Sent: Monday, May 2, 2022 12:01 AM
>> To: ffmpeg-devel@ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add derive-
>> device function which searches for existing devices in both directions
> 
> [..]
> 
>>>> * The thread-safety properties of the hwcontext API have been lost
>> -
>>>> you can no longer operate on devices independently across threads
>>>> (insofar as the underlying API allows that).
>>>>      Maybe there is an argument that derivation is something which
>>>> should happen early on and therefore documenting it as thread-
>> unsafe
>>>> is ok, but when hwupload/hwmap can use it inside filtergraphs that
>>>> just isn't going to happen (and will be violated in the FFmpeg
>> utility
>>>> if filters get threaded, as is being worked on).
>>>
>>>   From my understanding there will be a single separate thread which
>>> handles all filtergraph operations.
>>> I don't think it would even be possible (without massive changes)
>>> to arbitrate filter processing in parallel.
>>> But even if this would be implemented: the filtergraph setup (init,
>>> uninit, query_formats, etc.) would surely happen on a single thread.
>>
>> The ffmpeg utility creates filtergraphs dynamically when the first
>> frame is available from their inputs, so I don't see why you wouldn't
>> allow multiple of them to be created in parallel in that case.
>>
>> If you create all devices at the beginning and then give references to
>> them to the various filters which need them (so never manipulate
>> devices dynamically within the graph) then it would be ok, but I think
>> you've already rejected that approach.
> 
> Please let's not break out of the scope of this patchset.
> This patchset is not about re-doing device derivation. The only
> small change that it does is that it returns an existing device
> instead of creating a new one when such device already exists
> in the same(!) chain.
> 
> The change it makes has really nothing to do with threading or
> anything around it.

The change makes existing thread-safe hwcontext APIs unsafe.  That is definitely not "nothing to do with threading", and has to be resolved since they can currently be called from contexts which expect thread-safety (such as making filtergraphs).

>>>> * I'm not sure that it is reasonable to ignore options.  If an
>>>> unrelated component derived a device before you with special
>> options,
>>>> you might get that device even if you have incompatible different
>>>> options.
>>>
>>> I understand what you mean, but this is outside the scope of
>>> this patchset, because when you would want to do this, it
>>> would need to be implemented for derivation in general, not
>>> in this patchset which only adds reverse-search to the
>>> existing derivation functionality.
>>
>> I'm not sure what you mean by that?  The feature already exists; here
>> is a concrete example of where you would get the wrong result:
>>
>> Start with VAAPI device A.
>>
>> Component P derives Vulkan device B with some extension options X.
>>
>> Component Q wants the same device as P, so it derives again with
>> extension options X and gets B.
>>
>> Everything works fine for a while.
>>
>> Later, unrelated component R is inserted before P and Q.  It wants a
>> Vulkan device C with extension options Y, so it derives that.
>>
>> Now component Q is broken because it gets C instead of B and has the
>> wrong extensions enabled.
> 
> As per your request, this patchset's changes are now limited to
> use ffmpeg cli. And there is currently no support for "extension
> options" when deriving a device.

Yes there is - see the "instance_extensions" and "device_extensions" options to Vulkan device derivation.  (Hence the VAAPI+Vulkan example.)

> What I meant above is this:
> 
> Assume this patchset wouldn't be applied, but a patchset would
> be applied that allows to specify "extension options".
> Then, even without this patchset, I could construct a similar
> example, where you would get the same device when deriving
> two times from the same device but with different extension
> options.

How?

The existing derivation setup always makes a new device, so you can't accidentally get an old one.

The existing way of reusing devices is to keep the reference and reuse it directly, which means you need to pass the reference around explicitly and there is no problem.

> That's why I said: yes, I understand the case you are talking
> about. But it would require two separate patches, one for
> enabling extension options and another one for matching
> extension options when deriving a device. This would make
> sense, but:
> 
> It has nothing to do with this patchset.
> (it could be done before or afterwards)
> 
> I'm open to discuss this (separately), because it opens up
> a range of questions how it could be done.
> 
>>>>> diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c
>>>>> index ab9ad3703e..1aea7dd5c3 100644
>>>>> --- a/libavutil/hwcontext.c
>>>>> +++ b/libavutil/hwcontext.c
>>>>> @@ -123,6 +123,7 @@ static const AVClass hwdevice_ctx_class = {
>>>>>     static void hwdevice_ctx_free(void *opaque, uint8_t *data)
>>>>>     {
>>>>>         AVHWDeviceContext *ctx = (AVHWDeviceContext*)data;
>>>>> +    int i;
>>>>>
>>>>>         /* uninit might still want access the hw context and the
>> user
>>>>>          * free() callback might destroy it, so uninit has to be
>>>> called first */
>>>>> @@ -133,6 +134,8 @@ static void hwdevice_ctx_free(void *opaque,
>>>> uint8_t *data)
>>>>>             ctx->free(ctx);
>>>>>
>>>>>         av_buffer_unref(&ctx->internal->source_device);
>>>>> +    for (i = 0; i < AV_HWDEVICE_TYPE_NB; i++)
>>>>> +        av_buffer_unref(&ctx->internal->derived_devices[i]);
>>>>>
>>>>>         av_freep(&ctx->hwctx);
>>>>>         av_freep(&ctx->internal->priv);
>>>>> @@ -644,10 +647,31 @@ fail:
>>>>>         return ret;
>>>>>     }
>>>>>
>>>>> -int av_hwdevice_ctx_create_derived_opts(AVBufferRef
>> **dst_ref_ptr,
>>>>> -                                        enum AVHWDeviceType type,
>>>>> -                                        AVBufferRef *src_ref,
>>>>> -                                        AVDictionary *options,
>> int
>>>> flags)
>>>>> +static AVBufferRef* find_derived_hwdevice_ctx(AVBufferRef
>> *src_ref,
>>>> enum AVHWDeviceType type)
>>>>> +{
>>>>> +    AVBufferRef *tmp_ref;
>>>>> +    AVHWDeviceContext *src_ctx;
>>>>> +    int i;
>>>>> +
>>>>> +    src_ctx = (AVHWDeviceContext*)src_ref->data;
>>>>> +    if (src_ctx->type == type)
>>>>> +        return src_ref;
>>>>> +
>>>>> +    for (i = 0; i < AV_HWDEVICE_TYPE_NB; i++)
>>>>> +        if (src_ctx->internal->derived_devices[i]) {
>>>>> +            tmp_ref = find_derived_hwdevice_ctx(src_ctx-
>>> internal-
>>>>> derived_devices[i], type);
>>>>> +            if (tmp_ref)
>>>>> +                return tmp_ref;
>>>>> +        }
>>>>> +
>>>>> +    return NULL;
>>>>> +}
>>>>> +
>>>>> +static int hwdevice_ctx_create_derived(AVBufferRef **dst_ref_ptr,
>>>>> +                                       enum AVHWDeviceType type,
>>>>> +                                       AVBufferRef *src_ref,
>>>>> +                                       AVDictionary *options, int
>>>> flags,
>>>>> +                                       int get_existing)
>>>>>     {
>>>>>         AVBufferRef *dst_ref = NULL, *tmp_ref;
>>>>>         AVHWDeviceContext *dst_ctx, *tmp_ctx;
>>>>> @@ -667,6 +691,18 @@ int
>>>> av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
>>>>>             tmp_ref = tmp_ctx->internal->source_device;
>>>>>         }
>>>>>
>>>>> +    if (get_existing) {
>>>>> +        tmp_ref = find_derived_hwdevice_ctx(src_ref, type);
>>>>> +        if (tmp_ref) {
>>>>> +            dst_ref = av_buffer_ref(tmp_ref);
>>>>> +            if (!dst_ref) {
>>>>> +                ret = AVERROR(ENOMEM);
>>>>> +                goto fail;
>>>>> +            }
>>>>> +            goto done;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>>         dst_ref = av_hwdevice_ctx_alloc(type);
>>>>>         if (!dst_ref) {
>>>>>             ret = AVERROR(ENOMEM);
>>>>> @@ -688,6 +724,13 @@ int
>>>> av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
>>>>>                         ret = AVERROR(ENOMEM);
>>>>>                         goto fail;
>>>>>                     }
>>>>> +                if (!tmp_ctx->internal->derived_devices[type]) {
>>>>
>>>> I wonder whether you only want to do this when the user made the
>> new
>>>> call, not the old one?
>>>>
>>>> The semantics would perhaps feel clearer as "get or create the
>> shared
>>>> derived device" rather than "get the first device derived or create
>> a
>>>> new one if not".
>>>
>>> I've been there for a moment, and then I thought that when the API
>>> consumer would mix API calls, e.g. first without 'get' and second
>>> with 'get', then the second call would not produce the expected
>>> result.
>>>
>>> Let me know what you think, I have no strong opinion about this.
>>
>> Can you explain your example further?
> 
> Maybe we should get clear about what this patchset does exactly.
> Let's look at the following derivation chain of devices:
> 
> A
> ├─ X
> │  └─ Y
> ├─ B
> │  └─ C
> └─ V
>     └─ W
> 
> The meaning is:
> 
> - Y is derived from X, X is derived from A
> - C is derived from B, B is derived from A
> - W is derived from V, V is derived from A
> 
> In the existing implementation, each device "knows" its parent
> (via the 'source_device' field).
> 
> When you call av_hwdevice_ctx_create_derived() and specify "C"
> as the source device, then it will iterate the tree upwards,
> so when B is of the requested type, it will return B or if
> A is of the requested type, it will return A.
> Otherwise, it will create a new device of the requested type
> and sets C as its parent.
> 
> But it doesn't return X, Y, V or W (when any would match the
> requested type).
> 
> This is the current behavior.
> 
> 
> What this patchset does is that we also store the derived
> children for each device (derived_devices array).
> 
> In the example above, it means hat A has references to
> X, B and V. X to Y, B to C and V to W.
> 
> The behavior of the new function is as follows:
> 
> When you call av_hwdevice_ctx_get_or_create_derived() and specify "C"
> as the source device, then it will iterate the tree upwards,
> so when B is of the requested type, it will return B or if
> A is of the requested type, it will return A (like before).
> 
> Additionally, it will also iterate all through other children
> of B and other children of A. Which means that if X, Y, V or W
> matches the requested type, it would return it.

No it doesn't?  Your new function find_derived_hwdevice_ctx() is called only for the original source device, and recurses into its (transitive) children.  It can't return any of X, Y, V or W when starting from C.

> Otherwise, it will create a new device of the requested type
> and sets C as its parent.
> 
> This is the behavior of the new function.
> All that it changes is that it searches the full tree instead
> of the parents only.
 >
> Now back to your original question:
> 
>>>> +                if (!tmp_ctx->internal->derived_devices[type]) {
>>>
>>> I wonder whether you only want to do this when the user made the new
>>> call, not the old one?
>>>
>>> The semantics would perhaps feel clearer as "get or create the shared
>>> derived device" rather than "get the first device derived or create a
>>> new one if not".
> 
> The line of code you commented on, is about adding a newly created
> derived device to the child collection of its parent.

Yes, exactly - I'm suggesting only sharing the device when asked to share devices.

>>> I wonder whether you only want to do this when the user made the new
>>> call, not the old one?
> 
> The difference between old and new is that old searches only parents
> and new searches the full tree.
> But should calling the old function also control whether a newly created
> derived device is added to the child collection of its parent?
> 
> It might be confusing behavior when you first call the old function
> and later call the new function but you would get the device only
> when it's a parent but not when it's a down-tree-child of any parent.
> 
> And that's where I said:
> 
>> Let me know what you think, I have no strong opinion about this.
> 
> Best regards,
> softworkz
- Mark
Soft Works May 2, 2022, 10:59 p.m. UTC | #8
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark
> Thompson
> Sent: Tuesday, May 3, 2022 12:12 AM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add derive-
> device function which searches for existing devices in both directions
> 
> On 02/05/2022 09:14, Soft Works wrote:
> >> -----Original Message-----
> >> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Mark
> >> Thompson
> >> Sent: Monday, May 2, 2022 12:01 AM
> >> To: ffmpeg-devel@ffmpeg.org
> >> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
> derive-
> >> device function which searches for existing devices in both
> directions
> >
> > [..]
> >
> >>>> * The thread-safety properties of the hwcontext API have been
> lost
> >> -
> >>>> you can no longer operate on devices independently across threads
> >>>> (insofar as the underlying API allows that).
> >>>>      Maybe there is an argument that derivation is something
> which
> >>>> should happen early on and therefore documenting it as thread-
> >> unsafe
> >>>> is ok, but when hwupload/hwmap can use it inside filtergraphs
> that
> >>>> just isn't going to happen (and will be violated in the FFmpeg
> >> utility
> >>>> if filters get threaded, as is being worked on).
> >>>
> >>>   From my understanding there will be a single separate thread
> which
> >>> handles all filtergraph operations.
> >>> I don't think it would even be possible (without massive changes)
> >>> to arbitrate filter processing in parallel.
> >>> But even if this would be implemented: the filtergraph setup
> (init,
> >>> uninit, query_formats, etc.) would surely happen on a single
> thread.
> >>
> >> The ffmpeg utility creates filtergraphs dynamically when the first
> >> frame is available from their inputs, so I don't see why you
> wouldn't
> >> allow multiple of them to be created in parallel in that case.
> >>
> >> If you create all devices at the beginning and then give references
> to
> >> them to the various filters which need them (so never manipulate
> >> devices dynamically within the graph) then it would be ok, but I
> think
> >> you've already rejected that approach.
> >
> > Please let's not break out of the scope of this patchset.
> > This patchset is not about re-doing device derivation. The only
> > small change that it does is that it returns an existing device
> > instead of creating a new one when such device already exists
> > in the same(!) chain.
> >
> > The change it makes has really nothing to do with threading or
> > anything around it.
> 
> The change makes existing thread-safe hwcontext APIs unsafe.  That is
> definitely not "nothing to do with threading", and has to be resolved
> since they can currently be called from contexts which expect thread-
> safety (such as making filtergraphs).

As I said, I'm not aware that filtergraph creation would be a context 
which requires thread-safety, even when considering frames which get 
pushed into a filtergraph (like from a decoder).

Could you please show a command line which causes a situation where
device_derive is being called from multiple threads?


> >>>> * I'm not sure that it is reasonable to ignore options.  If an
> >>>> unrelated component derived a device before you with special
> >> options,
> >>>> you might get that device even if you have incompatible different
> >>>> options.
> >>>
> >>> I understand what you mean, but this is outside the scope of
> >>> this patchset, because when you would want to do this, it
> >>> would need to be implemented for derivation in general, not
> >>> in this patchset which only adds reverse-search to the
> >>> existing derivation functionality.
> >>
> >> I'm not sure what you mean by that?  The feature already exists;
> here
> >> is a concrete example of where you would get the wrong result:
> >>
> >> Start with VAAPI device A.
> >>
> >> Component P derives Vulkan device B with some extension options X.
> >>
> >> Component Q wants the same device as P, so it derives again with
> >> extension options X and gets B.
> >>
> >> Everything works fine for a while.
> >>
> >> Later, unrelated component R is inserted before P and Q.  It wants
> a
> >> Vulkan device C with extension options Y, so it derives that.
> >>
> >> Now component Q is broken because it gets C instead of B and has
> the
> >> wrong extensions enabled.
> >
> > As per your request, this patchset's changes are now limited to
> > use ffmpeg cli. And there is currently no support for "extension
> > options" when deriving a device.
> 
> Yes there is - see the "instance_extensions" and "device_extensions"
> options to Vulkan device derivation.  (Hence the VAAPI+Vulkan
> example.)

OK, but when deriving a device via command line, it doesn't consider
such parameters in the current device_derive function, and hence it's
not something that can be burdened onto this patchset.


> > What I meant above is this:
> >
> > Assume this patchset wouldn't be applied, but a patchset would
> > be applied that allows to specify "extension options".
> > Then, even without this patchset, I could construct a similar
> > example, where you would get the same device when deriving
> > two times from the same device but with different extension
> > options.
> 
> How?

VAAPI >> QSV(Param1) >> OpenCL

Now, from OpenCL, you want to derive QSV but with different parameters
(Param2). You won't get a new device, you get the existing QSV device.


> The existing derivation setup always makes a new device, so you can't
> accidentally get an old one.

No, not always, see above.

> The existing way of reusing devices is to keep the reference and reuse
> it directly, which means you need to pass the reference around
> explicitly and there is no problem.

You can do this as API user, but this patch is about ffmpeg cli and as
per your request limited to ffmpeg cli usage.


> >> Can you explain your example further?
> >
> > Maybe we should get clear about what this patchset does exactly.
> > Let's look at the following derivation chain of devices:
> >
> > A
> > ├─ X
> > │  └─ Y
> > ├─ B
> > │  └─ C
> > └─ V
> >     └─ W
> >
> > The meaning is:
> >
> > - Y is derived from X, X is derived from A
> > - C is derived from B, B is derived from A
> > - W is derived from V, V is derived from A
> >
> > In the existing implementation, each device "knows" its parent
> > (via the 'source_device' field).
> >
> > When you call av_hwdevice_ctx_create_derived() and specify "C"
> > as the source device, then it will iterate the tree upwards,
> > so when B is of the requested type, it will return B or if
> > A is of the requested type, it will return A.
> > Otherwise, it will create a new device of the requested type
> > and sets C as its parent.
> >
> > But it doesn't return X, Y, V or W (when any would match the
> > requested type).
> >
> > This is the current behavior.
> >
> >
> > What this patchset does is that we also store the derived
> > children for each device (derived_devices array).
> >
> > In the example above, it means hat A has references to
> > X, B and V. X to Y, B to C and V to W.
> >
> > The behavior of the new function is as follows:
> >
> > When you call av_hwdevice_ctx_get_or_create_derived() and specify
> "C"
> > as the source device, then it will iterate the tree upwards,
> > so when B is of the requested type, it will return B or if
> > A is of the requested type, it will return A (like before).
> >
> > Additionally, it will also iterate all through other children
> > of B and other children of A. Which means that if X, Y, V or W
> > matches the requested type, it would return it.
> 
> No it doesn't?  Your new function find_derived_hwdevice_ctx() is
> called only for the original source device, and recurses into its
> (transitive) children.  It can't return any of X, Y, V or W when
> starting from C.

OK, that's my mistake. It's been a while...
What I described is the original behavior. There were reasons
to limit it this way.

Best regards
softworkz
Mark Thompson May 2, 2022, 11:57 p.m. UTC | #9
On 02/05/2022 23:59, Soft Works wrote:
>> -----Original Message-----
>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark
>> Thompson
>> Sent: Tuesday, May 3, 2022 12:12 AM
>> To: ffmpeg-devel@ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add derive-
>> device function which searches for existing devices in both directions
>>
>> On 02/05/2022 09:14, Soft Works wrote:
>>>> -----Original Message-----
>>>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
>> Mark
>>>> Thompson
>>>> Sent: Monday, May 2, 2022 12:01 AM
>>>> To: ffmpeg-devel@ffmpeg.org
>>>> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
>> derive-
>>>> device function which searches for existing devices in both
>> directions
>>>
>>> [..]
>>>
>>>>>> * The thread-safety properties of the hwcontext API have been
>> lost
>>>> -
>>>>>> you can no longer operate on devices independently across threads
>>>>>> (insofar as the underlying API allows that).
>>>>>>       Maybe there is an argument that derivation is something
>> which
>>>>>> should happen early on and therefore documenting it as thread-
>>>> unsafe
>>>>>> is ok, but when hwupload/hwmap can use it inside filtergraphs
>> that
>>>>>> just isn't going to happen (and will be violated in the FFmpeg
>>>> utility
>>>>>> if filters get threaded, as is being worked on).
>>>>>
>>>>>    From my understanding there will be a single separate thread
>> which
>>>>> handles all filtergraph operations.
>>>>> I don't think it would even be possible (without massive changes)
>>>>> to arbitrate filter processing in parallel.
>>>>> But even if this would be implemented: the filtergraph setup
>> (init,
>>>>> uninit, query_formats, etc.) would surely happen on a single
>> thread.
>>>>
>>>> The ffmpeg utility creates filtergraphs dynamically when the first
>>>> frame is available from their inputs, so I don't see why you
>> wouldn't
>>>> allow multiple of them to be created in parallel in that case.
>>>>
>>>> If you create all devices at the beginning and then give references
>> to
>>>> them to the various filters which need them (so never manipulate
>>>> devices dynamically within the graph) then it would be ok, but I
>> think
>>>> you've already rejected that approach.
>>>
>>> Please let's not break out of the scope of this patchset.
>>> This patchset is not about re-doing device derivation. The only
>>> small change that it does is that it returns an existing device
>>> instead of creating a new one when such device already exists
>>> in the same(!) chain.
>>>
>>> The change it makes has really nothing to do with threading or
>>> anything around it.
>>
>> The change makes existing thread-safe hwcontext APIs unsafe.  That is
>> definitely not "nothing to do with threading", and has to be resolved
>> since they can currently be called from contexts which expect thread-
>> safety (such as making filtergraphs).
> 
> As I said, I'm not aware that filtergraph creation would be a context
> which requires thread-safety, even when considering frames which get
> pushed into a filtergraph (like from a decoder).

User programs can do whatever they like within the API constraints.

> Could you please show a command line which causes a situation where
> device_derive is being called from multiple threads?

Given that the ffmpeg utility is currently entirely single-threaded, this will only happen once the intended plans for adding threading to it are complete.

With that, it will be true for something which makes two filtergraphs and uses derivation in both of them.  For example:, this currently makes two independent VAAPI devices, but equally could reuse the same device:

# ffmpeg -y -f kmsgrab -i /dev/dri/card0 -vf hwmap=derive_device=vaapi,scale_vaapi=format=nv12 -c:v h264_vaapi out1.mp4 -vf hwmap=derive_device=vaapi,scale_vaapi=w=1280:h=720:format=nv12 -c:v h264_vaapi out2.mp4

>>>>>> * I'm not sure that it is reasonable to ignore options.  If an
>>>>>> unrelated component derived a device before you with special
>>>> options,
>>>>>> you might get that device even if you have incompatible different
>>>>>> options.
>>>>>
>>>>> I understand what you mean, but this is outside the scope of
>>>>> this patchset, because when you would want to do this, it
>>>>> would need to be implemented for derivation in general, not
>>>>> in this patchset which only adds reverse-search to the
>>>>> existing derivation functionality.
>>>>
>>>> I'm not sure what you mean by that?  The feature already exists;
>> here
>>>> is a concrete example of where you would get the wrong result:
>>>>
>>>> Start with VAAPI device A.
>>>>
>>>> Component P derives Vulkan device B with some extension options X.
>>>>
>>>> Component Q wants the same device as P, so it derives again with
>>>> extension options X and gets B.
>>>>
>>>> Everything works fine for a while.
>>>>
>>>> Later, unrelated component R is inserted before P and Q.  It wants
>> a
>>>> Vulkan device C with extension options Y, so it derives that.
>>>>
>>>> Now component Q is broken because it gets C instead of B and has
>> the
>>>> wrong extensions enabled.
>>>
>>> As per your request, this patchset's changes are now limited to
>>> use ffmpeg cli. And there is currently no support for "extension
>>> options" when deriving a device.
>>
>> Yes there is - see the "instance_extensions" and "device_extensions"
>> options to Vulkan device derivation.  (Hence the VAAPI+Vulkan
>> example.)
> 
> OK, but when deriving a device via command line, it doesn't consider
> such parameters in the current device_derive function, and hence it's
> not something that can be burdened onto this patchset.

Then it sounds like you want this function to be part of the ffmpeg utility, not inside one of the libraries which have other users.

>>> What I meant above is this:
>>>
>>> Assume this patchset wouldn't be applied, but a patchset would
>>> be applied that allows to specify "extension options".
>>> Then, even without this patchset, I could construct a similar
>>> example, where you would get the same device when deriving
>>> two times from the same device but with different extension
>>> options.
>>
>> How?
> 
> VAAPI >> QSV(Param1) >> OpenCL
> 
> Now, from OpenCL, you want to derive QSV but with different parameters
> (Param2). You won't get a new device, you get the existing QSV device.

This doesn't make sense - remember that device derivation is unidirectional, and OpenCL is a leaf API which can only derive from other things.  The "derive" operation there has to be interpreted as "return the device this was derived from", in which case options don't make sense.

(Indeed, it seems like a good idea to disallow options in that case.  I will prepare a patch to that effect.)

>> The existing derivation setup always makes a new device, so you can't
>> accidentally get an old one.
> 
> No, not always, see above.
> 
>> The existing way of reusing devices is to keep the reference and reuse
>> it directly, which means you need to pass the reference around
>> explicitly and there is no problem.
> 
> You can do this as API user, but this patch is about ffmpeg cli and as
> per your request limited to ffmpeg cli usage.
> 
> 
>>>> Can you explain your example further?
>>>
>>> Maybe we should get clear about what this patchset does exactly.
>>> Let's look at the following derivation chain of devices:
>>>
>>> A
>>> ├─ X
>>> │  └─ Y
>>> ├─ B
>>> │  └─ C
>>> └─ V
>>>      └─ W
>>>
>>> The meaning is:
>>>
>>> - Y is derived from X, X is derived from A
>>> - C is derived from B, B is derived from A
>>> - W is derived from V, V is derived from A
>>>
>>> In the existing implementation, each device "knows" its parent
>>> (via the 'source_device' field).
>>>
>>> When you call av_hwdevice_ctx_create_derived() and specify "C"
>>> as the source device, then it will iterate the tree upwards,
>>> so when B is of the requested type, it will return B or if
>>> A is of the requested type, it will return A.
>>> Otherwise, it will create a new device of the requested type
>>> and sets C as its parent.
>>>
>>> But it doesn't return X, Y, V or W (when any would match the
>>> requested type).
>>>
>>> This is the current behavior.
>>>
>>>
>>> What this patchset does is that we also store the derived
>>> children for each device (derived_devices array).
>>>
>>> In the example above, it means hat A has references to
>>> X, B and V. X to Y, B to C and V to W.
>>>
>>> The behavior of the new function is as follows:
>>>
>>> When you call av_hwdevice_ctx_get_or_create_derived() and specify
>> "C"
>>> as the source device, then it will iterate the tree upwards,
>>> so when B is of the requested type, it will return B or if
>>> A is of the requested type, it will return A (like before).
>>>
>>> Additionally, it will also iterate all through other children
>>> of B and other children of A. Which means that if X, Y, V or W
>>> matches the requested type, it would return it.
>>
>> No it doesn't?  Your new function find_derived_hwdevice_ctx() is
>> called only for the original source device, and recurses into its
>> (transitive) children.  It can't return any of X, Y, V or W when
>> starting from C.
> 
> OK, that's my mistake. It's been a while...
> What I described is the original behavior. There were reasons
> to limit it this way.

Well, which one is is actually intended then?

- Mark
Soft Works May 3, 2022, 12:11 a.m. UTC | #10
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark
> Thompson
> Sent: Tuesday, May 3, 2022 1:57 AM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add derive-
> device function which searches for existing devices in both directions
> 
> On 02/05/2022 23:59, Soft Works wrote:
> >> -----Original Message-----
> >> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Mark
> >> Thompson
> >> Sent: Tuesday, May 3, 2022 12:12 AM
> >> To: ffmpeg-devel@ffmpeg.org
> >> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
> derive-
> >> device function which searches for existing devices in both
> directions
> >>
> >> On 02/05/2022 09:14, Soft Works wrote:
> >>>> -----Original Message-----
> >>>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> >> Mark
> >>>> Thompson
> >>>> Sent: Monday, May 2, 2022 12:01 AM
> >>>> To: ffmpeg-devel@ffmpeg.org
> >>>> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
> >> derive-
> >>>> device function which searches for existing devices in both
> >> directions
> >>>
> >>> [..]
> >>>
> >>>>>> * The thread-safety properties of the hwcontext API have been
> >> lost
> >>>> -
> >>>>>> you can no longer operate on devices independently across
> threads
> >>>>>> (insofar as the underlying API allows that).
> >>>>>>       Maybe there is an argument that derivation is something
> >> which
> >>>>>> should happen early on and therefore documenting it as thread-
> >>>> unsafe
> >>>>>> is ok, but when hwupload/hwmap can use it inside filtergraphs
> >> that
> >>>>>> just isn't going to happen (and will be violated in the FFmpeg
> >>>> utility
> >>>>>> if filters get threaded, as is being worked on).
> >>>>>
> >>>>>    From my understanding there will be a single separate thread
> >> which
> >>>>> handles all filtergraph operations.
> >>>>> I don't think it would even be possible (without massive
> changes)
> >>>>> to arbitrate filter processing in parallel.
> >>>>> But even if this would be implemented: the filtergraph setup
> >> (init,
> >>>>> uninit, query_formats, etc.) would surely happen on a single
> >> thread.
> >>>>
> >>>> The ffmpeg utility creates filtergraphs dynamically when the
> first
> >>>> frame is available from their inputs, so I don't see why you
> >> wouldn't
> >>>> allow multiple of them to be created in parallel in that case.
> >>>>
> >>>> If you create all devices at the beginning and then give
> references
> >> to
> >>>> them to the various filters which need them (so never manipulate
> >>>> devices dynamically within the graph) then it would be ok, but I
> >> think
> >>>> you've already rejected that approach.
> >>>
> >>> Please let's not break out of the scope of this patchset.
> >>> This patchset is not about re-doing device derivation. The only
> >>> small change that it does is that it returns an existing device
> >>> instead of creating a new one when such device already exists
> >>> in the same(!) chain.
> >>>
> >>> The change it makes has really nothing to do with threading or
> >>> anything around it.
> >>
> >> The change makes existing thread-safe hwcontext APIs unsafe.  That
> is
> >> definitely not "nothing to do with threading", and has to be
> resolved
> >> since they can currently be called from contexts which expect
> thread-
> >> safety (such as making filtergraphs).
> >
> > As I said, I'm not aware that filtergraph creation would be a
> context
> > which requires thread-safety, even when considering frames which get
> > pushed into a filtergraph (like from a decoder).
> 
> User programs can do whatever they like within the API constraints.
> 
> > Could you please show a command line which causes a situation where
> > device_derive is being called from multiple threads?
> 
> Given that the ffmpeg utility is currently entirely single-threaded,
> this will only happen once the intended plans for adding threading to
> it are complete.

As mentioned, I don't think that would be possible easily, and from 
what I have understood, the plan is to have separate threads for decoding,
encoding and filtering but not multiple threads for filtering.


> With that, it will be true for something which makes two filtergraphs
> and uses derivation in both of them.  For example:, this currently
> makes two independent VAAPI devices, but equally could reuse the same
> device:
> 
> # ffmpeg -y -f kmsgrab -i /dev/dri/card0 -vf
> hwmap=derive_device=vaapi,scale_vaapi=format=nv12 -c:v h264_vaapi
> out1.mp4 -vf
> hwmap=derive_device=vaapi,scale_vaapi=w=1280:h=720:format=nv12 -c:v
> h264_vaapi out2.mp4

Well, multi-threading is not an issue right now, and I don't expect it
to be then.

But on a more practical take: what do you want me to do?

Guard that function with a lock? I can do this, no problem. But
none of any of the device control function does any synchronization,
so why would exactly this - and only this function need synchronization?


> >>>>>> * I'm not sure that it is reasonable to ignore options.  If an
> >>>>>> unrelated component derived a device before you with special
> >>>> options,
> >>>>>> you might get that device even if you have incompatible
> different
> >>>>>> options.
> >>>>>
> >>>>> I understand what you mean, but this is outside the scope of
> >>>>> this patchset, because when you would want to do this, it
> >>>>> would need to be implemented for derivation in general, not
> >>>>> in this patchset which only adds reverse-search to the
> >>>>> existing derivation functionality.
> >>>>
> >>>> I'm not sure what you mean by that?  The feature already exists;
> >> here
> >>>> is a concrete example of where you would get the wrong result:
> >>>>
> >>>> Start with VAAPI device A.
> >>>>
> >>>> Component P derives Vulkan device B with some extension options
> X.
> >>>>
> >>>> Component Q wants the same device as P, so it derives again with
> >>>> extension options X and gets B.
> >>>>
> >>>> Everything works fine for a while.
> >>>>
> >>>> Later, unrelated component R is inserted before P and Q.  It
> wants
> >> a
> >>>> Vulkan device C with extension options Y, so it derives that.
> >>>>
> >>>> Now component Q is broken because it gets C instead of B and has
> >> the
> >>>> wrong extensions enabled.
> >>>
> >>> As per your request, this patchset's changes are now limited to
> >>> use ffmpeg cli. And there is currently no support for "extension
> >>> options" when deriving a device.
> >>
> >> Yes there is - see the "instance_extensions" and
> "device_extensions"
> >> options to Vulkan device derivation.  (Hence the VAAPI+Vulkan
> >> example.)
> >
> > OK, but when deriving a device via command line, it doesn't consider
> > such parameters in the current device_derive function, and hence
> it's
> > not something that can be burdened onto this patchset.
> 
> Then it sounds like you want this function to be part of the ffmpeg
> utility, not inside one of the libraries which have other users.

No. I say, matching device parameters when searching for an existing 
device isn't done right now, and it's outside the scope of this patchset.


> >>> What I meant above is this:
> >>>
> >>> Assume this patchset wouldn't be applied, but a patchset would
> >>> be applied that allows to specify "extension options".
> >>> Then, even without this patchset, I could construct a similar
> >>> example, where you would get the same device when deriving
> >>> two times from the same device but with different extension
> >>> options.
> >>
> >> How?
> >
> > VAAPI >> QSV(Param1) >> OpenCL
> >
> > Now, from OpenCL, you want to derive QSV but with different
> parameters
> > (Param2). You won't get a new device, you get the existing QSV
> device.
> 
> This doesn't make sense - remember that device derivation is
> unidirectional, and OpenCL is a leaf API which can only derive from
> other things.  The "derive" operation there has to be interpreted as
> "return the device this was derived from", in which case options don't
> make sense.
> 
> (Indeed, it seems like a good idea to disallow options in that case.
> I will prepare a patch to that effect.)

OK.

> >> No it doesn't?  Your new function find_derived_hwdevice_ctx() is
> >> called only for the original source device, and recurses into its
> >> (transitive) children.  It can't return any of X, Y, V or W when
> >> starting from C.
> >
> > OK, that's my mistake. It's been a while...
> > What I described is the original behavior. There were reasons
> > to limit it this way.
> 
> Well, which one is is actually intended then?

This patchset. It is used by us and by Intel for quite a while already.

Thanks,
softworkz
Mark Thompson May 3, 2022, 8:22 p.m. UTC | #11
On 03/05/2022 01:11, Soft Works wrote:
>> -----Original Message-----
>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark
>> Thompson
>> Sent: Tuesday, May 3, 2022 1:57 AM
>> To: ffmpeg-devel@ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add derive-
>> device function which searches for existing devices in both directions
>>
>> On 02/05/2022 23:59, Soft Works wrote:
>>>> -----Original Message-----
>>>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
>> Mark
>>>> Thompson
>>>> Sent: Tuesday, May 3, 2022 12:12 AM
>>>> To: ffmpeg-devel@ffmpeg.org
>>>> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
>> derive-
>>>> device function which searches for existing devices in both
>> directions
>>>>
>>>> On 02/05/2022 09:14, Soft Works wrote:
>>>>>> -----Original Message-----
>>>>>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
>>>> Mark
>>>>>> Thompson
>>>>>> Sent: Monday, May 2, 2022 12:01 AM
>>>>>> To: ffmpeg-devel@ffmpeg.org
>>>>>> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
>>>> derive-
>>>>>> device function which searches for existing devices in both
>>>> directions
>>>>>
>>>>> [..]
>>>>>
>>>>>>>> * The thread-safety properties of the hwcontext API have been
>>>> lost
>>>>>> -
>>>>>>>> you can no longer operate on devices independently across
>> threads
>>>>>>>> (insofar as the underlying API allows that).
>>>>>>>>        Maybe there is an argument that derivation is something
>>>> which
>>>>>>>> should happen early on and therefore documenting it as thread-
>>>>>> unsafe
>>>>>>>> is ok, but when hwupload/hwmap can use it inside filtergraphs
>>>> that
>>>>>>>> just isn't going to happen (and will be violated in the FFmpeg
>>>>>> utility
>>>>>>>> if filters get threaded, as is being worked on).
>>>>>>>
>>>>>>>     From my understanding there will be a single separate thread
>>>> which
>>>>>>> handles all filtergraph operations.
>>>>>>> I don't think it would even be possible (without massive
>> changes)
>>>>>>> to arbitrate filter processing in parallel.
>>>>>>> But even if this would be implemented: the filtergraph setup
>>>> (init,
>>>>>>> uninit, query_formats, etc.) would surely happen on a single
>>>> thread.
>>>>>>
>>>>>> The ffmpeg utility creates filtergraphs dynamically when the
>> first
>>>>>> frame is available from their inputs, so I don't see why you
>>>> wouldn't
>>>>>> allow multiple of them to be created in parallel in that case.
>>>>>>
>>>>>> If you create all devices at the beginning and then give
>> references
>>>> to
>>>>>> them to the various filters which need them (so never manipulate
>>>>>> devices dynamically within the graph) then it would be ok, but I
>>>> think
>>>>>> you've already rejected that approach.
>>>>>
>>>>> Please let's not break out of the scope of this patchset.
>>>>> This patchset is not about re-doing device derivation. The only
>>>>> small change that it does is that it returns an existing device
>>>>> instead of creating a new one when such device already exists
>>>>> in the same(!) chain.
>>>>>
>>>>> The change it makes has really nothing to do with threading or
>>>>> anything around it.
>>>>
>>>> The change makes existing thread-safe hwcontext APIs unsafe.  That
>> is
>>>> definitely not "nothing to do with threading", and has to be
>> resolved
>>>> since they can currently be called from contexts which expect
>> thread-
>>>> safety (such as making filtergraphs).
>>>
>>> As I said, I'm not aware that filtergraph creation would be a
>> context
>>> which requires thread-safety, even when considering frames which get
>>> pushed into a filtergraph (like from a decoder).
>>
>> User programs can do whatever they like within the API constraints.
>>
>>> Could you please show a command line which causes a situation where
>>> device_derive is being called from multiple threads?
>>
>> Given that the ffmpeg utility is currently entirely single-threaded,
>> this will only happen once the intended plans for adding threading to
>> it are complete.
> 
> As mentioned, I don't think that would be possible easily, and from
> what I have understood, the plan is to have separate threads for decoding,
> encoding and filtering but not multiple threads for filtering.

As the summary in <https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2022-April/294940.html> states, the intent is a separate thread for each instance of those things, including filtergraphs.

>> With that, it will be true for something which makes two filtergraphs
>> and uses derivation in both of them.  For example:, this currently
>> makes two independent VAAPI devices, but equally could reuse the same
>> device:
>>
>> # ffmpeg -y -f kmsgrab -i /dev/dri/card0 -vf
>> hwmap=derive_device=vaapi,scale_vaapi=format=nv12 -c:v h264_vaapi
>> out1.mp4 -vf
>> hwmap=derive_device=vaapi,scale_vaapi=w=1280:h=720:format=nv12 -c:v
>> h264_vaapi out2.mp4
> 
> Well, multi-threading is not an issue right now, and I don't expect it
> to be then.
> 
> But on a more practical take: what do you want me to do?
> 
> Guard that function with a lock? I can do this, no problem.

I'd like it to be designed in a way that avoids this sort of problem, as all the existing functions are.

>                                                             But
> none of any of the device control function does any synchronization,
> so why would exactly this - and only this function need synchronization?

The derived_devices field is modified post-creation, which gives you data races if there is no other synchronisation.

(Consider simultaneous derivation calls: currently that's completely fine - it makes no changes to the parent device and the derived devices are independent.  With your proposed patch there is undefined behaviour because of the simultaneous reading and writing of the derived_devices field.)

The only other mutable field is the buffer pool in a frames context, which has its own internal synchronisation.

Having thought about this a bit further, I suspect you are going to need an additional shared-devices object of some kind in order to get around the circular references and fix the memory leak problem.

That additional object will have to be mutable and accessible from multiple threads, so will probably need an internal lock (or careful lock-free design).

- Mark
Soft Works May 3, 2022, 9:04 p.m. UTC | #12
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark
> Thompson
> Sent: Tuesday, May 3, 2022 10:23 PM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add derive-
> device function which searches for existing devices in both directions
> 
> On 03/05/2022 01:11, Soft Works wrote:
> >> -----Original Message-----
> >> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Mark
> >> Thompson
> >> Sent: Tuesday, May 3, 2022 1:57 AM
> >> To: ffmpeg-devel@ffmpeg.org
> >> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
> derive-
> >> device function which searches for existing devices in both
> directions
> >>
> >> On 02/05/2022 23:59, Soft Works wrote:
> >>>> -----Original Message-----
> >>>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> >> Mark
> >>>> Thompson
> >>>> Sent: Tuesday, May 3, 2022 12:12 AM
> >>>> To: ffmpeg-devel@ffmpeg.org
> >>>> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
> >> derive-
> >>>> device function which searches for existing devices in both
> >> directions
> >>>>
> >>>> On 02/05/2022 09:14, Soft Works wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf
> Of
> >>>> Mark
> >>>>>> Thompson
> >>>>>> Sent: Monday, May 2, 2022 12:01 AM
> >>>>>> To: ffmpeg-devel@ffmpeg.org
> >>>>>> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
> >>>> derive-
> >>>>>> device function which searches for existing devices in both
> >>>> directions
> >>>>>
> >>>>> [..]
> >>>>>
> >>>>>>>> * The thread-safety properties of the hwcontext API have been
> >>>> lost
> >>>>>> -
> >>>>>>>> you can no longer operate on devices independently across
> >> threads
> >>>>>>>> (insofar as the underlying API allows that).
> >>>>>>>>        Maybe there is an argument that derivation is
> something
> >>>> which
> >>>>>>>> should happen early on and therefore documenting it as
> thread-
> >>>>>> unsafe
> >>>>>>>> is ok, but when hwupload/hwmap can use it inside filtergraphs
> >>>> that
> >>>>>>>> just isn't going to happen (and will be violated in the
> FFmpeg
> >>>>>> utility
> >>>>>>>> if filters get threaded, as is being worked on).
> >>>>>>>
> >>>>>>>     From my understanding there will be a single separate
> thread
> >>>> which
> >>>>>>> handles all filtergraph operations.
> >>>>>>> I don't think it would even be possible (without massive
> >> changes)
> >>>>>>> to arbitrate filter processing in parallel.
> >>>>>>> But even if this would be implemented: the filtergraph setup
> >>>> (init,
> >>>>>>> uninit, query_formats, etc.) would surely happen on a single
> >>>> thread.
> >>>>>>
> >>>>>> The ffmpeg utility creates filtergraphs dynamically when the
> >> first
> >>>>>> frame is available from their inputs, so I don't see why you
> >>>> wouldn't
> >>>>>> allow multiple of them to be created in parallel in that case.
> >>>>>>
> >>>>>> If you create all devices at the beginning and then give
> >> references
> >>>> to
> >>>>>> them to the various filters which need them (so never
> manipulate
> >>>>>> devices dynamically within the graph) then it would be ok, but
> I
> >>>> think
> >>>>>> you've already rejected that approach.
> >>>>>
> >>>>> Please let's not break out of the scope of this patchset.
> >>>>> This patchset is not about re-doing device derivation. The only
> >>>>> small change that it does is that it returns an existing device
> >>>>> instead of creating a new one when such device already exists
> >>>>> in the same(!) chain.
> >>>>>
> >>>>> The change it makes has really nothing to do with threading or
> >>>>> anything around it.
> >>>>
> >>>> The change makes existing thread-safe hwcontext APIs unsafe.
> That
> >> is
> >>>> definitely not "nothing to do with threading", and has to be
> >> resolved
> >>>> since they can currently be called from contexts which expect
> >> thread-
> >>>> safety (such as making filtergraphs).
> >>>
> >>> As I said, I'm not aware that filtergraph creation would be a
> >> context
> >>> which requires thread-safety, even when considering frames which
> get
> >>> pushed into a filtergraph (like from a decoder).
> >>
> >> User programs can do whatever they like within the API constraints.
> >>
> >>> Could you please show a command line which causes a situation
> where
> >>> device_derive is being called from multiple threads?
> >>
> >> Given that the ffmpeg utility is currently entirely single-
> threaded,
> >> this will only happen once the intended plans for adding threading
> to
> >> it are complete.
> >
> > As mentioned, I don't think that would be possible easily, and from
> > what I have understood, the plan is to have separate threads for
> decoding,
> > encoding and filtering but not multiple threads for filtering.
> 
> As the summary in <https://lists.ffmpeg.org/pipermail/ffmpeg-
> devel/2022-April/294940.html> states, the intent is a separate thread
> for each instance of those things, including filtergraphs.
> 
> >> With that, it will be true for something which makes two
> filtergraphs
> >> and uses derivation in both of them.  For example:, this currently
> >> makes two independent VAAPI devices, but equally could reuse the
> same
> >> device:
> >>
> >> # ffmpeg -y -f kmsgrab -i /dev/dri/card0 -vf
> >> hwmap=derive_device=vaapi,scale_vaapi=format=nv12 -c:v h264_vaapi
> >> out1.mp4 -vf
> >> hwmap=derive_device=vaapi,scale_vaapi=w=1280:h=720:format=nv12 -c:v
> >> h264_vaapi out2.mp4
> >
> > Well, multi-threading is not an issue right now, and I don't expect
> it
> > to be then.
> >
> > But on a more practical take: what do you want me to do?
> >
> > Guard that function with a lock? I can do this, no problem.
> 
> I'd like it to be designed in a way that avoids this sort of problem,
> as all the existing functions are.
> 
> >                                                             But
> > none of any of the device control function does any synchronization,
> > so why would exactly this - and only this function need
> synchronization?
> 
> The derived_devices field is modified post-creation, which gives you
> data races if there is no other synchronisation.
> 
> (Consider simultaneous derivation calls: currently that's completely
> fine - it makes no changes to the parent device and the derived
> devices are independent.  With your proposed patch there is undefined
> behaviour because of the simultaneous reading and writing of the
> derived_devices field.)
> 
> The only other mutable field is the buffer pool in a frames context,
> which has its own internal synchronisation.
> 
> Having thought about this a bit further, I suspect you are going to
> need an additional shared-devices object of some kind in order to get
> around the circular references and fix the memory leak problem.
> 
> That additional object will have to be mutable and accessible from
> multiple threads, so will probably need an internal lock (or careful
> lock-free design).

Yup, that sounds quite reasonable to me.
Very helpful comment, thanks!
Soft Works May 27, 2022, 4:12 p.m. UTC | #13
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Mark Thompson
> Sent: Tuesday, May 3, 2022 10:23 PM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
> derive-device function which searches for existing devices in both
> directions
> 
> On 03/05/2022 01:11, Soft Works wrote:
> >> -----Original Message-----
> >> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Mark
> >> Thompson
> >> Sent: Tuesday, May 3, 2022 1:57 AM
> >> To: ffmpeg-devel@ffmpeg.org
> >> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
> derive-
> >> device function which searches for existing devices in both
> directions
> >>
> >> On 02/05/2022 23:59, Soft Works wrote:
> >>>> -----Original Message-----
> >>>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf
> Of
> >> Mark
> >>>> Thompson
> >>>> Sent: Tuesday, May 3, 2022 12:12 AM
> >>>> To: ffmpeg-devel@ffmpeg.org
> >>>> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
> >> derive-
> >>>> device function which searches for existing devices in both
> >> directions
> >>>>
> >>>> On 02/05/2022 09:14, Soft Works wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf
> Of
> >>>> Mark
> >>>>>> Thompson
> >>>>>> Sent: Monday, May 2, 2022 12:01 AM
> >>>>>> To: ffmpeg-devel@ffmpeg.org
> >>>>>> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
> >>>> derive-
> >>>>>> device function which searches for existing devices in both
> >>>> directions
> >>>>>
> >>>>> [..]
> >>>>>
> >>>>>>>> * The thread-safety properties of the hwcontext API have
> been
> >>>> lost
> >>>>>> -
> >>>>>>>> you can no longer operate on devices independently across
> >> threads
> >>>>>>>> (insofar as the underlying API allows that).
> >>>>>>>>        Maybe there is an argument that derivation is
> something
> >>>> which
> >>>>>>>> should happen early on and therefore documenting it as
> thread-
> >>>>>> unsafe
> >>>>>>>> is ok, but when hwupload/hwmap can use it inside
> filtergraphs
> >>>> that
> >>>>>>>> just isn't going to happen (and will be violated in the
> FFmpeg
> >>>>>> utility
> >>>>>>>> if filters get threaded, as is being worked on).
> >>>>>>>
> >>>>>>>     From my understanding there will be a single separate
> thread
> >>>> which
> >>>>>>> handles all filtergraph operations.
> >>>>>>> I don't think it would even be possible (without massive
> >> changes)
> >>>>>>> to arbitrate filter processing in parallel.
> >>>>>>> But even if this would be implemented: the filtergraph setup
> >>>> (init,
> >>>>>>> uninit, query_formats, etc.) would surely happen on a single
> >>>> thread.
> >>>>>>
> >>>>>> The ffmpeg utility creates filtergraphs dynamically when the
> >> first
> >>>>>> frame is available from their inputs, so I don't see why you
> >>>> wouldn't
> >>>>>> allow multiple of them to be created in parallel in that case.
> >>>>>>
> >>>>>> If you create all devices at the beginning and then give
> >> references
> >>>> to
> >>>>>> them to the various filters which need them (so never
> manipulate
> >>>>>> devices dynamically within the graph) then it would be ok, but
> I
> >>>> think
> >>>>>> you've already rejected that approach.
> >>>>>
> >>>>> Please let's not break out of the scope of this patchset.
> >>>>> This patchset is not about re-doing device derivation. The only
> >>>>> small change that it does is that it returns an existing device
> >>>>> instead of creating a new one when such device already exists
> >>>>> in the same(!) chain.
> >>>>>
> >>>>> The change it makes has really nothing to do with threading or
> >>>>> anything around it.
> >>>>
> >>>> The change makes existing thread-safe hwcontext APIs unsafe.
> That
> >> is
> >>>> definitely not "nothing to do with threading", and has to be
> >> resolved
> >>>> since they can currently be called from contexts which expect
> >> thread-
> >>>> safety (such as making filtergraphs).
> >>>
> >>> As I said, I'm not aware that filtergraph creation would be a
> >> context
> >>> which requires thread-safety, even when considering frames which
> get
> >>> pushed into a filtergraph (like from a decoder).
> >>
> >> User programs can do whatever they like within the API
> constraints.
> >>
> >>> Could you please show a command line which causes a situation
> where
> >>> device_derive is being called from multiple threads?
> >>
> >> Given that the ffmpeg utility is currently entirely single-
> threaded,
> >> this will only happen once the intended plans for adding threading
> to
> >> it are complete.
> >
> > As mentioned, I don't think that would be possible easily, and from
> > what I have understood, the plan is to have separate threads for
> decoding,
> > encoding and filtering but not multiple threads for filtering.
> 
> As the summary in <https://lists.ffmpeg.org/pipermail/ffmpeg-
> devel/2022-April/294940.html> states, the intent is a separate thread
> for each instance of those things, including filtergraphs.
> 
> >> With that, it will be true for something which makes two
> filtergraphs
> >> and uses derivation in both of them.  For example:, this currently
> >> makes two independent VAAPI devices, but equally could reuse the
> same
> >> device:
> >>
> >> # ffmpeg -y -f kmsgrab -i /dev/dri/card0 -vf
> >> hwmap=derive_device=vaapi,scale_vaapi=format=nv12 -c:v h264_vaapi
> >> out1.mp4 -vf
> >> hwmap=derive_device=vaapi,scale_vaapi=w=1280:h=720:format=nv12 -
> c:v
> >> h264_vaapi out2.mp4
> >
> > Well, multi-threading is not an issue right now, and I don't expect
> it
> > to be then.
> >
> > But on a more practical take: what do you want me to do?
> >
> > Guard that function with a lock? I can do this, no problem.


Hi Mark,

I have submitted a new patchset based on your suggestions:
https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=6655


> I'd like it to be designed in a way that avoids this sort of problem,
> as all the existing functions are.

Understood and accepted.


> > none of any of the device control function does any
> synchronization,
> > so why would exactly this - and only this function need
> synchronization?
> 
> The derived_devices field is modified post-creation, which gives you
> data races if there is no other synchronisation.
> 
> (Consider simultaneous derivation calls: currently that's completely
> fine - it makes no changes to the parent device and the derived
> devices are independent.  With your proposed patch there is undefined
> behaviour because of the simultaneous reading and writing of the
> derived_devices field.)
> 
> The only other mutable field is the buffer pool in a frames context,
> which has its own internal synchronisation.

I had thought there must be other cases of race conditions because
none of the other functions are employing any synchronization patterns,
but - at least with regards to device ctx - I realized that you are right
in that there aren't any races (in "regular" use cases at least).


> Having thought about this a bit further, I suspect you are going to
> need an additional shared-devices object of some kind in order to get
> around the circular references and fix the memory leak problem.
> 
> That additional object will have to be mutable and accessible from
> multiple threads, so will probably need an internal lock (or careful
> lock-free design).


I think I have found a safe and simple solution to both requirements:
weak-references and synchronization.


Weak References
===============

The use of AVBuffer and AVBufferRef is a well-established pattern, so it's
surely not an option to make any fundamental changes or start using a 
different pattern for a specific use case. Unfortunately, its  implementation
doesn't leave room for adapting it as a weak-reference pattern.

So came to the following pattern:

- When a device context is created it is registered in a global array
  (with thread-safe access) and gets a registration id (1, 2, 3, ...)
  => see register_hw_device()

- The registration id is the "weak reference"

- When a device context is freed, it gets unregistered 
  (removed from the array)
  => see unregister_hw_device()

- Now, a device context can reference other device contexts just 
  through that number (of course not knowing whether the corresponding
  device is still "alive")

- But this can be be determined by querying from the array which can return an
  AVBufferRef to that device on success´
  => see get_registered_hw_device()


Synchronization
===============

- All register and unregister calls are locked by a mutex
- Querying for a device by registration id is locked in the same way
  and returns an AVBufferRef to the device context, making sure
  that the returned device context will remain valid and not be
  freed


Specifics
=========

I had one problem to make it work as described above: How to return
an AVBufferRef to the device context from get_registered_hw_device()?

Currently, an AVBufferRef can only be created from another AVBufferRef.
But neither the device context nor the registration array can store
an AVBufferRef to the context, because that additional reference would
cause the device context to never be freed at all. 
Neither could we use the AVBufferRef that is supplied for 
av_hwdevice_ctx_init(), because it could be released at some time while
another AVBufferRef is still holding a reference.

That's why I chose to use and store AVBuffer itself. AVBuffer itself 
is not suitable for use as a weak reference, because it's not possible
to determine from the pointer once and whether it is av_free'd or not.

Though, in the context of hwdevice we don't have that problem due
to the use of hwdevice_ctx_free() as a custom callback for freeing
the AVBuffer, so we can safely use AVBuffer to store as a reference
in the global device context registration array.

The reference count is stored in AVBuffer anyway, and that leaves us
with just one requirement:

Being able to create an AVBufferRef from an AVBuffer directly instead
of from another AVBufferRef.
This is done in PATCH 1/4: add av_ref_from_buffer() function



I hope this makes sense to you and fulfills the requirements you 
are asking for.
Please let me know what you think.

Thanks,
sw
diff mbox series

Patch

diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c
index ab9ad3703e..1aea7dd5c3 100644
--- a/libavutil/hwcontext.c
+++ b/libavutil/hwcontext.c
@@ -123,6 +123,7 @@  static const AVClass hwdevice_ctx_class = {
 static void hwdevice_ctx_free(void *opaque, uint8_t *data)
 {
     AVHWDeviceContext *ctx = (AVHWDeviceContext*)data;
+    int i;
 
     /* uninit might still want access the hw context and the user
      * free() callback might destroy it, so uninit has to be called first */
@@ -133,6 +134,8 @@  static void hwdevice_ctx_free(void *opaque, uint8_t *data)
         ctx->free(ctx);
 
     av_buffer_unref(&ctx->internal->source_device);
+    for (i = 0; i < AV_HWDEVICE_TYPE_NB; i++)
+        av_buffer_unref(&ctx->internal->derived_devices[i]);
 
     av_freep(&ctx->hwctx);
     av_freep(&ctx->internal->priv);
@@ -644,10 +647,31 @@  fail:
     return ret;
 }
 
-int av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
-                                        enum AVHWDeviceType type,
-                                        AVBufferRef *src_ref,
-                                        AVDictionary *options, int flags)
+static AVBufferRef* find_derived_hwdevice_ctx(AVBufferRef *src_ref, enum AVHWDeviceType type)
+{
+    AVBufferRef *tmp_ref;
+    AVHWDeviceContext *src_ctx;
+    int i;
+
+    src_ctx = (AVHWDeviceContext*)src_ref->data;
+    if (src_ctx->type == type)
+        return src_ref;
+
+    for (i = 0; i < AV_HWDEVICE_TYPE_NB; i++)
+        if (src_ctx->internal->derived_devices[i]) {
+            tmp_ref = find_derived_hwdevice_ctx(src_ctx->internal->derived_devices[i], type);
+            if (tmp_ref)
+                return tmp_ref;
+        }
+
+    return NULL;
+}
+
+static int hwdevice_ctx_create_derived(AVBufferRef **dst_ref_ptr,
+                                       enum AVHWDeviceType type,
+                                       AVBufferRef *src_ref,
+                                       AVDictionary *options, int flags,
+                                       int get_existing)
 {
     AVBufferRef *dst_ref = NULL, *tmp_ref;
     AVHWDeviceContext *dst_ctx, *tmp_ctx;
@@ -667,6 +691,18 @@  int av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
         tmp_ref = tmp_ctx->internal->source_device;
     }
 
+    if (get_existing) {
+        tmp_ref = find_derived_hwdevice_ctx(src_ref, type);
+        if (tmp_ref) {
+            dst_ref = av_buffer_ref(tmp_ref);
+            if (!dst_ref) {
+                ret = AVERROR(ENOMEM);
+                goto fail;
+            }
+            goto done;
+        }
+    }
+
     dst_ref = av_hwdevice_ctx_alloc(type);
     if (!dst_ref) {
         ret = AVERROR(ENOMEM);
@@ -688,6 +724,13 @@  int av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
                     ret = AVERROR(ENOMEM);
                     goto fail;
                 }
+                if (!tmp_ctx->internal->derived_devices[type]) {
+                    tmp_ctx->internal->derived_devices[type] = av_buffer_ref(dst_ref);
+                    if (!tmp_ctx->internal->derived_devices[type]) {
+                        ret = AVERROR(ENOMEM);
+                        goto fail;
+                    }
+                }
                 ret = av_hwdevice_ctx_init(dst_ref);
                 if (ret < 0)
                     goto fail;
@@ -712,12 +755,29 @@  fail:
     return ret;
 }
 
+int av_hwdevice_ctx_create_derived_opts(AVBufferRef **dst_ref_ptr,
+                                        enum AVHWDeviceType type,
+                                        AVBufferRef *src_ref,
+                                        AVDictionary *options, int flags)
+{
+    return hwdevice_ctx_create_derived(dst_ref_ptr, type, src_ref,
+                                       options, flags, 0);
+}
+
+int av_hwdevice_ctx_get_or_create_derived(AVBufferRef **dst_ref_ptr,
+                                          enum AVHWDeviceType type,
+                                          AVBufferRef *src_ref, int flags)
+{
+    return hwdevice_ctx_create_derived(dst_ref_ptr, type, src_ref,
+                                       NULL, flags, 1);
+}
+
 int av_hwdevice_ctx_create_derived(AVBufferRef **dst_ref_ptr,
                                    enum AVHWDeviceType type,
                                    AVBufferRef *src_ref, int flags)
 {
-    return av_hwdevice_ctx_create_derived_opts(dst_ref_ptr, type, src_ref,
-                                               NULL, flags);
+    return hwdevice_ctx_create_derived(dst_ref_ptr, type, src_ref,
+                                       NULL, flags, 0);
 }
 
 static void ff_hwframe_unmap(void *opaque, uint8_t *data)
diff --git a/libavutil/hwcontext.h b/libavutil/hwcontext.h
index c18b7e1e8b..3785811f98 100644
--- a/libavutil/hwcontext.h
+++ b/libavutil/hwcontext.h
@@ -37,6 +37,7 @@  enum AVHWDeviceType {
     AV_HWDEVICE_TYPE_OPENCL,
     AV_HWDEVICE_TYPE_MEDIACODEC,
     AV_HWDEVICE_TYPE_VULKAN,
+    AV_HWDEVICE_TYPE_NB,          ///< number of hw device types
 };
 
 typedef struct AVHWDeviceInternal AVHWDeviceInternal;
@@ -328,6 +329,25 @@  int av_hwdevice_ctx_create_derived(AVBufferRef **dst_ctx,
                                    enum AVHWDeviceType type,
                                    AVBufferRef *src_ctx, int flags);
 
+/**
+ * Create a new device of the specified type from an existing device.
+ *
+ * This function performs the same action as av_hwdevice_ctx_create_derived,
+ * however, if a derived device of the specified type already exists,
+ * it returns the existing instance.
+ *
+ * @param dst_ctx On success, a reference to the newly-created
+ *                AVHWDeviceContext.
+ * @param type    The type of the new device to create.
+ * @param src_ctx A reference to an existing AVHWDeviceContext which will be
+ *                used to create the new device.
+ * @param flags   Currently unused; should be set to zero.
+ * @return        Zero on success, a negative AVERROR code on failure.
+ */
+int av_hwdevice_ctx_get_or_create_derived(AVBufferRef **dst_ctx,
+                                          enum AVHWDeviceType type,
+                                          AVBufferRef *src_ctx, int flags);
+
 /**
  * Create a new device of the specified type from an existing device.
  *
diff --git a/libavutil/hwcontext_internal.h b/libavutil/hwcontext_internal.h
index e6266494ac..f6fb67c491 100644
--- a/libavutil/hwcontext_internal.h
+++ b/libavutil/hwcontext_internal.h
@@ -109,6 +109,12 @@  struct AVHWDeviceInternal {
      * context it was derived from.
      */
     AVBufferRef *source_device;
+
+    /**
+     * An array of reference to device contexts which
+     * were derived from this device.
+     */
+    AVBufferRef *derived_devices[AV_HWDEVICE_TYPE_NB];
 };
 
 struct AVHWFramesInternal {
diff --git a/libavutil/hwcontext_qsv.c b/libavutil/hwcontext_qsv.c
index 66c0e38955..f24d06fe97 100644
--- a/libavutil/hwcontext_qsv.c
+++ b/libavutil/hwcontext_qsv.c
@@ -309,7 +309,7 @@  static void qsv_frames_uninit(AVHWFramesContext *ctx)
     av_buffer_unref(&s->child_frames_ref);
 }
 
-static void qsv_pool_release_dummy(void *opaque, uint8_t *data)
+static void qsv_release_dummy(void *opaque, uint8_t *data)
 {
 }
 
@@ -322,7 +322,7 @@  static AVBufferRef *qsv_pool_alloc(void *opaque, size_t size)
     if (s->nb_surfaces_used < hwctx->nb_surfaces) {
         s->nb_surfaces_used++;
         return av_buffer_create((uint8_t*)(s->surfaces_internal + s->nb_surfaces_used - 1),
-                                sizeof(*hwctx->surfaces), qsv_pool_release_dummy, NULL, 0);
+                                sizeof(*hwctx->surfaces), qsv_release_dummy, NULL, 0);
     }
 
     return NULL;
@@ -1649,8 +1649,15 @@  static int qsv_device_create(AVHWDeviceContext *ctx, const char *device,
     child_device = (AVHWDeviceContext*)priv->child_device_ctx->data;
 
     impl = choose_implementation(device, child_device_type);
+    ret = qsv_device_derive_from_child(ctx, impl, child_device, 0);
+    if (ret >= 0) {
+        ctx->internal->source_device = av_buffer_ref(priv->child_device_ctx);
+        child_device->internal->derived_devices[ctx->type] = av_buffer_create((uint8_t*)ctx, sizeof(*ctx), qsv_release_dummy, ctx, 0);
+        if (!child_device->internal->derived_devices[ctx->type])
+            return AVERROR(ENOMEM);
+    }
 
-    return qsv_device_derive_from_child(ctx, impl, child_device, 0);
+    return ret;
 }
 
 const HWContextType ff_hwcontext_type_qsv = {