[FFmpeg-devel] lavc/amfenc: Retain a reference to D3D frames used as input during the encoding process

Submitted by Alexander Kravchenko on April 6, 2018, 10:25 a.m.

Details

Message ID 01dc01d3cd91$8b2758c0$a1760a40$@gmail.com
State New
Headers show

Commit Message

Alexander Kravchenko April 6, 2018, 10:25 a.m.
> 
> This breaks the testcase described in
> https://trac.ffmpeg.org/ticket/6990 which is basically the same as the
> one you described in this patch.
> 
> I get the following spammed repeatedly:
> 
> [AVHWFramesContext @ 000000000502d340] Static surface pool size exceeded.
> [mpeg2video @ 0000000002f8d400] get_buffer() failed
> [mpeg2video @ 0000000002f8d400] thread_get_buffer() failed
> [mpeg2video @ 0000000002f8d400] get_buffer() failed (-12 0000000000000000)
> Error while decoding stream #0:1: Operation not permitted

Hi, could you please review the following updated patch
I added queue limit according initial pool size of incoming frame context.
This solves the problem running this pipeline without -extra_hw_frames 16 option, but -extra_hw_frames option can be used to have bigger queue for encoder.


---
 libavcodec/amfenc.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 libavcodec/amfenc.h |  3 ++
 2 files changed, 99 insertions(+), 1 deletion(-)

Comments

James Almer April 6, 2018, 2 p.m.
On 4/6/2018 7:25 AM, Alexander Kravchenko wrote:
>>
>> This breaks the testcase described in
>> https://trac.ffmpeg.org/ticket/6990 which is basically the same as the
>> one you described in this patch.
>>
>> I get the following spammed repeatedly:
>>
>> [AVHWFramesContext @ 000000000502d340] Static surface pool size exceeded.
>> [mpeg2video @ 0000000002f8d400] get_buffer() failed
>> [mpeg2video @ 0000000002f8d400] thread_get_buffer() failed
>> [mpeg2video @ 0000000002f8d400] get_buffer() failed (-12 0000000000000000)
>> Error while decoding stream #0:1: Operation not permitted
> 
> Hi, could you please review the following updated patch
> I added queue limit according initial pool size of incoming frame context.
> This solves the problem running this pipeline without -extra_hw_frames 16 option, but -extra_hw_frames option can be used to have bigger queue for encoder.

Yes, this solves it, and the output does indeed look good now. Thanks.

I'll leave reviewing this patch to someone more familiar with hw encoding.
Alexander Kravchenko April 6, 2018, 4:02 p.m.
> >> This breaks the testcase described in
> >> https://trac.ffmpeg.org/ticket/6990 which is basically the same as the
> >> one you described in this patch.
> >>
> >> I get the following spammed repeatedly:
> >>
> >> [AVHWFramesContext @ 000000000502d340] Static surface pool size exceeded.
> >> [mpeg2video @ 0000000002f8d400] get_buffer() failed
> >> [mpeg2video @ 0000000002f8d400] thread_get_buffer() failed
> >> [mpeg2video @ 0000000002f8d400] get_buffer() failed (-12 0000000000000000)
> >> Error while decoding stream #0:1: Operation not permitted
> >
> > Hi, could you please review the following updated patch
> > I added queue limit according initial pool size of incoming frame context.
> > This solves the problem running this pipeline without -extra_hw_frames 16 option, but -extra_hw_frames option can be used to
> have bigger queue for encoder.
> 
> Yes, this solves it, and the output does indeed look good now. Thanks.
> 
> I'll leave reviewing this patch to someone more familiar with hw encoding.
Thanks, James.

I think the following approaches could help to solve issues with pool size more flexible way
1) communication between decoder and the following frame consumer component (filter or encoder) about extra frames requirement in hw frame pool.
2) add function like av_buffer_pool_test(AVBufferPool *pool); which returns current status of pool (used/free frames count).
Mark Thompson April 8, 2018, 5:25 p.m.
On 06/04/18 11:25, Alexander Kravchenko wrote:
>>
>> This breaks the testcase described in
>> https://trac.ffmpeg.org/ticket/6990 which is basically the same as the
>> one you described in this patch.
>>
>> I get the following spammed repeatedly:
>>
>> [AVHWFramesContext @ 000000000502d340] Static surface pool size exceeded.
>> [mpeg2video @ 0000000002f8d400] get_buffer() failed
>> [mpeg2video @ 0000000002f8d400] thread_get_buffer() failed
>> [mpeg2video @ 0000000002f8d400] get_buffer() failed (-12 0000000000000000)
>> Error while decoding stream #0:1: Operation not permitted
> 
> Hi, could you please review the following updated patch
> I added queue limit according initial pool size of incoming frame context.
> This solves the problem running this pipeline without -extra_hw_frames 16 option, but -extra_hw_frames option can be used to have bigger queue for encoder.

I think you've misunderstood /why/ the decoder has the pool size allocation that it does.  The decoder expects to use all of the surfaces it has allocated in the worst case - the difference between MPEG-2 and H.264 is that MPEG-2 can store at most two reference frames (the forward and backward references for B-frames), while H.264 can store up to sixteen.  Most H.264 streams don't use all sixteen references, but in theory they could (excepting level restrictions, but they are generally quite iffy) so the decoder allocates space for all of those references even if they aren't used.

I can believe that this patch happens to work if you have a simple stream with limited references (streams rarely use more than two or three), but it will certainly fail exactly as before for complex streams.

If you want to hold onto more than one frame in the encoder then currently you need to use the -extra_hw_frames option on the source (whether decoder or filter) - that is exactly what it's there for.  Some sort of automatic negotiation is suggested (there was some discussion on libav-devel a while ago), but the requirement that it works through libavfilter is a difficult one with the current structure so nothing concrete is yet proposed.  (That was mostly considering libmfx, where it's even more of a problem because the lookahead options can make the encoder queue over a hundred frames internally.)

> ---
>  libavcodec/amfenc.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  libavcodec/amfenc.h |  3 ++
>  2 files changed, 99 insertions(+), 1 deletion(-)
> 
> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
> index 89a10ff253..eb7b20c252 100644
> --- a/libavcodec/amfenc.c
> +++ b/libavcodec/amfenc.c
> @@ -157,6 +157,9 @@ static int amf_init_context(AVCodecContext *avctx)
>      AmfContext         *ctx = avctx->priv_data;
>      AMF_RESULT          res = AMF_OK;
>  
> +    ctx->hwsurfaces_in_queue = 0;
> +    ctx->hwsurfaces_in_queue_max = 16;
> +
>      // configure AMF logger
>      // the return of these functions indicates old state and do not affect behaviour
>      ctx->trace->pVtbl->EnableWriter(ctx->trace, AMF_TRACE_WRITER_DEBUG_OUTPUT, ctx->log_to_dbg != 0 );
> @@ -187,6 +190,7 @@ static int amf_init_context(AVCodecContext *avctx)
>                          if (!ctx->hw_frames_ctx) {
>                              return AVERROR(ENOMEM);
>                          }
> +                        ctx->hwsurfaces_in_queue_max = device_ctx->initial_pool_size - 1;
>                      } else {
>                          if(res == AMF_NOT_SUPPORTED)
>                              av_log(avctx, AV_LOG_INFO, "avctx->hw_frames_ctx has D3D11 device which doesn't have D3D11VA interface, switching to default\n");
> @@ -443,6 +447,73 @@ int ff_amf_encode_init(AVCodecContext *avctx)
>      return ret;
>  }
>  
> +#define AMF_AV_QUERY_INTERFACE(res, from, InterfaceTypeTo, to) \
> +    { \
> +        AMFGuid guid_##InterfaceTypeTo = IID_##InterfaceTypeTo(); \
> +        res = from->pVtbl->QueryInterface(from, &guid_##InterfaceTypeTo, (void**)&to); \
> +    }
> +
> +#define AMF_AV_ASSIGN_PROPERTY_INTERFACE(res, pThis, name, val) \
> +    { \
> +        AMFInterface *amf_interface; \
> +        AMFVariantStruct var; \
> +        res = AMFVariantInit(&var); \
> +        if (res == AMF_OK) { \
> +            AMF_AV_QUERY_INTERFACE(res, val, AMFInterface, amf_interface)\
> +            if (res == AMF_OK) { \
> +                res = AMFVariantAssignInterface(&var, amf_interface); \
> +                amf_interface->pVtbl->Release(amf_interface); \
> +            } \
> +            if (res == AMF_OK) { \
> +                res = pThis->pVtbl->SetProperty(pThis, name, var); \
> +            } \
> +            AMFVariantClear(&var); \
> +        }\
> +    }
> +
> +#define AMF_AV_GET_PROPERTY_INTERFACE(res, pThis, name, TargetType, val) \
> +    { \
> +        AMFVariantStruct var; \
> +        res = AMFVariantInit(&var); \
> +        if (res == AMF_OK) { \
> +            res = pThis->pVtbl->GetProperty(pThis, name, &var); \
> +            if (res == AMF_OK) { \
> +                if (var.type == AMF_VARIANT_INTERFACE && AMFVariantInterface(&var)) { \
> +                    AMF_AV_QUERY_INTERFACE(res, AMFVariantInterface(&var), TargetType, val); \
> +                } else { \
> +                    res = AMF_INVALID_DATA_TYPE; \
> +                } \
> +            } \
> +            AMFVariantClear(&var); \
> +        }\
> +    }

These macros are only expanded once each and with a single type.  Do you intend to use them again in future patches with different types?  If not, it might be easier not to have them at all, or turn them into functions.

> +
> +static AMFBuffer* amf_create_buffer_with_frame_ref(const AVFrame* frame, AMFContext *context)
> +{
> +    AVFrame *frame_ref;
> +    AMFBuffer *frame_ref_storage_buffer = NULL;
> +    AMF_RESULT res;
> +
> +    res = context->pVtbl->AllocBuffer(context, AMF_MEMORY_HOST, sizeof(frame_ref), &frame_ref_storage_buffer);
> +    if (res == AMF_OK) {
> +        frame_ref = av_frame_clone(frame);
> +        if (frame_ref) {
> +            memcpy(frame_ref_storage_buffer->pVtbl->GetNative(frame_ref_storage_buffer), &frame_ref, sizeof(frame_ref));
> +        } else {
> +            frame_ref_storage_buffer->pVtbl->Release(frame_ref_storage_buffer);
> +            frame_ref_storage_buffer = NULL;
> +        }
> +    }
> +    return frame_ref_storage_buffer;
> +}
> +
> +static void amf_release_buffer_with_frame_ref(AMFBuffer *frame_ref_storage_buffer)
> +{
> +    AVFrame *av_frame_ref;
> +    memcpy(&av_frame_ref, frame_ref_storage_buffer->pVtbl->GetNative(frame_ref_storage_buffer), sizeof(av_frame_ref));
> +    av_frame_free(&av_frame_ref);
> +    frame_ref_storage_buffer->pVtbl->Release(frame_ref_storage_buffer);
> +} 
>  
>  int ff_amf_send_frame(AVCodecContext *avctx, const AVFrame *frame)
>  {
> @@ -484,6 +555,7 @@ int ff_amf_send_frame(AVCodecContext *avctx, const AVFrame *frame)
>              (ctx->hw_device_ctx && ((AVHWFramesContext*)frame->hw_frames_ctx->data)->device_ctx ==
>              (AVHWDeviceContext*)ctx->hw_device_ctx->data)
>          )) {
> +            AMFBuffer *frame_ref_storage_buffer;
>  #if CONFIG_D3D11VA
>              static const GUID AMFTextureArrayIndexGUID = { 0x28115527, 0xe7c3, 0x4b66, { 0x99, 0xd3, 0x4f, 0x2a, 0xe6, 0xb4, 0x7f, 0xaf } };
>              ID3D11Texture2D *texture = (ID3D11Texture2D*)frame->data[0]; // actual texture
> @@ -496,6 +568,17 @@ int ff_amf_send_frame(AVCodecContext *avctx, const AVFrame *frame)
>              // input HW surfaces can be vertically aligned by 16; tell AMF the real size
>              surface->pVtbl->SetCrop(surface, 0, 0, frame->width, frame->height);
>  #endif
> +
> +            frame_ref_storage_buffer = amf_create_buffer_with_frame_ref(frame, ctx->context);
> +            AMF_RETURN_IF_FALSE(ctx, frame_ref_storage_buffer != NULL, AVERROR(ENOMEM), "create_buffer_with_frame_ref() returned NULL\n");
> +
> +            AMF_AV_ASSIGN_PROPERTY_INTERFACE(res, surface, L"av_frame_ref", frame_ref_storage_buffer);
> +            if (res != AMF_OK) {
> +                surface->pVtbl->Release(surface);
> +            }
> +            ctx->hwsurfaces_in_queue++;
> +            frame_ref_storage_buffer->pVtbl->Release(frame_ref_storage_buffer);
> +            AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR(ENOMEM), "SetProperty failed for \"av_frame_ref\" with error %d\n", res);
>          } else {
>              res = ctx->context->pVtbl->AllocSurface(ctx->context, AMF_MEMORY_HOST, ctx->format, avctx->width, avctx->height, &surface);
>              AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR(ENOMEM), "AllocSurface() failed  with error %d\n", res);
> @@ -560,6 +643,18 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
>              ret = amf_copy_buffer(avctx, avpkt, buffer);
>  
>              buffer->pVtbl->Release(buffer);
> +
> +            if (data->pVtbl->HasProperty(data, L"av_frame_ref")) {
> +                AMFBuffer *frame_ref_storage_buffer;
> +                AMF_AV_GET_PROPERTY_INTERFACE(res, data, L"av_frame_ref", AMFBuffer, frame_ref_storage_buffer);
> +                if (res == AMF_OK) {
> +                    amf_release_buffer_with_frame_ref(frame_ref_storage_buffer);
> +                    ctx->hwsurfaces_in_queue--;
> +                } else {
> +                    av_log(avctx, AV_LOG_WARNING, "GetProperty failed for \"av_frame_ref\" with error %d\n", res);

Can you get this warning in normal use?  It seems like it should be fatal, since a frame reference has been completely lost.

> +                }
> +            }
> +
>              data->pVtbl->Release(data);
>  
>              AMF_RETURN_IF_FALSE(ctx, ret >= 0, ret, "amf_copy_buffer() failed with error %d\n", ret);
> @@ -589,7 +684,7 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
>                      av_log(avctx, AV_LOG_WARNING, "Data acquired but delayed drain submission got AMF_INPUT_FULL- should not happen\n");
>                  }
>              }
> -        } else if (ctx->delayed_surface != NULL || ctx->delayed_drain || (ctx->eof && res_query != AMF_EOF)) {
> +        } else if (ctx->delayed_surface != NULL || ctx->delayed_drain || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= ctx->hwsurfaces_in_queue_max)) {
>              block_and_wait = 1;
>              av_usleep(1000); // wait and poll again
>          }
> diff --git a/libavcodec/amfenc.h b/libavcodec/amfenc.h
> index 84f0aad2fa..b1361842bd 100644
> --- a/libavcodec/amfenc.h
> +++ b/libavcodec/amfenc.h
> @@ -62,6 +62,9 @@ typedef struct AmfContext {
>      AVBufferRef        *hw_device_ctx; ///< pointer to HW accelerator (decoder)
>      AVBufferRef        *hw_frames_ctx; ///< pointer to HW accelerator (frame allocator)
>  
> +    int                 hwsurfaces_in_queue;
> +    int                 hwsurfaces_in_queue_max;
> +
>      // helpers to handle async calls
>      int                 delayed_drain;
>      AMFSurface         *delayed_surface;
>
Alexander Kravchenko April 8, 2018, 7:13 p.m.
> -----Original Message-----
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf
> Of Mark Thompson
> Sent: Sunday, April 8, 2018 8:25 PM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH] lavc/amfenc: Retain a reference to D3D
> frames used as input during the encoding process
> 
> On 06/04/18 11:25, Alexander Kravchenko wrote:
> >>
> >> This breaks the testcase described in
> >> https://trac.ffmpeg.org/ticket/6990 which is basically the same as
> >> the one you described in this patch.
> >>
> >> I get the following spammed repeatedly:
> >>
> >> [AVHWFramesContext @ 000000000502d340] Static surface pool size
> exceeded.
> >> [mpeg2video @ 0000000002f8d400] get_buffer() failed [mpeg2video @
> >> 0000000002f8d400] thread_get_buffer() failed [mpeg2video @
> >> 0000000002f8d400] get_buffer() failed (-12 0000000000000000) Error
> >> while decoding stream #0:1: Operation not permitted
> >
> > Hi, could you please review the following updated patch I added queue
> > limit according initial pool size of incoming frame context.
> > This solves the problem running this pipeline without -extra_hw_frames 16
> option, but -extra_hw_frames option can be used to have bigger queue for
> encoder.
> 
> I think you've misunderstood /why/ the decoder has the pool size allocation
> that it does.  The decoder expects to use all of the surfaces it has allocated in
> the worst case - the difference between MPEG-2 and H.264 is that MPEG-2
> can store at most two reference frames (the forward and backward
> references for B-frames), while H.264 can store up to sixteen.  Most H.264
> streams don't use all sixteen references, but in theory they could (excepting
> level restrictions, but they are generally quite iffy) so the decoder allocates
> space for all of those references even if they aren't used.
> 
> I can believe that this patch happens to work if you have a simple stream
> with limited references (streams rarely use more than two or three), but it
> will certainly fail exactly as before for complex streams.
> 
> If you want to hold onto more than one frame in the encoder then currently
> you need to use the -extra_hw_frames option on the source (whether
> decoder or filter) - that is exactly what it's there for.  Some sort of automatic
> negotiation is suggested (there was some discussion on libav-devel a while
> ago), but the requirement that it works through libavfilter is a difficult one
> with the current structure so nothing concrete is yet proposed.  (That was
> mostly considering libmfx, where it's even more of a problem because the
> lookahead options can make the encoder queue over a hundred frames
> internally.)
> 
> > ---
> >  libavcodec/amfenc.c | 97
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  libavcodec/amfenc.h |  3 ++
> >  2 files changed, 99 insertions(+), 1 deletion(-)
> >
> > diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c index
> > 89a10ff253..eb7b20c252 100644
> > --- a/libavcodec/amfenc.c
> > +++ b/libavcodec/amfenc.c
> > @@ -157,6 +157,9 @@ static int amf_init_context(AVCodecContext *avctx)
> >      AmfContext         *ctx = avctx->priv_data;
> >      AMF_RESULT          res = AMF_OK;
> >
> > +    ctx->hwsurfaces_in_queue = 0;
> > +    ctx->hwsurfaces_in_queue_max = 16;
> > +
> >      // configure AMF logger
> >      // the return of these functions indicates old state and do not affect
> behaviour
> >      ctx->trace->pVtbl->EnableWriter(ctx->trace,
> > AMF_TRACE_WRITER_DEBUG_OUTPUT, ctx->log_to_dbg != 0 ); @@ -187,6
> +190,7 @@ static int amf_init_context(AVCodecContext *avctx)
> >                          if (!ctx->hw_frames_ctx) {
> >                              return AVERROR(ENOMEM);
> >                          }
> > +                        ctx->hwsurfaces_in_queue_max =
> > + device_ctx->initial_pool_size - 1;
> >                      } else {
> >                          if(res == AMF_NOT_SUPPORTED)
> >                              av_log(avctx, AV_LOG_INFO,
> > "avctx->hw_frames_ctx has D3D11 device which doesn't have D3D11VA
> interface, switching to default\n"); @@ -443,6 +447,73 @@ int
> ff_amf_encode_init(AVCodecContext *avctx)
> >      return ret;
> >  }
> >
> > +#define AMF_AV_QUERY_INTERFACE(res, from, InterfaceTypeTo, to) \
> > +    { \
> > +        AMFGuid guid_##InterfaceTypeTo = IID_##InterfaceTypeTo(); \
> > +        res = from->pVtbl->QueryInterface(from, &guid_##InterfaceTypeTo,
> (void**)&to); \
> > +    }
> > +
> > +#define AMF_AV_ASSIGN_PROPERTY_INTERFACE(res, pThis, name, val) \
> > +    { \
> > +        AMFInterface *amf_interface; \
> > +        AMFVariantStruct var; \
> > +        res = AMFVariantInit(&var); \
> > +        if (res == AMF_OK) { \
> > +            AMF_AV_QUERY_INTERFACE(res, val, AMFInterface,
> amf_interface)\
> > +            if (res == AMF_OK) { \
> > +                res = AMFVariantAssignInterface(&var, amf_interface); \
> > +                amf_interface->pVtbl->Release(amf_interface); \
> > +            } \
> > +            if (res == AMF_OK) { \
> > +                res = pThis->pVtbl->SetProperty(pThis, name, var); \
> > +            } \
> > +            AMFVariantClear(&var); \
> > +        }\
> > +    }
> > +
> > +#define AMF_AV_GET_PROPERTY_INTERFACE(res, pThis, name,
> TargetType, val) \
> > +    { \
> > +        AMFVariantStruct var; \
> > +        res = AMFVariantInit(&var); \
> > +        if (res == AMF_OK) { \
> > +            res = pThis->pVtbl->GetProperty(pThis, name, &var); \
> > +            if (res == AMF_OK) { \
> > +                if (var.type == AMF_VARIANT_INTERFACE &&
> AMFVariantInterface(&var)) { \
> > +                    AMF_AV_QUERY_INTERFACE(res, AMFVariantInterface(&var),
> TargetType, val); \
> > +                } else { \
> > +                    res = AMF_INVALID_DATA_TYPE; \
> > +                } \
> > +            } \
> > +            AMFVariantClear(&var); \
> > +        }\
> > +    }
> 
> These macros are only expanded once each and with a single type.  Do you
> intend to use them again in future patches with different types?  If not, it
> might be easier not to have them at all, or turn them into functions.
> 
> > +
> > +static AMFBuffer* amf_create_buffer_with_frame_ref(const AVFrame*
> > +frame, AMFContext *context) {
> > +    AVFrame *frame_ref;
> > +    AMFBuffer *frame_ref_storage_buffer = NULL;
> > +    AMF_RESULT res;
> > +
> > +    res = context->pVtbl->AllocBuffer(context, AMF_MEMORY_HOST,
> sizeof(frame_ref), &frame_ref_storage_buffer);
> > +    if (res == AMF_OK) {
> > +        frame_ref = av_frame_clone(frame);
> > +        if (frame_ref) {
> > +            memcpy(frame_ref_storage_buffer->pVtbl-
> >GetNative(frame_ref_storage_buffer), &frame_ref, sizeof(frame_ref));
> > +        } else {
> > +            frame_ref_storage_buffer->pVtbl-
> >Release(frame_ref_storage_buffer);
> > +            frame_ref_storage_buffer = NULL;
> > +        }
> > +    }
> > +    return frame_ref_storage_buffer;
> > +}
> > +
> > +static void amf_release_buffer_with_frame_ref(AMFBuffer
> > +*frame_ref_storage_buffer) {
> > +    AVFrame *av_frame_ref;
> > +    memcpy(&av_frame_ref, frame_ref_storage_buffer->pVtbl-
> >GetNative(frame_ref_storage_buffer), sizeof(av_frame_ref));
> > +    av_frame_free(&av_frame_ref);
> > +
> > +frame_ref_storage_buffer->pVtbl->Release(frame_ref_storage_buffer);
> > +}
> >
> >  int ff_amf_send_frame(AVCodecContext *avctx, const AVFrame *frame)
> {
> > @@ -484,6 +555,7 @@ int ff_amf_send_frame(AVCodecContext *avctx,
> const AVFrame *frame)
> >              (ctx->hw_device_ctx && ((AVHWFramesContext*)frame-
> >hw_frames_ctx->data)->device_ctx ==
> >              (AVHWDeviceContext*)ctx->hw_device_ctx->data)
> >          )) {
> > +            AMFBuffer *frame_ref_storage_buffer;
> >  #if CONFIG_D3D11VA
> >              static const GUID AMFTextureArrayIndexGUID = { 0x28115527,
> 0xe7c3, 0x4b66, { 0x99, 0xd3, 0x4f, 0x2a, 0xe6, 0xb4, 0x7f, 0xaf } };
> >              ID3D11Texture2D *texture =
> > (ID3D11Texture2D*)frame->data[0]; // actual texture @@ -496,6 +568,17
> @@ int ff_amf_send_frame(AVCodecContext *avctx, const AVFrame
> *frame)
> >              // input HW surfaces can be vertically aligned by 16; tell AMF the real
> size
> >              surface->pVtbl->SetCrop(surface, 0, 0, frame->width,
> > frame->height);  #endif
> > +
> > +            frame_ref_storage_buffer =
> amf_create_buffer_with_frame_ref(frame, ctx->context);
> > +            AMF_RETURN_IF_FALSE(ctx, frame_ref_storage_buffer !=
> > + NULL, AVERROR(ENOMEM), "create_buffer_with_frame_ref() returned
> > + NULL\n");
> > +
> > +            AMF_AV_ASSIGN_PROPERTY_INTERFACE(res, surface,
> L"av_frame_ref", frame_ref_storage_buffer);
> > +            if (res != AMF_OK) {
> > +                surface->pVtbl->Release(surface);
> > +            }
> > +            ctx->hwsurfaces_in_queue++;
> > +            frame_ref_storage_buffer->pVtbl-
> >Release(frame_ref_storage_buffer);
> > +            AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
> AVERROR(ENOMEM),
> > + "SetProperty failed for \"av_frame_ref\" with error %d\n", res);
> >          } else {
> >              res = ctx->context->pVtbl->AllocSurface(ctx->context,
> AMF_MEMORY_HOST, ctx->format, avctx->width, avctx->height, &surface);
> >              AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR(ENOMEM),
> > "AllocSurface() failed  with error %d\n", res); @@ -560,6 +643,18 @@ int
> ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
> >              ret = amf_copy_buffer(avctx, avpkt, buffer);
> >
> >              buffer->pVtbl->Release(buffer);
> > +
> > +            if (data->pVtbl->HasProperty(data, L"av_frame_ref")) {
> > +                AMFBuffer *frame_ref_storage_buffer;
> > +                AMF_AV_GET_PROPERTY_INTERFACE(res, data,
> L"av_frame_ref", AMFBuffer, frame_ref_storage_buffer);
> > +                if (res == AMF_OK) {
> > +
> amf_release_buffer_with_frame_ref(frame_ref_storage_buffer);
> > +                    ctx->hwsurfaces_in_queue--;
> > +                } else {
> > +                    av_log(avctx, AV_LOG_WARNING, "GetProperty failed
> > + for \"av_frame_ref\" with error %d\n", res);
> 
> Can you get this warning in normal use?  It seems like it should be fatal, since
> a frame reference has been completely lost.
> 
> > +                }
> > +            }
> > +
> >              data->pVtbl->Release(data);
> >
> >              AMF_RETURN_IF_FALSE(ctx, ret >= 0, ret,
> > "amf_copy_buffer() failed with error %d\n", ret); @@ -589,7 +684,7 @@
> int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
> >                      av_log(avctx, AV_LOG_WARNING, "Data acquired but delayed
> drain submission got AMF_INPUT_FULL- should not happen\n");
> >                  }
> >              }
> > -        } else if (ctx->delayed_surface != NULL || ctx->delayed_drain || (ctx-
> >eof && res_query != AMF_EOF)) {
> > +        } else if (ctx->delayed_surface != NULL || ctx->delayed_drain
> > + || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue
> > + >= ctx->hwsurfaces_in_queue_max)) {
> >              block_and_wait = 1;
> >              av_usleep(1000); // wait and poll again
> >          }
> > diff --git a/libavcodec/amfenc.h b/libavcodec/amfenc.h index
> > 84f0aad2fa..b1361842bd 100644
> > --- a/libavcodec/amfenc.h
> > +++ b/libavcodec/amfenc.h
> > @@ -62,6 +62,9 @@ typedef struct AmfContext {
> >      AVBufferRef        *hw_device_ctx; ///< pointer to HW accelerator
> (decoder)
> >      AVBufferRef        *hw_frames_ctx; ///< pointer to HW accelerator (frame
> allocator)
> >
> > +    int                 hwsurfaces_in_queue;
> > +    int                 hwsurfaces_in_queue_max;
> > +
> >      // helpers to handle async calls
> >      int                 delayed_drain;
> >      AMFSurface         *delayed_surface;
> >
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel



Hi Mark, thanks for your feedback
I just tried to propose solution which solves the problem in many cases (as short term solution to have this patch simple) 

What could you recommend for this fix (patch)?
1) Check if "extra_hw_frames" value is set and enough for encoding, then use incoming frames as input for decoder, otherwise use copy routine 
2) Check if "extra_hw_frames" value is set and enough, otherwise show error
3) Try to find decoder/encoder negotiation solution
4) ?

> These macros are only expanded once each and with a single type.  Do you
> intend to use them again in future patches with different types?  If not, it
> might be easier not to have them at all, or turn them into functions.
I will change them to functions, no problem. In near future macroses like this will appear in AMF itself, so I will simplify this later in separate patch.

> Can you get this warning in normal use?  It seems like it should be fatal, since
> a frame reference has been completely lost.
Agree, will be fixed
Mark Thompson April 8, 2018, 8:13 p.m.
On 08/04/18 20:13, Alexander Kravchenko wrote:
> 
> 
>> -----Original Message-----
>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf
>> Of Mark Thompson
>> Sent: Sunday, April 8, 2018 8:25 PM
>> To: ffmpeg-devel@ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH] lavc/amfenc: Retain a reference to D3D
>> frames used as input during the encoding process
>>
>> On 06/04/18 11:25, Alexander Kravchenko wrote:
>>>>
>>>> This breaks the testcase described in
>>>> https://trac.ffmpeg.org/ticket/6990 which is basically the same as
>>>> the one you described in this patch.
>>>>
>>>> I get the following spammed repeatedly:
>>>>
>>>> [AVHWFramesContext @ 000000000502d340] Static surface pool size
>> exceeded.
>>>> [mpeg2video @ 0000000002f8d400] get_buffer() failed [mpeg2video @
>>>> 0000000002f8d400] thread_get_buffer() failed [mpeg2video @
>>>> 0000000002f8d400] get_buffer() failed (-12 0000000000000000) Error
>>>> while decoding stream #0:1: Operation not permitted
>>>
>>> Hi, could you please review the following updated patch I added queue
>>> limit according initial pool size of incoming frame context.
>>> This solves the problem running this pipeline without -extra_hw_frames 16
>> option, but -extra_hw_frames option can be used to have bigger queue for
>> encoder.
>>
>> I think you've misunderstood /why/ the decoder has the pool size allocation
>> that it does.  The decoder expects to use all of the surfaces it has allocated in
>> the worst case - the difference between MPEG-2 and H.264 is that MPEG-2
>> can store at most two reference frames (the forward and backward
>> references for B-frames), while H.264 can store up to sixteen.  Most H.264
>> streams don't use all sixteen references, but in theory they could (excepting
>> level restrictions, but they are generally quite iffy) so the decoder allocates
>> space for all of those references even if they aren't used.
>>
>> I can believe that this patch happens to work if you have a simple stream
>> with limited references (streams rarely use more than two or three), but it
>> will certainly fail exactly as before for complex streams.
>>
>> If you want to hold onto more than one frame in the encoder then currently
>> you need to use the -extra_hw_frames option on the source (whether
>> decoder or filter) - that is exactly what it's there for.  Some sort of automatic
>> negotiation is suggested (there was some discussion on libav-devel a while
>> ago), but the requirement that it works through libavfilter is a difficult one
>> with the current structure so nothing concrete is yet proposed.  (That was
>> mostly considering libmfx, where it's even more of a problem because the
>> lookahead options can make the encoder queue over a hundred frames
>> internally.)
>>
>>> ---
>>>  libavcodec/amfenc.c | 97
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>  libavcodec/amfenc.h |  3 ++
>>>  2 files changed, 99 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c index
>>> 89a10ff253..eb7b20c252 100644
>>> --- a/libavcodec/amfenc.c
>>> +++ b/libavcodec/amfenc.c
>>> @@ -157,6 +157,9 @@ static int amf_init_context(AVCodecContext *avctx)
>>>      AmfContext         *ctx = avctx->priv_data;
>>>      AMF_RESULT          res = AMF_OK;
>>>
>>> +    ctx->hwsurfaces_in_queue = 0;
>>> +    ctx->hwsurfaces_in_queue_max = 16;
>>> +
>>>      // configure AMF logger
>>>      // the return of these functions indicates old state and do not affect
>> behaviour
>>>      ctx->trace->pVtbl->EnableWriter(ctx->trace,
>>> AMF_TRACE_WRITER_DEBUG_OUTPUT, ctx->log_to_dbg != 0 ); @@ -187,6
>> +190,7 @@ static int amf_init_context(AVCodecContext *avctx)
>>>                          if (!ctx->hw_frames_ctx) {
>>>                              return AVERROR(ENOMEM);
>>>                          }
>>> +                        ctx->hwsurfaces_in_queue_max =
>>> + device_ctx->initial_pool_size - 1;
>>>                      } else {
>>>                          if(res == AMF_NOT_SUPPORTED)
>>>                              av_log(avctx, AV_LOG_INFO,
>>> "avctx->hw_frames_ctx has D3D11 device which doesn't have D3D11VA
>> interface, switching to default\n"); @@ -443,6 +447,73 @@ int
>> ff_amf_encode_init(AVCodecContext *avctx)
>>>      return ret;
>>>  }
>>>
>>> +#define AMF_AV_QUERY_INTERFACE(res, from, InterfaceTypeTo, to) \
>>> +    { \
>>> +        AMFGuid guid_##InterfaceTypeTo = IID_##InterfaceTypeTo(); \
>>> +        res = from->pVtbl->QueryInterface(from, &guid_##InterfaceTypeTo,
>> (void**)&to); \
>>> +    }
>>> +
>>> +#define AMF_AV_ASSIGN_PROPERTY_INTERFACE(res, pThis, name, val) \
>>> +    { \
>>> +        AMFInterface *amf_interface; \
>>> +        AMFVariantStruct var; \
>>> +        res = AMFVariantInit(&var); \
>>> +        if (res == AMF_OK) { \
>>> +            AMF_AV_QUERY_INTERFACE(res, val, AMFInterface,
>> amf_interface)\
>>> +            if (res == AMF_OK) { \
>>> +                res = AMFVariantAssignInterface(&var, amf_interface); \
>>> +                amf_interface->pVtbl->Release(amf_interface); \
>>> +            } \
>>> +            if (res == AMF_OK) { \
>>> +                res = pThis->pVtbl->SetProperty(pThis, name, var); \
>>> +            } \
>>> +            AMFVariantClear(&var); \
>>> +        }\
>>> +    }
>>> +
>>> +#define AMF_AV_GET_PROPERTY_INTERFACE(res, pThis, name,
>> TargetType, val) \
>>> +    { \
>>> +        AMFVariantStruct var; \
>>> +        res = AMFVariantInit(&var); \
>>> +        if (res == AMF_OK) { \
>>> +            res = pThis->pVtbl->GetProperty(pThis, name, &var); \
>>> +            if (res == AMF_OK) { \
>>> +                if (var.type == AMF_VARIANT_INTERFACE &&
>> AMFVariantInterface(&var)) { \
>>> +                    AMF_AV_QUERY_INTERFACE(res, AMFVariantInterface(&var),
>> TargetType, val); \
>>> +                } else { \
>>> +                    res = AMF_INVALID_DATA_TYPE; \
>>> +                } \
>>> +            } \
>>> +            AMFVariantClear(&var); \
>>> +        }\
>>> +    }
>>
>> These macros are only expanded once each and with a single type.  Do you
>> intend to use them again in future patches with different types?  If not, it
>> might be easier not to have them at all, or turn them into functions.
>>
>>> +
>>> +static AMFBuffer* amf_create_buffer_with_frame_ref(const AVFrame*
>>> +frame, AMFContext *context) {
>>> +    AVFrame *frame_ref;
>>> +    AMFBuffer *frame_ref_storage_buffer = NULL;
>>> +    AMF_RESULT res;
>>> +
>>> +    res = context->pVtbl->AllocBuffer(context, AMF_MEMORY_HOST,
>> sizeof(frame_ref), &frame_ref_storage_buffer);
>>> +    if (res == AMF_OK) {
>>> +        frame_ref = av_frame_clone(frame);
>>> +        if (frame_ref) {
>>> +            memcpy(frame_ref_storage_buffer->pVtbl-
>>> GetNative(frame_ref_storage_buffer), &frame_ref, sizeof(frame_ref));
>>> +        } else {
>>> +            frame_ref_storage_buffer->pVtbl-
>>> Release(frame_ref_storage_buffer);
>>> +            frame_ref_storage_buffer = NULL;
>>> +        }
>>> +    }
>>> +    return frame_ref_storage_buffer;
>>> +}
>>> +
>>> +static void amf_release_buffer_with_frame_ref(AMFBuffer
>>> +*frame_ref_storage_buffer) {
>>> +    AVFrame *av_frame_ref;
>>> +    memcpy(&av_frame_ref, frame_ref_storage_buffer->pVtbl-
>>> GetNative(frame_ref_storage_buffer), sizeof(av_frame_ref));
>>> +    av_frame_free(&av_frame_ref);
>>> +
>>> +frame_ref_storage_buffer->pVtbl->Release(frame_ref_storage_buffer);
>>> +}
>>>
>>>  int ff_amf_send_frame(AVCodecContext *avctx, const AVFrame *frame)
>> {
>>> @@ -484,6 +555,7 @@ int ff_amf_send_frame(AVCodecContext *avctx,
>> const AVFrame *frame)
>>>              (ctx->hw_device_ctx && ((AVHWFramesContext*)frame-
>>> hw_frames_ctx->data)->device_ctx ==
>>>              (AVHWDeviceContext*)ctx->hw_device_ctx->data)
>>>          )) {
>>> +            AMFBuffer *frame_ref_storage_buffer;
>>>  #if CONFIG_D3D11VA
>>>              static const GUID AMFTextureArrayIndexGUID = { 0x28115527,
>> 0xe7c3, 0x4b66, { 0x99, 0xd3, 0x4f, 0x2a, 0xe6, 0xb4, 0x7f, 0xaf } };
>>>              ID3D11Texture2D *texture =
>>> (ID3D11Texture2D*)frame->data[0]; // actual texture @@ -496,6 +568,17
>> @@ int ff_amf_send_frame(AVCodecContext *avctx, const AVFrame
>> *frame)
>>>              // input HW surfaces can be vertically aligned by 16; tell AMF the real
>> size
>>>              surface->pVtbl->SetCrop(surface, 0, 0, frame->width,
>>> frame->height);  #endif
>>> +
>>> +            frame_ref_storage_buffer =
>> amf_create_buffer_with_frame_ref(frame, ctx->context);
>>> +            AMF_RETURN_IF_FALSE(ctx, frame_ref_storage_buffer !=
>>> + NULL, AVERROR(ENOMEM), "create_buffer_with_frame_ref() returned
>>> + NULL\n");
>>> +
>>> +            AMF_AV_ASSIGN_PROPERTY_INTERFACE(res, surface,
>> L"av_frame_ref", frame_ref_storage_buffer);
>>> +            if (res != AMF_OK) {
>>> +                surface->pVtbl->Release(surface);
>>> +            }
>>> +            ctx->hwsurfaces_in_queue++;
>>> +            frame_ref_storage_buffer->pVtbl-
>>> Release(frame_ref_storage_buffer);
>>> +            AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
>> AVERROR(ENOMEM),
>>> + "SetProperty failed for \"av_frame_ref\" with error %d\n", res);
>>>          } else {
>>>              res = ctx->context->pVtbl->AllocSurface(ctx->context,
>> AMF_MEMORY_HOST, ctx->format, avctx->width, avctx->height, &surface);
>>>              AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR(ENOMEM),
>>> "AllocSurface() failed  with error %d\n", res); @@ -560,6 +643,18 @@ int
>> ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
>>>              ret = amf_copy_buffer(avctx, avpkt, buffer);
>>>
>>>              buffer->pVtbl->Release(buffer);
>>> +
>>> +            if (data->pVtbl->HasProperty(data, L"av_frame_ref")) {
>>> +                AMFBuffer *frame_ref_storage_buffer;
>>> +                AMF_AV_GET_PROPERTY_INTERFACE(res, data,
>> L"av_frame_ref", AMFBuffer, frame_ref_storage_buffer);
>>> +                if (res == AMF_OK) {
>>> +
>> amf_release_buffer_with_frame_ref(frame_ref_storage_buffer);
>>> +                    ctx->hwsurfaces_in_queue--;
>>> +                } else {
>>> +                    av_log(avctx, AV_LOG_WARNING, "GetProperty failed
>>> + for \"av_frame_ref\" with error %d\n", res);
>>
>> Can you get this warning in normal use?  It seems like it should be fatal, since
>> a frame reference has been completely lost.
>>
>>> +                }
>>> +            }
>>> +
>>>              data->pVtbl->Release(data);
>>>
>>>              AMF_RETURN_IF_FALSE(ctx, ret >= 0, ret,
>>> "amf_copy_buffer() failed with error %d\n", ret); @@ -589,7 +684,7 @@
>> int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
>>>                      av_log(avctx, AV_LOG_WARNING, "Data acquired but delayed
>> drain submission got AMF_INPUT_FULL- should not happen\n");
>>>                  }
>>>              }
>>> -        } else if (ctx->delayed_surface != NULL || ctx->delayed_drain || (ctx-
>>> eof && res_query != AMF_EOF)) {
>>> +        } else if (ctx->delayed_surface != NULL || ctx->delayed_drain
>>> + || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue
>>> + >= ctx->hwsurfaces_in_queue_max)) {
>>>              block_and_wait = 1;
>>>              av_usleep(1000); // wait and poll again
>>>          }
>>> diff --git a/libavcodec/amfenc.h b/libavcodec/amfenc.h index
>>> 84f0aad2fa..b1361842bd 100644
>>> --- a/libavcodec/amfenc.h
>>> +++ b/libavcodec/amfenc.h
>>> @@ -62,6 +62,9 @@ typedef struct AmfContext {
>>>      AVBufferRef        *hw_device_ctx; ///< pointer to HW accelerator
>> (decoder)
>>>      AVBufferRef        *hw_frames_ctx; ///< pointer to HW accelerator (frame
>> allocator)
>>>
>>> +    int                 hwsurfaces_in_queue;
>>> +    int                 hwsurfaces_in_queue_max;
>>> +
>>>      // helpers to handle async calls
>>>      int                 delayed_drain;
>>>      AMFSurface         *delayed_surface;
>>>
> 
> 
> Hi Mark, thanks for your feedback
> I just tried to propose solution which solves the problem in many cases (as short term solution to have this patch simple) 
> 
> What could you recommend for this fix (patch)?
> 1) Check if "extra_hw_frames" value is set and enough for encoding, then use incoming frames as input for decoder, otherwise use copy routine 
> 2) Check if "extra_hw_frames" value is set and enough, otherwise show error

These values are not directly visible to the encoder, nor is it feasible in general to infer anything about what else might be using the same pool.

Consider, for example:

ffmpeg -hwaccel d3d11va -hwaccel_output_format d3d11 -i input.mkv -an -filter_complex '[0:v]split=2[out1][out2]' -map '[out1]' -c:v h264_amf -b:v 2M out1.mkv -map '[out2]' -c:v h264_amf -b:v 10M out2.mkv

> 3) Try to find decoder/encoder negotiation solution

Ideally yes, but this is not something which can easily be solved (will require new API and nontrivial changes to link negotiation inside libavfilter), nor is it really within the scope of this fix.

So:

> 4) ?

4) Do nothing.

IMO this is the right answer here for now - it is a known problem which affects other encoders as well, with a straightforward solution from the user point of view (if not a completely obvious one).

Maybe there could be a note to the AMF encoder documentation giving a hint of the problem and how to solve it for the user, though since such documentation doesn't exist that won't block this patch.

>> These macros are only expanded once each and with a single type.  Do you
>> intend to use them again in future patches with different types?  If not, it
>> might be easier not to have them at all, or turn them into functions.
> I will change them to functions, no problem. In near future macroses like this will appear in AMF itself, so I will simplify this later in separate patch.

Sounds good.

>> Can you get this warning in normal use?  It seems like it should be fatal, since
>> a frame reference has been completely lost.
> Agree, will be fixed

Ok.  If you want to clean these bits up then I'd be happy to apply it.

Thanks,

- Mark

Patch hide | download patch | download mbox

diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
index 89a10ff253..eb7b20c252 100644
--- a/libavcodec/amfenc.c
+++ b/libavcodec/amfenc.c
@@ -157,6 +157,9 @@  static int amf_init_context(AVCodecContext *avctx)
     AmfContext         *ctx = avctx->priv_data;
     AMF_RESULT          res = AMF_OK;
 
+    ctx->hwsurfaces_in_queue = 0;
+    ctx->hwsurfaces_in_queue_max = 16;
+
     // configure AMF logger
     // the return of these functions indicates old state and do not affect behaviour
     ctx->trace->pVtbl->EnableWriter(ctx->trace, AMF_TRACE_WRITER_DEBUG_OUTPUT, ctx->log_to_dbg != 0 );
@@ -187,6 +190,7 @@  static int amf_init_context(AVCodecContext *avctx)
                         if (!ctx->hw_frames_ctx) {
                             return AVERROR(ENOMEM);
                         }
+                        ctx->hwsurfaces_in_queue_max = device_ctx->initial_pool_size - 1;
                     } else {
                         if(res == AMF_NOT_SUPPORTED)
                             av_log(avctx, AV_LOG_INFO, "avctx->hw_frames_ctx has D3D11 device which doesn't have D3D11VA interface, switching to default\n");
@@ -443,6 +447,73 @@  int ff_amf_encode_init(AVCodecContext *avctx)
     return ret;
 }
 
+#define AMF_AV_QUERY_INTERFACE(res, from, InterfaceTypeTo, to) \
+    { \
+        AMFGuid guid_##InterfaceTypeTo = IID_##InterfaceTypeTo(); \
+        res = from->pVtbl->QueryInterface(from, &guid_##InterfaceTypeTo, (void**)&to); \
+    }
+
+#define AMF_AV_ASSIGN_PROPERTY_INTERFACE(res, pThis, name, val) \
+    { \
+        AMFInterface *amf_interface; \
+        AMFVariantStruct var; \
+        res = AMFVariantInit(&var); \
+        if (res == AMF_OK) { \
+            AMF_AV_QUERY_INTERFACE(res, val, AMFInterface, amf_interface)\
+            if (res == AMF_OK) { \
+                res = AMFVariantAssignInterface(&var, amf_interface); \
+                amf_interface->pVtbl->Release(amf_interface); \
+            } \
+            if (res == AMF_OK) { \
+                res = pThis->pVtbl->SetProperty(pThis, name, var); \
+            } \
+            AMFVariantClear(&var); \
+        }\
+    }
+
+#define AMF_AV_GET_PROPERTY_INTERFACE(res, pThis, name, TargetType, val) \
+    { \
+        AMFVariantStruct var; \
+        res = AMFVariantInit(&var); \
+        if (res == AMF_OK) { \
+            res = pThis->pVtbl->GetProperty(pThis, name, &var); \
+            if (res == AMF_OK) { \
+                if (var.type == AMF_VARIANT_INTERFACE && AMFVariantInterface(&var)) { \
+                    AMF_AV_QUERY_INTERFACE(res, AMFVariantInterface(&var), TargetType, val); \
+                } else { \
+                    res = AMF_INVALID_DATA_TYPE; \
+                } \
+            } \
+            AMFVariantClear(&var); \
+        }\
+    }
+
+static AMFBuffer* amf_create_buffer_with_frame_ref(const AVFrame* frame, AMFContext *context)
+{
+    AVFrame *frame_ref;
+    AMFBuffer *frame_ref_storage_buffer = NULL;
+    AMF_RESULT res;
+
+    res = context->pVtbl->AllocBuffer(context, AMF_MEMORY_HOST, sizeof(frame_ref), &frame_ref_storage_buffer);
+    if (res == AMF_OK) {
+        frame_ref = av_frame_clone(frame);
+        if (frame_ref) {
+            memcpy(frame_ref_storage_buffer->pVtbl->GetNative(frame_ref_storage_buffer), &frame_ref, sizeof(frame_ref));
+        } else {
+            frame_ref_storage_buffer->pVtbl->Release(frame_ref_storage_buffer);
+            frame_ref_storage_buffer = NULL;
+        }
+    }
+    return frame_ref_storage_buffer;
+}
+
+static void amf_release_buffer_with_frame_ref(AMFBuffer *frame_ref_storage_buffer)
+{
+    AVFrame *av_frame_ref;
+    memcpy(&av_frame_ref, frame_ref_storage_buffer->pVtbl->GetNative(frame_ref_storage_buffer), sizeof(av_frame_ref));
+    av_frame_free(&av_frame_ref);
+    frame_ref_storage_buffer->pVtbl->Release(frame_ref_storage_buffer);
+} 
 
 int ff_amf_send_frame(AVCodecContext *avctx, const AVFrame *frame)
 {
@@ -484,6 +555,7 @@  int ff_amf_send_frame(AVCodecContext *avctx, const AVFrame *frame)
             (ctx->hw_device_ctx && ((AVHWFramesContext*)frame->hw_frames_ctx->data)->device_ctx ==
             (AVHWDeviceContext*)ctx->hw_device_ctx->data)
         )) {
+            AMFBuffer *frame_ref_storage_buffer;
 #if CONFIG_D3D11VA
             static const GUID AMFTextureArrayIndexGUID = { 0x28115527, 0xe7c3, 0x4b66, { 0x99, 0xd3, 0x4f, 0x2a, 0xe6, 0xb4, 0x7f, 0xaf } };
             ID3D11Texture2D *texture = (ID3D11Texture2D*)frame->data[0]; // actual texture
@@ -496,6 +568,17 @@  int ff_amf_send_frame(AVCodecContext *avctx, const AVFrame *frame)
             // input HW surfaces can be vertically aligned by 16; tell AMF the real size
             surface->pVtbl->SetCrop(surface, 0, 0, frame->width, frame->height);
 #endif
+
+            frame_ref_storage_buffer = amf_create_buffer_with_frame_ref(frame, ctx->context);
+            AMF_RETURN_IF_FALSE(ctx, frame_ref_storage_buffer != NULL, AVERROR(ENOMEM), "create_buffer_with_frame_ref() returned NULL\n");
+
+            AMF_AV_ASSIGN_PROPERTY_INTERFACE(res, surface, L"av_frame_ref", frame_ref_storage_buffer);
+            if (res != AMF_OK) {
+                surface->pVtbl->Release(surface);
+            }
+            ctx->hwsurfaces_in_queue++;
+            frame_ref_storage_buffer->pVtbl->Release(frame_ref_storage_buffer);
+            AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR(ENOMEM), "SetProperty failed for \"av_frame_ref\" with error %d\n", res);
         } else {
             res = ctx->context->pVtbl->AllocSurface(ctx->context, AMF_MEMORY_HOST, ctx->format, avctx->width, avctx->height, &surface);
             AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR(ENOMEM), "AllocSurface() failed  with error %d\n", res);
@@ -560,6 +643,18 @@  int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
             ret = amf_copy_buffer(avctx, avpkt, buffer);
 
             buffer->pVtbl->Release(buffer);
+
+            if (data->pVtbl->HasProperty(data, L"av_frame_ref")) {
+                AMFBuffer *frame_ref_storage_buffer;
+                AMF_AV_GET_PROPERTY_INTERFACE(res, data, L"av_frame_ref", AMFBuffer, frame_ref_storage_buffer);
+                if (res == AMF_OK) {
+                    amf_release_buffer_with_frame_ref(frame_ref_storage_buffer);
+                    ctx->hwsurfaces_in_queue--;
+                } else {
+                    av_log(avctx, AV_LOG_WARNING, "GetProperty failed for \"av_frame_ref\" with error %d\n", res);
+                }
+            }
+
             data->pVtbl->Release(data);
 
             AMF_RETURN_IF_FALSE(ctx, ret >= 0, ret, "amf_copy_buffer() failed with error %d\n", ret);
@@ -589,7 +684,7 @@  int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
                     av_log(avctx, AV_LOG_WARNING, "Data acquired but delayed drain submission got AMF_INPUT_FULL- should not happen\n");
                 }
             }
-        } else if (ctx->delayed_surface != NULL || ctx->delayed_drain || (ctx->eof && res_query != AMF_EOF)) {
+        } else if (ctx->delayed_surface != NULL || ctx->delayed_drain || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= ctx->hwsurfaces_in_queue_max)) {
             block_and_wait = 1;
             av_usleep(1000); // wait and poll again
         }
diff --git a/libavcodec/amfenc.h b/libavcodec/amfenc.h
index 84f0aad2fa..b1361842bd 100644
--- a/libavcodec/amfenc.h
+++ b/libavcodec/amfenc.h
@@ -62,6 +62,9 @@  typedef struct AmfContext {
     AVBufferRef        *hw_device_ctx; ///< pointer to HW accelerator (decoder)
     AVBufferRef        *hw_frames_ctx; ///< pointer to HW accelerator (frame allocator)
 
+    int                 hwsurfaces_in_queue;
+    int                 hwsurfaces_in_queue_max;
+
     // helpers to handle async calls
     int                 delayed_drain;
     AMFSurface         *delayed_surface;