diff mbox series

[FFmpeg-devel,v11,07/14] avcodec/vaapi_encode: extract the init and close function to base layer

Message ID 20240525103022.1390-7-tong1.wu@intel.com
State New
Headers show
Series [FFmpeg-devel,v11,01/14] avcodec/vaapi_encode: introduce a base layer for vaapi encode | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Wu, Tong1 May 25, 2024, 10:30 a.m. UTC
From: Tong Wu <tong1.wu@intel.com>

Related parameters such as device context, frame context are also moved
to base layer.

Signed-off-by: Tong Wu <tong1.wu@intel.com>
---
 libavcodec/hw_base_encode.c     | 49 ++++++++++++++++++
 libavcodec/hw_base_encode.h     | 17 +++++++
 libavcodec/vaapi_encode.c       | 90 +++++++++++----------------------
 libavcodec/vaapi_encode.h       | 10 ----
 libavcodec/vaapi_encode_av1.c   |  2 +-
 libavcodec/vaapi_encode_h264.c  |  2 +-
 libavcodec/vaapi_encode_h265.c  |  2 +-
 libavcodec/vaapi_encode_mjpeg.c |  6 ++-
 8 files changed, 102 insertions(+), 76 deletions(-)

Comments

Lynne May 25, 2024, 1:07 p.m. UTC | #1
On 25/05/2024 12:30, tong1.wu-at-intel.com@ffmpeg.org wrote:
> From: Tong Wu <tong1.wu@intel.com>
> 
> Related parameters such as device context, frame context are also moved
> to base layer.
> 
> Signed-off-by: Tong Wu <tong1.wu@intel.com>
> ---
>   libavcodec/hw_base_encode.c     | 49 ++++++++++++++++++
>   libavcodec/hw_base_encode.h     | 17 +++++++
>   libavcodec/vaapi_encode.c       | 90 +++++++++++----------------------
>   libavcodec/vaapi_encode.h       | 10 ----
>   libavcodec/vaapi_encode_av1.c   |  2 +-
>   libavcodec/vaapi_encode_h264.c  |  2 +-
>   libavcodec/vaapi_encode_h265.c  |  2 +-
>   libavcodec/vaapi_encode_mjpeg.c |  6 ++-
>   8 files changed, 102 insertions(+), 76 deletions(-)
> 
> diff --git a/libavcodec/hw_base_encode.c b/libavcodec/hw_base_encode.c
> index 16afaa37be..c4789380b6 100644
> --- a/libavcodec/hw_base_encode.c
> +++ b/libavcodec/hw_base_encode.c
> @@ -592,3 +592,52 @@ end:
>   
>       return 0;
>   }
> +
> +int ff_hw_base_encode_init(AVCodecContext *avctx)
> +{
> +    FFHWBaseEncodeContext *ctx = avctx->priv_data;

This is the issue I was talking about, this requires that 
FFHWBaseEncodeContext is always the main context.

Could you change it so everything takes FFHWBaseEncodeContext as an 
argument, rather than AVCodecContext (apart from where the function 
absolutely must read some data from it)?
Sean McGovern May 25, 2024, 2:18 p.m. UTC | #2
Hi,


On Sat, May 25, 2024, 09:07 Lynne via ffmpeg-devel <ffmpeg-devel@ffmpeg.org>
wrote:

> On 25/05/2024 12:30, tong1.wu-at-intel.com@ffmpeg.org wrote:
> > From: Tong Wu <tong1.wu@intel.com>
> >
> > Related parameters such as device context, frame context are also moved
> > to base layer.
> >
> > Signed-off-by: Tong Wu <tong1.wu@intel.com>
> > ---
> >   libavcodec/hw_base_encode.c     | 49 ++++++++++++++++++
> >   libavcodec/hw_base_encode.h     | 17 +++++++
> >   libavcodec/vaapi_encode.c       | 90 +++++++++++----------------------
> >   libavcodec/vaapi_encode.h       | 10 ----
> >   libavcodec/vaapi_encode_av1.c   |  2 +-
> >   libavcodec/vaapi_encode_h264.c  |  2 +-
> >   libavcodec/vaapi_encode_h265.c  |  2 +-
> >   libavcodec/vaapi_encode_mjpeg.c |  6 ++-
> >   8 files changed, 102 insertions(+), 76 deletions(-)
> >
> > diff --git a/libavcodec/hw_base_encode.c b/libavcodec/hw_base_encode.c
> > index 16afaa37be..c4789380b6 100644
> > --- a/libavcodec/hw_base_encode.c
> > +++ b/libavcodec/hw_base_encode.c
> > @@ -592,3 +592,52 @@ end:
> >
> >       return 0;
> >   }
> > +
> > +int ff_hw_base_encode_init(AVCodecContext *avctx)
> > +{
> > +    FFHWBaseEncodeContext *ctx = avctx->priv_data;
>
> This is the issue I was talking about, this requires that
> FFHWBaseEncodeContext is always the main context.
>
> Could you change it so everything takes FFHWBaseEncodeContext as an
> argument, rather than AVCodecContext (apart from where the function
> absolutely must read some data from it)?
>

Might this suggestion involve having to do some ugly down-casting?

-- Sean McGovern

>
Lynne May 25, 2024, 2:25 p.m. UTC | #3
On 25/05/2024 16:18, Sean McGovern wrote:
> Hi,
> 
> 
> On Sat, May 25, 2024, 09:07 Lynne via ffmpeg-devel <ffmpeg-devel@ffmpeg.org>
> wrote:
> 
>> On 25/05/2024 12:30, tong1.wu-at-intel.com@ffmpeg.org wrote:
>>> From: Tong Wu <tong1.wu@intel.com>
>>>
>>> Related parameters such as device context, frame context are also moved
>>> to base layer.
>>>
>>> Signed-off-by: Tong Wu <tong1.wu@intel.com>
>>> ---
>>>    libavcodec/hw_base_encode.c     | 49 ++++++++++++++++++
>>>    libavcodec/hw_base_encode.h     | 17 +++++++
>>>    libavcodec/vaapi_encode.c       | 90 +++++++++++----------------------
>>>    libavcodec/vaapi_encode.h       | 10 ----
>>>    libavcodec/vaapi_encode_av1.c   |  2 +-
>>>    libavcodec/vaapi_encode_h264.c  |  2 +-
>>>    libavcodec/vaapi_encode_h265.c  |  2 +-
>>>    libavcodec/vaapi_encode_mjpeg.c |  6 ++-
>>>    8 files changed, 102 insertions(+), 76 deletions(-)
>>>
>>> diff --git a/libavcodec/hw_base_encode.c b/libavcodec/hw_base_encode.c
>>> index 16afaa37be..c4789380b6 100644
>>> --- a/libavcodec/hw_base_encode.c
>>> +++ b/libavcodec/hw_base_encode.c
>>> @@ -592,3 +592,52 @@ end:
>>>
>>>        return 0;
>>>    }
>>> +
>>> +int ff_hw_base_encode_init(AVCodecContext *avctx)
>>> +{
>>> +    FFHWBaseEncodeContext *ctx = avctx->priv_data;
>>
>> This is the issue I was talking about, this requires that
>> FFHWBaseEncodeContext is always the main context.
>>
>> Could you change it so everything takes FFHWBaseEncodeContext as an
>> argument, rather than AVCodecContext (apart from where the function
>> absolutely must read some data from it)?
>>
> 
> Might this suggestion involve having to do some ugly down-casting?

Not at all. Instead of having this context as the main context, just 
make each encoder have its own context, where this structure is the only 
element.
Vulkan requires extra state, such as an execution context, function 
pointers from the loader, memory the encoder needs, which, rather than 
adding to this structure, would be better off being in its own context.
Wu, Tong1 May 27, 2024, 12:35 a.m. UTC | #4
>From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Lynne
>via ffmpeg-devel
>Sent: Saturday, May 25, 2024 10:07 PM
>To: ffmpeg-devel@ffmpeg.org
>Cc: Lynne <dev@lynne.ee>
>Subject: Re: [FFmpeg-devel] [PATCH v11 07/14] avcodec/vaapi_encode: extract
>the init and close function to base layer
>
>On 25/05/2024 12:30, tong1.wu-at-intel.com@ffmpeg.org wrote:
>> From: Tong Wu <tong1.wu@intel.com>
>>
>> Related parameters such as device context, frame context are also moved
>> to base layer.
>>
>> Signed-off-by: Tong Wu <tong1.wu@intel.com>
>> ---
>>   libavcodec/hw_base_encode.c     | 49 ++++++++++++++++++
>>   libavcodec/hw_base_encode.h     | 17 +++++++
>>   libavcodec/vaapi_encode.c       | 90 +++++++++++----------------------
>>   libavcodec/vaapi_encode.h       | 10 ----
>>   libavcodec/vaapi_encode_av1.c   |  2 +-
>>   libavcodec/vaapi_encode_h264.c  |  2 +-
>>   libavcodec/vaapi_encode_h265.c  |  2 +-
>>   libavcodec/vaapi_encode_mjpeg.c |  6 ++-
>>   8 files changed, 102 insertions(+), 76 deletions(-)
>>
>> diff --git a/libavcodec/hw_base_encode.c b/libavcodec/hw_base_encode.c
>> index 16afaa37be..c4789380b6 100644
>> --- a/libavcodec/hw_base_encode.c
>> +++ b/libavcodec/hw_base_encode.c
>> @@ -592,3 +592,52 @@ end:
>>
>>       return 0;
>>   }
>> +
>> +int ff_hw_base_encode_init(AVCodecContext *avctx)
>> +{
>> +    FFHWBaseEncodeContext *ctx = avctx->priv_data;
>
>This is the issue I was talking about, this requires that
>FFHWBaseEncodeContext is always the main context.
>
>Could you change it so everything takes FFHWBaseEncodeContext as an
>argument, rather than AVCodecContext (apart from where the function
>absolutely must read some data from it)?

I'm trying to understand it more.

In ff_hw_base_encode_init we also have the following code:

    if (!avctx->hw_frames_ctx) {
        av_log(avctx, AV_LOG_ERROR, "A hardware frames reference is "
               "required to associate the encoding device.\n");
        return AVERROR(EINVAL);
    }

In this scenario we still need avctx right? So maybe this is the "must read data from it" situation and we keep AVCodecContext as the main context? 

Plus I have this av_log concern which is there's indeed some function that the only use of avctx is its av_log's context. Do you think I should instead use FFHWBaseEncodeContext as the main context and pass it to av_log for this situation while keeping AVCodecContext as the main context for other functions which actually read from AVCodecContext. That might lead to different av_log context in the same file.

Do you think the callbacks in FFHWEncodePictureOperation should be changed too? Or only all the functions in hw_base_encode.c should be concerned. Thank you.

-Tong
Lynne May 27, 2024, 1:04 a.m. UTC | #5
On 27/05/2024 02:35, Wu, Tong1 wrote:
>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Lynne
>> via ffmpeg-devel
>> Sent: Saturday, May 25, 2024 10:07 PM
>> To: ffmpeg-devel@ffmpeg.org
>> Cc: Lynne <dev@lynne.ee>
>> Subject: Re: [FFmpeg-devel] [PATCH v11 07/14] avcodec/vaapi_encode: extract
>> the init and close function to base layer
>>
>> On 25/05/2024 12:30, tong1.wu-at-intel.com@ffmpeg.org wrote:
>>> From: Tong Wu <tong1.wu@intel.com>
>>>
>>> Related parameters such as device context, frame context are also moved
>>> to base layer.
>>>
>>> Signed-off-by: Tong Wu <tong1.wu@intel.com>
>>> ---
>>>    libavcodec/hw_base_encode.c     | 49 ++++++++++++++++++
>>>    libavcodec/hw_base_encode.h     | 17 +++++++
>>>    libavcodec/vaapi_encode.c       | 90 +++++++++++----------------------
>>>    libavcodec/vaapi_encode.h       | 10 ----
>>>    libavcodec/vaapi_encode_av1.c   |  2 +-
>>>    libavcodec/vaapi_encode_h264.c  |  2 +-
>>>    libavcodec/vaapi_encode_h265.c  |  2 +-
>>>    libavcodec/vaapi_encode_mjpeg.c |  6 ++-
>>>    8 files changed, 102 insertions(+), 76 deletions(-)
>>>
>>> diff --git a/libavcodec/hw_base_encode.c b/libavcodec/hw_base_encode.c
>>> index 16afaa37be..c4789380b6 100644
>>> --- a/libavcodec/hw_base_encode.c
>>> +++ b/libavcodec/hw_base_encode.c
>>> @@ -592,3 +592,52 @@ end:
>>>
>>>        return 0;
>>>    }
>>> +
>>> +int ff_hw_base_encode_init(AVCodecContext *avctx)
>>> +{
>>> +    FFHWBaseEncodeContext *ctx = avctx->priv_data;
>>
>> This is the issue I was talking about, this requires that
>> FFHWBaseEncodeContext is always the main context.
>>
>> Could you change it so everything takes FFHWBaseEncodeContext as an
>> argument, rather than AVCodecContext (apart from where the function
>> absolutely must read some data from it)?
> 
> I'm trying to understand it more.
> 
> In ff_hw_base_encode_init we also have the following code:
> 
>      if (!avctx->hw_frames_ctx) {
>          av_log(avctx, AV_LOG_ERROR, "A hardware frames reference is "
>                 "required to associate the encoding device.\n");
>          return AVERROR(EINVAL);
>      }
> 
> In this scenario we still need avctx right? So maybe this is the "must read data from it" situation and we keep AVCodecContext as the main context?

Yup. My point is that FFHWBaseEncodeContext doesn't become the primary 
context that must be used, but a separate component other users can use 
standalone.

> Plus I have this av_log concern which is there's indeed some function that the only use of avctx is its av_log's context. Do you think I should instead use FFHWBaseEncodeContext as the main context and pass it to av_log for this situation while keeping AVCodecContext as the main context for other functions which actually read from AVCodecContext. That might lead to different av_log context in the same file.

Just save a pointer to avctx on init in FFHWBaseEncodeContext. That's 
what most of our code does.

> Do you think the callbacks in FFHWEncodePictureOperation should be changed too? Or only all the functions in hw_base_encode.c should be concerned. Thank you.

No, the callbacks should stay as-is. Users need to retrieve their own 
context via avctx. That's also a reason why you should keep a pointer to 
avctx on init.

Thanks
Wu, Tong1 May 28, 2024, 3:53 p.m. UTC | #6
>From: Lynne <dev@lynne.ee>
>Sent: Monday, May 27, 2024 10:04 AM
>To: Wu, Tong1 <tong1.wu@intel.com>; FFmpeg development discussions and
>patches <ffmpeg-devel@ffmpeg.org>
>Subject: Re: [FFmpeg-devel] [PATCH v11 07/14] avcodec/vaapi_encode: extract
>the init and close function to base layer
>
>On 27/05/2024 02:35, Wu, Tong1 wrote:
>>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
>Lynne
>>> via ffmpeg-devel
>>> Sent: Saturday, May 25, 2024 10:07 PM
>>> To: ffmpeg-devel@ffmpeg.org
>>> Cc: Lynne <dev@lynne.ee>
>>> Subject: Re: [FFmpeg-devel] [PATCH v11 07/14] avcodec/vaapi_encode:
>extract
>>> the init and close function to base layer
>>>
>>> On 25/05/2024 12:30, tong1.wu-at-intel.com@ffmpeg.org wrote:
>>>> From: Tong Wu <tong1.wu@intel.com>
>>>>
>>>> Related parameters such as device context, frame context are also moved
>>>> to base layer.
>>>>
>>>> Signed-off-by: Tong Wu <tong1.wu@intel.com>
>>>> ---
>>>>    libavcodec/hw_base_encode.c     | 49 ++++++++++++++++++
>>>>    libavcodec/hw_base_encode.h     | 17 +++++++
>>>>    libavcodec/vaapi_encode.c       | 90 +++++++++++----------------------
>>>>    libavcodec/vaapi_encode.h       | 10 ----
>>>>    libavcodec/vaapi_encode_av1.c   |  2 +-
>>>>    libavcodec/vaapi_encode_h264.c  |  2 +-
>>>>    libavcodec/vaapi_encode_h265.c  |  2 +-
>>>>    libavcodec/vaapi_encode_mjpeg.c |  6 ++-
>>>>    8 files changed, 102 insertions(+), 76 deletions(-)
>>>>
>>>> diff --git a/libavcodec/hw_base_encode.c b/libavcodec/hw_base_encode.c
>>>> index 16afaa37be..c4789380b6 100644
>>>> --- a/libavcodec/hw_base_encode.c
>>>> +++ b/libavcodec/hw_base_encode.c
>>>> @@ -592,3 +592,52 @@ end:
>>>>
>>>>        return 0;
>>>>    }
>>>> +
>>>> +int ff_hw_base_encode_init(AVCodecContext *avctx)
>>>> +{
>>>> +    FFHWBaseEncodeContext *ctx = avctx->priv_data;
>>>
>>> This is the issue I was talking about, this requires that
>>> FFHWBaseEncodeContext is always the main context.
>>>
>>> Could you change it so everything takes FFHWBaseEncodeContext as an
>>> argument, rather than AVCodecContext (apart from where the function
>>> absolutely must read some data from it)?
>>
>> I'm trying to understand it more.
>>
>> In ff_hw_base_encode_init we also have the following code:
>>
>>      if (!avctx->hw_frames_ctx) {
>>          av_log(avctx, AV_LOG_ERROR, "A hardware frames reference is "
>>                 "required to associate the encoding device.\n");
>>          return AVERROR(EINVAL);
>>      }
>>
>> In this scenario we still need avctx right? So maybe this is the "must read data
>from it" situation and we keep AVCodecContext as the main context?
>
>Yup. My point is that FFHWBaseEncodeContext doesn't become the primary
>context that must be used, but a separate component other users can use
>standalone.
>
>> Plus I have this av_log concern which is there's indeed some function that
>the only use of avctx is its av_log's context. Do you think I should instead use
>FFHWBaseEncodeContext as the main context and pass it to av_log for this
>situation while keeping AVCodecContext as the main context for other
>functions which actually read from AVCodecContext. That might lead to
>different av_log context in the same file.
>
>Just save a pointer to avctx on init in FFHWBaseEncodeContext. That's
>what most of our code does.
>
>> Do you think the callbacks in FFHWEncodePictureOperation should be
>changed too? Or only all the functions in hw_base_encode.c should be
>concerned. Thank you.
>
>No, the callbacks should stay as-is. Users need to retrieve their own
>context via avctx. That's also a reason why you should keep a pointer to
>avctx on init.
>

I've updated v12 and made this change with a separate(15th) patch. Please see if that is what you wanted. Thank you.

-Tong
diff mbox series

Patch

diff --git a/libavcodec/hw_base_encode.c b/libavcodec/hw_base_encode.c
index 16afaa37be..c4789380b6 100644
--- a/libavcodec/hw_base_encode.c
+++ b/libavcodec/hw_base_encode.c
@@ -592,3 +592,52 @@  end:
 
     return 0;
 }
+
+int ff_hw_base_encode_init(AVCodecContext *avctx)
+{
+    FFHWBaseEncodeContext *ctx = avctx->priv_data;
+
+    ctx->frame = av_frame_alloc();
+    if (!ctx->frame)
+        return AVERROR(ENOMEM);
+
+    if (!avctx->hw_frames_ctx) {
+        av_log(avctx, AV_LOG_ERROR, "A hardware frames reference is "
+               "required to associate the encoding device.\n");
+        return AVERROR(EINVAL);
+    }
+
+    ctx->input_frames_ref = av_buffer_ref(avctx->hw_frames_ctx);
+    if (!ctx->input_frames_ref)
+        return AVERROR(ENOMEM);
+
+    ctx->input_frames = (AVHWFramesContext *)ctx->input_frames_ref->data;
+
+    ctx->device_ref = av_buffer_ref(ctx->input_frames->device_ref);
+    if (!ctx->device_ref)
+        return AVERROR(ENOMEM);
+
+    ctx->device = (AVHWDeviceContext *)ctx->device_ref->data;
+
+    ctx->tail_pkt = av_packet_alloc();
+    if (!ctx->tail_pkt)
+        return AVERROR(ENOMEM);
+
+    return 0;
+}
+
+int ff_hw_base_encode_close(AVCodecContext *avctx)
+{
+    FFHWBaseEncodeContext *ctx = avctx->priv_data;
+
+    av_fifo_freep2(&ctx->encode_fifo);
+
+    av_frame_free(&ctx->frame);
+    av_packet_free(&ctx->tail_pkt);
+
+    av_buffer_unref(&ctx->device_ref);
+    av_buffer_unref(&ctx->input_frames_ref);
+    av_buffer_unref(&ctx->recon_frames_ref);
+
+    return 0;
+}
diff --git a/libavcodec/hw_base_encode.h b/libavcodec/hw_base_encode.h
index 81e6c87036..a062009860 100644
--- a/libavcodec/hw_base_encode.h
+++ b/libavcodec/hw_base_encode.h
@@ -19,6 +19,7 @@ 
 #ifndef AVCODEC_HW_BASE_ENCODE_H
 #define AVCODEC_HW_BASE_ENCODE_H
 
+#include "libavutil/hwcontext.h"
 #include "libavutil/fifo.h"
 
 #define MAX_DPB_SIZE 16
@@ -119,6 +120,18 @@  typedef struct FFHWBaseEncodeContext {
     // Hardware-specific hooks.
     const struct FFHWEncodePictureOperation *op;
 
+    // The hardware device context.
+    AVBufferRef    *device_ref;
+    AVHWDeviceContext *device;
+
+    // The hardware frame context containing the input frames.
+    AVBufferRef    *input_frames_ref;
+    AVHWFramesContext *input_frames;
+
+    // The hardware frame context containing the reconstructed frames.
+    AVBufferRef    *recon_frames_ref;
+    AVHWFramesContext *recon_frames;
+
     // Current encoding window, in display (input) order.
     FFHWBaseEncodePicture *pic_start, *pic_end;
     // The next picture to use as the previous reference picture in
@@ -185,6 +198,10 @@  typedef struct FFHWBaseEncodeContext {
 
 int ff_hw_base_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt);
 
+int ff_hw_base_encode_init(AVCodecContext *avctx);
+
+int ff_hw_base_encode_close(AVCodecContext *avctx);
+
 #define HW_BASE_ENCODE_COMMON_OPTIONS \
     { "async_depth", "Maximum processing parallelism. " \
       "Increase this to improve single channel performance.", \
diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c
index 1055fca0b1..1b3bab2c14 100644
--- a/libavcodec/vaapi_encode.c
+++ b/libavcodec/vaapi_encode.c
@@ -314,7 +314,7 @@  static int vaapi_encode_issue(AVCodecContext *avctx,
 
     av_log(avctx, AV_LOG_DEBUG, "Input surface is %#x.\n", pic->input_surface);
 
-    err = av_hwframe_get_buffer(ctx->recon_frames_ref, base_pic->recon_image, 0);
+    err = av_hwframe_get_buffer(base_ctx->recon_frames_ref, base_pic->recon_image, 0);
     if (err < 0) {
         err = AVERROR(ENOMEM);
         goto fail;
@@ -996,9 +996,10 @@  static const VAEntrypoint vaapi_encode_entrypoints_low_power[] = {
 
 static av_cold int vaapi_encode_profile_entrypoint(AVCodecContext *avctx)
 {
-    VAAPIEncodeContext      *ctx = avctx->priv_data;
-    VAProfile    *va_profiles    = NULL;
-    VAEntrypoint *va_entrypoints = NULL;
+    FFHWBaseEncodeContext *base_ctx = avctx->priv_data;
+    VAAPIEncodeContext         *ctx = avctx->priv_data;
+    VAProfile       *va_profiles    = NULL;
+    VAEntrypoint    *va_entrypoints = NULL;
     VAStatus vas;
     const VAEntrypoint *usable_entrypoints;
     const VAAPIEncodeProfile *profile;
@@ -1021,10 +1022,10 @@  static av_cold int vaapi_encode_profile_entrypoint(AVCodecContext *avctx)
         usable_entrypoints = vaapi_encode_entrypoints_normal;
     }
 
-    desc = av_pix_fmt_desc_get(ctx->input_frames->sw_format);
+    desc = av_pix_fmt_desc_get(base_ctx->input_frames->sw_format);
     if (!desc) {
         av_log(avctx, AV_LOG_ERROR, "Invalid input pixfmt (%d).\n",
-               ctx->input_frames->sw_format);
+               base_ctx->input_frames->sw_format);
         return AVERROR(EINVAL);
     }
     depth = desc->comp[0].depth;
@@ -2131,20 +2132,21 @@  static int vaapi_encode_alloc_output_buffer(FFRefStructOpaque opaque, void *obj)
 
 static av_cold int vaapi_encode_create_recon_frames(AVCodecContext *avctx)
 {
-    VAAPIEncodeContext *ctx = avctx->priv_data;
+    FFHWBaseEncodeContext *base_ctx = avctx->priv_data;
+    VAAPIEncodeContext         *ctx = avctx->priv_data;
     AVVAAPIHWConfig *hwconfig = NULL;
     AVHWFramesConstraints *constraints = NULL;
     enum AVPixelFormat recon_format;
     int err, i;
 
-    hwconfig = av_hwdevice_hwconfig_alloc(ctx->device_ref);
+    hwconfig = av_hwdevice_hwconfig_alloc(base_ctx->device_ref);
     if (!hwconfig) {
         err = AVERROR(ENOMEM);
         goto fail;
     }
     hwconfig->config_id = ctx->va_config;
 
-    constraints = av_hwdevice_get_hwframe_constraints(ctx->device_ref,
+    constraints = av_hwdevice_get_hwframe_constraints(base_ctx->device_ref,
                                                       hwconfig);
     if (!constraints) {
         err = AVERROR(ENOMEM);
@@ -2157,9 +2159,9 @@  static av_cold int vaapi_encode_create_recon_frames(AVCodecContext *avctx)
     recon_format = AV_PIX_FMT_NONE;
     if (constraints->valid_sw_formats) {
         for (i = 0; constraints->valid_sw_formats[i] != AV_PIX_FMT_NONE; i++) {
-            if (ctx->input_frames->sw_format ==
+            if (base_ctx->input_frames->sw_format ==
                 constraints->valid_sw_formats[i]) {
-                recon_format = ctx->input_frames->sw_format;
+                recon_format = base_ctx->input_frames->sw_format;
                 break;
             }
         }
@@ -2170,7 +2172,7 @@  static av_cold int vaapi_encode_create_recon_frames(AVCodecContext *avctx)
         }
     } else {
         // No idea what to use; copy input format.
-        recon_format = ctx->input_frames->sw_format;
+        recon_format = base_ctx->input_frames->sw_format;
     }
     av_log(avctx, AV_LOG_DEBUG, "Using %s as format of "
            "reconstructed frames.\n", av_get_pix_fmt_name(recon_format));
@@ -2191,19 +2193,19 @@  static av_cold int vaapi_encode_create_recon_frames(AVCodecContext *avctx)
     av_freep(&hwconfig);
     av_hwframe_constraints_free(&constraints);
 
-    ctx->recon_frames_ref = av_hwframe_ctx_alloc(ctx->device_ref);
-    if (!ctx->recon_frames_ref) {
+    base_ctx->recon_frames_ref = av_hwframe_ctx_alloc(base_ctx->device_ref);
+    if (!base_ctx->recon_frames_ref) {
         err = AVERROR(ENOMEM);
         goto fail;
     }
-    ctx->recon_frames = (AVHWFramesContext*)ctx->recon_frames_ref->data;
+    base_ctx->recon_frames = (AVHWFramesContext*)base_ctx->recon_frames_ref->data;
 
-    ctx->recon_frames->format    = AV_PIX_FMT_VAAPI;
-    ctx->recon_frames->sw_format = recon_format;
-    ctx->recon_frames->width     = ctx->surface_width;
-    ctx->recon_frames->height    = ctx->surface_height;
+    base_ctx->recon_frames->format    = AV_PIX_FMT_VAAPI;
+    base_ctx->recon_frames->sw_format = recon_format;
+    base_ctx->recon_frames->width     = ctx->surface_width;
+    base_ctx->recon_frames->height    = ctx->surface_height;
 
-    err = av_hwframe_ctx_init(ctx->recon_frames_ref);
+    err = av_hwframe_ctx_init(base_ctx->recon_frames_ref);
     if (err < 0) {
         av_log(avctx, AV_LOG_ERROR, "Failed to initialise reconstructed "
                "frame context: %d.\n", err);
@@ -2235,44 +2237,16 @@  av_cold int ff_vaapi_encode_init(AVCodecContext *avctx)
     VAStatus vas;
     int err;
 
+    err = ff_hw_base_encode_init(avctx);
+    if (err < 0)
+        goto fail;
+
     ctx->va_config  = VA_INVALID_ID;
     ctx->va_context = VA_INVALID_ID;
 
     base_ctx->op = &vaapi_op;
 
-    /* If you add something that can fail above this av_frame_alloc(),
-     * modify ff_vaapi_encode_close() accordingly. */
-    base_ctx->frame = av_frame_alloc();
-    if (!base_ctx->frame) {
-        return AVERROR(ENOMEM);
-    }
-
-    if (!avctx->hw_frames_ctx) {
-        av_log(avctx, AV_LOG_ERROR, "A hardware frames reference is "
-               "required to associate the encoding device.\n");
-        return AVERROR(EINVAL);
-    }
-
-    ctx->input_frames_ref = av_buffer_ref(avctx->hw_frames_ctx);
-    if (!ctx->input_frames_ref) {
-        err = AVERROR(ENOMEM);
-        goto fail;
-    }
-    ctx->input_frames = (AVHWFramesContext*)ctx->input_frames_ref->data;
-
-    ctx->device_ref = av_buffer_ref(ctx->input_frames->device_ref);
-    if (!ctx->device_ref) {
-        err = AVERROR(ENOMEM);
-        goto fail;
-    }
-    ctx->device = (AVHWDeviceContext*)ctx->device_ref->data;
-    ctx->hwctx = ctx->device->hwctx;
-
-    base_ctx->tail_pkt = av_packet_alloc();
-    if (!base_ctx->tail_pkt) {
-        err = AVERROR(ENOMEM);
-        goto fail;
-    }
+    ctx->hwctx = base_ctx->device->hwctx;
 
     err = vaapi_encode_profile_entrypoint(avctx);
     if (err < 0)
@@ -2339,7 +2313,7 @@  av_cold int ff_vaapi_encode_init(AVCodecContext *avctx)
     if (err < 0)
         goto fail;
 
-    recon_hwctx = ctx->recon_frames->hwctx;
+    recon_hwctx = base_ctx->recon_frames->hwctx;
     vas = vaCreateContext(ctx->hwctx->display, ctx->va_config,
                           ctx->surface_width, ctx->surface_height,
                           VA_PROGRESSIVE,
@@ -2467,16 +2441,10 @@  av_cold int ff_vaapi_encode_close(AVCodecContext *avctx)
         ctx->va_config = VA_INVALID_ID;
     }
 
-    av_frame_free(&base_ctx->frame);
-    av_packet_free(&base_ctx->tail_pkt);
-
     av_freep(&ctx->codec_sequence_params);
     av_freep(&ctx->codec_picture_params);
-    av_fifo_freep2(&base_ctx->encode_fifo);
 
-    av_buffer_unref(&ctx->recon_frames_ref);
-    av_buffer_unref(&ctx->input_frames_ref);
-    av_buffer_unref(&ctx->device_ref);
+    ff_hw_base_encode_close(avctx);
 
     return 0;
 }
diff --git a/libavcodec/vaapi_encode.h b/libavcodec/vaapi_encode.h
index 5e442d9c23..d5d7d27a07 100644
--- a/libavcodec/vaapi_encode.h
+++ b/libavcodec/vaapi_encode.h
@@ -219,18 +219,8 @@  typedef struct VAAPIEncodeContext {
     VAConfigID      va_config;
     VAContextID     va_context;
 
-    AVBufferRef    *device_ref;
-    AVHWDeviceContext *device;
     AVVAAPIDeviceContext *hwctx;
 
-    // The hardware frame context containing the input frames.
-    AVBufferRef    *input_frames_ref;
-    AVHWFramesContext *input_frames;
-
-    // The hardware frame context containing the reconstructed frames.
-    AVBufferRef    *recon_frames_ref;
-    AVHWFramesContext *recon_frames;
-
     // Pool of (reusable) bitstream output buffers.
     struct FFRefStructPool *output_buffer_pool;
 
diff --git a/libavcodec/vaapi_encode_av1.c b/libavcodec/vaapi_encode_av1.c
index 1c9235acd9..a590f4cd8b 100644
--- a/libavcodec/vaapi_encode_av1.c
+++ b/libavcodec/vaapi_encode_av1.c
@@ -373,7 +373,7 @@  static int vaapi_encode_av1_init_sequence_params(AVCodecContext *avctx)
     memset(sh_obu, 0, sizeof(*sh_obu));
     sh_obu->header.obu_type = AV1_OBU_SEQUENCE_HEADER;
 
-    desc = av_pix_fmt_desc_get(priv->common.input_frames->sw_format);
+    desc = av_pix_fmt_desc_get(base_ctx->input_frames->sw_format);
     av_assert0(desc);
 
     sh->seq_profile  = avctx->profile;
diff --git a/libavcodec/vaapi_encode_h264.c b/libavcodec/vaapi_encode_h264.c
index 2f91655d18..eacfb12706 100644
--- a/libavcodec/vaapi_encode_h264.c
+++ b/libavcodec/vaapi_encode_h264.c
@@ -308,7 +308,7 @@  static int vaapi_encode_h264_init_sequence_params(AVCodecContext *avctx)
     memset(sps, 0, sizeof(*sps));
     memset(pps, 0, sizeof(*pps));
 
-    desc = av_pix_fmt_desc_get(priv->common.input_frames->sw_format);
+    desc = av_pix_fmt_desc_get(base_ctx->input_frames->sw_format);
     av_assert0(desc);
     if (desc->nb_components == 1 || desc->log2_chroma_w != 1 || desc->log2_chroma_h != 1) {
         av_log(avctx, AV_LOG_ERROR, "Chroma format of input pixel format "
diff --git a/libavcodec/vaapi_encode_h265.c b/libavcodec/vaapi_encode_h265.c
index 6befe3426e..509251c97a 100644
--- a/libavcodec/vaapi_encode_h265.c
+++ b/libavcodec/vaapi_encode_h265.c
@@ -278,7 +278,7 @@  static int vaapi_encode_h265_init_sequence_params(AVCodecContext *avctx)
     memset(pps, 0, sizeof(*pps));
 
 
-    desc = av_pix_fmt_desc_get(priv->common.input_frames->sw_format);
+    desc = av_pix_fmt_desc_get(base_ctx->input_frames->sw_format);
     av_assert0(desc);
     if (desc->nb_components == 1) {
         chroma_format = 0;
diff --git a/libavcodec/vaapi_encode_mjpeg.c b/libavcodec/vaapi_encode_mjpeg.c
index 4c31881e6a..48dc677bec 100644
--- a/libavcodec/vaapi_encode_mjpeg.c
+++ b/libavcodec/vaapi_encode_mjpeg.c
@@ -222,6 +222,7 @@  static int vaapi_encode_mjpeg_write_extra_buffer(AVCodecContext *avctx,
 static int vaapi_encode_mjpeg_init_picture_params(AVCodecContext *avctx,
                                                   VAAPIEncodePicture *vaapi_pic)
 {
+    FFHWBaseEncodeContext       *base_ctx = avctx->priv_data;
     VAAPIEncodeMJPEGContext         *priv = avctx->priv_data;
     const FFHWBaseEncodePicture      *pic = &vaapi_pic->base;
     JPEGRawFrameHeader                *fh = &priv->frame_header;
@@ -235,7 +236,7 @@  static int vaapi_encode_mjpeg_init_picture_params(AVCodecContext *avctx,
 
     av_assert0(pic->type == FF_HW_PICTURE_TYPE_IDR);
 
-    desc = av_pix_fmt_desc_get(priv->common.input_frames->sw_format);
+    desc = av_pix_fmt_desc_get(base_ctx->input_frames->sw_format);
     av_assert0(desc);
     if (desc->flags & AV_PIX_FMT_FLAG_RGB)
         components = components_rgb;
@@ -437,10 +438,11 @@  static int vaapi_encode_mjpeg_init_slice_params(AVCodecContext *avctx,
 
 static av_cold int vaapi_encode_mjpeg_get_encoder_caps(AVCodecContext *avctx)
 {
+    FFHWBaseEncodeContext *base_ctx = avctx->priv_data;
     VAAPIEncodeContext *ctx = avctx->priv_data;
     const AVPixFmtDescriptor *desc;
 
-    desc = av_pix_fmt_desc_get(ctx->input_frames->sw_format);
+    desc = av_pix_fmt_desc_get(base_ctx->input_frames->sw_format);
     av_assert0(desc);
 
     ctx->surface_width  = FFALIGN(avctx->width,  8 << desc->log2_chroma_w);