diff mbox series

[FFmpeg-devel] avcodec/qsvenc: make QSV encoder encode VAAPI and D3D11 frames directly

Message ID 20220607092216.405-1-tong1.wu@intel.com
State New
Headers show
Series [FFmpeg-devel] avcodec/qsvenc: make QSV encoder encode VAAPI and D3D11 frames directly | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_armv7_RPi4 success Make finished
andriy/make_fate_armv7_RPi4 success Make fate finished

Commit Message

Wu, Tong1 June 7, 2022, 9:22 a.m. UTC
QSV encoder is able to encode frames with VAAPI or D3D11 pixel format
directly. This patch adds support for qsv encoder to accept VAAPI and
D3D11 pixel formats as input.

Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
---
 libavcodec/qsvenc.c       | 59 ++++++++++++++++++++++++++++++++++-----
 libavcodec/qsvenc_h264.c  |  2 ++
 libavcodec/qsvenc_hevc.c  |  2 ++
 libavcodec/qsvenc_jpeg.c  |  2 ++
 libavcodec/qsvenc_mpeg2.c |  2 ++
 libavcodec/qsvenc_vp9.c   |  2 ++
 6 files changed, 62 insertions(+), 7 deletions(-)

Comments

Anton Khirnov June 7, 2022, 9:33 a.m. UTC | #1
Quoting Tong Wu (2022-06-07 11:22:16)
> QSV encoder is able to encode frames with VAAPI or D3D11 pixel format
> directly. This patch adds support for qsv encoder to accept VAAPI and
> D3D11 pixel formats as input.

This looks like an ad-hoc hack to me. Encoders should not do these kinds
of tricks.
Wu, Tong1 June 8, 2022, 4:47 a.m. UTC | #2
> 
> Quoting Tong Wu (2022-06-07 11:22:16)
> > QSV encoder is able to encode frames with VAAPI or D3D11 pixel format
> > directly. This patch adds support for qsv encoder to accept VAAPI and
> > D3D11 pixel formats as input.
> 
> This looks like an ad-hoc hack to me. Encoders should not do these kinds of
> tricks.
> 
> --
> Anton Khirnov

Thanks for the comments. The MFXSurface is based on VaSurface on Linux and D3D texture on Windows. Since the QSV encoder can accept AV_PIX_FMT_QSV as input, it seems kind of reasonable to accept VAAPI and D3D as its input. And it just may not look like a 'real' trick, let's say, for example, make QSV encoder accept VULKAN format directly. By adding this patch, we just want QSV encoder have more input format supports like what nvenc does.

Plus, this patch can really help the users who have hybrid transcode needs.

Kindly Regards,
Tong
Xiang, Haihao June 8, 2022, 8:41 a.m. UTC | #3
On Wed, 2022-06-08 at 05:08 +0000, Soft Works wrote:
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Tong Wu
> > Sent: Tuesday, June 7, 2022 11:22 AM
> > To: ffmpeg-devel@ffmpeg.org
> > Cc: Tong Wu <tong1.wu@intel.com>; Wenbin Chen <wenbin.chen@intel.com>
> > Subject: [FFmpeg-devel] [PATCH] avcodec/qsvenc: make QSV encoder encode
> > VAAPI
> > and D3D11 frames directly
> > 
> > QSV encoder is able to encode frames with VAAPI or D3D11 pixel format
> > directly. This patch adds support for qsv encoder to accept VAAPI and
> > D3D11 pixel formats as input.
> > 
> > Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
> > Signed-off-by: Tong Wu <tong1.wu@intel.com>
> > ---
> >  libavcodec/qsvenc.c       | 59 ++++++++++++++++++++++++++++++++++-----
> >  libavcodec/qsvenc_h264.c  |  2 ++
> >  libavcodec/qsvenc_hevc.c  |  2 ++
> >  libavcodec/qsvenc_jpeg.c  |  2 ++
> >  libavcodec/qsvenc_mpeg2.c |  2 ++
> >  libavcodec/qsvenc_vp9.c   |  2 ++
> >  6 files changed, 62 insertions(+), 7 deletions(-)
> > 
> > diff --git a/libavcodec/qsvenc.c b/libavcodec/qsvenc.c
> > index 2b3b06767d..132d9ba93b 100644
> > --- a/libavcodec/qsvenc.c
> > +++ b/libavcodec/qsvenc.c
> > @@ -524,7 +524,9 @@ static int check_enc_param(AVCodecContext *avctx,
> > QSVEncContext *q)
> > 
> >  static int init_video_param_jpeg(AVCodecContext *avctx, QSVEncContext *q)
> >  {
> > -    enum AVPixelFormat sw_format = avctx->pix_fmt == AV_PIX_FMT_QSV ?
> > +    enum AVPixelFormat sw_format = avctx->pix_fmt == AV_PIX_FMT_QSV ||
> > +                                   avctx->pix_fmt == AV_PIX_FMT_VAAPI ||
> > +                                   avctx->pix_fmt == AV_PIX_FMT_D3D11 ?
> >                                     avctx->sw_pix_fmt : avctx->pix_fmt;
> >      const AVPixFmtDescriptor *desc;
> >      int ret;
> > @@ -591,7 +593,9 @@ static int init_video_param_jpeg(AVCodecContext *avctx,
> > QSVEncContext *q)
> > 
> >  static int init_video_param(AVCodecContext *avctx, QSVEncContext *q)
> >  {
> > -    enum AVPixelFormat sw_format = avctx->pix_fmt == AV_PIX_FMT_QSV ?
> > +    enum AVPixelFormat sw_format = avctx->pix_fmt == AV_PIX_FMT_QSV ||
> > +                                   avctx->pix_fmt == AV_PIX_FMT_VAAPI ||
> > +                                   avctx->pix_fmt == AV_PIX_FMT_D3D11 ?
> >                                     avctx->sw_pix_fmt : avctx->pix_fmt;
> >      const AVPixFmtDescriptor *desc;
> >      float quant;
> > @@ -1247,7 +1251,31 @@ int ff_qsv_enc_init(AVCodecContext *avctx,
> > QSVEncContext *q)
> > 
> >      if (avctx->hw_frames_ctx) {
> >          AVHWFramesContext    *frames_ctx = (AVHWFramesContext*)avctx-
> > > hw_frames_ctx->data;
> > 
> > -        AVQSVFramesContext *frames_hwctx = frames_ctx->hwctx;
> > +        AVQSVFramesContext *frames_hwctx = NULL;
> > +
> > +        if (frames_ctx->format == AV_PIX_FMT_VAAPI || frames_ctx->format ==
> > AV_PIX_FMT_D3D11) {
> > +            AVBufferRef *derive_device_ref = NULL;
> > +            AVBufferRef *derive_frames_ref = NULL;
> > +            ret = av_hwdevice_ctx_create_derived(&derive_device_ref,
> > +                                                 AV_HWDEVICE_TYPE_QSV,
> > frames_ctx->device_ref, 0);
> > +            if (ret < 0) {
> > +                av_log(avctx, AV_LOG_ERROR, "Failed to derive QSV device
> > context: %d.\n", ret);
> > +                return ret;
> > +            }
> > +            ret = av_hwframe_ctx_create_derived(&derive_frames_ref,
> > +                                                AV_PIX_FMT_QSV,
> > derive_device_ref, avctx->hw_frames_ctx, 0);
> > +            if (ret < 0) {
> > +                av_log(avctx, AV_LOG_ERROR, "Failed to derive QSV frames
> > context: %d.\n", ret);
> > +                av_buffer_unref(&derive_device_ref);
> > +                return ret;
> > +            }
> > +            av_buffer_unref(&avctx->hw_device_ctx);
> > +            avctx->hw_device_ctx = derive_device_ref;
> > +            av_buffer_unref(&avctx->hw_frames_ctx);
> > +            avctx->hw_frames_ctx = derive_frames_ref;
> > +            frames_ctx = (AVHWFramesContext*)avctx->hw_frames_ctx->data;
> > +        }
> > +        frames_hwctx = frames_ctx->hwctx;
> > 
> >          if (!iopattern) {
> >              if (frames_hwctx->frame_type & MFX_MEMTYPE_OPAQUE_FRAME)
> > @@ -1437,10 +1465,25 @@ static int submit_frame(QSVEncContext *q, const
> > AVFrame *frame,
> >      if (ret < 0)
> >          return ret;
> > 
> > -    if (frame->format == AV_PIX_FMT_QSV) {
> > -        ret = av_frame_ref(qf->frame, frame);
> > -        if (ret < 0)
> > -            return ret;
> > +    if (frame->format == AV_PIX_FMT_QSV || frame->format ==
> > AV_PIX_FMT_VAAPI
> > > > frame->format == AV_PIX_FMT_D3D11) {
> > 
> > +        if (frame->format == AV_PIX_FMT_QSV) {
> > +            ret = av_frame_ref(qf->frame, frame);
> > +            if (ret < 0)
> > +                return ret;
> > +        } else {
> > +            qf->frame->format = AV_PIX_FMT_QSV;
> > +            qf->frame->hw_frames_ctx = av_buffer_ref(q->avctx-
> > > hw_frames_ctx);
> > 
> > +            if (!qf->frame->hw_frames_ctx)
> > +                return AVERROR(ENOMEM);
> > +            ret = av_hwframe_map(qf->frame, frame, 0);
> > +            if (ret < 0) {
> > +                av_log(q->avctx, AV_LOG_ERROR, "Failed to map to QSV
> > frames\n");
> > +                return ret;
> > +            }
> > +            ret = av_frame_copy_props(qf->frame, frame);
> > +            if (ret < 0)
> > +                return ret;
> > +        }
> > 
> >          qf->surface = *(mfxFrameSurface1*)qf->frame->data[3];
> > 
> > @@ -1735,6 +1778,8 @@ int ff_qsv_enc_close(AVCodecContext *avctx,
> > QSVEncContext *q)
> > 
> >  const AVCodecHWConfigInternal *const ff_qsv_enc_hw_configs[] = {
> >      HW_CONFIG_ENCODER_FRAMES(QSV,  QSV),
> > +    HW_CONFIG_ENCODER_FRAMES(VAAPI,VAAPI),
> > +    HW_CONFIG_ENCODER_FRAMES(D3D11,D3D11VA),
> >      HW_CONFIG_ENCODER_DEVICE(NV12, QSV),
> >      HW_CONFIG_ENCODER_DEVICE(P010, QSV),
> >      NULL,
> > diff --git a/libavcodec/qsvenc_h264.c b/libavcodec/qsvenc_h264.c
> > index cf77ea575b..93ba8d8ded 100644
> > --- a/libavcodec/qsvenc_h264.c
> > +++ b/libavcodec/qsvenc_h264.c
> > @@ -196,6 +196,8 @@ const FFCodec ff_h264_qsv_encoder = {
> >      .p.pix_fmts     = (const enum AVPixelFormat[]){ AV_PIX_FMT_NV12,
> >                                                      AV_PIX_FMT_P010,
> >                                                      AV_PIX_FMT_QSV,
> > +                                                    AV_PIX_FMT_VAAPI,
> > +                                                    AV_PIX_FMT_D3D11,
> >                                                      AV_PIX_FMT_NONE },
> >      .p.priv_class   = &class,
> >      .defaults       = qsv_enc_defaults,
> > diff --git a/libavcodec/qsvenc_hevc.c b/libavcodec/qsvenc_hevc.c
> > index a6bf39c148..63b6ad9150 100644
> > --- a/libavcodec/qsvenc_hevc.c
> > +++ b/libavcodec/qsvenc_hevc.c
> > @@ -309,6 +309,8 @@ const FFCodec ff_hevc_qsv_encoder = {
> >                                                      AV_PIX_FMT_YUYV422,
> >                                                      AV_PIX_FMT_Y210,
> >                                                      AV_PIX_FMT_QSV,
> > +                                                    AV_PIX_FMT_VAAPI,
> > +                                                    AV_PIX_FMT_D3D11,
> >                                                      AV_PIX_FMT_BGRA,
> >                                                      AV_PIX_FMT_X2RGB10,
> >                                                      AV_PIX_FMT_NONE },
> > diff --git a/libavcodec/qsvenc_jpeg.c b/libavcodec/qsvenc_jpeg.c
> > index 825eb8dc06..5b7611bb85 100644
> > --- a/libavcodec/qsvenc_jpeg.c
> > +++ b/libavcodec/qsvenc_jpeg.c
> > @@ -91,6 +91,8 @@ const FFCodec ff_mjpeg_qsv_encoder = {
> >      .p.capabilities = AV_CODEC_CAP_DELAY | AV_CODEC_CAP_HYBRID,
> >      .p.pix_fmts     = (const enum AVPixelFormat[]){ AV_PIX_FMT_NV12,
> >                                                      AV_PIX_FMT_QSV,
> > +                                                    AV_PIX_FMT_VAAPI,
> > +                                                    AV_PIX_FMT_D3D11,
> >                                                      AV_PIX_FMT_NONE },
> >      .p.priv_class   = &class,
> >      .defaults       = qsv_enc_defaults,
> > diff --git a/libavcodec/qsvenc_mpeg2.c b/libavcodec/qsvenc_mpeg2.c
> > index 5cb12a2582..cba4001ee1 100644
> > --- a/libavcodec/qsvenc_mpeg2.c
> > +++ b/libavcodec/qsvenc_mpeg2.c
> > @@ -105,6 +105,8 @@ const FFCodec ff_mpeg2_qsv_encoder = {
> >      .p.capabilities = AV_CODEC_CAP_DELAY | AV_CODEC_CAP_HYBRID,
> >      .p.pix_fmts     = (const enum AVPixelFormat[]){ AV_PIX_FMT_NV12,
> >                                                      AV_PIX_FMT_QSV,
> > +                                                    AV_PIX_FMT_VAAPI,
> > +                                                    AV_PIX_FMT_D3D11,
> >                                                      AV_PIX_FMT_NONE },
> >      .p.priv_class   = &class,
> >      .defaults       = qsv_enc_defaults,
> > diff --git a/libavcodec/qsvenc_vp9.c b/libavcodec/qsvenc_vp9.c
> > index 4b2a6ce77f..2825b98a4a 100644
> > --- a/libavcodec/qsvenc_vp9.c
> > +++ b/libavcodec/qsvenc_vp9.c
> > @@ -115,6 +115,8 @@ const FFCodec ff_vp9_qsv_encoder = {
> >      .p.pix_fmts     = (const enum AVPixelFormat[]){ AV_PIX_FMT_NV12,
> >                                                      AV_PIX_FMT_P010,
> >                                                      AV_PIX_FMT_QSV,
> > +                                                    AV_PIX_FMT_VAAPI,
> > +                                                    AV_PIX_FMT_D3D11,
> >                                                      AV_PIX_FMT_NONE },
> >      .p.priv_class   = &class,
> >      .defaults       = qsv_enc_defaults,
> > --
> > 2.35.1.windows.2
> 
> Hi,
> 
> thanks for submitting this patch. Though, I'm afraid, but this 
> 
> - fundamentally contradicts the logic of ffmpeg's handling of hw acceleration,
>   hw device and hw frames contexts
> - adds code to an encoder, doing things an encoder is not supposed to do- qsv
> encoders and decoders have their own context => QSV

nvdec and nvenc have CUDA but nvenc can also support D3D11va, it sounds make
sense for me to support D3D11va/vaapi in qsvenc too as d3d11va/vaapi are used
internally in MediaSDK.  

> - is not safe/guaranteed to work always
>   there are different requirements for QSV than for other other cases
>   like VAAPI - for example: QSV requires a fixed-size frame pool
>   and encoders often need a larger frame pool than VAAPI
> 

Encoders in MediaSDK don't need a fixed pool, probably we may relax this
limitation in QSV.

Thanks
Haihao

> 
> My personal opinion on such kind of automatic handling is this:
> 
> when you are not able to build a command line in a way that you exactly 
> know at each stage of the transcoding pipeline in which hw (or sw) context
> it will be executed, then you might be lost anyway - in most cases :-)
> 
> When you really want to achieve that kind of behavior, then it would be 
> a better idea to create a mechanism for "auto-insertion" of hwmap
> filters for such cases.
> I don't think that such behavior should be active by default though, as
> it would most likely create more non-understood failures than convenience 
> moments for not having to type        ,hwmap=derive_device=qsv
>
Xiang, Haihao June 9, 2022, 6:47 a.m. UTC | #4
On Wed, 2022-06-08 at 11:13 +0000, Soft Works wrote:
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Xiang,
> > Haihao
> > Sent: Wednesday, June 8, 2022 10:42 AM
> > To: ffmpeg-devel@ffmpeg.org
> > Cc: Wu, Tong1 <tong1.wu@intel.com>; Chen, Wenbin <wenbin.chen@intel.com>
> > Subject: Re: [FFmpeg-devel] [PATCH] avcodec/qsvenc: make QSV encoder encode
> > VAAPI and D3D11 frames directly
> > 
> > On Wed, 2022-06-08 at 05:08 +0000, Soft Works wrote:
> > > > -----Original Message-----
> > > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Tong
> > > > Wu
> > > > Sent: Tuesday, June 7, 2022 11:22 AM
> > > > To: ffmpeg-devel@ffmpeg.org
> > > > Cc: Tong Wu <tong1.wu@intel.com>; Wenbin Chen <wenbin.chen@intel.com>
> > > > Subject: [FFmpeg-devel] [PATCH] avcodec/qsvenc: make QSV encoder encode
> > > > VAAPI
> > > > and D3D11 frames directly
> 
> [..]
> 
> > > > 2.35.1.windows.2
> > > 
> > > Hi,
> > > 
> > > thanks for submitting this patch. Though, I'm afraid, but this
> > > 
> > > - fundamentally contradicts the logic of ffmpeg's handling of hw
> > 
> > acceleration,
> > >   hw device and hw frames contexts
> > > - adds code to an encoder, doing things an encoder is not supposed to do-
> > 
> > qsv
> > > encoders and decoders have their own context => QSV
> > 
> > nvdec and nvenc have CUDA but nvenc can also support D3D11va, it sounds make
> > sense for me to support D3D11va/vaapi in qsvenc too as d3d11va/vaapi are
> > used
> > internally in MediaSDK.
> 
> Can you please post a command line showing nvenc working with input
> from a D3D11VA decoder and without using any hwmap/hwupload/hwdownload
> filters?
> 

According to the code below, nvenc may accept d3d11 frames directly,

https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/nvenc.c#L46-L72

so the command below should work

$> ffmpeg -y -hwaccel_output_format d3d11 -hwaccel d3d11va -i input.mp4 -c:v
hevc_nvenc out.mp4 

> 
> > 
> > > - is not safe/guaranteed to work always
> > >   there are different requirements for QSV than for other other cases
> > >   like VAAPI - for example: QSV requires a fixed-size frame pool
> > >   and encoders often need a larger frame pool than VAAPI
> > > 
> > 
> > Encoders in MediaSDK don't need a fixed pool, probably we may relax this
> > limitation in QSV.
> 
> Well - I think they do:
> 
> Common
> The QSV hw frames context implementation is using a fixed pool size and
> changing that would be anything but trivial. 
> 
> D3D11
> The decoders are using and allocating a D3D11 array textures which cannot 
> be resized. When the surface count is too low, QSV encoding will fail.
> 
> VAAPI
> See vaapi_frames_init(): there is support for dynamically sized pools, but
> not as render targets.
> 
> 
> The actual problem though is not about fixed vs dynamic pool size - it's
> about whether the number of frames will be sufficient for encoding.
> And when try to use the (d3d11va) decoder's output frames and supply them
> to the encoder, they may very likely be insufficient.
> 
> 
> QSV encoding always requires quite a number of surfaces.
> You might remember all the extra_hw_frames parameters in the sample
> command lines: 
> 
> https://trac.ffmpeg.org/wiki/Hardware/QuickSync
> 
> -------------
> 
> Did you ever test this with a D3D11VA decoder on Windows? 
> I'm not sure whether this can work at all (feeding decoder frames
> into the encoder) due to the differing bind_flags requirements.
> 
> In case you did test that - could you please post the command line?

vaapi on Linux:

$ ffmpeg -y -hwaccel vaapi -hwaccel_output_format vaapi -i input.mp4 -c:v
hevc_qsv out.mp4

d3d11va on Windows:

$ ffmpeg.exe -y -hwaccel d3d11va -hwaccel_output_format d3d11 -i input.mp4 -c:v
h264_qsv -y out.mp4


Thanks
Haihao
Soft Works June 10, 2022, 11:54 p.m. UTC | #5
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Xiang, Haihao
> Sent: Thursday, June 9, 2022 8:48 AM
> To: ffmpeg-devel@ffmpeg.org
> Cc: Wu, Tong1 <tong1.wu@intel.com>; Chen, Wenbin
> <wenbin.chen@intel.com>
> Subject: Re: [FFmpeg-devel] [PATCH] avcodec/qsvenc: make QSV encoder
> encode VAAPI and D3D11 frames directly
> 
> On Wed, 2022-06-08 at 11:13 +0000, Soft Works wrote:
> > > -----Original Message-----
> > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> Xiang,
> > > Haihao
> > > Sent: Wednesday, June 8, 2022 10:42 AM
> > > To: ffmpeg-devel@ffmpeg.org
> > > Cc: Wu, Tong1 <tong1.wu@intel.com>; Chen, Wenbin
> <wenbin.chen@intel.com>
> > > Subject: Re: [FFmpeg-devel] [PATCH] avcodec/qsvenc: make QSV
> encoder encode
> > > VAAPI and D3D11 frames directly
> > >
> > > On Wed, 2022-06-08 at 05:08 +0000, Soft Works wrote:
> > > > > -----Original Message-----
> > > > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On
> Behalf Of Tong
> > > > > Wu
> > > > > Sent: Tuesday, June 7, 2022 11:22 AM
> > > > > To: ffmpeg-devel@ffmpeg.org
> > > > > Cc: Tong Wu <tong1.wu@intel.com>; Wenbin Chen
> <wenbin.chen@intel.com>
> > > > > Subject: [FFmpeg-devel] [PATCH] avcodec/qsvenc: make QSV
> encoder encode
> > > > > VAAPI
> > > > > and D3D11 frames directly
> >
> > [..]
> >
> > > > > 2.35.1.windows.2
> > > >
> > > > Hi,
> > > >
> > > > thanks for submitting this patch. Though, I'm afraid, but this
> > > >
> > > > - fundamentally contradicts the logic of ffmpeg's handling of
> hw
> > >
> > > acceleration,
> > > >   hw device and hw frames contexts
> > > > - adds code to an encoder, doing things an encoder is not
> supposed to do-
> > >
> > > qsv
> > > > encoders and decoders have their own context => QSV
> > >
> > > nvdec and nvenc have CUDA but nvenc can also support D3D11va, it
> sounds make
> > > sense for me to support D3D11va/vaapi in qsvenc too as
> d3d11va/vaapi are
> > > used
> > > internally in MediaSDK.
> >
> > Can you please post a command line showing nvenc working with input
> > from a D3D11VA decoder and without using any
> hwmap/hwupload/hwdownload
> > filters?
> >
> 
> According to the code below, nvenc may accept d3d11 frames directly,
> 
> https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/nvenc.c#L46-
> L72
> 
> so the command below should work
> 
> $> ffmpeg -y -hwaccel_output_format d3d11 -hwaccel d3d11va -i
> input.mp4 -c:v
> hevc_nvenc out.mp4

Right, it does work. Thanks for the command, I had tried like that
before, but in a "wrong" branch.

 
Now I took a bit of a deeper look into it and the ability of NVENC to
encode from plain D3D11 frames. There are quite a few differences
between NVENC and QSVENC.


HW Frames Contexts
------------------

QSVENV

MSDK cannot work with VAAPI frames, D3D9 frames or D3D11 frames directly.
An application is always required to wrap such frames via mfxSurface and
manage a collection of mfxSurface descriptions.
It's an abstraction that allows coding against the MSDK API independent
from the underlying technology. 
The technical representation of this in ffmpeg is the QSVFramesContext.
When there's an input of plain VAAPI or D3D11 frames (hw frames context),
then it is required to derive a new QSVFramesContext from the input hw
frames context (e.g. via hwmap filter) where the procedure of deriving
means to set up a new QSVFramesContext which does the required wrapping
(or "mapping" as ffmpeg calls it).

I think that the way how this logic is reflected in ffmpeg is thought
out very well and provides a high degree of flexibility.


NVENC

The situation is very different here. Nvidia provides platform independency
not by wrapping platform-specific GPU frame types, but instead uses its own
custom type - CUDA memory/frames. This is what decoders are outputting, filters
are using for input/output and encoders take as input.

What I do not know, is whether it would be possible to map D3D11 frames to
CUDA frames and vice versa. In case, that would be the preferable way IMO
to deal with different hw frame types. 
At least this isn't implemented at this time. The only possible frames
derivation/mapping is from and to Vulkan.

Hence I can't say whether the NVENC implementation to take D3D11 frames
directly has been done out of laziness or whether it was the only possible
way. In case when it wouldn't be possible to map D3D11 frames to CUDA 
frames, and only NVENC encoders would be able to process D3D11 frames,
then it would have been the only option of course.

But in any way, it's not the same as with QSVENC, because NVENC can take
D3D11 frames as input directly without wrapping/mapping first.

----------------

There are more differences, but I don't want to drive it too far.

What stands at the bottom line is:

- NVENC can take D3D11 frames context directly
- QSVENC can't - it needs to map it to a QSVFramesContext first


Concluding opinion:

An encoder should not include (duplicated) code for creating a derived frames
context. 
The same goal (= getting those command lines working that Haihao has posted)
could be achieved by auto-inserting a hwmap filter in those cases, which 
would probably take a few lines of code only. 

We don't have a precedence for auto-insertion of a hwmap filter, but we do
that in many other cases, so that would seem to me at least an option to 
think about.

I'm curious about other opinions...

Thanks,
softworkz
Anton Khirnov June 13, 2022, 9:07 a.m. UTC | #6
Quoting Wu, Tong1 (2022-06-08 06:47:27)
> > 
> > Quoting Tong Wu (2022-06-07 11:22:16)
> > > QSV encoder is able to encode frames with VAAPI or D3D11 pixel format
> > > directly. This patch adds support for qsv encoder to accept VAAPI and
> > > D3D11 pixel formats as input.
> > 
> > This looks like an ad-hoc hack to me. Encoders should not do these kinds of
> > tricks.
> > 
> > --
> > Anton Khirnov
> 
> Thanks for the comments. The MFXSurface is based on VaSurface on Linux
> and D3D texture on Windows. Since the QSV encoder can accept
> AV_PIX_FMT_QSV as input, it seems kind of reasonable to accept VAAPI
> and D3D as its input. And it just may not look like a 'real' trick,
> let's say, for example, make QSV encoder accept VULKAN format
> directly. By adding this patch, we just want QSV encoder have more
> input format supports like what nvenc does.

The difference with nvenc is that the nvenc API actually supports d3d
textures directly, our encoder wrapper merely passes them through.

Your patch, on the other hand, derives a new device inside the decoder.
The intent behind the hwcontext interface is that such operations should
be left to the library caller, and are actually quite easy to do. So I
don't see why is this patch really needed.

> Plus, this patch can really help the users who have hybrid transcode needs.

Could you elaborate? How would this patch be useful in this specific
case. Why can't the callers dervice the device themselves?
Wu, Tong1 June 15, 2022, 2:48 a.m. UTC | #7
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> > Xiang, Haihao
> > Sent: Thursday, June 9, 2022 8:48 AM
> > To: ffmpeg-devel@ffmpeg.org
> > Cc: Wu, Tong1 <tong1.wu@intel.com>; Chen, Wenbin
> > <wenbin.chen@intel.com>
> > Subject: Re: [FFmpeg-devel] [PATCH] avcodec/qsvenc: make QSV encoder
> > encode VAAPI and D3D11 frames directly
> >
> > On Wed, 2022-06-08 at 11:13 +0000, Soft Works wrote:
> > > > -----Original Message-----
> > > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of
> > Xiang,
> > > > Haihao
> > > > Sent: Wednesday, June 8, 2022 10:42 AM
> > > > To: ffmpeg-devel@ffmpeg.org
> > > > Cc: Wu, Tong1 <tong1.wu@intel.com>; Chen, Wenbin
> > <wenbin.chen@intel.com>
> > > > Subject: Re: [FFmpeg-devel] [PATCH] avcodec/qsvenc: make QSV
> > encoder encode
> > > > VAAPI and D3D11 frames directly
> > > >
> > > > On Wed, 2022-06-08 at 05:08 +0000, Soft Works wrote:
> > > > > > -----Original Message-----
> > > > > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On
> > Behalf Of Tong
> > > > > > Wu
> > > > > > Sent: Tuesday, June 7, 2022 11:22 AM
> > > > > > To: ffmpeg-devel@ffmpeg.org
> > > > > > Cc: Tong Wu <tong1.wu@intel.com>; Wenbin Chen
> > <wenbin.chen@intel.com>
> > > > > > Subject: [FFmpeg-devel] [PATCH] avcodec/qsvenc: make QSV
> > encoder encode
> > > > > > VAAPI
> > > > > > and D3D11 frames directly
> > >
> > > [..]
> > >
> > > > > > 2.35.1.windows.2
> > > > >
> > > > > Hi,
> > > > >
> > > > > thanks for submitting this patch. Though, I'm afraid, but this
> > > > >
> > > > > - fundamentally contradicts the logic of ffmpeg's handling of
> > hw
> > > >
> > > > acceleration,
> > > > >   hw device and hw frames contexts
> > > > > - adds code to an encoder, doing things an encoder is not
> > supposed to do-
> > > >
> > > > qsv
> > > > > encoders and decoders have their own context => QSV
> > > >
> > > > nvdec and nvenc have CUDA but nvenc can also support D3D11va, it
> > sounds make
> > > > sense for me to support D3D11va/vaapi in qsvenc too as
> > d3d11va/vaapi are
> > > > used
> > > > internally in MediaSDK.
> > >
> > > Can you please post a command line showing nvenc working with input
> > > from a D3D11VA decoder and without using any
> > hwmap/hwupload/hwdownload
> > > filters?
> > >
> >
> > According to the code below, nvenc may accept d3d11 frames directly,
> >
> > https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/nvenc.c#L46-
> > L72
> >
> > so the command below should work
> >
> > $> ffmpeg -y -hwaccel_output_format d3d11 -hwaccel d3d11va -i
> > input.mp4 -c:v
> > hevc_nvenc out.mp4
> 
> Right, it does work. Thanks for the command, I had tried like that before, but
> in a "wrong" branch.
> 
> 
> Now I took a bit of a deeper look into it and the ability of NVENC to encode
> from plain D3D11 frames. There are quite a few differences between NVENC
> and QSVENC.
> 
> 
> HW Frames Contexts
> ------------------
> 
> QSVENV
> 
> MSDK cannot work with VAAPI frames, D3D9 frames or D3D11 frames
> directly.
> An application is always required to wrap such frames via mfxSurface and
> manage a collection of mfxSurface descriptions.
> It's an abstraction that allows coding against the MSDK API independent from
> the underlying technology.
> The technical representation of this in ffmpeg is the QSVFramesContext.
> When there's an input of plain VAAPI or D3D11 frames (hw frames context),
> then it is required to derive a new QSVFramesContext from the input hw
> frames context (e.g. via hwmap filter) where the procedure of deriving
> means to set up a new QSVFramesContext which does the required wrapping
> (or "mapping" as ffmpeg calls it).
> 
> I think that the way how this logic is reflected in ffmpeg is thought out very
> well and provides a high degree of flexibility.
> 
> 
> NVENC
> 
> The situation is very different here. Nvidia provides platform independency
> not by wrapping platform-specific GPU frame types, but instead uses its own
> custom type - CUDA memory/frames. This is what decoders are outputting,
> filters are using for input/output and encoders take as input.
> 
> What I do not know, is whether it would be possible to map D3D11 frames to
> CUDA frames and vice versa. In case, that would be the preferable way IMO
> to deal with different hw frame types.
> At least this isn't implemented at this time. The only possible frames
> derivation/mapping is from and to Vulkan.
> 
> Hence I can't say whether the NVENC implementation to take D3D11 frames
> directly has been done out of laziness or whether it was the only possible
> way. In case when it wouldn't be possible to map D3D11 frames to CUDA
> frames, and only NVENC encoders would be able to process D3D11 frames,
> then it would have been the only option of course.
> 
> But in any way, it's not the same as with QSVENC, because NVENC can take
> D3D11 frames as input directly without wrapping/mapping first.
> 
> ----------------
> 
> There are more differences, but I don't want to drive it too far.
> 
> What stands at the bottom line is:
> 
> - NVENC can take D3D11 frames context directly
> - QSVENC can't - it needs to map it to a QSVFramesContext first
> 
> 
> Concluding opinion:
> 
> An encoder should not include (duplicated) code for creating a derived
> frames context.
> The same goal (= getting those command lines working that Haihao has
> posted) could be achieved by auto-inserting a hwmap filter in those cases,
> which would probably take a few lines of code only.
> 
> We don't have a precedence for auto-insertion of a hwmap filter, but we do
> that in many other cases, so that would seem to me at least an option to
> think about.
> 
> I'm curious about other opinions...
> 
> Thanks,
> softworkz

Thanks for the opinion. It makes sense to me. Looks like it's a more feasible way and will have more investigation on adding automap.

Thanks,
Tong
Wu, Tong1 June 15, 2022, 2:54 a.m. UTC | #8
> Quoting Wu, Tong1 (2022-06-08 06:47:27)
> > >
> > > Quoting Tong Wu (2022-06-07 11:22:16)
> > > > QSV encoder is able to encode frames with VAAPI or D3D11 pixel
> > > > format directly. This patch adds support for qsv encoder to accept
> > > > VAAPI and
> > > > D3D11 pixel formats as input.
> > >
> > > This looks like an ad-hoc hack to me. Encoders should not do these
> > > kinds of tricks.
> > >
> > > --
> > > Anton Khirnov
> >
> > Thanks for the comments. The MFXSurface is based on VaSurface on Linux
> > and D3D texture on Windows. Since the QSV encoder can accept
> > AV_PIX_FMT_QSV as input, it seems kind of reasonable to accept VAAPI
> > and D3D as its input. And it just may not look like a 'real' trick,
> > let's say, for example, make QSV encoder accept VULKAN format
> > directly. By adding this patch, we just want QSV encoder have more
> > input format supports like what nvenc does.
> 
> The difference with nvenc is that the nvenc API actually supports d3d
> textures directly, our encoder wrapper merely passes them through.
> 
> Your patch, on the other hand, derives a new device inside the decoder.
> The intent behind the hwcontext interface is that such operations should be
> left to the library caller, and are actually quite easy to do. So I don't see why
> is this patch really needed.
> 
> > Plus, this patch can really help the users who have hybrid transcode needs.
> 
> Could you elaborate? How would this patch be useful in this specific case.
> Why can't the callers dervice the device themselves?
> 
> --
> Anton Khirnov

It looks easier and more convenient for the users because they don't derive them manually.
But yes, I'm convinced that it may be not the work that an encoder should do. Thanks for the comments. 

Regards,
Tong
diff mbox series

Patch

diff --git a/libavcodec/qsvenc.c b/libavcodec/qsvenc.c
index 2b3b06767d..132d9ba93b 100644
--- a/libavcodec/qsvenc.c
+++ b/libavcodec/qsvenc.c
@@ -524,7 +524,9 @@  static int check_enc_param(AVCodecContext *avctx, QSVEncContext *q)
 
 static int init_video_param_jpeg(AVCodecContext *avctx, QSVEncContext *q)
 {
-    enum AVPixelFormat sw_format = avctx->pix_fmt == AV_PIX_FMT_QSV ?
+    enum AVPixelFormat sw_format = avctx->pix_fmt == AV_PIX_FMT_QSV ||
+                                   avctx->pix_fmt == AV_PIX_FMT_VAAPI ||
+                                   avctx->pix_fmt == AV_PIX_FMT_D3D11 ?
                                    avctx->sw_pix_fmt : avctx->pix_fmt;
     const AVPixFmtDescriptor *desc;
     int ret;
@@ -591,7 +593,9 @@  static int init_video_param_jpeg(AVCodecContext *avctx, QSVEncContext *q)
 
 static int init_video_param(AVCodecContext *avctx, QSVEncContext *q)
 {
-    enum AVPixelFormat sw_format = avctx->pix_fmt == AV_PIX_FMT_QSV ?
+    enum AVPixelFormat sw_format = avctx->pix_fmt == AV_PIX_FMT_QSV ||
+                                   avctx->pix_fmt == AV_PIX_FMT_VAAPI ||
+                                   avctx->pix_fmt == AV_PIX_FMT_D3D11 ?
                                    avctx->sw_pix_fmt : avctx->pix_fmt;
     const AVPixFmtDescriptor *desc;
     float quant;
@@ -1247,7 +1251,31 @@  int ff_qsv_enc_init(AVCodecContext *avctx, QSVEncContext *q)
 
     if (avctx->hw_frames_ctx) {
         AVHWFramesContext    *frames_ctx = (AVHWFramesContext*)avctx->hw_frames_ctx->data;
-        AVQSVFramesContext *frames_hwctx = frames_ctx->hwctx;
+        AVQSVFramesContext *frames_hwctx = NULL;
+
+        if (frames_ctx->format == AV_PIX_FMT_VAAPI || frames_ctx->format == AV_PIX_FMT_D3D11) {
+            AVBufferRef *derive_device_ref = NULL;
+            AVBufferRef *derive_frames_ref = NULL;
+            ret = av_hwdevice_ctx_create_derived(&derive_device_ref,
+                                                 AV_HWDEVICE_TYPE_QSV, frames_ctx->device_ref, 0);
+            if (ret < 0) {
+                av_log(avctx, AV_LOG_ERROR, "Failed to derive QSV device context: %d.\n", ret);
+                return ret;
+            }
+            ret = av_hwframe_ctx_create_derived(&derive_frames_ref,
+                                                AV_PIX_FMT_QSV, derive_device_ref, avctx->hw_frames_ctx, 0);
+            if (ret < 0) {
+                av_log(avctx, AV_LOG_ERROR, "Failed to derive QSV frames context: %d.\n", ret);
+                av_buffer_unref(&derive_device_ref);
+                return ret;
+            }
+            av_buffer_unref(&avctx->hw_device_ctx);
+            avctx->hw_device_ctx = derive_device_ref;
+            av_buffer_unref(&avctx->hw_frames_ctx);
+            avctx->hw_frames_ctx = derive_frames_ref;
+            frames_ctx = (AVHWFramesContext*)avctx->hw_frames_ctx->data;
+        }
+        frames_hwctx = frames_ctx->hwctx;
 
         if (!iopattern) {
             if (frames_hwctx->frame_type & MFX_MEMTYPE_OPAQUE_FRAME)
@@ -1437,10 +1465,25 @@  static int submit_frame(QSVEncContext *q, const AVFrame *frame,
     if (ret < 0)
         return ret;
 
-    if (frame->format == AV_PIX_FMT_QSV) {
-        ret = av_frame_ref(qf->frame, frame);
-        if (ret < 0)
-            return ret;
+    if (frame->format == AV_PIX_FMT_QSV || frame->format == AV_PIX_FMT_VAAPI || frame->format == AV_PIX_FMT_D3D11) {
+        if (frame->format == AV_PIX_FMT_QSV) {
+            ret = av_frame_ref(qf->frame, frame);
+            if (ret < 0)
+                return ret;
+        } else {
+            qf->frame->format = AV_PIX_FMT_QSV;
+            qf->frame->hw_frames_ctx = av_buffer_ref(q->avctx->hw_frames_ctx);
+            if (!qf->frame->hw_frames_ctx)
+                return AVERROR(ENOMEM);
+            ret = av_hwframe_map(qf->frame, frame, 0);
+            if (ret < 0) {
+                av_log(q->avctx, AV_LOG_ERROR, "Failed to map to QSV frames\n");
+                return ret;
+            }
+            ret = av_frame_copy_props(qf->frame, frame);
+            if (ret < 0)
+                return ret;
+        }
 
         qf->surface = *(mfxFrameSurface1*)qf->frame->data[3];
 
@@ -1735,6 +1778,8 @@  int ff_qsv_enc_close(AVCodecContext *avctx, QSVEncContext *q)
 
 const AVCodecHWConfigInternal *const ff_qsv_enc_hw_configs[] = {
     HW_CONFIG_ENCODER_FRAMES(QSV,  QSV),
+    HW_CONFIG_ENCODER_FRAMES(VAAPI,VAAPI),
+    HW_CONFIG_ENCODER_FRAMES(D3D11,D3D11VA),
     HW_CONFIG_ENCODER_DEVICE(NV12, QSV),
     HW_CONFIG_ENCODER_DEVICE(P010, QSV),
     NULL,
diff --git a/libavcodec/qsvenc_h264.c b/libavcodec/qsvenc_h264.c
index cf77ea575b..93ba8d8ded 100644
--- a/libavcodec/qsvenc_h264.c
+++ b/libavcodec/qsvenc_h264.c
@@ -196,6 +196,8 @@  const FFCodec ff_h264_qsv_encoder = {
     .p.pix_fmts     = (const enum AVPixelFormat[]){ AV_PIX_FMT_NV12,
                                                     AV_PIX_FMT_P010,
                                                     AV_PIX_FMT_QSV,
+                                                    AV_PIX_FMT_VAAPI,
+                                                    AV_PIX_FMT_D3D11,
                                                     AV_PIX_FMT_NONE },
     .p.priv_class   = &class,
     .defaults       = qsv_enc_defaults,
diff --git a/libavcodec/qsvenc_hevc.c b/libavcodec/qsvenc_hevc.c
index a6bf39c148..63b6ad9150 100644
--- a/libavcodec/qsvenc_hevc.c
+++ b/libavcodec/qsvenc_hevc.c
@@ -309,6 +309,8 @@  const FFCodec ff_hevc_qsv_encoder = {
                                                     AV_PIX_FMT_YUYV422,
                                                     AV_PIX_FMT_Y210,
                                                     AV_PIX_FMT_QSV,
+                                                    AV_PIX_FMT_VAAPI,
+                                                    AV_PIX_FMT_D3D11,
                                                     AV_PIX_FMT_BGRA,
                                                     AV_PIX_FMT_X2RGB10,
                                                     AV_PIX_FMT_NONE },
diff --git a/libavcodec/qsvenc_jpeg.c b/libavcodec/qsvenc_jpeg.c
index 825eb8dc06..5b7611bb85 100644
--- a/libavcodec/qsvenc_jpeg.c
+++ b/libavcodec/qsvenc_jpeg.c
@@ -91,6 +91,8 @@  const FFCodec ff_mjpeg_qsv_encoder = {
     .p.capabilities = AV_CODEC_CAP_DELAY | AV_CODEC_CAP_HYBRID,
     .p.pix_fmts     = (const enum AVPixelFormat[]){ AV_PIX_FMT_NV12,
                                                     AV_PIX_FMT_QSV,
+                                                    AV_PIX_FMT_VAAPI,
+                                                    AV_PIX_FMT_D3D11,
                                                     AV_PIX_FMT_NONE },
     .p.priv_class   = &class,
     .defaults       = qsv_enc_defaults,
diff --git a/libavcodec/qsvenc_mpeg2.c b/libavcodec/qsvenc_mpeg2.c
index 5cb12a2582..cba4001ee1 100644
--- a/libavcodec/qsvenc_mpeg2.c
+++ b/libavcodec/qsvenc_mpeg2.c
@@ -105,6 +105,8 @@  const FFCodec ff_mpeg2_qsv_encoder = {
     .p.capabilities = AV_CODEC_CAP_DELAY | AV_CODEC_CAP_HYBRID,
     .p.pix_fmts     = (const enum AVPixelFormat[]){ AV_PIX_FMT_NV12,
                                                     AV_PIX_FMT_QSV,
+                                                    AV_PIX_FMT_VAAPI,
+                                                    AV_PIX_FMT_D3D11,
                                                     AV_PIX_FMT_NONE },
     .p.priv_class   = &class,
     .defaults       = qsv_enc_defaults,
diff --git a/libavcodec/qsvenc_vp9.c b/libavcodec/qsvenc_vp9.c
index 4b2a6ce77f..2825b98a4a 100644
--- a/libavcodec/qsvenc_vp9.c
+++ b/libavcodec/qsvenc_vp9.c
@@ -115,6 +115,8 @@  const FFCodec ff_vp9_qsv_encoder = {
     .p.pix_fmts     = (const enum AVPixelFormat[]){ AV_PIX_FMT_NV12,
                                                     AV_PIX_FMT_P010,
                                                     AV_PIX_FMT_QSV,
+                                                    AV_PIX_FMT_VAAPI,
+                                                    AV_PIX_FMT_D3D11,
                                                     AV_PIX_FMT_NONE },
     .p.priv_class   = &class,
     .defaults       = qsv_enc_defaults,