diff mbox series

[FFmpeg-devel] avcodec/nvdec: support resizing while decoding

Message ID CAOSt7Dm_2vmvPRpj3EeX1RRpu6+tH7Z56Bg--Z+vZU5QivccmA@mail.gmail.com
State New
Headers show
Series [FFmpeg-devel] avcodec/nvdec: support resizing while decoding | expand

Checks

Context Check Description
yinshiyou/configure_loongarch64 warning Failed to apply patch
andriy/configure_x86 warning Failed to apply patch

Commit Message

Carlos Ruiz Sept. 19, 2024, 11:16 p.m. UTC
Hi!

This is my first contribution to the project so please excuse any bad
etiquette, I tried to read all the FAQs before posting. Would love to start
by thanking everyone for such an amazing framework you've built!

Anyway, here's my proposed patch to support video resizing when using NVDEC
hwaccel to decode hevc video (I could look into a similar patch for h264,
av1, etc if this looks useful). There's a bit more context/explanation in
the commit description in the patch, but please let me know if the use case
isn't clear.

I tested locally and all of these 4 scenarios work as expected:
 * Using hevc codec with nvdec hwaccel, leaving avctx->width and
avctx->height unset. On a 1920x1080 input video, I get 1920x1080 cuda
frames out.
 * Using hevc codec with nvdec hwaccel, setting avctx->width and
avctx->height to some arbitrary value (e.g. 640x360). On the same input
video, I get 640x360 cuda frames out.
 * Using hevc codec without hwaccel, leaving avctx->width and avctx->height
unset. I get 1920x1080 yuvj420p frames (in cpu) out.
 * Using hevc codec without hwaccel, setting avctx->width and avctx->height
to some arbitrary value (e.g. 640x360). The values get ignored (as in
FFMpeg master) and I again get 1920x1080 yuvj420p frames out.

I'm not extremely familiar with hevcdec.c so I'm not sure if this would
accidentally break something else. Looking forward to hearing your thoughts!


From 850afda5f6479064c75a4b905f12e48f97b6d551 Mon Sep 17 00:00:00 2001
From: Carlos Ruiz <carlos.r.domin@gmail.com>
Date: Thu, 19 Sep 2024 14:00:05 +0200
Subject: [PATCH] avcodec/nvdec: support resizing while decoding

Nvidia chips support accelerated resizing while decoding video. The *_cuvid
codecs (cuviddec.c) already support resizing and cropping, but have two big
downsides:
  1) they have a minimum latency of two packets (even with the LOW_DELAY
flag enabled)
  2) AV_CODEC_FLAG_COPY_OPAQUE is not respected (opaque and opaque_ref
aren't transferred from packets to frames)

Instead, parsing the video using a non-accelerated codec (hevcdec.c) solves
both downsides above. This commit brings resizing capabilities to the
*_nvdec hwaccel, similar to what *_cuvid does, to combine the best of both
worlds (proper parsing + accelerated decoding and resizing).
---
 libavcodec/hevc/hevcdec.c |  8 ++++++--
 libavcodec/nvdec.c        | 21 +++++++++++++++++----
 2 files changed, 23 insertions(+), 6 deletions(-)

+
 int ff_nvdec_decode_init(AVCodecContext *avctx)
 {
     NVDECContext *ctx = avctx->internal->hwaccel_priv_data;
@@ -393,8 +405,9 @@ int ff_nvdec_decode_init(AVCodecContext *avctx)

     params.ulWidth             = avctx->coded_width;
     params.ulHeight            = avctx->coded_height;
-    params.ulTargetWidth       = avctx->coded_width;
-    params.ulTargetHeight      = avctx->coded_height;
+    avctx->get_buffer2         = get_buffer2;
+    params.ulTargetWidth       = avctx->width;
+    params.ulTargetHeight      = avctx->height;
     params.bitDepthMinus8      = sw_desc->comp[0].depth - 8;
     params.OutputFormat        = output_format;
     params.CodecType           = cuvid_codec_type;
@@ -719,8 +732,8 @@ int ff_nvdec_frame_params(AVCodecContext *avctx,
     chroma_444 = supports_444 && cuvid_chroma_format ==
cudaVideoChromaFormat_444;

     frames_ctx->format            = AV_PIX_FMT_CUDA;
-    frames_ctx->width             = (avctx->coded_width + 1) & ~1;
-    frames_ctx->height            = (avctx->coded_height + 1) & ~1;
+    frames_ctx->width             = (avctx->width + 1) & ~1;
+    frames_ctx->height            = (avctx->height + 1) & ~1;
     /*
      * We add two extra frames to the pool to account for deinterlacing
filters
      * holding onto their frames.
--
2.43.0

Comments

Hendrik Leppkes Sept. 20, 2024, 5:18 a.m. UTC | #1
On Fri, Sep 20, 2024 at 1:24 AM Carlos Ruiz <carlos.r.domin@gmail.com> wrote:
>
> Hi!
>
> This is my first contribution to the project so please excuse any bad
> etiquette, I tried to read all the FAQs before posting. Would love to start
> by thanking everyone for such an amazing framework you've built!
>
> Anyway, here's my proposed patch to support video resizing when using NVDEC
> hwaccel to decode hevc video (I could look into a similar patch for h264,
> av1, etc if this looks useful). There's a bit more context/explanation in
> the commit description in the patch, but please let me know if the use case
> isn't clear.
>

We don't really leverage these extra functions of NVDEC because it
breaks many assumptions about hwaccels, which are meant to be exact
decoders.
If anything, just fudging the width/height is certainly an API
violation and will likely not be possible as it breaks many
assumptions in the code otherwise, see below.

> ---
>  libavcodec/hevc/hevcdec.c |  8 ++++++--
>  libavcodec/nvdec.c        | 21 +++++++++++++++++----
>  2 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/libavcodec/hevc/hevcdec.c b/libavcodec/hevc/hevcdec.c
> index d915d74d22..d63fc5875f 100644
> --- a/libavcodec/hevc/hevcdec.c
> +++ b/libavcodec/hevc/hevcdec.c
> @@ -351,8 +351,12 @@ static void export_stream_params(HEVCContext *s, const
> HEVCSPS *sps)
>      avctx->pix_fmt             = sps->pix_fmt;
>      avctx->coded_width         = sps->width;
>      avctx->coded_height        = sps->height;
> -    avctx->width               = sps->width  - ow->left_offset -
> ow->right_offset;
> -    avctx->height              = sps->height - ow->top_offset  -
> ow->bottom_offset;
> +    if (avctx->width <= 0 || avctx->height <= 0) {
> +        avctx->width           = sps->width;
> +        avctx->height          = sps->height;
> +    }
> +    avctx->width               = avctx->width - ow->left_offset - ow->right_offset;
> +    avctx->height              = avctx->height - ow->top_offset  - ow->bottom_offset;

You cannot do that here. The frame size can change mid-stream, and
this would suppress any such change.
Additionally, if this code runs more then once, then the offset is
applied repeatedly without actually resetting width/height.

- Hendrik
Carlos Ruiz Sept. 23, 2024, 11:56 a.m. UTC | #2
> We don't really leverage these extra functions of NVDEC because it
> breaks many assumptions about hwaccels, which are meant to be exact
> decoders.

Yeah I understand that, and expected this kind of feedback. Do you envision
a
way for hwaccels (perhaps in the future) to support some additional level of
*hardware acceleration*, such as resizing and cropping? Let me try to
motivate
the rationale.

Think about a common use case in the AI world we live in: you receive a
bunch of
simultaneous 4k (or 1080p) incoming rtsp streams and you want to decode the
video
and pass it through some ML model. The native hevc codec doesn't support
resizing,
so you decode video at full 4k on the gpu by leveraging the nvdec hwaccel,
which
allocates something like 5-10 surfaces at 3840x2160, totalling around 250MB
of VRAM
(GPU memory), and then you have to immediately take all of those frames,
pass them
through a filterchain, scale them down to e.g. 640x360. Leaving aside the
waste of
additional CUDA cores and added latency of having to synchronize the cuda
stream twice
(once to receive the frame out of the decoder, the second to pull from the
filterchain),
do this for 50 camera streams and you'll quickly run out of GPU memory with
a GPU utilization
under 10%... Instead, my patch allows decoding at 640x360 (or whatever you
choose)
directly, allocating only 6MB of GPU VRAM (instead of 250MB at 4k), so now
you can fit
40x more video decoding streams in the same GPU card.

I understand the challenges of allowing hwaccels to resize/crop as part of
the decoding
process and how it currently breaks the API in several ways I'm of course
unfamiliar with.
However, I'd love to help support this feature in the future, since it
would add tremendous
value and enable more efficient real-time Computer Vision pipelines to
leverage the
amazing framework FFmpeg is.


> You cannot do that here. The frame size can change mid-stream, and
> this would suppress any such change.
> Additionally, if this code runs more then once, then the offset is
> applied repeatedly without actually resetting width/height.

Very good points! Indeed I'm not yet super familiar with the structure of
the hevc codec implementation, nor with the available functions to
reconfigure
a hwaccel (I know the Video Codec SDK supports this, but have never
implemented such functionality), but I see how this would break.
Would you see value in me trying to figure out a way to support
reconfiguring
the codec if the width/height/cropping changes mid-stream or would you
reject
the patch/feature regardless?

Thanks!

On Fri, Sep 20, 2024 at 7:18 AM Hendrik Leppkes <h.leppkes@gmail.com> wrote:

> On Fri, Sep 20, 2024 at 1:24 AM Carlos Ruiz <carlos.r.domin@gmail.com>
> wrote:
> >
> > Hi!
> >
> > This is my first contribution to the project so please excuse any bad
> > etiquette, I tried to read all the FAQs before posting. Would love to
> start
> > by thanking everyone for such an amazing framework you've built!
> >
> > Anyway, here's my proposed patch to support video resizing when using
> NVDEC
> > hwaccel to decode hevc video (I could look into a similar patch for h264,
> > av1, etc if this looks useful). There's a bit more context/explanation in
> > the commit description in the patch, but please let me know if the use
> case
> > isn't clear.
> >
>
> We don't really leverage these extra functions of NVDEC because it
> breaks many assumptions about hwaccels, which are meant to be exact
> decoders.
> If anything, just fudging the width/height is certainly an API
> violation and will likely not be possible as it breaks many
> assumptions in the code otherwise, see below.
>
> > ---
> >  libavcodec/hevc/hevcdec.c |  8 ++++++--
> >  libavcodec/nvdec.c        | 21 +++++++++++++++++----
> >  2 files changed, 23 insertions(+), 6 deletions(-)
> >
> > diff --git a/libavcodec/hevc/hevcdec.c b/libavcodec/hevc/hevcdec.c
> > index d915d74d22..d63fc5875f 100644
> > --- a/libavcodec/hevc/hevcdec.c
> > +++ b/libavcodec/hevc/hevcdec.c
> > @@ -351,8 +351,12 @@ static void export_stream_params(HEVCContext *s,
> const
> > HEVCSPS *sps)
> >      avctx->pix_fmt             = sps->pix_fmt;
> >      avctx->coded_width         = sps->width;
> >      avctx->coded_height        = sps->height;
> > -    avctx->width               = sps->width  - ow->left_offset -
> > ow->right_offset;
> > -    avctx->height              = sps->height - ow->top_offset  -
> > ow->bottom_offset;
> > +    if (avctx->width <= 0 || avctx->height <= 0) {
> > +        avctx->width           = sps->width;
> > +        avctx->height          = sps->height;
> > +    }
> > +    avctx->width               = avctx->width - ow->left_offset -
> ow->right_offset;
> > +    avctx->height              = avctx->height - ow->top_offset  -
> ow->bottom_offset;
>
> You cannot do that here. The frame size can change mid-stream, and
> this would suppress any such change.
> Additionally, if this code runs more then once, then the offset is
> applied repeatedly without actually resetting width/height.
>
> - Hendrik
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
diff mbox series

Patch

diff --git a/libavcodec/hevc/hevcdec.c b/libavcodec/hevc/hevcdec.c
index d915d74d22..d63fc5875f 100644
--- a/libavcodec/hevc/hevcdec.c
+++ b/libavcodec/hevc/hevcdec.c
@@ -351,8 +351,12 @@  static void export_stream_params(HEVCContext *s, const
HEVCSPS *sps)
     avctx->pix_fmt             = sps->pix_fmt;
     avctx->coded_width         = sps->width;
     avctx->coded_height        = sps->height;
-    avctx->width               = sps->width  - ow->left_offset -
ow->right_offset;
-    avctx->height              = sps->height - ow->top_offset  -
ow->bottom_offset;
+    if (avctx->width <= 0 || avctx->height <= 0) {
+        avctx->width           = sps->width;
+        avctx->height          = sps->height;
+    }
+    avctx->width               = avctx->width - ow->left_offset -
ow->right_offset;
+    avctx->height              = avctx->height - ow->top_offset  -
ow->bottom_offset;
     avctx->has_b_frames        = sps->temporal_layer[sps->max_sub_layers -
1].num_reorder_pics;
     avctx->profile             = sps->ptl.general_ptl.profile_idc;
     avctx->level               = sps->ptl.general_ptl.level_idc;
diff --git a/libavcodec/nvdec.c b/libavcodec/nvdec.c
index 932544564a..86143de74c 100644
--- a/libavcodec/nvdec.c
+++ b/libavcodec/nvdec.c
@@ -324,6 +324,18 @@  static int nvdec_init_hwframes(AVCodecContext *avctx,
AVBufferRef **out_frames_r
     return 0;
 }

+static int get_buffer2(AVCodecContext *avctx, AVFrame *frame, int flags) {
+    /*
+     * HEVC codec includes FF_CODEC_CAP_EXPORTS_CROPPING in its
caps_internal, so by default frames will be set
+     * to width=avctx->coded_width and height=avctx->coded_height. Now
that we support resizing as part of decoding,
+     * overwrite the frame dimensions with display values rather than
coded.
+     */
+    int ret = avcodec_default_get_buffer2(avctx, frame, flags);
+    frame->width = avctx->width;
+    frame->height = avctx->height;
+    return ret;
+}