[FFmpeg-devel] libavfilter: scale_cuda filter adds dynamic command values

Submitted by msanders on Nov. 26, 2018, 6:09 p.m.

Details

Message ID gNrIzrq1ridEUTA8-CamdmmHVYnEFcaGttFi02Ts4RDvu1xAOUV_qUbhDag0ntBDC6t7NzO6CbbGjuk9ayuWgf_paMeJT4H6XNXiokn6JNM=@protonmail.com
State New
Headers show

Commit Message

msanders Nov. 26, 2018, 6:09 p.m.
Hi,

This patch adds command support for dynamic change the size in the “scale_cuda” resize filter. In fact, it’s the first GPU filter accepting realtime commands. Using similar changes it’s possible to port it to other hwaccelerators. The only limitation is that the values cannot exceed the initial values. It is therefore necessary to set up the graph with the higher values you may need.

One example: { -filter_complex "scale_cuda=720:576,hwdownload,format=nv12,zmq" }
And then you can use the <c> or ZMQ messages to change the width and/or height.

Warning: This patch requires, to have sense, to apply too this other patch that fixes the hwdownload filter.
https://patchwork.ffmpeg.org/patch/11165/
From 46c58c1da02ba40c1fab0ec93febc995db90caca Mon Sep 17 00:00:00 2001
From: M. Sanders <hidden at gmail.com>
Date: Mon, 26 Nov 2018 17:52:52 +0000
Subject: [PATCH 2/2] libavfilter: scale_cuda dynamic command values

---
 libavfilter/vf_scale_cuda.c |   72 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 62 insertions(+), 10 deletions(-)

Comments

Timo Rothenpieler Nov. 26, 2018, 7:10 p.m.
On 26.11.2018 19:09, msanders wrote:
> Hi,
> 
> This patch adds command support for dynamic change the size in the “scale_cuda” resize filter. In fact, it’s the first GPU filter accepting realtime commands. Using similar changes it’s possible to port it to other hwaccelerators. The only limitation is that the values cannot exceed the initial values. It is therefore necessary to set up the graph with the higher values you may need.
> 
> One example: { -filter_complex "scale_cuda=720:576,hwdownload,format=nv12,zmq" }
> And then you can use the <c> or ZMQ messages to change the width and/or height.
> 
> Warning: This patch requires, to have sense, to apply too this other patch that fixes the hwdownload filter.
> https://patchwork.ffmpeg.org/patch/11165/
> 

I'm not sure if this is such a good idea. There's a lot of places all 
over the codebase that have a hardcoded assumption about how hardware 
frames in general, and CUDA frames in particular work.

A lot of code checks the width/height from the hwframes ctx instead of 
the frame itself, because it needs the real size(1088 for 1080p for 
example) of the underlying buffer at all times.
So those consumers would straight up ignore any scaling done to the 
frames, reading messed up data instead.

On top of that, in the specific case of CUDA, the CUDA pix_fmt is 
defined with the assumption that the entire frame is in a single 
continuous buffer with all planes right after one another. This would 
most prominently affect nvenc, as its API only takes one lone CUdevptr 
as input, and then has a fixed idea about how the data behind it looks.

So it would produce output with random data to the right and bottom of 
the scaled frame, still with the outer dimensions before the 
re-configuration.

The only way this could possibly work is if a new hw_frames_ctx is 
created on reconfiguration.
With nvenc this would actually work without any changes, as it re-reads 
the width/height out of it on every frame already, and initially only 
gets the CUDA-Context and sw_pix_fmt out of it, so those would need to 
stay the same, which isn't an issue.
But for a bunch of other hardware filters more work is needed. Specially 
as some parts overlap with other APIs.
msanders Nov. 26, 2018, 7:35 p.m.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, 26 de November de 2018 20:10, Timo Rothenpieler <timo@rothenpieler.org> wrote:

> On 26.11.2018 19:09, msanders wrote:
>
> > Hi,
> > This patch adds command support for dynamic change the size in the “scale_cuda” resize filter. In fact, it’s the first GPU filter accepting realtime commands. Using similar changes it’s possible to port it to other hwaccelerators. The only limitation is that the values cannot exceed the initial values. It is therefore necessary to set up the graph with the higher values you may need.
> > One example: { -filter_complex "scale_cuda=720:576,hwdownload,format=nv12,zmq" }
> > And then you can use the <c> or ZMQ messages to change the width and/or height.
> > Warning: This patch requires, to have sense, to apply too this other patch that fixes the hwdownload filter.
> > https://patchwork.ffmpeg.org/patch/11165/
>
> I'm not sure if this is such a good idea. There's a lot of places all
> over the codebase that have a hardcoded assumption about how hardware
> frames in general, and CUDA frames in particular work.
>
I think it's a good idea because: I implemented it and I'm using it. In addition, it works!

For sure you can't change the HW frames, only the "size" of images in the HW frames. That's all.


> A lot of code checks the width/height from the hwframes ctx instead of
> the frame itself, because it needs the real size(1088 for 1080p for
> example) of the underlying buffer at all times.
> So those consumers would straight up ignore any scaling done to the
> frames, reading messed up data instead.
>
For all checks that I have done, all works as expected.
Only the "hwdownload" has some troubles. But I also provided the patch that solves it.


> On top of that, in the specific case of CUDA, the CUDA pix_fmt is
> defined with the assumption that the entire frame is in a single
> continuous buffer with all planes right after one another. This would
> most prominently affect nvenc, as its API only takes one lone CUdevptr
> as input, and then has a fixed idea about how the data behind it looks.
>
I take account of this in the code. See the part where I always use the original (context) boundaries for secondary planes.


> So it would produce output with random data to the right and bottom of
> the scaled frame, still with the outer dimensions before the
> re-configuration.
>
No. The output is always close to the top (begining). No other alignement is allowed. Nothing random at all.


> The only way this could possibly work is if a new hw_frames_ctx is
> created on reconfiguration.

No. This isn't viable. If you recreate the hw_context then the next HW filters fails.
Please, belive me. I've already tried it and it doesn't work.

> With nvenc this would actually work without any changes, as it re-reads
> the width/height out of it on every frame already, and initially only
> gets the CUDA-Context and sw_pix_fmt out of it, so those would need to
> stay the same, which isn't an issue.

Why you like to "resize" before the "encoder"? This will produce a bitstream with size changes. Not a good idea.

Think on what does the current "vf_scale"... that already supports live size changes: It's useful when you're doing some filtering/processing. In such situation it's good that the "scale_cuda" does equal to "scale" (aka dynamic commands). Then you can do more work inside the GPU before downloading frames to RAM (remember that with my patch for "hwdonwload" it works). Only think in the wasted CPU consumed for a simple "scale" instead of "scale_cuda" when you need "live size changes".

> But for a bunch of other hardware filters more work is needed. Specially
> as some parts overlap with other APIs.

I don't see the problem. Please explain it better.

Regards.

Patch hide | download patch | download mbox

diff --git a/libavfilter/vf_scale_cuda.c b/libavfilter/vf_scale_cuda.c
index 53b7aa9..ac8f8cc 100644
--- a/libavfilter/vf_scale_cuda.c
+++ b/libavfilter/vf_scale_cuda.c
@@ -78,6 +78,8 @@  typedef struct CUDAScaleContext {
 
     char *w_expr;               ///< width  expression string
     char *h_expr;               ///< height expression string
+    int target_width;
+    int target_height;
 
     CUcontext   cu_ctx;
     CUevent     cu_event;
@@ -314,9 +316,22 @@  static av_cold int cudascale_config_props(AVFilterLink *outlink)
     outlink->w = w;
     outlink->h = h;
 
-    ret = init_processing_chain(ctx, inlink->w, inlink->h, w, h);
-    if (ret < 0)
-        return ret;
+    if (!s->frames_ctx) {
+        ret = init_processing_chain(ctx, inlink->w, inlink->h, w, h);
+        if (ret < 0)
+            return ret;
+    } else {
+        // We are reconfiguring, so reuse the current hw_context
+        av_log(s, AV_LOG_DEBUG, "Reusing the current hw_context.\n");
+        if (s->planes_out && outlink->w > s->planes_out[0].width) {
+            outlink->w = s->planes_out[0].width;
+            av_log(s, AV_LOG_ERROR, "Impossible to resize above the Context width.\n");
+        }
+        if (s->planes_out && outlink->h > s->planes_out[0].height) {
+            outlink->h = s->planes_out[0].height;
+            av_log(s, AV_LOG_ERROR, "Impossible to resize above the Context height.\n");
+        }
+    }
 
     av_log(ctx, AV_LOG_VERBOSE, "w:%d h:%d -> w:%d h:%d\n",
            inlink->w, inlink->h, outlink->w, outlink->h);
@@ -329,6 +344,9 @@  static av_cold int cudascale_config_props(AVFilterLink *outlink)
         outlink->sample_aspect_ratio = inlink->sample_aspect_ratio;
     }
 
+    s->target_width =  outlink->w;
+    s->target_height = outlink->h;
+
     return 0;
 
 fail:
@@ -375,11 +393,11 @@  static int scalecuda_resize(AVFilterContext *ctx,
                            1);
         call_resize_kernel(s, s->cu_func_uchar, s->cu_tex_uchar, 1,
                            in->data[0]+in->linesize[0]*in->height, in->width/2, in->height/2, in->linesize[0]/2,
-                           out->data[0]+out->linesize[0]*out->height, out->width/2, out->height/2, out->linesize[0]/2,
+                           out->data[0]+out->linesize[0]*s->planes_out[0].height, out->width/2, out->height/2, out->linesize[0]/2,
                            1);
         call_resize_kernel(s, s->cu_func_uchar, s->cu_tex_uchar, 1,
                            in->data[0]+ ALIGN_UP((in->linesize[0]*in->height*5)/4, s->tex_alignment), in->width/2, in->height/2, in->linesize[0]/2,
-                           out->data[0]+(out->linesize[0]*out->height*5)/4, out->width/2, out->height/2, out->linesize[0]/2,
+                           out->data[0]+(out->linesize[0]*s->planes_out[0].height*5)/4, out->width/2, out->height/2, out->linesize[0]/2,
                            1);
         break;
     case AV_PIX_FMT_YUV444P:
@@ -389,11 +407,11 @@  static int scalecuda_resize(AVFilterContext *ctx,
                            1);
         call_resize_kernel(s, s->cu_func_uchar, s->cu_tex_uchar, 1,
                            in->data[0]+in->linesize[0]*in->height, in->width, in->height, in->linesize[0],
-                           out->data[0]+out->linesize[0]*out->height, out->width, out->height, out->linesize[0],
+                           out->data[0]+out->linesize[0]*s->planes_out[0].height, out->width, out->height, out->linesize[0],
                            1);
         call_resize_kernel(s, s->cu_func_uchar, s->cu_tex_uchar, 1,
                            in->data[0]+in->linesize[0]*in->height*2, in->width, in->height, in->linesize[0],
-                           out->data[0]+out->linesize[0]*out->height*2, out->width, out->height, out->linesize[0],
+                           out->data[0]+out->linesize[0]*s->planes_out[0].height*2, out->width, out->height, out->linesize[0],
                            1);
         break;
     case AV_PIX_FMT_NV12:
@@ -401,9 +419,10 @@  static int scalecuda_resize(AVFilterContext *ctx,
                            in->data[0], in->width, in->height, in->linesize[0],
                            out->data[0], out->width, out->height, out->linesize[0],
                            1);
+
         call_resize_kernel(s, s->cu_func_uchar2, s->cu_tex_uchar2, 2,
                            in->data[1], in->width/2, in->height/2, in->linesize[1],
-                           out->data[0] + out->linesize[0] * ((out->height + 31) & ~0x1f), out->width/2, out->height/2, out->linesize[1]/2,
+                           out->data[0] + out->linesize[0] * ((s->planes_out[0].height + 31) & ~0x1f), out->width/2, out->height/2, out->linesize[1]/2,
                            1);
         break;
     case AV_PIX_FMT_P010LE:
@@ -413,7 +432,7 @@  static int scalecuda_resize(AVFilterContext *ctx,
                            2);
         call_resize_kernel(s, s->cu_func_ushort2, s->cu_tex_ushort2, 2,
                            in->data[1], in->width / 2, in->height / 2, in->linesize[1]/2,
-                           out->data[0] + out->linesize[0] * ((out->height + 31) & ~0x1f), out->width / 2, out->height / 2, out->linesize[1] / 4,
+                           out->data[0] + out->linesize[0] * ((s->planes_out[0].height + 31) & ~0x1f), out->width / 2, out->height / 2, out->linesize[1] / 4,
                            2);
         break;
     case AV_PIX_FMT_P016LE:
@@ -423,7 +442,7 @@  static int scalecuda_resize(AVFilterContext *ctx,
                            2);
         call_resize_kernel(s, s->cu_func_ushort2, s->cu_tex_ushort2, 2,
                            in->data[1], in->width / 2, in->height / 2, in->linesize[1] / 2,
-                           out->data[0] + out->linesize[0] * ((out->height + 31) & ~0x1f), out->width / 2, out->height / 2, out->linesize[1] / 4,
+                           out->data[0] + out->linesize[0] * ((s->planes_out[0].height + 31) & ~0x1f), out->width / 2, out->height / 2, out->linesize[1] / 4,
                            2);
         break;
     default:
@@ -439,6 +458,11 @@  static int cudascale_scale(AVFilterContext *ctx, AVFrame *out, AVFrame *in)
     AVFrame *src = in;
     int ret;
 
+    out->width  = s->target_width;
+    out->height = s->target_height;
+    s->frame->width  = out->width;
+    s->frame->height = out->height;
+
     ret = scalecuda_resize(ctx, s->frame, src);
     if (ret < 0)
         return ret;
@@ -499,6 +523,33 @@  fail:
     return ret;
 }
 
+static int process_command(AVFilterContext *ctx, const char *cmd, const char *args,
+                           char *res, int res_len, int flags)
+{
+    CUDAScaleContext *s  = ctx->priv;
+    int ret;
+
+    if (   !strcmp(cmd, "width")  || !strcmp(cmd, "w")
+        || !strcmp(cmd, "height") || !strcmp(cmd, "h")) {
+
+        int old_w = s->target_width;
+        int old_h = s->target_height;
+        AVFilterLink *outlink = ctx->outputs[0];
+
+        av_opt_set(s, cmd, args, 0);
+        if ((ret = cudascale_config_props(outlink)) < 0) {
+            s->target_width  = old_w;
+            s->target_height = old_h;
+        } else {
+            av_log(ctx, AV_LOG_VERBOSE, "Reconfiguration w:%d h:%d -> w:%d h:%d\n",
+                   ctx->inputs[0]->w, ctx->inputs[0]->h, ctx->outputs[0]->w, ctx->outputs[0]->h);
+        }
+    } else
+        ret = AVERROR(ENOSYS);
+
+    return ret;
+}
+
 #define OFFSET(x) offsetof(CUDAScaleContext, x)
 #define FLAGS (AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM)
 static const AVOption options[] = {
@@ -545,6 +596,7 @@  AVFilter ff_vf_scale_cuda = {
 
     .inputs    = cudascale_inputs,
     .outputs   = cudascale_outputs,
+    .process_command = process_command,
 
     .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
 };