[FFmpeg-devel] (for discussion): cuvid: allow to crop and resize in decoder

Submitted by Miroslav Slugeň on Feb. 12, 2017, 7:51 p.m.

Details

Message ID 58A0BCA5.7030009@email.cz
State New
Headers show

Commit Message

Miroslav Slugeň Feb. 12, 2017, 7:51 p.m.
This patch is for discussion only, not ready to commit yet.

1. Cuvid decoder actualy support scaling input to requested resolution 
without any performance penalty (like libnpp does), so this patch is 
proof of concept that it is working like expected.

2. Cuvid also supports cropping, but from tests only in 4px steps, this 
is also very nice, because hwaccel cuvid hasn't any cropping filter.

Anybody feel free to adopt this patch and modify it for final commit.

Comments

Hendrik Leppkes Feb. 12, 2017, 7:59 p.m.
On Sun, Feb 12, 2017 at 8:51 PM, Miroslav Slugeň <thunder.m@email.cz> wrote:
> This patch is for discussion only, not ready to commit yet.
>
> 1. Cuvid decoder actualy support scaling input to requested resolution
> without any performance penalty (like libnpp does), so this patch is proof
> of concept that it is working like expected.
>

I don't think scaling is something a decoder should be doing, we don't
really want all sorts of video processing jumbled up into one
monolithic cuvid thing, but rather keep tasks separated.

- Hendrik
Miroslav Slugeň Feb. 12, 2017, 8:07 p.m.
Dne 12.2.2017 v 20:59 Hendrik Leppkes napsal(a):
> On Sun, Feb 12, 2017 at 8:51 PM, Miroslav Slugeň <thunder.m@email.cz> wrote:
>> This patch is for discussion only, not ready to commit yet.
>>
>> 1. Cuvid decoder actualy support scaling input to requested resolution
>> without any performance penalty (like libnpp does), so this patch is proof
>> of concept that it is working like expected.
>>
> I don't think scaling is something a decoder should be doing, we don't
> really want all sorts of video processing jumbled up into one
> monolithic cuvid thing, but rather keep tasks separated.
>
> - Hendrik
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Yes, but when you transcoding from FHD or 4K to SD quality it could save 
alotof GPU resources.

We have one example where "ONE" Quadro P5000 (2xNVENC) is downscaling 
about 74 FHD streams to SD at realtime.

I know it is not something that is acceptable in current ffmpeg, maybe 
libav could adopt this patch.

M.
wm4 Feb. 13, 2017, 4:03 a.m.
On Sun, 12 Feb 2017 21:07:40 +0100
Miroslav Slugeň <thunder.m@email.cz> wrote:

> Dne 12.2.2017 v 20:59 Hendrik Leppkes napsal(a):
> > On Sun, Feb 12, 2017 at 8:51 PM, Miroslav Slugeň <thunder.m@email.cz> wrote:  
> >> This patch is for discussion only, not ready to commit yet.
> >>
> >> 1. Cuvid decoder actualy support scaling input to requested resolution
> >> without any performance penalty (like libnpp does), so this patch is proof
> >> of concept that it is working like expected.
> >>  
> > I don't think scaling is something a decoder should be doing, we don't
> > really want all sorts of video processing jumbled up into one
> > monolithic cuvid thing, but rather keep tasks separated.
> >
> > - Hendrik
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel  
> Yes, but when you transcoding from FHD or 4K to SD quality it could save 
> alotof GPU resources.
> 
> We have one example where "ONE" Quadro P5000 (2xNVENC) is downscaling 
> about 74 FHD streams to SD at realtime.
> 
> I know it is not something that is acceptable in current ffmpeg, maybe 
> libav could adopt this patch.

You mean the Libav project? They'd be even less likely to accept such a
patch.

Anyway, I don't think this would be slower than doing it in some sort
of separate cuda video filter.
Miroslav Slugeň Feb. 13, 2017, 8:03 a.m.
Dne 13.2.2017 v 05:03 wm4 napsal(a):
> On Sun, 12 Feb 2017 21:07:40 +0100
> Miroslav Slugeň <thunder.m@email.cz> wrote:
>
>> Dne 12.2.2017 v 20:59 Hendrik Leppkes napsal(a):
>>> On Sun, Feb 12, 2017 at 8:51 PM, Miroslav Slugeň <thunder.m@email.cz> wrote:
>>>> This patch is for discussion only, not ready to commit yet.
>>>>
>>>> 1. Cuvid decoder actualy support scaling input to requested resolution
>>>> without any performance penalty (like libnpp does), so this patch is proof
>>>> of concept that it is working like expected.
>>>>   
>>> I don't think scaling is something a decoder should be doing, we don't
>>> really want all sorts of video processing jumbled up into one
>>> monolithic cuvid thing, but rather keep tasks separated.
>>>
>>> - Hendrik
>>> _______________________________________________
>>> ffmpeg-devel mailing list
>>> ffmpeg-devel@ffmpeg.org
>>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>> Yes, but when you transcoding from FHD or 4K to SD quality it could save
>> alotof GPU resources.
>>
>> We have one example where "ONE" Quadro P5000 (2xNVENC) is downscaling
>> about 74 FHD streams to SD at realtime.
>>
>> I know it is not something that is acceptable in current ffmpeg, maybe
>> libav could adopt this patch.
> You mean the Libav project? They'd be even less likely to accept such a
> patch.
>
> Anyway, I don't think this would be slower than doing it in some sort
> of separate cuda video filter.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This is not true, NVDEC (cuvid) is separate chipset and has its own 
NVDEC load in nvidia-smi monitoring tool, while resizing with libnpp is 
completly done on CUDA cores. In NVDEC only deinterlacing ADAPTIVE is 
using CUDA cores more intensively, cropping and resizing in NVDEC is for 
free :)

M.
wm4 Feb. 13, 2017, 8:09 a.m.
On Mon, 13 Feb 2017 09:03:09 +0100
Miroslav Slugeň <thunder.m@email.cz> wrote:

> Dne 13.2.2017 v 05:03 wm4 napsal(a):
> > On Sun, 12 Feb 2017 21:07:40 +0100
> > Miroslav Slugeň <thunder.m@email.cz> wrote:
> >  
> >> Dne 12.2.2017 v 20:59 Hendrik Leppkes napsal(a):  
> >>> On Sun, Feb 12, 2017 at 8:51 PM, Miroslav Slugeň <thunder.m@email.cz> wrote:  
> >>>> This patch is for discussion only, not ready to commit yet.
> >>>>
> >>>> 1. Cuvid decoder actualy support scaling input to requested resolution
> >>>> without any performance penalty (like libnpp does), so this patch is proof
> >>>> of concept that it is working like expected.
> >>>>     
> >>> I don't think scaling is something a decoder should be doing, we don't
> >>> really want all sorts of video processing jumbled up into one
> >>> monolithic cuvid thing, but rather keep tasks separated.
> >>>
> >>> - Hendrik
> >>> _______________________________________________
> >>> ffmpeg-devel mailing list
> >>> ffmpeg-devel@ffmpeg.org
> >>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel  
> >> Yes, but when you transcoding from FHD or 4K to SD quality it could save
> >> alotof GPU resources.
> >>
> >> We have one example where "ONE" Quadro P5000 (2xNVENC) is downscaling
> >> about 74 FHD streams to SD at realtime.
> >>
> >> I know it is not something that is acceptable in current ffmpeg, maybe
> >> libav could adopt this patch.  
> > You mean the Libav project? They'd be even less likely to accept such a
> > patch.
> >
> > Anyway, I don't think this would be slower than doing it in some sort
> > of separate cuda video filter.
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel  
> This is not true, NVDEC (cuvid) is separate chipset and has its own 
> NVDEC load in nvidia-smi monitoring tool, while resizing with libnpp is 
> completly done on CUDA cores. In NVDEC only deinterlacing ADAPTIVE is 
> using CUDA cores more intensively, cropping and resizing in NVDEC is for 
> free :)

I wasn't talking about libnpp. I'm assuming they provide their
processing stuff as separate APIs somewhere.
Timo Rothenpieler Feb. 13, 2017, 10:36 a.m.
Am 12.02.2017 um 20:59 schrieb Hendrik Leppkes:
> On Sun, Feb 12, 2017 at 8:51 PM, Miroslav Slugeň <thunder.m@email.cz> wrote:
>> This patch is for discussion only, not ready to commit yet.
>>
>> 1. Cuvid decoder actualy support scaling input to requested resolution
>> without any performance penalty (like libnpp does), so this patch is proof
>> of concept that it is working like expected.
>>
> 
> I don't think scaling is something a decoder should be doing, we don't
> really want all sorts of video processing jumbled up into one
> monolithic cuvid thing, but rather keep tasks separated.

I'm generally in favor of adding this, but I don't see why ffmpeg.c
needs changes for this.
The decoder should already be free to return any video size it likes.

CUVID is kind of a huge special case with its deinterlacing already,
cropping/resizing the output is quite trivial compared to that.
Hendrik Leppkes Feb. 13, 2017, 11:43 a.m.
On Mon, Feb 13, 2017 at 11:36 AM, Timo Rothenpieler
<timo@rothenpieler.org> wrote:
> Am 12.02.2017 um 20:59 schrieb Hendrik Leppkes:
>> On Sun, Feb 12, 2017 at 8:51 PM, Miroslav Slugeň <thunder.m@email.cz> wrote:
>>> This patch is for discussion only, not ready to commit yet.
>>>
>>> 1. Cuvid decoder actualy support scaling input to requested resolution
>>> without any performance penalty (like libnpp does), so this patch is proof
>>> of concept that it is working like expected.
>>>
>>
>> I don't think scaling is something a decoder should be doing, we don't
>> really want all sorts of video processing jumbled up into one
>> monolithic cuvid thing, but rather keep tasks separated.
>
> I'm generally in favor of adding this, but I don't see why ffmpeg.c
> needs changes for this.
> The decoder should already be free to return any video size it likes.
>
> CUVID is kind of a huge special case with its deinterlacing already,
> cropping/resizing the output is quite trivial compared to that.
>

We recently just had all sorts of discussions what decoders should and
should not do, I don't think scaling in a decoder is a good thing to
start doing here.

- Hendrik
Michael Niedermayer Feb. 13, 2017, 1:08 p.m.
On Mon, Feb 13, 2017 at 12:43:51PM +0100, Hendrik Leppkes wrote:
> On Mon, Feb 13, 2017 at 11:36 AM, Timo Rothenpieler
> <timo@rothenpieler.org> wrote:
> > Am 12.02.2017 um 20:59 schrieb Hendrik Leppkes:
> >> On Sun, Feb 12, 2017 at 8:51 PM, Miroslav Slugeň <thunder.m@email.cz> wrote:
> >>> This patch is for discussion only, not ready to commit yet.
> >>>
> >>> 1. Cuvid decoder actualy support scaling input to requested resolution
> >>> without any performance penalty (like libnpp does), so this patch is proof
> >>> of concept that it is working like expected.
> >>>
> >>
> >> I don't think scaling is something a decoder should be doing, we don't
> >> really want all sorts of video processing jumbled up into one
> >> monolithic cuvid thing, but rather keep tasks separated.
> >
> > I'm generally in favor of adding this, but I don't see why ffmpeg.c
> > needs changes for this.
> > The decoder should already be free to return any video size it likes.
> >
> > CUVID is kind of a huge special case with its deinterlacing already,
> > cropping/resizing the output is quite trivial compared to that.
> >
> 
> We recently just had all sorts of discussions what decoders should and
> should not do, I don't think scaling in a decoder is a good thing to
> start doing here.

scaling in some decoders is mandated by some specs
some standards support reduced resolution which can switch from frame
to frame without the decoder output changing
There is also the possiblity of scalability where the reference stream
has lower resolution IIRC.

This is kind of different of course but, scaling code in decoders is
part of some specifications.

[...]
Timo Rothenpieler March 1, 2017, 10:58 a.m.
>> We recently just had all sorts of discussions what decoders should and
>> should not do, I don't think scaling in a decoder is a good thing to
>> start doing here.
> 
> scaling in some decoders is mandated by some specs
> some standards support reduced resolution which can switch from frame
> to frame without the decoder output changing
> There is also the possiblity of scalability where the reference stream
> has lower resolution IIRC.
> 
> This is kind of different of course but, scaling code in decoders is
> part of some specifications.

Would like to bring this back up.
I'd like to merge this, as specially the scaling is freely done by the
video asic, offering a possibility to scale without requiring non-free
libnpp. And cropping so far is not possible at all.

Yes, scaling and cropping is not something a decoder usually does, but
it exposes a hardware feature that has no other way of accessing it,
which offers valuable functionality to users.
Michael Niedermayer March 4, 2017, 4:39 p.m.
On Wed, Mar 01, 2017 at 11:58:39AM +0100, Timo Rothenpieler wrote:
> >> We recently just had all sorts of discussions what decoders should and
> >> should not do, I don't think scaling in a decoder is a good thing to
> >> start doing here.
> > 
> > scaling in some decoders is mandated by some specs
> > some standards support reduced resolution which can switch from frame
> > to frame without the decoder output changing
> > There is also the possiblity of scalability where the reference stream
> > has lower resolution IIRC.
> > 
> > This is kind of different of course but, scaling code in decoders is
> > part of some specifications.
> 
> Would like to bring this back up.
> I'd like to merge this, as specially the scaling is freely done by the
> video asic, offering a possibility to scale without requiring non-free
> libnpp. And cropping so far is not possible at all.
> 
> Yes, scaling and cropping is not something a decoder usually does, but
> it exposes a hardware feature that has no other way of accessing it,
> which offers valuable functionality to users.

iam fine with this but i am not sure others are


[...]
Philip Langdale March 4, 2017, 5:16 p.m.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 1 Mar 2017 11:58:39 +0100
Timo Rothenpieler <timo@rothenpieler.org> wrote:

> >> We recently just had all sorts of discussions what decoders should

> >> and should not do, I don't think scaling in a decoder is a good

> >> thing to start doing here.  

> > 

> > scaling in some decoders is mandated by some specs

> > some standards support reduced resolution which can switch from

> > frame to frame without the decoder output changing

> > There is also the possiblity of scalability where the reference

> > stream has lower resolution IIRC.

> > 

> > This is kind of different of course but, scaling code in decoders is

> > part of some specifications.  

> 

> Would like to bring this back up.

> I'd like to merge this, as specially the scaling is freely done by the

> video asic, offering a possibility to scale without requiring non-free

> libnpp. And cropping so far is not possible at all.

> 

> Yes, scaling and cropping is not something a decoder usually does, but

> it exposes a hardware feature that has no other way of accessing it,

> which offers valuable functionality to users.

> 


I'm ok with it. I agree it's ugly, but if this is the only way, so be
it.

For what it's worth. there's precedence in crystalhd. I exposed the
hardware's ability to do downscaling, which was valuable because it
allowed you to downscale before memcpy, which made the difference
between playable and unplayable for some low end machines.


- --phil
-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAli69m4ACgkQ+NaxlGp1aC5xwgCgkfrCp7/dMt4zl+APxZRCQ17c
tQQAoIZW+8Z57NbOsOTxFTqTpp4PVLWP
=43NP
-----END PGP SIGNATURE-----
u-9iep@aetey.se March 4, 2017, 7:30 p.m.
On Sat, Mar 04, 2017 at 09:16:30AM -0800, Philip Langdale wrote:
> On Wed, 1 Mar 2017 11:58:39 +0100
> Timo Rothenpieler <timo@rothenpieler.org> wrote:
> > Would like to bring this back up.
> > I'd like to merge this, as specially the scaling is freely done by the
> > video asic, offering a possibility to scale without requiring non-free
> > libnpp. And cropping so far is not possible at all.
> > 
> > Yes, scaling and cropping is not something a decoder usually does, but
> > it exposes a hardware feature that has no other way of accessing it,
> > which offers valuable functionality to users.
> 
> I'm ok with it. I agree it's ugly, but if this is the only way, so be
> it.

I find it kind of intriguing that doing an operation at a place
where it is most efficient (also where it seems to belong by the codec
or hardware design) is being called "ugly".

> For what it's worth. there's precedence in crystalhd. I exposed the
> hardware's ability to do downscaling, which was valuable because it
> allowed you to downscale before memcpy, which made the difference
> between playable and unplayable for some low end machines.

The cinepak decoder is another precedent of the same kind, even if not
regarding scaling but pixel formats.

"Doing the operation where it costs least" looks like a reasonable
criteria, doesn't it?

Which criteria would make a decoder (or any tool) a wrong place
for something it does much better than anyone else?

Regards,
Rune
Timo Rothenpieler March 4, 2017, 7:33 p.m.
> "Doing the operation where it costs least" looks like a reasonable
> criteria, doesn't it?
>
> Which criteria would make a decoder (or any tool) a wrong place
> for something it does much better than anyone else?

It's about having scaling-functionality in libavcodec, while it belongs 
into libavfilter, but the cuvid API does not offer that possibility.
u-9iep@aetey.se March 4, 2017, 8:24 p.m.
On Sat, Mar 04, 2017 at 08:33:03PM +0100, Timo Rothenpieler wrote:
> >Which criteria would make a decoder (or any tool) a wrong place
> >for something it does much better than anyone else?
> 
> It's about having scaling-functionality in libavcodec, while it belongs into
> libavfilter, but the cuvid API does not offer that possibility.

You take for granted "it belongs 'there'" but my question was not about
"where" but "why".

In these particular cases (cuvid, cinepak) a libxxxx can perform at
best only a small fraction as good as the decoder itself.

So, again, what is our criteria to choose the most suitable place?

libxxxxxxx exist for a good reason, in many cases they are best as
providers of a certain functionality, compared to multiple spread ad-hoc
implementations.

OTOH when they are _not_ good at providing a functionality, and for
fundamental reasons can not be made as good as an alternative,
then why insist on using them?

Regards,
Rune
Mark Thompson March 4, 2017, 9:44 p.m.
On 01/03/17 10:58, Timo Rothenpieler wrote:
>>> We recently just had all sorts of discussions what decoders should and
>>> should not do, I don't think scaling in a decoder is a good thing to
>>> start doing here.
>>
>> scaling in some decoders is mandated by some specs
>> some standards support reduced resolution which can switch from frame
>> to frame without the decoder output changing
>> There is also the possiblity of scalability where the reference stream
>> has lower resolution IIRC.
>>
>> This is kind of different of course but, scaling code in decoders is
>> part of some specifications.
> 
> Would like to bring this back up.
> I'd like to merge this, as specially the scaling is freely done by the
> video asic, offering a possibility to scale without requiring non-free
> libnpp. And cropping so far is not possible at all.
> 
> Yes, scaling and cropping is not something a decoder usually does, but
> it exposes a hardware feature that has no other way of accessing it,
> which offers valuable functionality to users.

To offer an alternative approach to this:

* Make a new CUVID hwcontext implementation - each frame in it consists of some decode parameters (including input bitstream) and a reference to a decoder instance.

* The CUVID decoder in lavc would create a decoder instance, but when asked to decode a packet it would a new CUVID frame with the appropriate decoding parameters attached to it and returns that.

* CUVID scale/crop/deinterlace filters could then be written which just tag the frame with the appropriate transformation to happen later.

* The decoder then actually runs when you try to get the frame data - either by mapping to CUDA (av_hwframe_map() / vf_hwmap) or actually downloading the frame to system memory (av_hwframe_transfer_data() / vf_hwdownload).


Now, while this has rather nice outward behaviour in having the API work like all other hwcontext implementations, it also has a number of difficulties:

* It's even less clear how to get asynchronicity for performance than it is now - decodes are only issued when you try to use the output, so pretty much all overlap possibilities are lost.  Maybe that could be avoided by adding some sort of "crystallise frame" call to hwcontext, but it's still somewhat clumsy.

* The decoder has to be able to determine the intrinsic delay of the stream in advance, because it can't output a frame until it will definitely be decodable without more packets on the input (av_hwframe_transfer_data() can't return AVERROR(EAGAIN) to indicate that you should supply more data with avcodec_send_packet()).

* The non-native output formats of the decoder in lavc (i.e. all current ones - system memory and CUDA) become unwanted, but compatibility would force them to continue to exist as some sort of auto-download setup.  (ffmpeg.c wouldn't use it - the download would happen there (or not) like it does with the true hwaccels, since like them the decoder doesn't actually support system memory or even CUDA frame output without copying at all.)

* This multiple-library approach putting the decoder in lavu might be regarded as madness.


Not really advocating this solution exactly (I rather agree with the final point above), but I think something like this should be considered so that CUVID doesn't end up behaving entirely differently to all other decoders in this respect.

- Mark
Timo Rothenpieler March 5, 2017, 3:36 p.m.
Am 01.03.2017 um 11:58 schrieb Timo Rothenpieler:
>>> We recently just had all sorts of discussions what decoders should and
>>> should not do, I don't think scaling in a decoder is a good thing to
>>> start doing here.
>>
>> scaling in some decoders is mandated by some specs
>> some standards support reduced resolution which can switch from frame
>> to frame without the decoder output changing
>> There is also the possiblity of scalability where the reference stream
>> has lower resolution IIRC.
>>
>> This is kind of different of course but, scaling code in decoders is
>> part of some specifications.
>
> Would like to bring this back up.
> I'd like to merge this, as specially the scaling is freely done by the
> video asic, offering a possibility to scale without requiring non-free
> libnpp. And cropping so far is not possible at all.
>
> Yes, scaling and cropping is not something a decoder usually does, but
> it exposes a hardware feature that has no other way of accessing it,
> which offers valuable functionality to users.
>

With the lazy filter init now merged, this patch can be simplified.
Rewrote most of it, currently on github: 
https://github.com/BtbN/FFmpeg/commit/f856fa509278392a88c754b8c7755a575e5aeb41

I'm still doing some testing with it, but intend to push it if no issues 
are found.

Patch hide | download patch | download mbox

From 9f5dfd6e9cabd3d419a3a58f7bfa3b3c1e179638 Mon Sep 17 00:00:00 2001
From: Miroslav Slugen <thunder.m@email.cz>
Date: Sun, 12 Feb 2017 20:29:34 +0100
Subject: [PATCH 1/1] cuvid: add resize and crop futures

---
 ffmpeg.h           |  2 ++
 ffmpeg_opt.c       | 12 +++++++
 libavcodec/cuvid.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++--------
 3 files changed, 96 insertions(+), 13 deletions(-)

diff --git a/ffmpeg.h b/ffmpeg.h
index 85a8f18..0374f11 100644
--- a/ffmpeg.h
+++ b/ffmpeg.h
@@ -132,6 +132,8 @@  typedef struct OptionsContext {
     int        nb_hwaccel_output_formats;
     SpecifierOpt *autorotate;
     int        nb_autorotate;
+    SpecifierOpt *resize;
+    int        nb_resize;
 
     /* output options */
     StreamMap *stream_maps;
diff --git a/ffmpeg_opt.c b/ffmpeg_opt.c
index 6a47d32..fcf4792 100644
--- a/ffmpeg_opt.c
+++ b/ffmpeg_opt.c
@@ -659,6 +659,7 @@  static void add_input_streams(OptionsContext *o, AVFormatContext *ic)
         char *codec_tag = NULL;
         char *next;
         char *discard_str = NULL;
+        char *resize_str = NULL;
         const AVClass *cc = avcodec_get_class();
         const AVOption *discard_opt = av_opt_find(&cc, "skip_frame", NULL, 0, 0);
 
@@ -722,6 +723,14 @@  static void add_input_streams(OptionsContext *o, AVFormatContext *ic)
         case AVMEDIA_TYPE_VIDEO:
             if(!ist->dec)
                 ist->dec = avcodec_find_decoder(par->codec_id);
+
+            MATCH_PER_STREAM_OPT(resize, str, resize_str, ic, st);
+            if (resize_str) {
+                av_parse_video_size(&ist->dec_ctx->width, &ist->dec_ctx->height, resize_str);
+                ist->dec_ctx->coded_width  = ist->dec_ctx->width;
+                ist->dec_ctx->coded_height = ist->dec_ctx->height;
+            }
+
 #if FF_API_EMU_EDGE
             if (av_codec_get_lowres(st->codec)) {
                 av_codec_set_lowres(ist->dec_ctx, av_codec_get_lowres(st->codec));
@@ -3591,6 +3600,9 @@  const OptionDef options[] = {
     { "hwaccel_output_format", OPT_VIDEO | OPT_STRING | HAS_ARG | OPT_EXPERT |
                           OPT_SPEC | OPT_INPUT,                                  { .off = OFFSET(hwaccel_output_formats) },
         "select output format used with HW accelerated decoding", "format" },
+    { "resize",         OPT_VIDEO | OPT_STRING | HAS_ARG | OPT_EXPERT |
+                        OPT_SPEC | OPT_INPUT | OPT_OUTPUT,                       { .off = OFFSET(resize) },
+        "resizer builtin input or output" },
 #if CONFIG_VDA || CONFIG_VIDEOTOOLBOX
     { "videotoolbox_pixfmt", HAS_ARG | OPT_STRING | OPT_EXPERT, { &videotoolbox_pixfmt}, "" },
 #endif
diff --git a/libavcodec/cuvid.c b/libavcodec/cuvid.c
index a2e125d..7370ed1 100644
--- a/libavcodec/cuvid.c
+++ b/libavcodec/cuvid.c
@@ -21,6 +21,7 @@ 
 
 #include "compat/cuda/dynlink_loader.h"
 
+#include "libavutil/avstring.h"
 #include "libavutil/buffer.h"
 #include "libavutil/mathematics.h"
 #include "libavutil/hwcontext.h"
@@ -43,6 +44,15 @@  typedef struct CuvidContext
     char *cu_gpu;
     int nb_surfaces;
     int drop_second_field;
+    char *crop;
+    char *resize;
+
+    struct {
+        short left;
+        short top;
+        short right;
+        short bottom;
+    } offset;
 
     AVBufferRef *hwdevice;
     AVBufferRef *hwframe;
@@ -57,6 +67,10 @@  typedef struct CuvidContext
     int internal_error;
     int decoder_flushing;
 
+    int width;
+    int height;
+    int coded_width;
+    int coded_height;
     cudaVideoCodec codec_type;
     cudaVideoChromaFormat chroma_format;
 
@@ -105,6 +119,7 @@  static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
     AVHWFramesContext *hwframe_ctx = (AVHWFramesContext*)ctx->hwframe->data;
     CUVIDDECODECREATEINFO cuinfo;
     int surface_fmt;
+    int width, height;
 
     enum AVPixelFormat pix_fmts[3] = { AV_PIX_FMT_CUDA,
                                        AV_PIX_FMT_NONE,  // Will be updated below
@@ -144,8 +159,8 @@  static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
 
     avctx->pix_fmt = surface_fmt;
 
-    avctx->width = format->display_area.right;
-    avctx->height = format->display_area.bottom;
+    width = format->display_area.right - format->display_area.left;
+    height = format->display_area.bottom - format->display_area.top;
 
     ff_set_sar(avctx, av_div_q(
         (AVRational){ format->display_aspect_ratio.x, format->display_aspect_ratio.y },
@@ -174,8 +189,10 @@  static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
     }
 
     if (ctx->cudecoder
-            && avctx->coded_width == format->coded_width
-            && avctx->coded_height == format->coded_height
+            && ctx->width == width
+            && ctx->height == height
+            && ctx->coded_width == format->coded_width
+            && ctx->coded_height == format->coded_height
             && ctx->chroma_format == format->chroma_format
             && ctx->codec_type == format->codec)
         return 1;
@@ -204,11 +221,15 @@  static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
         return 0;
     }
 
-    avctx->coded_width = format->coded_width;
-    avctx->coded_height = format->coded_height;
-
+    ctx->width = width;
+    ctx->height = height;
+    ctx->coded_width = format->coded_width;
+    ctx->coded_height = format->coded_height;
     ctx->chroma_format = format->chroma_format;
 
+    avctx->coded_width = avctx->width;
+    avctx->coded_height = avctx->height;
+
     memset(&cuinfo, 0, sizeof(cuinfo));
 
     cuinfo.CodecType = ctx->codec_type = format->codec;
@@ -228,15 +249,24 @@  static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
         return 0;
     }
 
-    cuinfo.ulWidth = avctx->coded_width;
-    cuinfo.ulHeight = avctx->coded_height;
-    cuinfo.ulTargetWidth = cuinfo.ulWidth;
-    cuinfo.ulTargetHeight = cuinfo.ulHeight;
+    cuinfo.ulWidth = ctx->coded_width;
+    cuinfo.ulHeight = ctx->coded_height;
+
+    /* cropping depends on original resolution */
+    cuinfo.display_area.left = ctx->offset.left;
+    cuinfo.display_area.top = ctx->offset.top;
+    cuinfo.display_area.right = cuinfo.ulWidth - ctx->offset.right;
+    cuinfo.display_area.bottom = cuinfo.ulHeight - ctx->offset.bottom;
 
+    /* scaling to requested resolution */
+    cuinfo.ulTargetWidth = avctx->width;
+    cuinfo.ulTargetHeight = avctx->height;
+
+    /* aspect ratio conversion, 1:1, depends on scaled resolution */
     cuinfo.target_rect.left = 0;
     cuinfo.target_rect.top = 0;
-    cuinfo.target_rect.right = cuinfo.ulWidth;
-    cuinfo.target_rect.bottom = cuinfo.ulHeight;
+    cuinfo.target_rect.right = cuinfo.ulTargetWidth;
+    cuinfo.target_rect.bottom = cuinfo.ulTargetHeight;
 
     cuinfo.ulNumDecodeSurfaces = ctx->nb_surfaces;
     cuinfo.ulNumOutputSurfaces = 1;
@@ -636,6 +666,11 @@  static int cuvid_test_dummy_decoder(AVCodecContext *avctx,
     cuinfo.ulTargetWidth = cuinfo.ulWidth;
     cuinfo.ulTargetHeight = cuinfo.ulHeight;
 
+    cuinfo.display_area.left = 0;
+    cuinfo.display_area.top = 0;
+    cuinfo.display_area.right = cuinfo.ulWidth;
+    cuinfo.display_area.bottom = cuinfo.ulHeight;
+
     cuinfo.target_rect.left = 0;
     cuinfo.target_rect.top = 0;
     cuinfo.target_rect.right = cuinfo.ulWidth;
@@ -822,6 +857,38 @@  static av_cold int cuvid_decode_init(AVCodecContext *avctx)
                FFMIN(sizeof(ctx->cuparse_ext.raw_seqhdr_data), avctx->extradata_size));
     }
 
+    ctx->offset.top = 0;
+    ctx->offset.bottom = 0;
+    ctx->offset.left = 0;
+    ctx->offset.right = 0;
+    if (ctx->crop) {
+        char *crop_str, *saveptr;
+        int crop_idx = 0;
+        crop_str = av_strdup(ctx->crop);
+        crop_str = av_strtok(crop_str, "x", &saveptr);
+        while (crop_str) {
+            switch (crop_idx++) {
+            case 0:
+                ctx->offset.top = atoi(crop_str);
+                break;
+            case 1:
+                ctx->offset.bottom = atoi(crop_str);
+                break;
+            case 2:
+                ctx->offset.left = atoi(crop_str);
+                break;
+            case 3:
+                ctx->offset.right = atoi(crop_str);
+                break;
+            default:
+                break;
+            }
+            crop_str = av_strtok(NULL, "x", &saveptr);
+        }
+        free(crop_str);
+    }
+
+
     ctx->cuparseinfo.ulMaxNumDecodeSurfaces = ctx->nb_surfaces;
     ctx->cuparseinfo.ulMaxDisplayDelay = 4;
     ctx->cuparseinfo.pUserData = avctx;
@@ -934,6 +1001,8 @@  static const AVOption options[] = {
     { "gpu",      "GPU to be used for decoding", OFFSET(cu_gpu), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, VD },
     { "surfaces", "Maximum surfaces to be used for decoding", OFFSET(nb_surfaces), AV_OPT_TYPE_INT, { .i64 = 25 }, 0, INT_MAX, VD },
     { "drop_second_field", "Drop second field when deinterlacing", OFFSET(drop_second_field), AV_OPT_TYPE_BOOL, { .i64 = 1 }, 0, 1, VD },
+    { "crop",     "Crop (top)x(bottom)x(left)x(right)", OFFSET(crop), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, VD },
+    { "resize",   "Resize (width)x(height)", OFFSET(resize), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, VD },
     { NULL }
 };
 
-- 
2.1.4