diff mbox

[FFmpeg-devel] NVENC: Better surface allocation alghoritm, fix rc_lookahead

Message ID 5832D7D7.5000202@email.cz
State Accepted
Headers show

Commit Message

Miroslav Slugeň Nov. 21, 2016, 11:17 a.m. UTC
User selectable surfaces are not working correctly, if you set number of 
surfaces on cmdline, it will always use minimum 32 or 48 depends on 
selected resolution, but in nvenc it is not necessary to use so many 
surfaces.

So from now you can define as low as 1 surface and nvenc will still 
work, it will ofcourse lower GPU memory usage by 95% and async_delay to zero

That was the easy part, now littlebit more...

Next part of this patch is to always prefer rc_lookahead to be more 
important for number of surfaces, than user defined surfaces value. 
Maximum rc_lookahead from nvidia documentation is 32, but could increase 
in future generations so there is no limit for this yet. Value 
async_depth is still accepted and prefered over rc_lookahead.

There were also bug when you request more than rc_lookahead > 31, it 
will always set maximum 31, because surface numbers recalculation was 
after setting lookahead, which is now fixed.

Results:
If you set -rc_lookahead 32 and -bf 3 it will now use only 40 surfaces 
and lower GPU memory usage by 20%, also it will now increase PSNR by 0.012dB

Two more comments:

1. from my internal test, i don't understand addition of 4 more surfaces 
when lookahead is calculated, i didn't used this and everything works as 
with those 4 more extra surfaces, does anybody know what is going on 
there? I looks like it was used for B frames which are calculated 
separately, because B frames maximum is 4.

2. rc_lookahead is defined default to -1, but in test condition if 
(ctx->rc_lookahead) which sets lookahead it will be always true, i don't 
know if this is intended behavior, so in default behavior is lookahead 
always on!

This is default condition when rc_lokkahead is -1 (not defined on 
cmdline), whis is maybe something that is not intended:
ctx->encode_config.rcParams.enableLookahead = 1;
ctx->encode_config.rcParams.lookaheadDepth  = 0;
ctx->encode_config.rcParams.disableIadapt   = 0;
ctx->encode_config.rcParams.disableBadapt   = 0;

Comments

Carl Eugen Hoyos Nov. 21, 2016, 11:47 a.m. UTC | #1
2016-11-21 12:17 GMT+01:00 Miroslav Slugeň <thunder.m@email.cz>:

> That was the easy part, now littlebit more...
>
> Next part of this patch is to always prefer rc_lookahead

[...]

> There were also bug

Please split the patch into two (or three) patches to make
the review and possible regression tests easier.

Thank you, Carl Eugen

[...]
Timo Rothenpieler Nov. 21, 2016, 11:57 a.m. UTC | #2
> Please split the patch into two (or three) patches to make
> the review and possible regression tests easier.

The bug was implicitly fixed by the new code, it doesn't seem necessary
to me to fix it independently, specially as so far nobody seems to have
run into it.

Patch LGTM, applied locally, will push most likely tomorrow.
Miroslav Slugeň Nov. 21, 2016, 12:06 p.m. UTC | #3
Thx, i has same opinion :)

Miroslav Slugeň
+420 724 825 885

Dne 21.11.2016 v 12:57 Timo Rothenpieler napsal(a):
>> Please split the patch into two (or three) patches to make
>> the review and possible regression tests easier.
> The bug was implicitly fixed by the new code, it doesn't seem necessary
> to me to fix it independently, specially as so far nobody seems to have
> run into it.
>
> Patch LGTM, applied locally, will push most likely tomorrow.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Moritz Barsnick Nov. 21, 2016, 10:09 p.m. UTC | #4
On Mon, Nov 21, 2016 at 12:17:43 +0100, Miroslav Slugeň wrote:
> +                   "Defined rc_lookahead require more surfaces, "
Nit:                                        ^ requires

Moritz
diff mbox

Patch

From ab98c06a19086ee3763722556295fa32ab8b8789 Mon Sep 17 00:00:00 2001
From: Miroslav Slugen <thunder.m@email.cz>
Date: Mon, 21 Nov 2016 11:30:27 +0100
Subject: [PATCH] NVENC: Better surface allocation alghoritm, fix rc_lookahead
 limit

---
 libavcodec/nvenc.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/libavcodec/nvenc.c b/libavcodec/nvenc.c
index a3a2ef5..fdf8e5d 100644
--- a/libavcodec/nvenc.c
+++ b/libavcodec/nvenc.c
@@ -674,6 +674,27 @@  static void nvenc_override_rate_control(AVCodecContext *avctx)
     rc->rateControlMode = ctx->rc;
 }
 
+static av_cold int nvenc_recalc_surfaces(AVCodecContext *avctx)
+{
+    NvencContext *ctx = avctx->priv_data;
+    int nb_surfaces = 0;
+
+    if (ctx->rc_lookahead > 0) {
+        nb_surfaces = ctx->rc_lookahead + ((ctx->encode_config.frameIntervalP > 0) ? ctx->encode_config.frameIntervalP : 0) + 1 + 4;
+        if (ctx->nb_surfaces < nb_surfaces) {
+            av_log(avctx, AV_LOG_WARNING,
+                   "Defined rc_lookahead require more surfaces, "
+                   "increasing used surfaces %d -> %d\n", ctx->nb_surfaces, nb_surfaces);
+            ctx->nb_surfaces = nb_surfaces;
+        }
+    }
+
+    ctx->nb_surfaces = FFMAX(1, FFMIN(MAX_REGISTERED_FRAMES, ctx->nb_surfaces));
+    ctx->async_depth = FFMIN(ctx->async_depth, ctx->nb_surfaces - 1);
+
+    return 0;
+}
+
 static av_cold void nvenc_setup_rate_control(AVCodecContext *avctx)
 {
     NvencContext *ctx = avctx->priv_data;
@@ -1030,6 +1051,8 @@  static av_cold int nvenc_setup_encoder(AVCodecContext *avctx)
     ctx->initial_pts[0] = AV_NOPTS_VALUE;
     ctx->initial_pts[1] = AV_NOPTS_VALUE;
 
+    nvenc_recalc_surfaces(avctx);
+
     nvenc_setup_rate_control(avctx);
 
     if (avctx->flags & AV_CODEC_FLAG_INTERLACED_DCT) {
@@ -1156,11 +1179,6 @@  static av_cold int nvenc_setup_surfaces(AVCodecContext *avctx)
 {
     NvencContext *ctx = avctx->priv_data;
     int i, res;
-    int num_mbs = ((avctx->width + 15) >> 4) * ((avctx->height + 15) >> 4);
-    ctx->nb_surfaces = FFMAX((num_mbs >= 8160) ? 32 : 48,
-                             ctx->nb_surfaces);
-    ctx->async_depth = FFMIN(ctx->async_depth, ctx->nb_surfaces - 1);
-
 
     ctx->surfaces = av_mallocz_array(ctx->nb_surfaces, sizeof(*ctx->surfaces));
     if (!ctx->surfaces)
-- 
2.1.4