From patchwork Mon Nov 21 11:17:43 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Miroslav_Sluge=C5=88?= X-Patchwork-Id: 1506 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.90.1 with SMTP id o1csp1529423vsb; Mon, 21 Nov 2016 03:17:57 -0800 (PST) X-Received: by 10.28.128.211 with SMTP id b202mr15598898wmd.7.1479727077362; Mon, 21 Nov 2016 03:17:57 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a7si19604895wjy.176.2016.11.21.03.17.56; Mon, 21 Nov 2016 03:17:57 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@email.cz; dkim=neutral (body hash did not verify) header.i=@email.cz; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE dis=NONE) header.from=email.cz Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0F31768982A; Mon, 21 Nov 2016 13:17:51 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mxs2.seznam.cz (mxs2.seznam.cz [77.75.76.125]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 600FA68920B for ; Mon, 21 Nov 2016 13:17:44 +0200 (EET) Received: from email.seznam.cz by email-smtpc4a.ng.seznam.cz (email-smtpc4a.ng.seznam.cz [10.23.10.105]) id 7e0bc4644adcdc787e99ed69; Mon, 21 Nov 2016 12:17:46 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=email.cz; s=beta; t=1479727066; bh=3dbWz5fN+u32v1xH8J0GtG6OHM2ifuGf101kXb4KyA8=; h=DKIM-Signature:Received:From:Subject:To:Message-ID:Date: User-Agent:MIME-Version:Content-Type; b=jkqT3Ue7V5mNKCUR8IhYdwBzfWU5l/TR3RuVOutqPyCW8xHuB0vDpy9XAq0HYgc7G /Po1tc/ZmbiWnLcabmwPmFIzvKAc9aShy+lpmbzuklry4Br18Y6HLqEM2l5W8KEyrm 71nHzsL5R/Okc/tJLvPt3QXgJPn+kswtxj2cAwaA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=email.cz; s=beta; t=1479727065; bh=3dbWz5fN+u32v1xH8J0GtG6OHM2ifuGf101kXb4KyA8=; h=Received:From:Subject:To:Message-ID:Date:User-Agent:MIME-Version: Content-Type; b=O7OtqPVbIupXmMayqjN6pfAxmU65E7fWGMrdN0T0uDoaOPw+OOBaRnal6bjWooeCf EBQlRr0ZXRhNT7w4602ZrVeHd4L/LKQoNInbhzntBHnbGAFY+SuQZ+a0e65+JPBJ8E WDYHZGM1JQg8T3JbmApszY1zDeQiX5MScDV3J3qQ= Received: from [192.168.0.6] (ip-94-113-140-7.net.upcbroadband.cz [94.113.140.7]) by email-relay11.ng.seznam.cz (Seznam SMTPD 1.3.39) with ESMTP; Mon, 21 Nov 2016 12:17:44 +0100 (CET) From: =?UTF-8?Q?Miroslav_Sluge=c5=88?= To: ffmpeg-devel@ffmpeg.org Message-ID: <5832D7D7.5000202@email.cz> Date: Mon, 21 Nov 2016 12:17:43 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] NVENC: Better surface allocation alghoritm, fix rc_lookahead X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" User selectable surfaces are not working correctly, if you set number of surfaces on cmdline, it will always use minimum 32 or 48 depends on selected resolution, but in nvenc it is not necessary to use so many surfaces. So from now you can define as low as 1 surface and nvenc will still work, it will ofcourse lower GPU memory usage by 95% and async_delay to zero That was the easy part, now littlebit more... Next part of this patch is to always prefer rc_lookahead to be more important for number of surfaces, than user defined surfaces value. Maximum rc_lookahead from nvidia documentation is 32, but could increase in future generations so there is no limit for this yet. Value async_depth is still accepted and prefered over rc_lookahead. There were also bug when you request more than rc_lookahead > 31, it will always set maximum 31, because surface numbers recalculation was after setting lookahead, which is now fixed. Results: If you set -rc_lookahead 32 and -bf 3 it will now use only 40 surfaces and lower GPU memory usage by 20%, also it will now increase PSNR by 0.012dB Two more comments: 1. from my internal test, i don't understand addition of 4 more surfaces when lookahead is calculated, i didn't used this and everything works as with those 4 more extra surfaces, does anybody know what is going on there? I looks like it was used for B frames which are calculated separately, because B frames maximum is 4. 2. rc_lookahead is defined default to -1, but in test condition if (ctx->rc_lookahead) which sets lookahead it will be always true, i don't know if this is intended behavior, so in default behavior is lookahead always on! This is default condition when rc_lokkahead is -1 (not defined on cmdline), whis is maybe something that is not intended: ctx->encode_config.rcParams.enableLookahead = 1; ctx->encode_config.rcParams.lookaheadDepth = 0; ctx->encode_config.rcParams.disableIadapt = 0; ctx->encode_config.rcParams.disableBadapt = 0; From ab98c06a19086ee3763722556295fa32ab8b8789 Mon Sep 17 00:00:00 2001 From: Miroslav Slugen Date: Mon, 21 Nov 2016 11:30:27 +0100 Subject: [PATCH] NVENC: Better surface allocation alghoritm, fix rc_lookahead limit --- libavcodec/nvenc.c | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/libavcodec/nvenc.c b/libavcodec/nvenc.c index a3a2ef5..fdf8e5d 100644 --- a/libavcodec/nvenc.c +++ b/libavcodec/nvenc.c @@ -674,6 +674,27 @@ static void nvenc_override_rate_control(AVCodecContext *avctx) rc->rateControlMode = ctx->rc; } +static av_cold int nvenc_recalc_surfaces(AVCodecContext *avctx) +{ + NvencContext *ctx = avctx->priv_data; + int nb_surfaces = 0; + + if (ctx->rc_lookahead > 0) { + nb_surfaces = ctx->rc_lookahead + ((ctx->encode_config.frameIntervalP > 0) ? ctx->encode_config.frameIntervalP : 0) + 1 + 4; + if (ctx->nb_surfaces < nb_surfaces) { + av_log(avctx, AV_LOG_WARNING, + "Defined rc_lookahead require more surfaces, " + "increasing used surfaces %d -> %d\n", ctx->nb_surfaces, nb_surfaces); + ctx->nb_surfaces = nb_surfaces; + } + } + + ctx->nb_surfaces = FFMAX(1, FFMIN(MAX_REGISTERED_FRAMES, ctx->nb_surfaces)); + ctx->async_depth = FFMIN(ctx->async_depth, ctx->nb_surfaces - 1); + + return 0; +} + static av_cold void nvenc_setup_rate_control(AVCodecContext *avctx) { NvencContext *ctx = avctx->priv_data; @@ -1030,6 +1051,8 @@ static av_cold int nvenc_setup_encoder(AVCodecContext *avctx) ctx->initial_pts[0] = AV_NOPTS_VALUE; ctx->initial_pts[1] = AV_NOPTS_VALUE; + nvenc_recalc_surfaces(avctx); + nvenc_setup_rate_control(avctx); if (avctx->flags & AV_CODEC_FLAG_INTERLACED_DCT) { @@ -1156,11 +1179,6 @@ static av_cold int nvenc_setup_surfaces(AVCodecContext *avctx) { NvencContext *ctx = avctx->priv_data; int i, res; - int num_mbs = ((avctx->width + 15) >> 4) * ((avctx->height + 15) >> 4); - ctx->nb_surfaces = FFMAX((num_mbs >= 8160) ? 32 : 48, - ctx->nb_surfaces); - ctx->async_depth = FFMIN(ctx->async_depth, ctx->nb_surfaces - 1); - ctx->surfaces = av_mallocz_array(ctx->nb_surfaces, sizeof(*ctx->surfaces)); if (!ctx->surfaces) -- 2.1.4