From patchwork Tue Aug 23 17:10:21 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Oliver Collyer X-Patchwork-Id: 270 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.140.134 with SMTP id o128csp2399486vsd; Tue, 23 Aug 2016 10:11:06 -0700 (PDT) X-Received: by 10.28.215.81 with SMTP id o78mr20352622wmg.42.1471972264049; Tue, 23 Aug 2016 10:11:04 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u67si13880194wme.73.2016.08.23.10.10.57; Tue, 23 Aug 2016 10:11:04 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@mac.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE dis=NONE) header.from=mac.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 07266689BA3; Tue, 23 Aug 2016 20:10:44 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from pv33p04im-asmtp002.me.com (pv33p04im-asmtp002.me.com [17.143.181.11]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5E605689A63 for ; Tue, 23 Aug 2016 20:10:31 +0300 (EEST) Received: from process-dkim-sign-daemon.pv33p04im-asmtp002.me.com by pv33p04im-asmtp002.me.com (Oracle Communications Messaging Server 7.0.5.38.0 64bit (built Feb 26 2016)) id <0OCD00300H0PCA00@pv33p04im-asmtp002.me.com> for ffmpeg-devel@ffmpeg.org; Tue, 23 Aug 2016 17:10:25 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mac.com; s=4d515a; t=1471972225; bh=1h3cqL+SVOkaWtLPwYqits2J8CBEAUyBHaLXFwKb3fY=; h=From:Content-type:Subject:Message-id:Date:To:MIME-version; b=dJuhGywx0x4qK5LlcYhdF1l+yDIlZ9G+58ZAUYue2eDW0cZdJr7b6u3PJ2c4MSJVw ekPfFQeH5+/oQGDqh/SxIn4uv1WnCFw6rmcu4hCE8HYqaDFtFuP6T+6VTIcIQ37lfa VAWmmShxCZSduDm1j0UGiUXRDKBbbULRtudCibYx+yOZzOtkPO+6hBm4HgWdJZKyIT N6v79LCA1macsTbZbNED0kX5sil1yZxSYQIoRyTl+VBgvJVkG20GG2HlRF5p7/6A2I h02zgIjWzM/zWRfmNMay8syP3X4AMiJpoJL3JZ1Ysb5Eziaimq2hRSA3ZWRxkEOut6 XwY9Dpl1v1zqQ== Received: from ovs-laptop.ov (host-195-142-216-120.reverse.superonline.net [195.142.216.120]) by pv33p04im-asmtp002.me.com (Oracle Communications Messaging Server 7.0.5.38.0 64bit (built Feb 26 2016)) with ESMTPSA id <0OCD00KQ4H19Z750@pv33p04im-asmtp002.me.com> for ffmpeg-devel@ffmpeg.org; Tue, 23 Aug 2016 17:10:25 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-08-23_10:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 clxscore=1015 suspectscore=13 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1603290000 definitions=main-1608230171 From: Oliver Collyer Message-id: <26E0E9A5-8B82-4D90-A97C-50E62FF69AB6@mac.com> Date: Tue, 23 Aug 2016 20:10:21 +0300 To: ffmpeg-devel@ffmpeg.org MIME-version: 1.0 (Mac OS X Mail 9.3 \(3124\)) X-Mailer: Apple Mail (2.3124) Subject: [FFmpeg-devel] [PATCH] Nvidia NVENC 10-bit HEVC encoding and rate control lookahead support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Hi all Attached is a patch for the above. 10-bit HEVC encoding is a new feature of the latest Pascal Nvidia GPUs, released in the past few months; I’ve added support for the yuv420p10le and yuv444p10le pixel formats. Rate control lookahead is available on pre-Pascal models too but is available with the latest SDK/latest drivers. As part of this I’ve bumped the required SDK version to the latest, which is 7. Feedback welcome. This is only my second patch; I seem to average about one a year :) Regards Oliver --- configure | 4 +- libavcodec/nvenc.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++-- libavcodec/nvenc.h | 6 +++ libavcodec/nvenc_hevc.c | 6 ++- 4 files changed, 129 insertions(+), 7 deletions(-) diff --git a/configure b/configure index 9b92426..46ff144 100755 --- a/configure +++ b/configure @@ -5774,8 +5774,8 @@ enabled mmal && check_func_headers interface/mmal/mmal.h "MMAL_PARAMETER_VIDEO_M enabled netcdf && require_pkg_config netcdf netcdf.h nc_inq_libvers enabled nvenc && { check_header nvEncodeAPI.h || die "ERROR: nvEncodeAPI.h not found."; } && - { check_cpp_condition nvEncodeAPI.h "NVENCAPI_MAJOR_VERSION >= 6" || - die "ERROR: NVENC API version 5 or older is not supported"; } && + { check_cpp_condition nvEncodeAPI.h "NVENCAPI_MAJOR_VERSION >= 7" || + die "ERROR: NVENC API version 6 or older is not supported"; } && { [ $target_os != cygwin ] || die "ERROR: NVENC is not supported on Cygwin currently."; } enabled openal && { { for al_libs in "${OPENAL_LIBS}" "-lopenal" "-lOpenAL32"; do check_lib 'AL/al.h' alGetError "${al_libs}" && break; done } || diff --git a/libavcodec/nvenc.c b/libavcodec/nvenc.c index 984dd3b..685dd7d 100644 --- a/libavcodec/nvenc.c +++ b/libavcodec/nvenc.c @@ -75,8 +75,10 @@ const enum AVPixelFormat ff_nvenc_pix_fmts[] = { AV_PIX_FMT_YUV420P, + AV_PIX_FMT_YUV420P10LE, AV_PIX_FMT_NV12, AV_PIX_FMT_YUV444P, + AV_PIX_FMT_YUV444P10LE, #if CONFIG_CUDA AV_PIX_FMT_CUDA, #endif @@ -314,6 +316,18 @@ static int nvenc_check_capabilities(AVCodecContext *avctx) return AVERROR(ENOSYS); } + ret = nvenc_check_cap(avctx, NV_ENC_CAPS_SUPPORT_10BIT_ENCODE); + if ((ctx->data_pix_fmt == AV_PIX_FMT_YUV420P10LE || ctx->data_pix_fmt == AV_PIX_FMT_YUV444P10LE) && ret <= 0) { + av_log(avctx, AV_LOG_VERBOSE, "10 bit encode not supported\n"); + return AVERROR(ENOSYS); + } + + ret = nvenc_check_cap(avctx, NV_ENC_CAPS_SUPPORT_LOOKAHEAD); + if (ctx->rc_lookahead > 0 && ret <= 0) { + av_log(avctx, AV_LOG_VERBOSE, "RC lookahead not supported\n"); + return AVERROR(ENOSYS); + } + return 0; } @@ -673,6 +687,11 @@ static av_cold void nvenc_setup_rate_control(AVCodecContext *avctx) } else if (ctx->encode_config.rcParams.averageBitRate > 0) { ctx->encode_config.rcParams.vbvBufferSize = 2 * ctx->encode_config.rcParams.averageBitRate; } + + if (ctx->rc_lookahead > 0) { + ctx->encode_config.rcParams.enableLookahead = 1; + ctx->encode_config.rcParams.lookaheadDepth = FFMIN(ctx->rc_lookahead, 32); + } } static av_cold int nvenc_setup_h264_config(AVCodecContext *avctx) @@ -800,9 +819,26 @@ static av_cold int nvenc_setup_hevc_config(AVCodecContext *avctx) hevc->outputPictureTimingSEI = 1; } - /* No other profile is supported in the current SDK version 5 */ - cc->profileGUID = NV_ENC_HEVC_PROFILE_MAIN_GUID; - avctx->profile = FF_PROFILE_HEVC_MAIN; + switch(ctx->profile) { + case NV_ENC_HEVC_PROFILE_MAIN: + cc->profileGUID = NV_ENC_HEVC_PROFILE_MAIN_GUID; + avctx->profile = FF_PROFILE_HEVC_MAIN; + break; + case NV_ENC_HEVC_PROFILE_MAIN_10: + cc->profileGUID = NV_ENC_HEVC_PROFILE_MAIN10_GUID; + avctx->profile = FF_PROFILE_HEVC_MAIN_10; + break; + } + + // force setting profile as main10 if input is AV_PIX_FMT_YUVXXXP10LE + if (ctx->data_pix_fmt == AV_PIX_FMT_YUV420P10LE || ctx->data_pix_fmt == AV_PIX_FMT_YUV444P10LE) { + cc->profileGUID = NV_ENC_HEVC_PROFILE_MAIN10_GUID; + avctx->profile = FF_PROFILE_HEVC_MAIN_10; + } + + hevc->chromaFormatIDC = ctx->data_pix_fmt == AV_PIX_FMT_YUV444P || ctx->data_pix_fmt == AV_PIX_FMT_YUV444P10LE ? 3 : 1; + + hevc->pixelBitDepthMinus8 = ctx->data_pix_fmt == AV_PIX_FMT_YUV420P10LE || ctx->data_pix_fmt == AV_PIX_FMT_YUV444P10LE ? 2 : 0; hevc->level = ctx->level; @@ -954,6 +990,10 @@ static av_cold int nvenc_alloc_surface(AVCodecContext *avctx, int idx) ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_YV12_PL; break; + case AV_PIX_FMT_YUV420P10LE: + ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_YUV420_10BIT; + break; + case AV_PIX_FMT_NV12: ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_NV12_PL; break; @@ -962,6 +1002,10 @@ static av_cold int nvenc_alloc_surface(AVCodecContext *avctx, int idx) ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_YUV444_PL; break; + case AV_PIX_FMT_YUV444P10LE: + ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_YUV444_10BIT; + break; + default: av_log(avctx, AV_LOG_FATAL, "Invalid input pixel format\n"); return AVERROR(EINVAL); @@ -1206,6 +1250,49 @@ static NvencSurface *get_free_frame(NvencContext *ctx) return NULL; } +static void copy_single_10bit_plane(uint8_t *dst, int dst_linesize, + const uint8_t *src, int src_linesize, + int width, int height) +{ + if (!dst || !src) + return; + av_assert0(abs(src_linesize) >= width << 1); + av_assert0(abs(dst_linesize) >= width << 1); + for (;height > 0; height--) { + uint16_t* tdst = (uint16_t*)dst; + uint16_t* tsrc = (uint16_t*)src; + for (int w = width; w > 0; w--) { + *tdst++ = *tsrc++ << 6; + } + dst += dst_linesize; + src += src_linesize; + } +} + +static void interleave_10bit_planes(uint8_t *dst, int dst_linesize, + const uint8_t *src1, int src1_linesize, + const uint8_t *src2, int src2_linesize, + int width, int height) +{ + if (!dst || !src1 || !src2) + return; + av_assert0(abs(src1_linesize) >= width); + av_assert0(abs(src2_linesize) >= width); + av_assert0(abs(dst_linesize) >= width << 1); + for (;height > 0; height--) { + uint16_t* tdst = (uint16_t*)dst; + uint16_t* tsrc1 = (uint16_t*)src1; + uint16_t* tsrc2 = (uint16_t*)src2; + for (int w = width; w > 0; w-=2) { + *tdst++ = *tsrc1++ << 6; + *tdst++ = *tsrc2++ << 6; + } + dst += dst_linesize; + src1 += src1_linesize; + src2 += src2_linesize; + } +} + static int nvenc_copy_frame(AVCodecContext *avctx, NvencSurface *inSurf, NV_ENC_LOCK_INPUT_BUFFER *lockBufferParams, const AVFrame *frame) { @@ -1228,6 +1315,17 @@ static int nvenc_copy_frame(AVCodecContext *avctx, NvencSurface *inSurf, av_image_copy_plane(buf, lockBufferParams->pitch >> 1, frame->data[1], frame->linesize[1], avctx->width >> 1, avctx->height >> 1); + } else if (frame->format == AV_PIX_FMT_YUV420P10LE) { + copy_single_10bit_plane(buf, lockBufferParams->pitch, + frame->data[0], frame->linesize[0], + avctx->width, avctx->height); + + buf += off; + + interleave_10bit_planes(buf, lockBufferParams->pitch, + frame->data[1], frame->linesize[1], + frame->data[2], frame->linesize[2], + avctx->width, avctx->height >> 1); } else if (frame->format == AV_PIX_FMT_NV12) { av_image_copy_plane(buf, lockBufferParams->pitch, frame->data[0], frame->linesize[0], @@ -1254,6 +1352,22 @@ static int nvenc_copy_frame(AVCodecContext *avctx, NvencSurface *inSurf, av_image_copy_plane(buf, lockBufferParams->pitch, frame->data[2], frame->linesize[2], avctx->width, avctx->height); + } else if (frame->format == AV_PIX_FMT_YUV444P10LE) { + copy_single_10bit_plane(buf, lockBufferParams->pitch, + frame->data[0], frame->linesize[0], + avctx->width, avctx->height); + + buf += off; + + copy_single_10bit_plane(buf, lockBufferParams->pitch, + frame->data[1], frame->linesize[1], + avctx->width, avctx->height); + + buf += off; + + copy_single_10bit_plane(buf, lockBufferParams->pitch, + frame->data[2], frame->linesize[2], + avctx->width, avctx->height); } else { av_log(avctx, AV_LOG_FATAL, "Invalid pixel format!\n"); return AVERROR(EINVAL); diff --git a/libavcodec/nvenc.h b/libavcodec/nvenc.h index 961cbc7..9366a26 100644 --- a/libavcodec/nvenc.h +++ b/libavcodec/nvenc.h @@ -117,6 +117,11 @@ enum { }; enum { + NV_ENC_HEVC_PROFILE_MAIN, + NV_ENC_HEVC_PROFILE_MAIN_10, +}; + +enum { NVENC_LOWLATENCY = 1, NVENC_LOSSLESS = 2, NVENC_ONE_PASS = 4, @@ -174,6 +179,7 @@ typedef struct NvencContext int device; int flags; int async_depth; + int rc_lookahead; } NvencContext; int ff_nvenc_encode_init(AVCodecContext *avctx); diff --git a/libavcodec/nvenc_hevc.c b/libavcodec/nvenc_hevc.c index 1ce7c89..04e351a 100644 --- a/libavcodec/nvenc_hevc.c +++ b/libavcodec/nvenc_hevc.c @@ -39,8 +39,9 @@ static const AVOption options[] = { { "llhp", "low latency hp", 0, AV_OPT_TYPE_CONST, { .i64 = PRESET_LOW_LATENCY_HP }, 0, 0, VE, "preset" }, { "lossless", "lossless", 0, AV_OPT_TYPE_CONST, { .i64 = PRESET_LOSSLESS_DEFAULT }, 0, 0, VE, "preset" }, { "losslesshp", "lossless hp", 0, AV_OPT_TYPE_CONST, { .i64 = PRESET_LOSSLESS_HP }, 0, 0, VE, "preset" }, - { "profile", "Set the encoding profile", OFFSET(profile), AV_OPT_TYPE_INT, { .i64 = FF_PROFILE_HEVC_MAIN }, FF_PROFILE_HEVC_MAIN, FF_PROFILE_HEVC_MAIN, VE, "profile" }, - { "main", "", 0, AV_OPT_TYPE_CONST, { .i64 = FF_PROFILE_HEVC_MAIN }, 0, 0, VE, "profile" }, + { "profile", "Set the encoding profile", OFFSET(profile), AV_OPT_TYPE_INT, { .i64 = NV_ENC_HEVC_PROFILE_MAIN }, NV_ENC_HEVC_PROFILE_MAIN, FF_PROFILE_HEVC_MAIN_10, VE, "profile" }, + { "main", "", 0, AV_OPT_TYPE_CONST, { .i64 = NV_ENC_HEVC_PROFILE_MAIN }, 0, 0, VE, "profile" }, + { "main10", "", 0, AV_OPT_TYPE_CONST, { .i64 = NV_ENC_HEVC_PROFILE_MAIN_10 }, 0, 0, VE, "profile" }, { "level", "Set the encoding level restriction", OFFSET(level), AV_OPT_TYPE_INT, { .i64 = NV_ENC_LEVEL_AUTOSELECT }, NV_ENC_LEVEL_AUTOSELECT, NV_ENC_LEVEL_HEVC_62, VE, "level" }, { "auto", "", 0, AV_OPT_TYPE_CONST, { .i64 = NV_ENC_LEVEL_AUTOSELECT }, 0, 0, VE, "level" }, { "1", "", 0, AV_OPT_TYPE_CONST, { .i64 = NV_ENC_LEVEL_HEVC_1 }, 0, 0, VE, "level" }, @@ -73,6 +74,7 @@ static const AVOption options[] = { { "ll_2pass_quality", "Multi-pass optimized for image quality (only for low-latency presets)", 0, AV_OPT_TYPE_CONST, { .i64 = NV_ENC_PARAMS_RC_2_PASS_QUALITY }, 0, 0, VE, "rc" }, { "ll_2pass_size", "Multi-pass optimized for constant frame size (only for low-latency presets)", 0, AV_OPT_TYPE_CONST, { .i64 = NV_ENC_PARAMS_RC_2_PASS_FRAMESIZE_CAP }, 0, 0, VE, "rc" }, { "vbr_2pass", "Multi-pass variable bitrate mode", 0, AV_OPT_TYPE_CONST, { .i64 = NV_ENC_PARAMS_RC_2_PASS_VBR }, 0, 0, VE, "rc" }, + { "rc-lookahead", "Number of frames to look ahead for rate-control", OFFSET(rc_lookahead), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, INT_MAX, VE }, { "surfaces", "Number of concurrent surfaces", OFFSET(nb_surfaces), AV_OPT_TYPE_INT, { .i64 = 32 }, 0, INT_MAX, VE }, { "cbr", "Use cbr encoding mode", OFFSET(cbr), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE }, { "2pass", "Use 2pass encoding mode", OFFSET(twopass), AV_OPT_TYPE_BOOL, { .i64 = -1 }, -1, 1, VE },