From patchwork Tue Aug 23 17:10:21 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Oliver Collyer <ovcollyer@mac.com>
X-Patchwork-Id: 270
Delivered-To: ffmpegpatchwork@gmail.com
Received: by 10.103.140.134 with SMTP id o128csp2399486vsd;
	Tue, 23 Aug 2016 10:11:06 -0700 (PDT)
X-Received: by 10.28.215.81 with SMTP id o78mr20352622wmg.42.1471972264049;
	Tue, 23 Aug 2016 10:11:04 -0700 (PDT)
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
	by mx.google.com with ESMTP id
	u67si13880194wme.73.2016.08.23.10.10.57;
	Tue, 23 Aug 2016 10:11:04 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
	dkim=neutral (body hash did not verify) header.i=@mac.com;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
	dmarc=fail (p=NONE dis=NONE) header.from=mac.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 07266689BA3;
	Tue, 23 Aug 2016 20:10:44 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from pv33p04im-asmtp002.me.com (pv33p04im-asmtp002.me.com
	[17.143.181.11])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5E605689A63
	for <ffmpeg-devel@ffmpeg.org>; Tue, 23 Aug 2016 20:10:31 +0300 (EEST)
Received: from process-dkim-sign-daemon.pv33p04im-asmtp002.me.com by
	pv33p04im-asmtp002.me.com
	(Oracle Communications Messaging Server 7.0.5.38.0 64bit (built Feb
	26 2016)) id <0OCD00300H0PCA00@pv33p04im-asmtp002.me.com> for
	ffmpeg-devel@ffmpeg.org; Tue, 23 Aug 2016 17:10:25 +0000 (GMT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mac.com; s=4d515a;
	t=1471972225; bh=1h3cqL+SVOkaWtLPwYqits2J8CBEAUyBHaLXFwKb3fY=;
	h=From:Content-type:Subject:Message-id:Date:To:MIME-version;
	b=dJuhGywx0x4qK5LlcYhdF1l+yDIlZ9G+58ZAUYue2eDW0cZdJr7b6u3PJ2c4MSJVw
	ekPfFQeH5+/oQGDqh/SxIn4uv1WnCFw6rmcu4hCE8HYqaDFtFuP6T+6VTIcIQ37lfa
	VAWmmShxCZSduDm1j0UGiUXRDKBbbULRtudCibYx+yOZzOtkPO+6hBm4HgWdJZKyIT
	N6v79LCA1macsTbZbNED0kX5sil1yZxSYQIoRyTl+VBgvJVkG20GG2HlRF5p7/6A2I
	h02zgIjWzM/zWRfmNMay8syP3X4AMiJpoJL3JZ1Ysb5Eziaimq2hRSA3ZWRxkEOut6
	XwY9Dpl1v1zqQ==
Received: from ovs-laptop.ov
	(host-195-142-216-120.reverse.superonline.net [195.142.216.120])
	by pv33p04im-asmtp002.me.com
	(Oracle Communications Messaging Server 7.0.5.38.0 64bit (built Feb
	26 2016))
	with ESMTPSA id <0OCD00KQ4H19Z750@pv33p04im-asmtp002.me.com> for
	ffmpeg-devel@ffmpeg.org; Tue, 23 Aug 2016 17:10:25 +0000 (GMT)
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,,
	definitions=2016-08-23_10:,, signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0
	clxscore=1015 suspectscore=13 malwarescore=0 phishscore=0
	adultscore=0
	bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1
	engine=8.0.1-1603290000 definitions=main-1608230171
From: Oliver Collyer <ovcollyer@mac.com>
Message-id: <26E0E9A5-8B82-4D90-A97C-50E62FF69AB6@mac.com>
Date: Tue, 23 Aug 2016 20:10:21 +0300
To: ffmpeg-devel@ffmpeg.org
MIME-version: 1.0 (Mac OS X Mail 9.3 \(3124\))
X-Mailer: Apple Mail (2.3124)
Subject: [FFmpeg-devel] [PATCH] Nvidia NVENC 10-bit HEVC encoding and rate
	control lookahead support
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <http://ffmpeg.org/mailman/options/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <http://ffmpeg.org/pipermail/ffmpeg-devel/>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Hi all

Attached is a patch for the above.

10-bit HEVC encoding is a new feature of the latest Pascal Nvidia GPUs, released in the past few months; I’ve added support for the yuv420p10le and yuv444p10le pixel formats.

Rate control lookahead is available on pre-Pascal models too but is available with the latest SDK/latest drivers.

As part of this I’ve bumped the required SDK version to the latest, which is 7.

Feedback welcome. This is only my second patch; I seem to average about one a year :)

Regards

Oliver
---
configure               |   4 +-
libavcodec/nvenc.c      | 120 ++++++++++++++++++++++++++++++++++++++++++++++--
libavcodec/nvenc.h      |   6 +++
libavcodec/nvenc_hevc.c |   6 ++-
4 files changed, 129 insertions(+), 7 deletions(-)

diff --git a/configure b/configure
index 9b92426..46ff144 100755
--- a/configure
+++ b/configure
@@ -5774,8 +5774,8 @@ enabled mmal && check_func_headers interface/mmal/mmal.h "MMAL_PARAMETER_VIDEO_M

enabled netcdf            && require_pkg_config netcdf netcdf.h nc_inq_libvers
enabled nvenc             && { check_header nvEncodeAPI.h || die "ERROR: nvEncodeAPI.h not found."; } &&
-                             { check_cpp_condition nvEncodeAPI.h "NVENCAPI_MAJOR_VERSION >= 6" ||
-                               die "ERROR: NVENC API version 5 or older is not supported"; } &&
+                             { check_cpp_condition nvEncodeAPI.h "NVENCAPI_MAJOR_VERSION >= 7" ||
+                               die "ERROR: NVENC API version 6 or older is not supported"; } &&
                             { [ $target_os != cygwin ] || die "ERROR: NVENC is not supported on Cygwin currently."; }
enabled openal            && { { for al_libs in "${OPENAL_LIBS}" "-lopenal" "-lOpenAL32"; do
                               check_lib 'AL/al.h' alGetError "${al_libs}" && break; done } ||
diff --git a/libavcodec/nvenc.c b/libavcodec/nvenc.c
index 984dd3b..685dd7d 100644
--- a/libavcodec/nvenc.c
+++ b/libavcodec/nvenc.c
@@ -75,8 +75,10 @@

const enum AVPixelFormat ff_nvenc_pix_fmts[] = {
    AV_PIX_FMT_YUV420P,
+    AV_PIX_FMT_YUV420P10LE,
    AV_PIX_FMT_NV12,
    AV_PIX_FMT_YUV444P,
+    AV_PIX_FMT_YUV444P10LE,
#if CONFIG_CUDA
    AV_PIX_FMT_CUDA,
#endif
@@ -314,6 +316,18 @@ static int nvenc_check_capabilities(AVCodecContext *avctx)
        return AVERROR(ENOSYS);
    }

+    ret = nvenc_check_cap(avctx, NV_ENC_CAPS_SUPPORT_10BIT_ENCODE);
+    if ((ctx->data_pix_fmt == AV_PIX_FMT_YUV420P10LE || ctx->data_pix_fmt == AV_PIX_FMT_YUV444P10LE) && ret <= 0) {
+        av_log(avctx, AV_LOG_VERBOSE, "10 bit encode not supported\n");
+        return AVERROR(ENOSYS);
+    }
+
+    ret = nvenc_check_cap(avctx, NV_ENC_CAPS_SUPPORT_LOOKAHEAD);
+    if (ctx->rc_lookahead > 0 && ret <= 0) {
+        av_log(avctx, AV_LOG_VERBOSE, "RC lookahead not supported\n");
+        return AVERROR(ENOSYS);
+    }
+
    return 0;
}

@@ -673,6 +687,11 @@ static av_cold void nvenc_setup_rate_control(AVCodecContext *avctx)
    } else if (ctx->encode_config.rcParams.averageBitRate > 0) {
        ctx->encode_config.rcParams.vbvBufferSize = 2 * ctx->encode_config.rcParams.averageBitRate;
    }
+
+    if (ctx->rc_lookahead > 0) {
+        ctx->encode_config.rcParams.enableLookahead = 1;
+        ctx->encode_config.rcParams.lookaheadDepth = FFMIN(ctx->rc_lookahead, 32);
+    }
}

static av_cold int nvenc_setup_h264_config(AVCodecContext *avctx)
@@ -800,9 +819,26 @@ static av_cold int nvenc_setup_hevc_config(AVCodecContext *avctx)
        hevc->outputPictureTimingSEI   = 1;
    }

-    /* No other profile is supported in the current SDK version 5 */
-    cc->profileGUID = NV_ENC_HEVC_PROFILE_MAIN_GUID;
-    avctx->profile = FF_PROFILE_HEVC_MAIN;
+    switch(ctx->profile) {
+    case NV_ENC_HEVC_PROFILE_MAIN:
+        cc->profileGUID = NV_ENC_HEVC_PROFILE_MAIN_GUID;
+        avctx->profile = FF_PROFILE_HEVC_MAIN;
+        break;
+    case NV_ENC_HEVC_PROFILE_MAIN_10:
+        cc->profileGUID = NV_ENC_HEVC_PROFILE_MAIN10_GUID;
+        avctx->profile = FF_PROFILE_HEVC_MAIN_10;
+        break;
+    }
+
+    // force setting profile as main10 if input is AV_PIX_FMT_YUVXXXP10LE
+    if (ctx->data_pix_fmt == AV_PIX_FMT_YUV420P10LE || ctx->data_pix_fmt == AV_PIX_FMT_YUV444P10LE) {
+        cc->profileGUID = NV_ENC_HEVC_PROFILE_MAIN10_GUID;
+        avctx->profile = FF_PROFILE_HEVC_MAIN_10;
+    }
+
+    hevc->chromaFormatIDC = ctx->data_pix_fmt == AV_PIX_FMT_YUV444P || ctx->data_pix_fmt == AV_PIX_FMT_YUV444P10LE ? 3 : 1;
+
+    hevc->pixelBitDepthMinus8 = ctx->data_pix_fmt == AV_PIX_FMT_YUV420P10LE || ctx->data_pix_fmt == AV_PIX_FMT_YUV444P10LE ? 2 : 0;

    hevc->level = ctx->level;

@@ -954,6 +990,10 @@ static av_cold int nvenc_alloc_surface(AVCodecContext *avctx, int idx)
        ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_YV12_PL;
        break;

+    case AV_PIX_FMT_YUV420P10LE:
+        ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_YUV420_10BIT;
+        break;
+
    case AV_PIX_FMT_NV12:
        ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_NV12_PL;
        break;
@@ -962,6 +1002,10 @@ static av_cold int nvenc_alloc_surface(AVCodecContext *avctx, int idx)
        ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_YUV444_PL;
        break;

+    case AV_PIX_FMT_YUV444P10LE:
+        ctx->surfaces[idx].format = NV_ENC_BUFFER_FORMAT_YUV444_10BIT;
+        break;
+
    default:
        av_log(avctx, AV_LOG_FATAL, "Invalid input pixel format\n");
        return AVERROR(EINVAL);
@@ -1206,6 +1250,49 @@ static NvencSurface *get_free_frame(NvencContext *ctx)
    return NULL;
}

+static void copy_single_10bit_plane(uint8_t *dst, int dst_linesize,
+                                    const uint8_t *src, int src_linesize,
+                                    int width, int height)
+{
+    if (!dst || !src)
+        return;
+    av_assert0(abs(src_linesize) >= width << 1);
+    av_assert0(abs(dst_linesize) >= width << 1);
+    for (;height > 0; height--) {
+        uint16_t* tdst = (uint16_t*)dst;
+        uint16_t* tsrc = (uint16_t*)src;
+        for (int w = width; w > 0; w--) {
+            *tdst++ = *tsrc++ << 6;
+        }
+        dst += dst_linesize;
+        src += src_linesize;
+    }
+}
+
+static void interleave_10bit_planes(uint8_t *dst, int dst_linesize,
+                                    const uint8_t *src1, int src1_linesize,
+                                    const uint8_t *src2, int src2_linesize,
+                                    int width, int height)
+{
+    if (!dst || !src1 || !src2)
+        return;
+    av_assert0(abs(src1_linesize) >= width);
+    av_assert0(abs(src2_linesize) >= width);
+    av_assert0(abs(dst_linesize) >= width << 1);
+    for (;height > 0; height--) {
+        uint16_t* tdst = (uint16_t*)dst;
+        uint16_t* tsrc1 = (uint16_t*)src1;
+        uint16_t* tsrc2 = (uint16_t*)src2;
+        for (int w = width; w > 0; w-=2) {
+            *tdst++ = *tsrc1++ << 6;
+            *tdst++ = *tsrc2++ << 6;
+        }
+        dst += dst_linesize;
+        src1 += src1_linesize;
+        src2 += src2_linesize;
+    }
+}
+
static int nvenc_copy_frame(AVCodecContext *avctx, NvencSurface *inSurf,
            NV_ENC_LOCK_INPUT_BUFFER *lockBufferParams, const AVFrame *frame)
{
@@ -1228,6 +1315,17 @@ static int nvenc_copy_frame(AVCodecContext *avctx, NvencSurface *inSurf,
        av_image_copy_plane(buf, lockBufferParams->pitch >> 1,
            frame->data[1], frame->linesize[1],
            avctx->width >> 1, avctx->height >> 1);
+    } else if (frame->format == AV_PIX_FMT_YUV420P10LE) {
+        copy_single_10bit_plane(buf, lockBufferParams->pitch,
+            frame->data[0], frame->linesize[0],
+            avctx->width, avctx->height);
+
+        buf += off;
+
+        interleave_10bit_planes(buf, lockBufferParams->pitch,
+            frame->data[1], frame->linesize[1],
+            frame->data[2], frame->linesize[2],
+            avctx->width, avctx->height >> 1);
    } else if (frame->format == AV_PIX_FMT_NV12) {
        av_image_copy_plane(buf, lockBufferParams->pitch,
            frame->data[0], frame->linesize[0],
@@ -1254,6 +1352,22 @@ static int nvenc_copy_frame(AVCodecContext *avctx, NvencSurface *inSurf,
        av_image_copy_plane(buf, lockBufferParams->pitch,
            frame->data[2], frame->linesize[2],
            avctx->width, avctx->height);
+    } else if (frame->format == AV_PIX_FMT_YUV444P10LE) {
+        copy_single_10bit_plane(buf, lockBufferParams->pitch,
+            frame->data[0], frame->linesize[0],
+            avctx->width, avctx->height);
+
+        buf += off;
+
+        copy_single_10bit_plane(buf, lockBufferParams->pitch,
+            frame->data[1], frame->linesize[1],
+            avctx->width, avctx->height);
+
+        buf += off;
+
+        copy_single_10bit_plane(buf, lockBufferParams->pitch,
+            frame->data[2], frame->linesize[2],
+            avctx->width, avctx->height);
    } else {
        av_log(avctx, AV_LOG_FATAL, "Invalid pixel format!\n");
        return AVERROR(EINVAL);
diff --git a/libavcodec/nvenc.h b/libavcodec/nvenc.h
index 961cbc7..9366a26 100644
--- a/libavcodec/nvenc.h
+++ b/libavcodec/nvenc.h
@@ -117,6 +117,11 @@ enum {
};

enum {
+    NV_ENC_HEVC_PROFILE_MAIN,
+    NV_ENC_HEVC_PROFILE_MAIN_10,
+};
+
+enum {
    NVENC_LOWLATENCY = 1,
    NVENC_LOSSLESS   = 2,
    NVENC_ONE_PASS   = 4,
@@ -174,6 +179,7 @@ typedef struct NvencContext
    int device;
    int flags;
    int async_depth;
+    int rc_lookahead;
} NvencContext;

int ff_nvenc_encode_init(AVCodecContext *avctx);
diff --git a/libavcodec/nvenc_hevc.c b/libavcodec/nvenc_hevc.c
index 1ce7c89..04e351a 100644
--- a/libavcodec/nvenc_hevc.c
+++ b/libavcodec/nvenc_hevc.c
@@ -39,8 +39,9 @@ static const AVOption options[] = {
    { "llhp",       "low latency hp",                     0,                   AV_OPT_TYPE_CONST,  { .i64 = PRESET_LOW_LATENCY_HP }, 0, 0, VE, "preset" },
    { "lossless",   "lossless",                           0,                   AV_OPT_TYPE_CONST,  { .i64 = PRESET_LOSSLESS_DEFAULT }, 0, 0, VE, "preset" },
    { "losslesshp", "lossless hp",                        0,                   AV_OPT_TYPE_CONST,  { .i64 = PRESET_LOSSLESS_HP }, 0, 0, VE, "preset" },
-    { "profile", "Set the encoding profile",             OFFSET(profile),      AV_OPT_TYPE_INT,    { .i64 = FF_PROFILE_HEVC_MAIN }, FF_PROFILE_HEVC_MAIN, FF_PROFILE_HEVC_MAIN, VE, "profile" },
-    { "main",    "",                                     0,                    AV_OPT_TYPE_CONST,  { .i64 = FF_PROFILE_HEVC_MAIN }, 0, 0, VE, "profile" },
+    { "profile", "Set the encoding profile",             OFFSET(profile),      AV_OPT_TYPE_INT,    { .i64 = NV_ENC_HEVC_PROFILE_MAIN }, NV_ENC_HEVC_PROFILE_MAIN, FF_PROFILE_HEVC_MAIN_10, VE, "profile" },
+    { "main",    "",                                     0,                    AV_OPT_TYPE_CONST,  { .i64 = NV_ENC_HEVC_PROFILE_MAIN }, 0, 0, VE, "profile" },
+    { "main10",  "",                                     0,                    AV_OPT_TYPE_CONST,  { .i64 = NV_ENC_HEVC_PROFILE_MAIN_10 }, 0, 0, VE, "profile" },
    { "level",   "Set the encoding level restriction",   OFFSET(level),        AV_OPT_TYPE_INT,    { .i64 = NV_ENC_LEVEL_AUTOSELECT }, NV_ENC_LEVEL_AUTOSELECT, NV_ENC_LEVEL_HEVC_62, VE, "level" },
    { "auto",    "",                                     0,                    AV_OPT_TYPE_CONST,  { .i64 = NV_ENC_LEVEL_AUTOSELECT },  0, 0, VE,  "level" },
    { "1",       "",                                     0,                    AV_OPT_TYPE_CONST,  { .i64 = NV_ENC_LEVEL_HEVC_1 },  0, 0, VE,  "level" },
@@ -73,6 +74,7 @@ static const AVOption options[] = {
    { "ll_2pass_quality", "Multi-pass optimized for image quality (only for low-latency presets)",       0, AV_OPT_TYPE_CONST,  { .i64 = NV_ENC_PARAMS_RC_2_PASS_QUALITY },       0, 0, VE, "rc" },
    { "ll_2pass_size",    "Multi-pass optimized for constant frame size (only for low-latency presets)", 0, AV_OPT_TYPE_CONST,  { .i64 = NV_ENC_PARAMS_RC_2_PASS_FRAMESIZE_CAP }, 0, 0, VE, "rc" },
    { "vbr_2pass",        "Multi-pass variable bitrate mode",                                            0, AV_OPT_TYPE_CONST,  { .i64 = NV_ENC_PARAMS_RC_2_PASS_VBR },           0, 0, VE, "rc" },
+    { "rc-lookahead",  "Number of frames to look ahead for rate-control", OFFSET(rc_lookahead), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, INT_MAX, VE },
    { "surfaces", "Number of concurrent surfaces",        OFFSET(nb_surfaces), AV_OPT_TYPE_INT,    { .i64 = 32 },                   0, INT_MAX, VE },
    { "cbr", "Use cbr encoding mode", OFFSET(cbr), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE },
    { "2pass", "Use 2pass encoding mode", OFFSET(twopass), AV_OPT_TYPE_BOOL, { .i64 = -1 }, -1, 1, VE },