From patchwork Sun Nov 20 01:18:08 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Philip Langdale <philipl@overt.org>
X-Patchwork-Id: 1494
Delivered-To: ffmpegpatchwork@gmail.com
Received: by 10.103.90.1 with SMTP id o1csp873666vsb;
	Sat, 19 Nov 2016 17:18:24 -0800 (PST)
X-Received: by 10.28.128.211 with SMTP id b202mr6692443wmd.7.1479604704468;
	Sat, 19 Nov 2016 17:18:24 -0800 (PST)
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
	by mx.google.com with ESMTP id
	199si7849107wmm.166.2016.11.19.17.18.23;
	Sat, 19 Nov 2016 17:18:24 -0800 (PST)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
	dkim=neutral (body hash did not verify) header.i=@overt.org;
	dkim=neutral (body hash did not verify) header.i=@overt.org;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DCA5A689DAA;
	Sun, 20 Nov 2016 03:18:19 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from so254-29.mailgun.net (so254-29.mailgun.net [198.61.254.29])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 34F94689CD1
	for <ffmpeg-devel@ffmpeg.org>; Sun, 20 Nov 2016 03:18:14 +0200 (EET)
DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=overt.org; q=dns/txt;
	s=k1;
	t=1479604693; h=Message-Id: Date: Subject: Cc: To: From: Sender;
	bh=yqB0SMj6zXDOW/mtZlidpPwbaSWUKKTwBRnyfNf3YZY=;
	b=CYTfbkq2yUfJRIpN6H114JvBtXGV18E6Jg2RvDeGYQzAm9m0z0ACpGv8MjerbG5hJKcjzI16
	xRALK5ydD6miGcyG3ZOjLzq1yoEh6VD2n01SNUNGQBxiD3pqHzr9CfhP2dNqe6/24t5rQxmU
	4B0f9L8+uv5nBlCbupzVwBp73OU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=overt.org; s=k1; q=dns;
	h=Sender: From: To: Cc: Subject: Date: Message-Id;
	b=Jl7yBnEPqkUBuVuy8+7z1evB6EvmN/FF/6aU+vcEvLPpNDL+iQnQ+dn18U0rvDG4cpK+Cj
	Mh4cHmVB5oELItWnMYeZEEWRHCfMogLZimAu6PHz8xtL97djPeCjsh6H5mcO2stD738l8HIp
	a2C2NotEDnRANy/wCm4cAeZNJ6qcY=
X-Mailgun-Sending-Ip: 198.61.254.29
X-Mailgun-Sid: 
 WyIyM2Q3MCIsICJmZm1wZWctZGV2ZWxAZmZtcGVnLm9yZyIsICI0YTg5NjEiXQ==
Received: from mail.overt.org (155.208.178.107.bc.googleusercontent.com
	[107.178.208.155])
	by mxa.mailgun.org with ESMTP id 5830f9d5.7fd63036c6f8-in01;
	Sun, 20 Nov 2016 01:18:13 -0000 (UTC)
Received: from authenticated-user (mail.overt.org [107.178.208.155])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128
	bits)) (No client certificate requested)
	by mail.overt.org (Postfix) with ESMTPSA id EAFA160594;
	Sun, 20 Nov 2016 01:18:12 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=overt.org; s=mail;
	t=1479604693; bh=rggmvxbrrdLz05UOpCO5gtPsR30z67VT9FrsMtQYORw=;
	h=From:To:Cc:Subject:Date:From;
	b=u5DFZ+j1RMgX4ewCgyorYprb6fQKjgP0H7JNCqmjk7tPJTLojp9jIQmv+LGrFlFUj
	UOCIz/DpnQUkV3BrzaLTXyiAVTaiHiLwzFThUzNruvc8c0/q8VS5P21Uv8Sacgk7rJ
	alVsEpMscccfJwHtrre1bn9cOP8ghIeCWoOSrSh48rgnPy9VjbKssoZbj6U5ECLD3T
	mcOtL23rGeXu748rPSjzExKZs3lz93SN13MI+RanFiXSSuvUHqH8N/lMyhnYEDripn
	SZ2KnWrhzi8OmetEvUaie9gApi7vusQEzcbM8WpvlfuX3pl4hux/CTI0wOnawD/gBf
	F8X0R3aTTUEzg==
From: Philip Langdale <philipl@overt.org>
To: ffmpeg-devel@ffmpeg.org
Date: Sat, 19 Nov 2016 17:18:08 -0800
Message-Id: <20161120011808.4352-1-philipl@overt.org>
Subject: [FFmpeg-devel] [PATCH] avcodec/cuvid: Add support for P010 as an
	output surface format
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <http://ffmpeg.org/mailman/options/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <http://ffmpeg.org/pipermail/ffmpeg-devel/>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
Cc: Philip Langdale <philipl@overt.org>
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

The nvidia 375.xx driver introduces support for P016 output surfaces,
for 10bit and 12bit HEVC content (it's also the first driver to support
hardware decoding of 12bit content).

Technically, we don't support P016, but in practice I don't think we
zero-out the extra bits in P010 so it can be used to carry the data.

This change introduces cuvid decoder support for P010 output for
output to hardware and system memory surfaces. For simplicity, it
does not maintain the previous ability to output NV12 for > 8 bit
input video - the user will need to update their driver to decode
such videos.

After this change, both cuvid and nvenc support P010, but the
ffmpeg_cuvid transcoding logic will need more work to connect the
two together. Similarly, the scale_npp filter still only works with
8bit surfaces.

Signed-off-by: Philip Langdale <philipl@overt.org>
---
 compat/cuda/dynlink_cuviddec.h |  3 ++-
 libavcodec/cuvid.c             | 58 +++++++++++++++++++++++++++++++-----------
 libavutil/hwcontext_cuda.c     | 11 +++++++-
 3 files changed, 55 insertions(+), 17 deletions(-)

diff --git a/compat/cuda/dynlink_cuviddec.h b/compat/cuda/dynlink_cuviddec.h
index 17207bc..9ff2741 100644
--- a/compat/cuda/dynlink_cuviddec.h
+++ b/compat/cuda/dynlink_cuviddec.h
@@ -83,7 +83,8 @@ typedef enum cudaVideoCodec_enum {
  * Video Surface Formats Enums
  */
 typedef enum cudaVideoSurfaceFormat_enum {
-    cudaVideoSurfaceFormat_NV12=0       /**< NV12 (currently the only supported output format)  */
+    cudaVideoSurfaceFormat_NV12=0,      /**< NV12  */
+    cudaVideoSurfaceFormat_P016=1       /**< P016  */
 } cudaVideoSurfaceFormat;
 
 /*!
diff --git a/libavcodec/cuvid.c b/libavcodec/cuvid.c
index c3e831a..34b0734 100644
--- a/libavcodec/cuvid.c
+++ b/libavcodec/cuvid.c
@@ -28,6 +28,7 @@
 #include "libavutil/fifo.h"
 #include "libavutil/log.h"
 #include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
 
 #include "avcodec.h"
 #include "internal.h"
@@ -103,11 +104,35 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
     CuvidContext *ctx = avctx->priv_data;
     AVHWFramesContext *hwframe_ctx = (AVHWFramesContext*)ctx->hwframe->data;
     CUVIDDECODECREATEINFO cuinfo;
+    int surface_fmt;
+
+    enum AVPixelFormat pix_fmts_nv12[3] = { AV_PIX_FMT_CUDA,
+                                            AV_PIX_FMT_NV12,
+                                            AV_PIX_FMT_NONE };
+
+    enum AVPixelFormat pix_fmts_p010[3] = { AV_PIX_FMT_CUDA,
+                                            AV_PIX_FMT_P010,
+                                            AV_PIX_FMT_NONE };
 
     av_log(avctx, AV_LOG_TRACE, "pfnSequenceCallback, progressive_sequence=%d\n", format->progressive_sequence);
 
     ctx->internal_error = 0;
 
+    surface_fmt = ff_get_format(avctx, format->bit_depth_luma_minus8 > 0 ?
+                                pix_fmts_p010 : pix_fmts_nv12);
+    if (surface_fmt < 0) {
+        av_log(avctx, AV_LOG_ERROR, "ff_get_format failed: %d\n", surface_fmt);
+        ctx->internal_error = AVERROR(EINVAL);
+        return 0;
+    }
+
+    av_log(avctx, AV_LOG_VERBOSE, "Formats: Original: %s | HW: %s | SW: %s\n",
+           av_get_pix_fmt_name(avctx->pix_fmt),
+           av_get_pix_fmt_name(surface_fmt),
+           av_get_pix_fmt_name(avctx->sw_pix_fmt));
+
+    avctx->pix_fmt = surface_fmt;
+
     avctx->width = format->display_area.right;
     avctx->height = format->display_area.bottom;
 
@@ -156,7 +181,7 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
             hwframe_ctx->width < avctx->width ||
             hwframe_ctx->height < avctx->height ||
             hwframe_ctx->format != AV_PIX_FMT_CUDA ||
-            hwframe_ctx->sw_format != AV_PIX_FMT_NV12)) {
+            hwframe_ctx->sw_format != avctx->sw_pix_fmt)) {
         av_log(avctx, AV_LOG_ERROR, "AVHWFramesContext is already initialized with incompatible parameters\n");
         ctx->internal_error = AVERROR(EINVAL);
         return 0;
@@ -177,7 +202,19 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
 
     cuinfo.CodecType = ctx->codec_type = format->codec;
     cuinfo.ChromaFormat = format->chroma_format;
-    cuinfo.OutputFormat = cudaVideoSurfaceFormat_NV12;
+
+    switch (avctx->sw_pix_fmt) {
+    case AV_PIX_FMT_NV12:
+        cuinfo.OutputFormat = cudaVideoSurfaceFormat_NV12;
+        break;
+    case AV_PIX_FMT_P010:
+        cuinfo.OutputFormat = cudaVideoSurfaceFormat_P016;
+        break;
+    default:
+        av_log(avctx, AV_LOG_ERROR, "Output formats other than NV12 or P010 are not supported\n");
+        ctx->internal_error = AVERROR(EINVAL);
+        return 0;
+    }
 
     cuinfo.ulWidth = avctx->coded_width;
     cuinfo.ulHeight = avctx->coded_height;
@@ -209,7 +246,7 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
 
     if (!hwframe_ctx->pool) {
         hwframe_ctx->format = AV_PIX_FMT_CUDA;
-        hwframe_ctx->sw_format = AV_PIX_FMT_NV12;
+        hwframe_ctx->sw_format = avctx->sw_pix_fmt;
         hwframe_ctx->width = avctx->width;
         hwframe_ctx->height = avctx->height;
 
@@ -417,7 +454,8 @@ static int cuvid_output_frame(AVCodecContext *avctx, AVFrame *frame)
 
                 offset += avctx->coded_height;
             }
-        } else if (avctx->pix_fmt == AV_PIX_FMT_NV12) {
+        } else if (avctx->pix_fmt == AV_PIX_FMT_NV12 ||
+                   avctx->pix_fmt == AV_PIX_FMT_P010) {
             AVFrame *tmp_frame = av_frame_alloc();
             if (!tmp_frame) {
                 av_log(avctx, AV_LOG_ERROR, "av_frame_alloc failed\n");
@@ -615,17 +653,6 @@ static av_cold int cuvid_decode_init(AVCodecContext *avctx)
     const AVBitStreamFilter *bsf;
     int ret = 0;
 
-    enum AVPixelFormat pix_fmts[3] = { AV_PIX_FMT_CUDA,
-                                       AV_PIX_FMT_NV12,
-                                       AV_PIX_FMT_NONE };
-
-    ret = ff_get_format(avctx, pix_fmts);
-    if (ret < 0) {
-        av_log(avctx, AV_LOG_ERROR, "ff_get_format failed: %d\n", ret);
-        return ret;
-    }
-    avctx->pix_fmt = ret;
-
     ret = cuvid_load_functions(&ctx->cvdl);
     if (ret < 0) {
         av_log(avctx, AV_LOG_ERROR, "Failed loading nvcuvid.\n");
@@ -899,6 +926,7 @@ static const AVOption options[] = {
         .capabilities   = AV_CODEC_CAP_DELAY | AV_CODEC_CAP_AVOID_PROBING, \
         .pix_fmts       = (const enum AVPixelFormat[]){ AV_PIX_FMT_CUDA, \
                                                         AV_PIX_FMT_NV12, \
+                                                        AV_PIX_FMT_P010, \
                                                         AV_PIX_FMT_NONE }, \
     };
 
diff --git a/libavutil/hwcontext_cuda.c b/libavutil/hwcontext_cuda.c
index 30de299..e413aa8 100644
--- a/libavutil/hwcontext_cuda.c
+++ b/libavutil/hwcontext_cuda.c
@@ -35,6 +35,7 @@ static const enum AVPixelFormat supported_formats[] = {
     AV_PIX_FMT_NV12,
     AV_PIX_FMT_YUV420P,
     AV_PIX_FMT_YUV444P,
+    AV_PIX_FMT_P010,
 };
 
 static void cuda_buffer_free(void *opaque, uint8_t *data)
@@ -111,6 +112,7 @@ static int cuda_frames_init(AVHWFramesContext *ctx)
             size = aligned_width * ctx->height * 3 / 2;
             break;
         case AV_PIX_FMT_YUV444P:
+        case AV_PIX_FMT_P010:
             size = aligned_width * ctx->height * 3;
             break;
         }
@@ -125,7 +127,13 @@ static int cuda_frames_init(AVHWFramesContext *ctx)
 
 static int cuda_get_buffer(AVHWFramesContext *ctx, AVFrame *frame)
 {
-    int aligned_width = FFALIGN(ctx->width, CUDA_FRAME_ALIGNMENT);
+    int aligned_width;
+    int width_in_bytes = ctx->width;
+
+    if (ctx->sw_format == AV_PIX_FMT_P010) {
+       width_in_bytes *= 2;
+    }
+    aligned_width = FFALIGN(width_in_bytes, CUDA_FRAME_ALIGNMENT);
 
     frame->buf[0] = av_buffer_pool_get(ctx->pool);
     if (!frame->buf[0])
@@ -133,6 +141,7 @@ static int cuda_get_buffer(AVHWFramesContext *ctx, AVFrame *frame)
 
     switch (ctx->sw_format) {
     case AV_PIX_FMT_NV12:
+    case AV_PIX_FMT_P010:
         frame->data[0]     = frame->buf[0]->data;
         frame->data[1]     = frame->data[0] + aligned_width * ctx->height;
         frame->linesize[0] = aligned_width;