From patchwork Tue Nov 22 16:27:48 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Philip Langdale <philipl@overt.org>
X-Patchwork-Id: 1528
Delivered-To: ffmpegpatchwork@gmail.com
Received: by 10.103.90.1 with SMTP id o1csp2277238vsb;
	Tue, 22 Nov 2016 08:28:16 -0800 (PST)
X-Received: by 10.28.169.74 with SMTP id s71mr3103741wme.1.1479832096447;
	Tue, 22 Nov 2016 08:28:16 -0800 (PST)
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
	by mx.google.com with ESMTP id
	e70si3307244wma.135.2016.11.22.08.28.16;
	Tue, 22 Nov 2016 08:28:16 -0800 (PST)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
	dkim=neutral (body hash did not verify) header.i=@overt.org;
	dkim=neutral (body hash did not verify) header.i=@overt.org;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BD32A689213;
	Tue, 22 Nov 2016 18:28:00 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from so254-29.mailgun.net (so254-29.mailgun.net [198.61.254.29])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AAD78687EE8
	for <ffmpeg-devel@ffmpeg.org>; Tue, 22 Nov 2016 18:27:53 +0200 (EET)
DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=overt.org; q=dns/txt;
	s=k1; t=1479832075;
	h=References: In-Reply-To: Message-Id: Date: Subject: Cc:
	To: From: Sender; bh=oDwrmFD0PQq2U9Oxig6N6BFZ0x/OR9e6QgsyaXBxd9E=;
	b=LW10ZTxpkatmvCiMq7ii/vw//6dzbzFVK94V5DmjQPfxRb2NrPqWkxD1Ru/EB5W3qr5pVrkd
	bAxZpaCqkp7gv5nyRGx3muUkwJ6r049FPBaVj57OKmLNuhmYmYZEqQzA5K7rkEFJErqevhLE
	1nRXB9vIXQpuRTtXNxZfU20Cd9w=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=overt.org; s=k1; q=dns;
	h=Sender: From: To: Cc: Subject: Date: Message-Id: In-Reply-To:
	References;
	b=jFDpE3Y9jbT77hxTKcHkqgxkNrSkxrrAvLNhjCZHUtk5tWYoNiZupWuCBz3F9FiA01lMII
	ioKQ711Lvh1W/kkj2E6qUBBBENszmE0VYAGFL+hxUibO96duMuL4eoj0uUR0ExgMEZqq+OBn
	EEmSR77hsTMJT6XOYYxqb35Kzo1ck=
X-Mailgun-Sending-Ip: 198.61.254.29
X-Mailgun-Sid: 
 WyIyM2Q3MCIsICJmZm1wZWctZGV2ZWxAZmZtcGVnLm9yZyIsICI0YTg5NjEiXQ==
Received: from mail.overt.org (155.208.178.107.bc.googleusercontent.com
	[107.178.208.155])
	by mxa.mailgun.org with ESMTP id 5834720a.7f21a02aea40-smtp-out-n03;
	Tue, 22 Nov 2016 16:27:54 -0000 (UTC)
Received: from authenticated-user (mail.overt.org [107.178.208.155])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128
	bits)) (No client certificate requested)
	by mail.overt.org (Postfix) with ESMTPSA id EF3FF6819E;
	Tue, 22 Nov 2016 16:27:53 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=overt.org; s=mail;
	t=1479832074; bh=dZ2qj4l5lgbayv63fk8lzBA/ItkbtdujMWES+tUx4ZI=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=WevXb/6Nguf1rR2R3BIVp048DZDuLcVMKsUvenQP4AJHv36330wBHoU9/4HNys69/
	V+ugAcYVbHIy1/KWbMNN+koLNmJyTfLNhyD3xiu+U/80tgUYsiXvM86Axxgj7dLCJX
	+HX/7bQjxjeqw/7WD9fn4hfTB2npe+isX+IMyYP+9I8iwKsUP8ockQRWmw/eMb9xeb
	Gh39bmTShGMGGxgFR6CRbHeU903seKv5Ndfkw44pHH/VkRt7VaRAAfFIKGbu9i4GqC
	0Y8ZKAE2UOvDKLmoOD7mLzsV3xpiFAlDNBsJXUD93kasxulXHEDHMD/lxupE6aumQI
	EYZVoyRmO+gfA==
From: Philip Langdale <philipl@overt.org>
To: ffmpeg-devel@ffmpeg.org
Date: Tue, 22 Nov 2016 08:27:48 -0800
Message-Id: <20161122162748.6476-3-philipl@overt.org>
In-Reply-To: <20161122162748.6476-1-philipl@overt.org>
References: <20161122162748.6476-1-philipl@overt.org>
Subject: [FFmpeg-devel] [PATCH 2/2] avcodec/cuvid: Add support for P010/P016
	as an output surface format
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <http://ffmpeg.org/mailman/options/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <http://ffmpeg.org/pipermail/ffmpeg-devel/>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
Cc: Philip Langdale <philipl@overt.org>
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

The nvidia 375.xx driver introduces support for P016 output surfaces,
for 10bit and 12bit HEVC content (it's also the first driver to support
hardware decoding of 12bit content).

The cuvid api, as far as I can tell, only declares one output format
that they appear to refer to as P016 in the driver strings. Of course,
10bit content in P016 is identical to P010, and it is useful for
compatibility purposes to declare the format to be P010 to work with
other components that only know how to consume P010 (and to avoid
triggering swscale conversions that are lossy when they shouldn't be).

For simplicity, this change does not maintain the previous ability
to output dithered NV12 for 10/12 bit input video - the user will need
to update their driver to decode such videos.

Signed-off-by: Philip Langdale <philipl@overt.org>
---
 compat/cuda/dynlink_cuviddec.h |  3 +-
 libavcodec/cuvid.c             | 80 +++++++++++++++++++++++++++++++++---------
 2 files changed, 66 insertions(+), 17 deletions(-)

diff --git a/compat/cuda/dynlink_cuviddec.h b/compat/cuda/dynlink_cuviddec.h
index 17207bc..9ff2741 100644
--- a/compat/cuda/dynlink_cuviddec.h
+++ b/compat/cuda/dynlink_cuviddec.h
@@ -83,7 +83,8 @@ typedef enum cudaVideoCodec_enum {
  * Video Surface Formats Enums
  */
 typedef enum cudaVideoSurfaceFormat_enum {
-    cudaVideoSurfaceFormat_NV12=0       /**< NV12 (currently the only supported output format)  */
+    cudaVideoSurfaceFormat_NV12=0,      /**< NV12  */
+    cudaVideoSurfaceFormat_P016=1       /**< P016  */
 } cudaVideoSurfaceFormat;
 
 /*!
diff --git a/libavcodec/cuvid.c b/libavcodec/cuvid.c
index 65468dd..2b2e8ae 100644
--- a/libavcodec/cuvid.c
+++ b/libavcodec/cuvid.c
@@ -28,6 +28,7 @@
 #include "libavutil/fifo.h"
 #include "libavutil/log.h"
 #include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
 
 #include "avcodec.h"
 #include "internal.h"
@@ -102,11 +103,53 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
     CuvidContext *ctx = avctx->priv_data;
     AVHWFramesContext *hwframe_ctx = (AVHWFramesContext*)ctx->hwframe->data;
     CUVIDDECODECREATEINFO cuinfo;
+    int surface_fmt;
+
+    enum AVPixelFormat pix_fmts_nv12[3] = { AV_PIX_FMT_CUDA,
+                                            AV_PIX_FMT_NV12,
+                                            AV_PIX_FMT_NONE };
+
+    enum AVPixelFormat pix_fmts_p010[3] = { AV_PIX_FMT_CUDA,
+                                            AV_PIX_FMT_P010,
+                                            AV_PIX_FMT_NONE };
+
+    enum AVPixelFormat pix_fmts_p016[3] = { AV_PIX_FMT_CUDA,
+                                            AV_PIX_FMT_P016,
+                                            AV_PIX_FMT_NONE };
 
     av_log(avctx, AV_LOG_TRACE, "pfnSequenceCallback, progressive_sequence=%d\n", format->progressive_sequence);
 
     ctx->internal_error = 0;
 
+    switch (format->bit_depth_luma_minus8) {
+    case 0: // 8-bit
+        surface_fmt = ff_get_format(avctx, pix_fmts_nv12);
+        break;
+    case 2: // 10-bit
+        surface_fmt = ff_get_format(avctx, pix_fmts_p010);
+        break;
+    case 4: // 12-bit
+        surface_fmt = ff_get_format(avctx, pix_fmts_p016);
+        break;
+    default:
+        av_log(avctx, AV_LOG_ERROR, "unsupported bit depth: %d\n",
+               format->bit_depth_luma_minus8 + 8);
+        ctx->internal_error = AVERROR(EINVAL);
+        return 0;
+    }
+    if (surface_fmt < 0) {
+        av_log(avctx, AV_LOG_ERROR, "ff_get_format failed: %d\n", surface_fmt);
+        ctx->internal_error = AVERROR(EINVAL);
+        return 0;
+    }
+
+    av_log(avctx, AV_LOG_VERBOSE, "Formats: Original: %s | HW: %s | SW: %s\n",
+           av_get_pix_fmt_name(avctx->pix_fmt),
+           av_get_pix_fmt_name(surface_fmt),
+           av_get_pix_fmt_name(avctx->sw_pix_fmt));
+
+    avctx->pix_fmt = surface_fmt;
+
     avctx->width = format->display_area.right;
     avctx->height = format->display_area.bottom;
 
@@ -155,7 +198,7 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
             hwframe_ctx->width < avctx->width ||
             hwframe_ctx->height < avctx->height ||
             hwframe_ctx->format != AV_PIX_FMT_CUDA ||
-            hwframe_ctx->sw_format != AV_PIX_FMT_NV12)) {
+            hwframe_ctx->sw_format != avctx->sw_pix_fmt)) {
         av_log(avctx, AV_LOG_ERROR, "AVHWFramesContext is already initialized with incompatible parameters\n");
         ctx->internal_error = AVERROR(EINVAL);
         return 0;
@@ -176,7 +219,20 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
 
     cuinfo.CodecType = ctx->codec_type = format->codec;
     cuinfo.ChromaFormat = format->chroma_format;
-    cuinfo.OutputFormat = cudaVideoSurfaceFormat_NV12;
+
+    switch (avctx->sw_pix_fmt) {
+    case AV_PIX_FMT_NV12:
+        cuinfo.OutputFormat = cudaVideoSurfaceFormat_NV12;
+        break;
+    case AV_PIX_FMT_P010:
+    case AV_PIX_FMT_P016:
+        cuinfo.OutputFormat = cudaVideoSurfaceFormat_P016;
+        break;
+    default:
+        av_log(avctx, AV_LOG_ERROR, "Output formats other than NV12, P010 or P016 are not supported\n");
+        ctx->internal_error = AVERROR(EINVAL);
+        return 0;
+    }
 
     cuinfo.ulWidth = avctx->coded_width;
     cuinfo.ulHeight = avctx->coded_height;
@@ -208,7 +264,7 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form
 
     if (!hwframe_ctx->pool) {
         hwframe_ctx->format = AV_PIX_FMT_CUDA;
-        hwframe_ctx->sw_format = AV_PIX_FMT_NV12;
+        hwframe_ctx->sw_format = avctx->sw_pix_fmt;
         hwframe_ctx->width = avctx->width;
         hwframe_ctx->height = avctx->height;
 
@@ -416,7 +472,9 @@ static int cuvid_output_frame(AVCodecContext *avctx, AVFrame *frame)
 
                 offset += avctx->coded_height;
             }
-        } else if (avctx->pix_fmt == AV_PIX_FMT_NV12) {
+        } else if (avctx->pix_fmt == AV_PIX_FMT_NV12 ||
+                   avctx->pix_fmt == AV_PIX_FMT_P010 ||
+                   avctx->pix_fmt == AV_PIX_FMT_P016) {
             AVFrame *tmp_frame = av_frame_alloc();
             if (!tmp_frame) {
                 av_log(avctx, AV_LOG_ERROR, "av_frame_alloc failed\n");
@@ -446,7 +504,6 @@ static int cuvid_output_frame(AVCodecContext *avctx, AVFrame *frame)
                 av_frame_free(&tmp_frame);
                 goto error;
             }
-
             av_frame_free(&tmp_frame);
         } else {
             ret = AVERROR_BUG;
@@ -614,17 +671,6 @@ static av_cold int cuvid_decode_init(AVCodecContext *avctx)
     const AVBitStreamFilter *bsf;
     int ret = 0;
 
-    enum AVPixelFormat pix_fmts[3] = { AV_PIX_FMT_CUDA,
-                                       AV_PIX_FMT_NV12,
-                                       AV_PIX_FMT_NONE };
-
-    ret = ff_get_format(avctx, pix_fmts);
-    if (ret < 0) {
-        av_log(avctx, AV_LOG_ERROR, "ff_get_format failed: %d\n", ret);
-        return ret;
-    }
-    avctx->pix_fmt = ret;
-
     ret = cuvid_load_functions(&ctx->cvdl);
     if (ret < 0) {
         av_log(avctx, AV_LOG_ERROR, "Failed loading nvcuvid.\n");
@@ -899,6 +945,8 @@ static const AVOption options[] = {
         .capabilities   = AV_CODEC_CAP_DELAY | AV_CODEC_CAP_AVOID_PROBING, \
         .pix_fmts       = (const enum AVPixelFormat[]){ AV_PIX_FMT_CUDA, \
                                                         AV_PIX_FMT_NV12, \
+                                                        AV_PIX_FMT_P010, \
+                                                        AV_PIX_FMT_P016, \
                                                         AV_PIX_FMT_NONE }, \
     };