From patchwork Mon Nov 21 15:28:03 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philip Langdale X-Patchwork-Id: 1515 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.90.1 with SMTP id o1csp1660098vsb; Mon, 21 Nov 2016 07:28:30 -0800 (PST) X-Received: by 10.46.9.17 with SMTP id 17mr8927172ljj.18.1479742110358; Mon, 21 Nov 2016 07:28:30 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id q31si8992108lfi.259.2016.11.21.07.28.28; Mon, 21 Nov 2016 07:28:30 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@overt.org; dkim=neutral (body hash did not verify) header.i=@overt.org; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A8A02689C7C; Mon, 21 Nov 2016 17:28:22 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from so254-29.mailgun.net (so254-29.mailgun.net [198.61.254.29]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AE591689B50 for ; Mon, 21 Nov 2016 17:28:15 +0200 (EET) DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=overt.org; q=dns/txt; s=k1; t=1479742097; h=References: In-Reply-To: Message-Id: Date: Subject: Cc: To: From: Sender; bh=UDJcEXNfgbpDfEosue8PHFIAeZa1Ih7FHxSCCCZ/A/E=; b=l1/nXtA/SHzI+itI0sUr9BRHOAXvz3wpsaJT0BTPPWBzOYDXQpY8ASgVvwvopWr91hvXB4pc xqe7SbYxM65fuYUBbl8Y5Z4kiBZzsKIKe1ufJVvXG2XM/+LhbcOLAGHvdyEnCmYT9K2+TPaz g/X09aXjfX6ewrSlFpKLxptyo58= DomainKey-Signature: a=rsa-sha1; c=nofws; d=overt.org; s=k1; q=dns; h=Sender: From: To: Cc: Subject: Date: Message-Id: In-Reply-To: References; b=NfKsj+MZK0XPs0wKy87xBPfxh5x4VwIjMsNx7Hlfm3JV7DBCqGOAT3b6uixaMayd5JxH5a tNpd1LRCRQzwRPtvM0FRnBQTRjNRQ9TEC/JOnO/PTZls6dSaP7LwxhwpcxLf5I8X+K0VG1k4 EYjW9d36ffyGgVjHfJXzcnD9GIrsc= X-Mailgun-Sending-Ip: 198.61.254.29 X-Mailgun-Sid: WyIyM2Q3MCIsICJmZm1wZWctZGV2ZWxAZmZtcGVnLm9yZyIsICI0YTg5NjEiXQ== Received: from mail.overt.org (155.208.178.107.bc.googleusercontent.com [107.178.208.155]) by mxa.mailgun.org with ESMTP id 5833128d.7fa4741526c0-smtp-out-n02; Mon, 21 Nov 2016 15:28:13 -0000 (UTC) Received: from authenticated-user (mail.overt.org [107.178.208.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.overt.org (Postfix) with ESMTPSA id 83B3168193; Mon, 21 Nov 2016 15:28:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=overt.org; s=mail; t=1479742091; bh=sNvqdVvt12YwShkawcAZowPbCNy4jQ7oNCyFwtISNiw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Ddmui9aPUMq38ixeM6CjGM/PeX7q21fEqmE+MLm2ALmdc0i5J4a7Cw9lRUzTkfJUw 0ls4hqKNqMndwfIe1jiLbHxzV5YICACPHkV0u7ZwEWL7kz0DfKCBheDUeriLFcbq8b CPyor6jvmzWpjHZwuFvJuSLx1KYp8zFRbf2gEXTVgbRWtVs3MN7ovLJo1NOa7TH75g 2CKry02B6kDRs0oReO+lWACrSWWGgd3xpY1WEj0QZpk+QbYVm6D6Qw7KauU3avT9Om rws1mRNYSTbkCcEKSodfOLmHYs1dgeoaiMvYlBoPN9KwzxBkBWJpycLc03fxIkNZbq tcGUpsMvXfN+A== From: Philip Langdale To: ffmpeg-devel@ffmpeg.org Date: Mon, 21 Nov 2016 07:28:03 -0800 Message-Id: <20161121152804.5605-2-philipl@overt.org> In-Reply-To: <20161121152804.5605-1-philipl@overt.org> References: <20161121152804.5605-1-philipl@overt.org> Subject: [FFmpeg-devel] [PATCH 1/2] avcodec/cuvid: Add support for P016 as an output surface format X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Philip Langdale MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The nvidia 375.xx driver introduces support for P016 output surfaces, for 10bit and 12bit HEVC content (it's also the first driver to support hardware decoding of 12bit content). This change introduces cuvid decoder support for P016 output for output to hardware and system memory surfaces. For simplicity, it does not maintain the previous ability to output NV12 for > 8 bit input video - the user will need to update their driver to decode such videos. Signed-off-by: Philip Langdale --- compat/cuda/dynlink_cuviddec.h | 3 ++- libavcodec/cuvid.c | 59 ++++++++++++++++++++++++++++++------------ libavutil/hwcontext_cuda.c | 11 +++++++- 3 files changed, 55 insertions(+), 18 deletions(-) diff --git a/compat/cuda/dynlink_cuviddec.h b/compat/cuda/dynlink_cuviddec.h index 17207bc..9ff2741 100644 --- a/compat/cuda/dynlink_cuviddec.h +++ b/compat/cuda/dynlink_cuviddec.h @@ -83,7 +83,8 @@ typedef enum cudaVideoCodec_enum { * Video Surface Formats Enums */ typedef enum cudaVideoSurfaceFormat_enum { - cudaVideoSurfaceFormat_NV12=0 /**< NV12 (currently the only supported output format) */ + cudaVideoSurfaceFormat_NV12=0, /**< NV12 */ + cudaVideoSurfaceFormat_P016=1 /**< P016 */ } cudaVideoSurfaceFormat; /*! diff --git a/libavcodec/cuvid.c b/libavcodec/cuvid.c index c3e831a..6798bac 100644 --- a/libavcodec/cuvid.c +++ b/libavcodec/cuvid.c @@ -28,6 +28,7 @@ #include "libavutil/fifo.h" #include "libavutil/log.h" #include "libavutil/opt.h" +#include "libavutil/pixdesc.h" #include "avcodec.h" #include "internal.h" @@ -103,11 +104,35 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form CuvidContext *ctx = avctx->priv_data; AVHWFramesContext *hwframe_ctx = (AVHWFramesContext*)ctx->hwframe->data; CUVIDDECODECREATEINFO cuinfo; + int surface_fmt; + + enum AVPixelFormat pix_fmts_nv12[3] = { AV_PIX_FMT_CUDA, + AV_PIX_FMT_NV12, + AV_PIX_FMT_NONE }; + + enum AVPixelFormat pix_fmts_p016[3] = { AV_PIX_FMT_CUDA, + AV_PIX_FMT_P016, + AV_PIX_FMT_NONE }; av_log(avctx, AV_LOG_TRACE, "pfnSequenceCallback, progressive_sequence=%d\n", format->progressive_sequence); ctx->internal_error = 0; + surface_fmt = ff_get_format(avctx, format->bit_depth_luma_minus8 > 0 ? + pix_fmts_p016 : pix_fmts_nv12); + if (surface_fmt < 0) { + av_log(avctx, AV_LOG_ERROR, "ff_get_format failed: %d\n", surface_fmt); + ctx->internal_error = AVERROR(EINVAL); + return 0; + } + + av_log(avctx, AV_LOG_VERBOSE, "Formats: Original: %s | HW: %s | SW: %s\n", + av_get_pix_fmt_name(avctx->pix_fmt), + av_get_pix_fmt_name(surface_fmt), + av_get_pix_fmt_name(avctx->sw_pix_fmt)); + + avctx->pix_fmt = surface_fmt; + avctx->width = format->display_area.right; avctx->height = format->display_area.bottom; @@ -156,7 +181,7 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form hwframe_ctx->width < avctx->width || hwframe_ctx->height < avctx->height || hwframe_ctx->format != AV_PIX_FMT_CUDA || - hwframe_ctx->sw_format != AV_PIX_FMT_NV12)) { + hwframe_ctx->sw_format != avctx->sw_pix_fmt)) { av_log(avctx, AV_LOG_ERROR, "AVHWFramesContext is already initialized with incompatible parameters\n"); ctx->internal_error = AVERROR(EINVAL); return 0; @@ -177,7 +202,19 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form cuinfo.CodecType = ctx->codec_type = format->codec; cuinfo.ChromaFormat = format->chroma_format; - cuinfo.OutputFormat = cudaVideoSurfaceFormat_NV12; + + switch (avctx->sw_pix_fmt) { + case AV_PIX_FMT_NV12: + cuinfo.OutputFormat = cudaVideoSurfaceFormat_NV12; + break; + case AV_PIX_FMT_P016: + cuinfo.OutputFormat = cudaVideoSurfaceFormat_P016; + break; + default: + av_log(avctx, AV_LOG_ERROR, "Output formats other than NV12 or P016 are not supported\n"); + ctx->internal_error = AVERROR(EINVAL); + return 0; + } cuinfo.ulWidth = avctx->coded_width; cuinfo.ulHeight = avctx->coded_height; @@ -209,7 +246,7 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form if (!hwframe_ctx->pool) { hwframe_ctx->format = AV_PIX_FMT_CUDA; - hwframe_ctx->sw_format = AV_PIX_FMT_NV12; + hwframe_ctx->sw_format = avctx->sw_pix_fmt; hwframe_ctx->width = avctx->width; hwframe_ctx->height = avctx->height; @@ -417,7 +454,8 @@ static int cuvid_output_frame(AVCodecContext *avctx, AVFrame *frame) offset += avctx->coded_height; } - } else if (avctx->pix_fmt == AV_PIX_FMT_NV12) { + } else if (avctx->pix_fmt == AV_PIX_FMT_NV12 || + avctx->pix_fmt == AV_PIX_FMT_P016) { AVFrame *tmp_frame = av_frame_alloc(); if (!tmp_frame) { av_log(avctx, AV_LOG_ERROR, "av_frame_alloc failed\n"); @@ -447,7 +485,6 @@ static int cuvid_output_frame(AVCodecContext *avctx, AVFrame *frame) av_frame_free(&tmp_frame); goto error; } - av_frame_free(&tmp_frame); } else { ret = AVERROR_BUG; @@ -615,17 +652,6 @@ static av_cold int cuvid_decode_init(AVCodecContext *avctx) const AVBitStreamFilter *bsf; int ret = 0; - enum AVPixelFormat pix_fmts[3] = { AV_PIX_FMT_CUDA, - AV_PIX_FMT_NV12, - AV_PIX_FMT_NONE }; - - ret = ff_get_format(avctx, pix_fmts); - if (ret < 0) { - av_log(avctx, AV_LOG_ERROR, "ff_get_format failed: %d\n", ret); - return ret; - } - avctx->pix_fmt = ret; - ret = cuvid_load_functions(&ctx->cvdl); if (ret < 0) { av_log(avctx, AV_LOG_ERROR, "Failed loading nvcuvid.\n"); @@ -899,6 +925,7 @@ static const AVOption options[] = { .capabilities = AV_CODEC_CAP_DELAY | AV_CODEC_CAP_AVOID_PROBING, \ .pix_fmts = (const enum AVPixelFormat[]){ AV_PIX_FMT_CUDA, \ AV_PIX_FMT_NV12, \ + AV_PIX_FMT_P016, \ AV_PIX_FMT_NONE }, \ }; diff --git a/libavutil/hwcontext_cuda.c b/libavutil/hwcontext_cuda.c index 30de299..1c49b6e 100644 --- a/libavutil/hwcontext_cuda.c +++ b/libavutil/hwcontext_cuda.c @@ -35,6 +35,7 @@ static const enum AVPixelFormat supported_formats[] = { AV_PIX_FMT_NV12, AV_PIX_FMT_YUV420P, AV_PIX_FMT_YUV444P, + AV_PIX_FMT_P016, }; static void cuda_buffer_free(void *opaque, uint8_t *data) @@ -111,6 +112,7 @@ static int cuda_frames_init(AVHWFramesContext *ctx) size = aligned_width * ctx->height * 3 / 2; break; case AV_PIX_FMT_YUV444P: + case AV_PIX_FMT_P016: size = aligned_width * ctx->height * 3; break; } @@ -125,7 +127,13 @@ static int cuda_frames_init(AVHWFramesContext *ctx) static int cuda_get_buffer(AVHWFramesContext *ctx, AVFrame *frame) { - int aligned_width = FFALIGN(ctx->width, CUDA_FRAME_ALIGNMENT); + int aligned_width; + int width_in_bytes = ctx->width; + + if (ctx->sw_format == AV_PIX_FMT_P016) { + width_in_bytes *= 2; + } + aligned_width = FFALIGN(width_in_bytes, CUDA_FRAME_ALIGNMENT); frame->buf[0] = av_buffer_pool_get(ctx->pool); if (!frame->buf[0]) @@ -133,6 +141,7 @@ static int cuda_get_buffer(AVHWFramesContext *ctx, AVFrame *frame) switch (ctx->sw_format) { case AV_PIX_FMT_NV12: + case AV_PIX_FMT_P016: frame->data[0] = frame->buf[0]->data; frame->data[1] = frame->data[0] + aligned_width * ctx->height; frame->linesize[0] = aligned_width;