From patchwork Sun Nov 20 19:44:22 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philip Langdale X-Patchwork-Id: 1500 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.90.1 with SMTP id o1csp1220533vsb; Sun, 20 Nov 2016 11:44:38 -0800 (PST) X-Received: by 10.194.109.42 with SMTP id hp10mr4323896wjb.167.1479671078742; Sun, 20 Nov 2016 11:44:38 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id du10si16502065wjb.54.2016.11.20.11.44.38; Sun, 20 Nov 2016 11:44:38 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@overt.org; dkim=neutral (body hash did not verify) header.i=@overt.org; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D02EE689FDC; Sun, 20 Nov 2016 21:44:33 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from so254-29.mailgun.net (so254-29.mailgun.net [198.61.254.29]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 53285689F96 for ; Sun, 20 Nov 2016 21:44:27 +0200 (EET) DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=overt.org; q=dns/txt; s=k1; t=1479671068; h=Message-Id: Date: Subject: Cc: To: From: Sender; bh=Dl1oS6oq7URLsfrYgyCii5nR90IvVQjxmfBadk81Uog=; b=eF1uQL7lUt2L8ngiPIHeXCSR/Q39q3zksjTQL18SSHHNiiO16XZLdtYXvfHoPQaZcaxhpiBZ o4WxVmUqoocXHhqCxQwcHvZRC7V1S+2TSdSbwJWwYol1ajKH9ZcdkBtadmPL1zbpVFHQUMyA RuWZTkcqThbDoepP+Euj6Q36rx8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=overt.org; s=k1; q=dns; h=Sender: From: To: Cc: Subject: Date: Message-Id; b=opC8OVMqDkJmjkQH2x2eRL5lse99vv1q1mmLgcZcrhOQzohcZh7AAq7TAfwIC4AhuzjDiw 3BZzdMzk0AgXyMVYIMBU8ibIGlmFBkdC0t0DkDZXLDSPRyY30dCagLkupCaZ2JCaaWRtU37S lJFZHdEGoSVV7rzRqSUbi6257rBME= X-Mailgun-Sending-Ip: 198.61.254.29 X-Mailgun-Sid: WyIyM2Q3MCIsICJmZm1wZWctZGV2ZWxAZmZtcGVnLm9yZyIsICI0YTg5NjEiXQ== Received: from mail.overt.org (155.208.178.107.bc.googleusercontent.com [107.178.208.155]) by mxa.mailgun.org with ESMTP id 5831fd1b.7f69b45528f0-smtp-out-n03; Sun, 20 Nov 2016 19:44:27 -0000 (UTC) Received: from authenticated-user (mail.overt.org [107.178.208.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.overt.org (Postfix) with ESMTPSA id 074D7600B5; Sun, 20 Nov 2016 19:44:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=overt.org; s=mail; t=1479671067; bh=0IQ+xc56v9y71izFZ5aJg9K3UQMei4KuwVaHEUG33JM=; h=From:To:Cc:Subject:Date:From; b=Pttm+gs4hYPun8huvKZP2X9X/BL2mDGkhOGQ2Vu01fRFIzzhjuRDuDZ09G29Mqr2l s+Kq8W/Dmf57N6GrI5UxiNv4cLhlTk+dgP9KHGdCf3cFzC5ePfCwuD+siJ0H0urFMG sGmKt022rHFmSkxFNEc0ovGTHVMO0FwC2wmza2/ttB3iiOvt531d+3NehCixvB3NCO EtAPL7Lz/8RprwM1V6aGgu3uyqQcUUyGw0Wp/g+SMf7Sxw4uvzVGXdieo2VlxB2U7O p9Kl44xMxdVMOOm0VZFSY+Q1pDdKh75hTBYp0HDLsTVDdtH8NsT/AmYDTt90ofC+mB PXBK3h9b2i/ww== From: Philip Langdale To: ffmpeg-devel@ffmpeg.org Date: Sun, 20 Nov 2016 11:44:22 -0800 Message-Id: <20161120194422.20746-1-philipl@overt.org> Subject: [FFmpeg-devel] [PATCH] avcodec/cuvid: Add support for P010 as an output surface format X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Philip Langdale MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The nvidia 375.xx driver introduces support for P016 output surfaces, for 10bit and 12bit HEVC content (it's also the first driver to support hardware decoding of 12bit content). Technically, we don't support P016, but in practice I don't think we zero-out the extra bits in P010 so it can be used to carry the data. This change introduces cuvid decoder support for P010 output for output to hardware and system memory surfaces. For simplicity, it does not maintain the previous ability to output NV12 for > 8 bit input video - the user will need to update their driver to decode such videos. After this change, both cuvid and nvenc support P010, but the ffmpeg_cuvid transcoding logic will need more work to connect the two together. Similarly, the scale_npp filter still only works with 8bit surfaces. Signed-off-by: Philip Langdale --- compat/cuda/cuviddec.h | 3 ++- libavcodec/cuvid.c | 57 ++++++++++++++++++++++++++++++++++------------ libavutil/hwcontext_cuda.c | 11 ++++++++- 3 files changed, 55 insertions(+), 16 deletions(-) diff --git a/compat/cuda/cuviddec.h b/compat/cuda/cuviddec.h index f9257ea..357289b 100644 --- a/compat/cuda/cuviddec.h +++ b/compat/cuda/cuviddec.h @@ -87,7 +87,8 @@ typedef enum cudaVideoCodec_enum { * Video Surface Formats Enums */ typedef enum cudaVideoSurfaceFormat_enum { - cudaVideoSurfaceFormat_NV12=0 /**< NV12 (currently the only supported output format) */ + cudaVideoSurfaceFormat_NV12=0, /**< NV12 */ + cudaVideoSurfaceFormat_P016=1 /**< P016 */ } cudaVideoSurfaceFormat; /*! diff --git a/libavcodec/cuvid.c b/libavcodec/cuvid.c index eafce0a..a6fb302 100644 --- a/libavcodec/cuvid.c +++ b/libavcodec/cuvid.c @@ -26,6 +26,7 @@ #include "libavutil/fifo.h" #include "libavutil/log.h" #include "libavutil/opt.h" +#include "libavutil/pixdesc.h" #include "avcodec.h" #include "internal.h" @@ -99,11 +100,35 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form CuvidContext *ctx = avctx->priv_data; AVHWFramesContext *hwframe_ctx = (AVHWFramesContext*)ctx->hwframe->data; CUVIDDECODECREATEINFO cuinfo; + int surface_fmt; + + enum AVPixelFormat pix_fmts_nv12[3] = { AV_PIX_FMT_CUDA, + AV_PIX_FMT_NV12, + AV_PIX_FMT_NONE }; + + enum AVPixelFormat pix_fmts_p010[3] = { AV_PIX_FMT_CUDA, + AV_PIX_FMT_P010, + AV_PIX_FMT_NONE }; av_log(avctx, AV_LOG_TRACE, "pfnSequenceCallback, progressive_sequence=%d\n", format->progressive_sequence); ctx->internal_error = 0; + surface_fmt = ff_get_format(avctx, format->bit_depth_luma_minus8 > 0 ? + pix_fmts_p010 : pix_fmts_nv12); + if (surface_fmt < 0) { + av_log(avctx, AV_LOG_ERROR, "ff_get_format failed: %d\n", surface_fmt); + ctx->internal_error = AVERROR(EINVAL); + return 0; + } + + av_log(avctx, AV_LOG_VERBOSE, "Formats: Original: %s | HW: %s | SW: %s\n", + av_get_pix_fmt_name(avctx->pix_fmt), + av_get_pix_fmt_name(surface_fmt), + av_get_pix_fmt_name(avctx->sw_pix_fmt)); + + avctx->pix_fmt = surface_fmt; + avctx->width = format->display_area.right; avctx->height = format->display_area.bottom; @@ -152,7 +177,7 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form hwframe_ctx->width < avctx->width || hwframe_ctx->height < avctx->height || hwframe_ctx->format != AV_PIX_FMT_CUDA || - hwframe_ctx->sw_format != AV_PIX_FMT_NV12)) { + hwframe_ctx->sw_format != avctx->sw_pix_fmt)) { av_log(avctx, AV_LOG_ERROR, "AVHWFramesContext is already initialized with incompatible parameters\n"); ctx->internal_error = AVERROR(EINVAL); return 0; @@ -173,7 +198,19 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form cuinfo.CodecType = ctx->codec_type = format->codec; cuinfo.ChromaFormat = format->chroma_format; - cuinfo.OutputFormat = cudaVideoSurfaceFormat_NV12; + + switch (avctx->sw_pix_fmt) { + case AV_PIX_FMT_NV12: + cuinfo.OutputFormat = cudaVideoSurfaceFormat_NV12; + break; + case AV_PIX_FMT_P010: + cuinfo.OutputFormat = cudaVideoSurfaceFormat_P016; + break; + default: + av_log(avctx, AV_LOG_ERROR, "Output formats other than NV12 or P010 are not supported\n"); + ctx->internal_error = AVERROR(EINVAL); + return 0; + } cuinfo.ulWidth = avctx->coded_width; cuinfo.ulHeight = avctx->coded_height; @@ -205,7 +242,7 @@ static int CUDAAPI cuvid_handle_video_sequence(void *opaque, CUVIDEOFORMAT* form if (!hwframe_ctx->pool) { hwframe_ctx->format = AV_PIX_FMT_CUDA; - hwframe_ctx->sw_format = AV_PIX_FMT_NV12; + hwframe_ctx->sw_format = avctx->sw_pix_fmt; hwframe_ctx->width = avctx->width; hwframe_ctx->height = avctx->height; @@ -413,7 +450,8 @@ static int cuvid_output_frame(AVCodecContext *avctx, AVFrame *frame) offset += avctx->coded_height; } - } else if (avctx->pix_fmt == AV_PIX_FMT_NV12) { + } else if (avctx->pix_fmt == AV_PIX_FMT_NV12 || + avctx->pix_fmt == AV_PIX_FMT_P010) { AVFrame *tmp_frame = av_frame_alloc(); if (!tmp_frame) { av_log(avctx, AV_LOG_ERROR, "av_frame_alloc failed\n"); @@ -606,16 +644,6 @@ static av_cold int cuvid_decode_init(AVCodecContext *avctx) const AVBitStreamFilter *bsf; int ret = 0; - enum AVPixelFormat pix_fmts[3] = { AV_PIX_FMT_CUDA, - AV_PIX_FMT_NV12, - AV_PIX_FMT_NONE }; - - ret = ff_get_format(avctx, pix_fmts); - if (ret < 0) { - av_log(avctx, AV_LOG_ERROR, "ff_get_format failed: %d\n", ret); - return ret; - } - ctx->frame_queue = av_fifo_alloc(MAX_FRAME_COUNT * sizeof(CuvidParsedFrame)); if (!ctx->frame_queue) { ret = AVERROR(ENOMEM); @@ -883,6 +911,7 @@ static const AVOption options[] = { .capabilities = AV_CODEC_CAP_DELAY | AV_CODEC_CAP_AVOID_PROBING, \ .pix_fmts = (const enum AVPixelFormat[]){ AV_PIX_FMT_CUDA, \ AV_PIX_FMT_NV12, \ + AV_PIX_FMT_P010, \ AV_PIX_FMT_NONE }, \ }; diff --git a/libavutil/hwcontext_cuda.c b/libavutil/hwcontext_cuda.c index e1dcab0..add4adb 100644 --- a/libavutil/hwcontext_cuda.c +++ b/libavutil/hwcontext_cuda.c @@ -35,6 +35,7 @@ static const enum AVPixelFormat supported_formats[] = { AV_PIX_FMT_NV12, AV_PIX_FMT_YUV420P, AV_PIX_FMT_YUV444P, + AV_PIX_FMT_P010, }; static void cuda_buffer_free(void *opaque, uint8_t *data) @@ -109,6 +110,7 @@ static int cuda_frames_init(AVHWFramesContext *ctx) size = aligned_width * ctx->height * 3 / 2; break; case AV_PIX_FMT_YUV444P: + case AV_PIX_FMT_P010: size = aligned_width * ctx->height * 3; break; } @@ -123,7 +125,13 @@ static int cuda_frames_init(AVHWFramesContext *ctx) static int cuda_get_buffer(AVHWFramesContext *ctx, AVFrame *frame) { - int aligned_width = FFALIGN(ctx->width, CUDA_FRAME_ALIGNMENT); + int aligned_width; + int width_in_bytes = ctx->width; + + if (ctx->sw_format == AV_PIX_FMT_P010) { + width_in_bytes *= 2; + } + aligned_width = FFALIGN(width_in_bytes, CUDA_FRAME_ALIGNMENT); frame->buf[0] = av_buffer_pool_get(ctx->pool); if (!frame->buf[0]) @@ -131,6 +139,7 @@ static int cuda_get_buffer(AVHWFramesContext *ctx, AVFrame *frame) switch (ctx->sw_format) { case AV_PIX_FMT_NV12: + case AV_PIX_FMT_P010: frame->data[0] = frame->buf[0]->data; frame->data[1] = frame->data[0] + aligned_width * ctx->height; frame->linesize[0] = aligned_width;