From patchwork Sun Oct 7 02:19:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philip Langdale X-Patchwork-Id: 10630 Delivered-To: ffmpegpatchwork@gmail.com Received: by 2002:ab0:73d2:0:0:0:0:0 with SMTP id m18csp2108942uaq; Sat, 6 Oct 2018 19:20:31 -0700 (PDT) X-Google-Smtp-Source: ACcGV630oZFYvrTV/qUEM/mRw0sjc8wLocAfiWAS/8J8hrvRF3xPjhY6ke48N1Xe6Dy4Ts0o6Nc7 X-Received: by 2002:a1c:9cc9:: with SMTP id f192-v6mr12164054wme.135.1538878831788; Sat, 06 Oct 2018 19:20:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538878831; cv=none; d=google.com; s=arc-20160816; b=wWmJdHRIKr1BGZXwL6SL7NSuMTPmUkTdU+E6BTZf/S+0ANuluHtVc5O7tB+C2ks+IF oEjFCpJDvs23txmm3hPSEI+I4m4cZEyqdsY3Wq3bVgScft3Xq6Bd4eNKLajQ3pdC/6ls XlHlCYS5zRvAXepvWGutttFX1j52mZ/NESPd95slwm7QbL7PLB1uz03EzMgQm5xhQ8gE NJsHWMh5tb2ooFraB/eY7tTZ2Xr9sbauBAg7osu+gL2xTYZ7+dlCBBYshdSQFohTzbFW fzuRgy8IxlBTovGPWw/GZAwmQEU9d3GVhefWc2CvutNyj/BWhZckAjFa5S1MhYLYQGbv snjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:message-id:date:to:from:dkim-signature :delivered-to; bh=0AgHZBzeKYclFcZonFA2jhBAJnG73xUxhoOOFW+oAtQ=; b=TU5owiFsGXpG+OI3DTEOEnKvFa4iiv5GqhLYcvbISFlznO6YDpVW67ac43Ty79jIs+ Oo7PE/7JzYB2v4U2mBBnsQR3IV6qqbn5Yap4tqQEIQkdyNg4gG1947YbzYwsxNXMRvgW 2Y81Iq5WWFtRaQjeyCnYJ3HIsqVDY6E4snod0N+kaoxDr+B1FtvKGNTaWO4KmHa6Ldzy 2OeyyEBef+BH5aZqZWyr1m0my4IAGwEKoVPk7cHMmMzAx5O00m2Gm1PaYlq9q+8xTUfy UwNblGjFtxDjSes5E7KLjUO0yRNBXsd9iL4kgE/kMq1mT0unJIXNVg0Zjt+j018uCtL5 0bbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@overt.org header.s=mail header.b=gfq5VTjc; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id n8-v6si4492686wmc.173.2018.10.06.19.20.30; Sat, 06 Oct 2018 19:20:31 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@overt.org header.s=mail header.b=gfq5VTjc; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D7E08689A81; Sun, 7 Oct 2018 05:20:07 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-vk1-f225.google.com (mail-vk1-f225.google.com [209.85.221.225]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5CAA46805C1 for ; Sun, 7 Oct 2018 05:20:01 +0300 (EEST) Received: by mail-vk1-f225.google.com with SMTP id g80-v6so3771629vke.5 for ; Sat, 06 Oct 2018 19:20:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id; bh=VBBtqPvlv4K5bvk0JipS9I1RwgsSYwZI/GtI6k1COJE=; b=rIq9D0qzDGk62PMWGyQhp0YFPzyxRK8VUwm2JAXkvVo27q3s+30QvKwoEh4w+6XKVL THEQDichSguJYFlKn1uKBfWEcSlz3mAmbGoUEVE0UQaVQcz9Rpzo5DMNb4bG1NM6K0g9 e5H84V7ujIl4Uy0/Sn2GQaYgCKlt3t+pPJykBAgc88+JCYnyy1T2WMdjNeAc657aueLC efQwKbnWapANMt2u17FX4TNHYOCvdzP7a6qr+Zh1Tqu/ZCYxIgdqXwCbnvKIlIeR0BYh rHqxkIVwJZC5L446u2TH1oG80X1NyS5ekXmgVnZkuqGIHxNFszOptGqaob/PXgEwr0En w2+Q== X-Gm-Message-State: ABuFfogxUcXMTnBdI8daKxzFSGk8IrmETIS5pWfAO95jj77x6GUnjhaV L7+YbXVZn7oWTwTibGdfX4dNGMYJesyGYWP2hR2s3gHVL21HZA== X-Received: by 2002:a1f:9951:: with SMTP id b78-v6mr6850768vke.25.1538878819725; Sat, 06 Oct 2018 19:20:19 -0700 (PDT) Received: from mail.overt.org (155.208.178.107.bc.googleusercontent.com. [107.178.208.155]) by smtp-relay.gmail.com with ESMTPS id m203-v6sm1235218vkd.14.2018.10.06.19.20.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 06 Oct 2018 19:20:19 -0700 (PDT) X-Relaying-Domain: gapps.overt.org Received: from authenticated-user (mail.overt.org [107.178.208.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.overt.org (Postfix) with ESMTPSA id A72C660189; Sun, 7 Oct 2018 02:20:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=overt.org; s=mail; t=1538878818; bh=gT8k3Ivt2d+sT5bJNDjRLktKVJqbhVZes2EDug2t7do=; h=From:To:Cc:Subject:Date:From; b=gfq5VTjcnOj9hDlqJje7UOWJQgZsTwRXlfxH6RrDuQlbJKhrNpMn3qeKaD2huoW7R 5TEsb1cZvY4vUWUhmOKaUAfOGmbZ8r7zIp+NaIxzJYuqCRiWcO04OnSfTX0OzOSw2V DTWNZy/aPbNZq2A6fQ+Ck2iZM68SW/SZeLj1LLaacrZilZWE85mtXeHRmtD5VpOP43 NFxgnx+m8Ry9OBKWyAgRkA1x4/WO6e4TavzKyLB1nM2ixdtif/iyOqpPBsapaypd5T xzJreP4EZDgrxf9h/mKxVg93JEFf3ES56Kx9MNhgIwhZyQes1J6lWZjKOxmjoejiag mS05TYGnkz6IA== From: Philip Langdale To: ffmpeg-devel@ffmpeg.org, Timo Rothenpieler Date: Sat, 6 Oct 2018 19:19:55 -0700 Message-Id: <20181007021955.6668-1-philipl@overt.org> Subject: [FFmpeg-devel] [PATCH] avcodec/nvdec: Add support for decoding HEVC 4:4:4 content X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Philip Langdale MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The latest generation video decoder on the Turing chips supports decoding HEVC 4:4:4. Supporting this is relatively straight-forward; we need to account for the different chroma format and pick the right output and sw formats at the right times. There was one bug which was the hard-coded assumption that the first chroma plane would be half-height; I fixed this to use the actual shift value on the plane. The output formats ('2', and '3') are currently undocumented but appear to be YUV444P and YUV444P16 based on how they behave. --- libavcodec/hevcdec.c | 2 ++ libavcodec/nvdec.c | 43 +++++++++++++++++++++++++++++++++++-------- 2 files changed, 37 insertions(+), 8 deletions(-) diff --git a/libavcodec/hevcdec.c b/libavcodec/hevcdec.c index a3b5c8cb71..508e093ea3 100644 --- a/libavcodec/hevcdec.c +++ b/libavcodec/hevcdec.c @@ -409,6 +409,8 @@ static enum AVPixelFormat get_format(HEVCContext *s, const HEVCSPS *sps) #endif break; case AV_PIX_FMT_YUV420P12: + case AV_PIX_FMT_YUV444P10: + case AV_PIX_FMT_YUV444P12: #if CONFIG_HEVC_NVDEC_HWACCEL *fmt++ = AV_PIX_FMT_CUDA; #endif diff --git a/libavcodec/nvdec.c b/libavcodec/nvdec.c index e779be3a45..7e5c1791ea 100644 --- a/libavcodec/nvdec.c +++ b/libavcodec/nvdec.c @@ -34,6 +34,9 @@ #include "nvdec.h" #include "internal.h" +#define cudaVideoSurfaceFormat_YUV444P 2 +#define cudaVideoSurfaceFormat_YUV444P16 3 + typedef struct NVDECDecoder { CUvideodecoder decoder; @@ -273,7 +276,8 @@ int ff_nvdec_decode_init(AVCodecContext *avctx) CUVIDDECODECREATEINFO params = { 0 }; - int cuvid_codec_type, cuvid_chroma_format; + cudaVideoSurfaceFormat output_format; + int cuvid_codec_type, cuvid_chroma_format, chroma_444; int ret = 0; sw_desc = av_pix_fmt_desc_get(avctx->sw_pix_fmt); @@ -291,6 +295,7 @@ int ff_nvdec_decode_init(AVCodecContext *avctx) av_log(avctx, AV_LOG_ERROR, "Unsupported chroma format\n"); return AVERROR(ENOSYS); } + chroma_444 = cuvid_chroma_format == cudaVideoChromaFormat_444; if (!avctx->hw_frames_ctx) { ret = ff_decode_get_hw_frames_ctx(avctx, AV_HWDEVICE_TYPE_CUDA); @@ -298,6 +303,21 @@ int ff_nvdec_decode_init(AVCodecContext *avctx) return ret; } + switch (sw_desc->comp[0].depth) { + case 8: + output_format = chroma_444 ? cudaVideoSurfaceFormat_YUV444P : + cudaVideoSurfaceFormat_NV12; + break; + case 10: + case 12: + output_format = chroma_444 ? cudaVideoSurfaceFormat_YUV444P16 : + cudaVideoSurfaceFormat_P016; + break; + default: + av_log(avctx, AV_LOG_ERROR, "Unsupported bit depth\n"); + return AVERROR(ENOSYS); + } + frames_ctx = (AVHWFramesContext*)avctx->hw_frames_ctx->data; params.ulWidth = avctx->coded_width; @@ -305,8 +325,7 @@ int ff_nvdec_decode_init(AVCodecContext *avctx) params.ulTargetWidth = avctx->coded_width; params.ulTargetHeight = avctx->coded_height; params.bitDepthMinus8 = sw_desc->comp[0].depth - 8; - params.OutputFormat = params.bitDepthMinus8 ? - cudaVideoSurfaceFormat_P016 : cudaVideoSurfaceFormat_NV12; + params.OutputFormat = output_format; params.CodecType = cuvid_codec_type; params.ChromaFormat = cuvid_chroma_format; params.ulNumDecodeSurfaces = frames_ctx->initial_pool_size; @@ -388,6 +407,8 @@ static int nvdec_retrieve_data(void *logctx, AVFrame *frame) NVDECFrame *cf = (NVDECFrame*)fdd->hwaccel_priv; NVDECDecoder *decoder = (NVDECDecoder*)cf->decoder_ref->data; + AVHWFramesContext *hwctx = (AVHWFramesContext *)frame->hw_frames_ctx->data; + CUVIDPROCPARAMS vpp = { 0 }; NVDECFrame *unmap_data = NULL; @@ -397,6 +418,7 @@ static int nvdec_retrieve_data(void *logctx, AVFrame *frame) unsigned int pitch, i; unsigned int offset = 0; + int shift_h = 0, shift_v = 0; int ret = 0; vpp.progressive_frame = 1; @@ -433,10 +455,11 @@ static int nvdec_retrieve_data(void *logctx, AVFrame *frame) unmap_data->idx_ref = av_buffer_ref(cf->idx_ref); unmap_data->decoder_ref = av_buffer_ref(cf->decoder_ref); + av_pix_fmt_get_chroma_sub_sample(hwctx->sw_format, &shift_h, &shift_v); for (i = 0; frame->linesize[i]; i++) { frame->data[i] = (uint8_t*)(devptr + offset); frame->linesize[i] = pitch; - offset += pitch * (frame->height >> (i ? 1 : 0)); + offset += pitch * (frame->height >> (i ? shift_v : 0)); } goto finish; @@ -576,7 +599,7 @@ int ff_nvdec_frame_params(AVCodecContext *avctx, { AVHWFramesContext *frames_ctx = (AVHWFramesContext*)hw_frames_ctx->data; const AVPixFmtDescriptor *sw_desc; - int cuvid_codec_type, cuvid_chroma_format; + int cuvid_codec_type, cuvid_chroma_format, chroma_444; sw_desc = av_pix_fmt_desc_get(avctx->sw_pix_fmt); if (!sw_desc) @@ -593,6 +616,7 @@ int ff_nvdec_frame_params(AVCodecContext *avctx, av_log(avctx, AV_LOG_VERBOSE, "Unsupported chroma format\n"); return AVERROR(EINVAL); } + chroma_444 = cuvid_chroma_format == cudaVideoChromaFormat_444; frames_ctx->format = AV_PIX_FMT_CUDA; frames_ctx->width = (avctx->coded_width + 1) & ~1; @@ -605,15 +629,18 @@ int ff_nvdec_frame_params(AVCodecContext *avctx, if (!frames_ctx->pool) return AVERROR(ENOMEM); + // It it semantically incorrect to use AX_PIX_FMT_YUV444P16 for either the 10 + // or 12 bit case, but ffmpeg and nvidia disagree on which end the padding + // bits go at. P16 is unambiguous and matches. switch (sw_desc->comp[0].depth) { case 8: - frames_ctx->sw_format = AV_PIX_FMT_NV12; + frames_ctx->sw_format = chroma_444 ? AV_PIX_FMT_YUV444P : AV_PIX_FMT_NV12; break; case 10: - frames_ctx->sw_format = AV_PIX_FMT_P010; + frames_ctx->sw_format = chroma_444 ? AV_PIX_FMT_YUV444P16 : AV_PIX_FMT_P010; break; case 12: - frames_ctx->sw_format = AV_PIX_FMT_P016; + frames_ctx->sw_format = chroma_444 ? AV_PIX_FMT_YUV444P16 : AV_PIX_FMT_P016; break; default: return AVERROR(EINVAL);