From patchwork Sun Oct 7 17:50:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philip Langdale X-Patchwork-Id: 10634 Delivered-To: ffmpegpatchwork@gmail.com Received: by 2002:ab0:73d2:0:0:0:0:0 with SMTP id m18csp2761756uaq; Sun, 7 Oct 2018 10:51:55 -0700 (PDT) X-Google-Smtp-Source: ACcGV60oZPYD86fIC0DpimmwNewemM6N4qprSD2BxJ1vzCna1rnooF1q9mDV/iYVoMYJ07OTzLkW X-Received: by 2002:a1c:f20f:: with SMTP id s15-v6mr524420wmc.0.1538934714933; Sun, 07 Oct 2018 10:51:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538934714; cv=none; d=google.com; s=arc-20160816; b=Omf4NV5CxnYnTOxLPFnmtxdPraYAkFc9Xfrc5LG6fDt14bKMz/yPg2pFlPssLt+KtV CKxh7UQsy1ULvPyK4OO5HC3bSFGAwp90dwxBiBdSq1B7tBe6WYLFsiym8w5pgX+e8IlX /vbCUTfE6rWCIMIMV73jFTPOu8Ou9DXATCC7xV8r4N+TaUzcW3Ed77tbAAFIitcoTnFP UfSksIqh/NhwxT3HqKfy48YkfTKda++blTHqGvSgVJ8uVsLPug39w42Dr2xGYxk4SqHT cpFcIOQ4196YD9lrmLpv9PcGRCTwBKNMikVI0HIRkg5Q3+/D/SM8J++m+a3iEfqIjKP6 1L/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to; bh=dOOzxFZsH7o2DxuBIUn7Dkb2uFh+qF3Z1hrHk2az6RU=; b=z8ATeuA5yRPhFwWRSujiuLXwo8NyieNSwlAdplSZQ7wuctPtlP08uD2Mb3LLefR96M FIPoPotiiFlHPIOKP41otCIlWxtkfeAPzf1gxtfR2IR21kmf9b1bOlpN3FUgvOD0yL5F G9/0NTKUR/FNrwIOG7+s8F5IJTC1w59kIkxz561oBPW/r2SI1cAlW11aWMtmjtSiQWzz 0Lvy2hla+cOa80v3GsKz6twXUC38A/8P7x4nVI4WIKBcQB+y9mqx1R1nfukW3zZkc7Tj WQ9LZHNW4RtwyscaHUYiLH9bjBPuzeZhAJm8YdJxhwi9Ooc+JN/uowiH7mTq1FNcXr8z 0Ufg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@overt.org header.s=mail header.b=TCHNbRJU; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 59-v6si11196125wrh.350.2018.10.07.10.51.54; Sun, 07 Oct 2018 10:51:54 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@overt.org header.s=mail header.b=TCHNbRJU; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E469268A0C7; Sun, 7 Oct 2018 20:51:14 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-it1-f227.google.com (mail-it1-f227.google.com [209.85.166.227]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 88B8E689EEA for ; Sun, 7 Oct 2018 20:51:06 +0300 (EEST) Received: by mail-it1-f227.google.com with SMTP id 134-v6so8688695itz.2 for ; Sun, 07 Oct 2018 10:51:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=BWJVuDOqvUAMvRRvsaTKuvBx9j+Ai1+Lo1A50xcIb7k=; b=K7ivYYIauOF+DVnU7Dvju45sVbx+YhAaWnNIXlZiDuofrXcLugAJO89IVSuhu54Rh0 etA7OsMlQzuGCiki+Q9+l8ZXQ3j1XHB5eoZZ7GIUodCc6g4PgEoRnzVh4ILg25k9cjej Eb6RmqJyaXa8Bbsyc/7ddFv4b2TMpZ31F78VDwwA5n4/fhQTtgIIWYSSWThyICzBcIBu 1q6t15uO9GVobpoxHiK0VOgpGdFtFfCUwIYjqB/GmFQ3EARsHVEAhxiAPdS+dxiWI84d eXsVDmirFkoD6tHhaK1iJfL8vKEwETO6KoJLtpHFHmCPF5qBtkzlSwI7Cf6JDkd0tggc Ryuw== X-Gm-Message-State: ABuFfoiocSO7DlGHfNSeyuicJgg0roOFH11+1UO9Vo5A3MwI5VJRaufj NelPzhXRAaxMinyBWPjRyJTDzii2JOV5VVIUJUzX7SZcWuPs1w== X-Received: by 2002:a24:8709:: with SMTP id f9-v6mr15096903ite.138.1538934686690; Sun, 07 Oct 2018 10:51:26 -0700 (PDT) Received: from mail.overt.org (155.208.178.107.bc.googleusercontent.com. [107.178.208.155]) by smtp-relay.gmail.com with ESMTPS id d69-v6sm809927itc.9.2018.10.07.10.51.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 07 Oct 2018 10:51:26 -0700 (PDT) X-Relaying-Domain: gapps.overt.org Received: from authenticated-user (mail.overt.org [107.178.208.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.overt.org (Postfix) with ESMTPSA id 01F3E60738; Sun, 7 Oct 2018 17:51:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=overt.org; s=mail; t=1538934686; bh=lwutFO9LtBnbGVemNB0LcOfH7vYK9cQ7km8dVwivMmo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TCHNbRJUcj+Wp3kS7Ju/SUsOe/2WUJOEwN+OoxHZZCf7zWLBz3bvdWwjuDFHS+dBr YHBGGP/1Y/IOw77Kv4C/Hpjavs6Nvyt1erRkGzQrz68vFZGifeIW0CJuJTg/CpFRV8 b9cUwNdK8PXF1QuFLpDv2rXRuBuPXOZF+2rVUD8Q3+FD5L7mRkJMAkycruIs4iBdOH i57oY+l/ACoKIJAD4K29Xy4BUMMr0MXorb98lRxYhAabcrsfroLayAMiK3MZ3c+S3f vNdjtvy3t/PSnACJrLGpVY9KJg0PkLCDZTA1d08kNien/h2tQBSHPcTkO7s/4uTdyU qNrbBakjjKhUw== From: Philip Langdale To: ffmpeg-devel@ffmpeg.org, Timo Rothenpieler Date: Sun, 7 Oct 2018 10:50:54 -0700 Message-Id: <20181007175057.31070-3-philipl@overt.org> In-Reply-To: <20181007175057.31070-1-philipl@overt.org> References: <20181007175057.31070-1-philipl@overt.org> Subject: [FFmpeg-devel] [PATCH 2/5] avcodec/nvdec: Add support for decoding HEVC 4:4:4 content X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Philip Langdale MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The latest generation video decoder on the Turing chips supports decoding HEVC 4:4:4. Supporting this is relatively straight-forward; we need to account for the different chroma format and pick the right output and sw formats at the right times. There was one bug which was the hard-coded assumption that the first chroma plane would be half-height; I fixed this to use the actual shift value on the plane. The output formats ('2', and '3') are currently undocumented but appear to be YUV444P and YUV444P16 based on how they behave. Signed-off-by: Philip Langdale --- libavcodec/hevcdec.c | 3 +++ libavcodec/nvdec.c | 43 +++++++++++++++++++++++++++++++------- libavutil/hwcontext_cuda.c | 2 ++ 3 files changed, 40 insertions(+), 8 deletions(-) diff --git a/libavcodec/hevcdec.c b/libavcodec/hevcdec.c index a3b5c8cb71..972f2b56b6 100644 --- a/libavcodec/hevcdec.c +++ b/libavcodec/hevcdec.c @@ -409,6 +409,9 @@ static enum AVPixelFormat get_format(HEVCContext *s, const HEVCSPS *sps) #endif break; case AV_PIX_FMT_YUV420P12: + case AV_PIX_FMT_YUV444P: + case AV_PIX_FMT_YUV444P10: + case AV_PIX_FMT_YUV444P12: #if CONFIG_HEVC_NVDEC_HWACCEL *fmt++ = AV_PIX_FMT_CUDA; #endif diff --git a/libavcodec/nvdec.c b/libavcodec/nvdec.c index e779be3a45..e1ac06f852 100644 --- a/libavcodec/nvdec.c +++ b/libavcodec/nvdec.c @@ -34,6 +34,9 @@ #include "nvdec.h" #include "internal.h" +#define NVDEC_FORMAT_YUV444P 2 +#define NVDEC_FORMAT_YUV444P16 3 + typedef struct NVDECDecoder { CUvideodecoder decoder; @@ -273,7 +276,8 @@ int ff_nvdec_decode_init(AVCodecContext *avctx) CUVIDDECODECREATEINFO params = { 0 }; - int cuvid_codec_type, cuvid_chroma_format; + cudaVideoSurfaceFormat output_format; + int cuvid_codec_type, cuvid_chroma_format, chroma_444; int ret = 0; sw_desc = av_pix_fmt_desc_get(avctx->sw_pix_fmt); @@ -291,6 +295,7 @@ int ff_nvdec_decode_init(AVCodecContext *avctx) av_log(avctx, AV_LOG_ERROR, "Unsupported chroma format\n"); return AVERROR(ENOSYS); } + chroma_444 = cuvid_chroma_format == cudaVideoChromaFormat_444; if (!avctx->hw_frames_ctx) { ret = ff_decode_get_hw_frames_ctx(avctx, AV_HWDEVICE_TYPE_CUDA); @@ -298,6 +303,21 @@ int ff_nvdec_decode_init(AVCodecContext *avctx) return ret; } + switch (sw_desc->comp[0].depth) { + case 8: + output_format = chroma_444 ? NVDEC_FORMAT_YUV444P : + cudaVideoSurfaceFormat_NV12; + break; + case 10: + case 12: + output_format = chroma_444 ? NVDEC_FORMAT_YUV444P16 : + cudaVideoSurfaceFormat_P016; + break; + default: + av_log(avctx, AV_LOG_ERROR, "Unsupported bit depth\n"); + return AVERROR(ENOSYS); + } + frames_ctx = (AVHWFramesContext*)avctx->hw_frames_ctx->data; params.ulWidth = avctx->coded_width; @@ -305,8 +325,7 @@ int ff_nvdec_decode_init(AVCodecContext *avctx) params.ulTargetWidth = avctx->coded_width; params.ulTargetHeight = avctx->coded_height; params.bitDepthMinus8 = sw_desc->comp[0].depth - 8; - params.OutputFormat = params.bitDepthMinus8 ? - cudaVideoSurfaceFormat_P016 : cudaVideoSurfaceFormat_NV12; + params.OutputFormat = output_format; params.CodecType = cuvid_codec_type; params.ChromaFormat = cuvid_chroma_format; params.ulNumDecodeSurfaces = frames_ctx->initial_pool_size; @@ -388,6 +407,8 @@ static int nvdec_retrieve_data(void *logctx, AVFrame *frame) NVDECFrame *cf = (NVDECFrame*)fdd->hwaccel_priv; NVDECDecoder *decoder = (NVDECDecoder*)cf->decoder_ref->data; + AVHWFramesContext *hwctx = (AVHWFramesContext *)frame->hw_frames_ctx->data; + CUVIDPROCPARAMS vpp = { 0 }; NVDECFrame *unmap_data = NULL; @@ -397,6 +418,7 @@ static int nvdec_retrieve_data(void *logctx, AVFrame *frame) unsigned int pitch, i; unsigned int offset = 0; + int shift_h = 0, shift_v = 0; int ret = 0; vpp.progressive_frame = 1; @@ -433,10 +455,11 @@ static int nvdec_retrieve_data(void *logctx, AVFrame *frame) unmap_data->idx_ref = av_buffer_ref(cf->idx_ref); unmap_data->decoder_ref = av_buffer_ref(cf->decoder_ref); + av_pix_fmt_get_chroma_sub_sample(hwctx->sw_format, &shift_h, &shift_v); for (i = 0; frame->linesize[i]; i++) { frame->data[i] = (uint8_t*)(devptr + offset); frame->linesize[i] = pitch; - offset += pitch * (frame->height >> (i ? 1 : 0)); + offset += pitch * (frame->height >> (i ? shift_v : 0)); } goto finish; @@ -576,7 +599,7 @@ int ff_nvdec_frame_params(AVCodecContext *avctx, { AVHWFramesContext *frames_ctx = (AVHWFramesContext*)hw_frames_ctx->data; const AVPixFmtDescriptor *sw_desc; - int cuvid_codec_type, cuvid_chroma_format; + int cuvid_codec_type, cuvid_chroma_format, chroma_444; sw_desc = av_pix_fmt_desc_get(avctx->sw_pix_fmt); if (!sw_desc) @@ -593,6 +616,7 @@ int ff_nvdec_frame_params(AVCodecContext *avctx, av_log(avctx, AV_LOG_VERBOSE, "Unsupported chroma format\n"); return AVERROR(EINVAL); } + chroma_444 = cuvid_chroma_format == cudaVideoChromaFormat_444; frames_ctx->format = AV_PIX_FMT_CUDA; frames_ctx->width = (avctx->coded_width + 1) & ~1; @@ -605,15 +629,18 @@ int ff_nvdec_frame_params(AVCodecContext *avctx, if (!frames_ctx->pool) return AVERROR(ENOMEM); + // It it semantically incorrect to use AX_PIX_FMT_YUV444P16 for either the 10 + // or 12 bit case, but ffmpeg and nvidia disagree on which end the padding + // bits go at. P16 is unambiguous and matches. switch (sw_desc->comp[0].depth) { case 8: - frames_ctx->sw_format = AV_PIX_FMT_NV12; + frames_ctx->sw_format = chroma_444 ? AV_PIX_FMT_YUV444P : AV_PIX_FMT_NV12; break; case 10: - frames_ctx->sw_format = AV_PIX_FMT_P010; + frames_ctx->sw_format = chroma_444 ? AV_PIX_FMT_YUV444P10_LSB : AV_PIX_FMT_P010; break; case 12: - frames_ctx->sw_format = AV_PIX_FMT_P016; + frames_ctx->sw_format = chroma_444 ? AV_PIX_FMT_YUV444P12_LSB : AV_PIX_FMT_P016; break; default: return AVERROR(EINVAL); diff --git a/libavutil/hwcontext_cuda.c b/libavutil/hwcontext_cuda.c index 3b1d53e799..094706db44 100644 --- a/libavutil/hwcontext_cuda.c +++ b/libavutil/hwcontext_cuda.c @@ -38,6 +38,8 @@ static const enum AVPixelFormat supported_formats[] = { AV_PIX_FMT_YUV444P, AV_PIX_FMT_P010, AV_PIX_FMT_P016, + AV_PIX_FMT_YUV444P10_LSB, + AV_PIX_FMT_YUV444P12_LSB, AV_PIX_FMT_YUV444P16, AV_PIX_FMT_0RGB32, AV_PIX_FMT_0BGR32,