From patchwork Thu Feb 14 04:03:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philip Langdale X-Patchwork-Id: 12070 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 83D0A447B9B for ; Thu, 14 Feb 2019 06:03:57 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6ACCD689F29; Thu, 14 Feb 2019 06:03:57 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ot1-f97.google.com (mail-ot1-f97.google.com [209.85.210.97]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 94FB6689EA6 for ; Thu, 14 Feb 2019 06:03:49 +0200 (EET) Received: by mail-ot1-f97.google.com with SMTP id n8so8292871otl.6 for ; Wed, 13 Feb 2019 20:03:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=oxEU/jt2qxqaUdQr4ciFnnF2LWen0oj/QHEIR5grH9M=; b=OHKbBnCJdF75nJt1exI76zuUV2KW0+ChvZJT00EAjP9DCgJeNvHfalmnh8GCcNPk6y KvM4U3BpYkMpFprnOvuorLBwS39/PA7mqPGffWXXtxdoN34hghuBRfIrYDyeZuS3aHZ7 vu/gtLc4sQM0Ow0RLeCRuGPhVor9cfHk0QoMK7RXP4TDaJ9xpA1mEUcI7fV2qDiWGXPy vkCgVQ8q/OXotdtYUEcGHZ3aVfaLcBUnmM67UhqfThSF9Nozc4W+sfcOh2TNVhuvKcnm CvgG8uRL3jgpRMI5Xt9OeZO6mHlt+PT3K9bi/pvBtUoXDy2zGHSNYDXkO6jBlIiZ46oh +47g== X-Gm-Message-State: AHQUAuYl8AZqFJnob3nUUl+kLNTTqt63Rs6+K5T9LzXHLiezBXryabbs Pk7VvqpDl2Q39XRQoye67N0eQ1NaedB9Cadr/upJv4rCRw8t5g== X-Google-Smtp-Source: AHgI3IaAjfT6w7VmvxjmNV5qCzk/135GQ2qFf1MIr1QgctHhB7wm7eYNDCVsDztGHHcLSgXI+MIXN/tHyEmd X-Received: by 2002:aca:da05:: with SMTP id r5mr1065831oig.57.1550117028295; Wed, 13 Feb 2019 20:03:48 -0800 (PST) Received: from mail.overt.org (155.208.178.107.bc.googleusercontent.com. [107.178.208.155]) by smtp-relay.gmail.com with ESMTPS id u60sm136772otb.2.2019.02.13.20.03.48 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 13 Feb 2019 20:03:48 -0800 (PST) X-Relaying-Domain: gapps.overt.org Received: from authenticated-user (mail.overt.org [107.178.208.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.overt.org (Postfix) with ESMTPSA id 6C10C40F59; Wed, 13 Feb 2019 22:03:47 -0600 (CST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=overt.org; s=mail; t=1550117027; bh=JUKSLLIZCrvPo7ksiOa1zO69Hxj52k3dNnkNWTOj4oY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LrLwDojppV8viiRP+6RgHz7+LE0ybMNLR6XvSqPGHcQWkOpG9IfHVsm3GpiLp/nJ3 PK845vnXFHv4l/BxMsYO9SXsfcwHKUvznlIBJTb5HNDTaArN9KC0v1tT7hziD40CF3 uCK4G7BWNboXl9wSF2dg/+ypst6QmjGOxN42F3Wtf8WE/G0lhT/JWQL6F5/2aFgUGz SpcZL6AAKYshuSAwYzMqEmOdCKNAq4HvHCmdAHDQsdG3Y0IM98hGa9pZa3u9B4lVZH JFout98HOura8egf7efrbuGpTOEM/maLZ8lH+Uj2YiQz1x1ngtSLdE3szHovL9tPKV vA9HGjGsQPP/Q== From: Philip Langdale To: ffmpeg-devel@ffmpeg.org Date: Wed, 13 Feb 2019 20:03:37 -0800 Message-Id: <20190214040339.13789-3-philipl@overt.org> In-Reply-To: <20190214040339.13789-1-philipl@overt.org> References: <20190214040339.13789-1-philipl@overt.org> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/4] avcodec/nvdec: Add support for decoding HEVC 4:4:4 content X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Philip Langdale Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The latest generation video decoder on the Turing chips supports decoding HEVC 4:4:4. Supporting this is relatively straight-forward; we need to account for the different chroma format and pick the right output and sw formats at the right times. There was one bug which was the hard-coded assumption that the first chroma plane would be half-height; I fixed this to use the actual shift value on the plane. We also need to pass the SPS and PPS range extension flags. Signed-off-by: Philip Langdale --- libavcodec/hevcdec.c | 3 +++ libavcodec/nvdec.c | 42 +++++++++++++++++++++++++++++++++-------- libavcodec/nvdec_hevc.c | 30 +++++++++++++++++++++++++++++ 3 files changed, 67 insertions(+), 8 deletions(-) diff --git a/libavcodec/hevcdec.c b/libavcodec/hevcdec.c index b2a87d55db..967f8f1def 100644 --- a/libavcodec/hevcdec.c +++ b/libavcodec/hevcdec.c @@ -409,6 +409,9 @@ static enum AVPixelFormat get_format(HEVCContext *s, const HEVCSPS *sps) #endif break; case AV_PIX_FMT_YUV420P12: + case AV_PIX_FMT_YUV444P: + case AV_PIX_FMT_YUV444P10: + case AV_PIX_FMT_YUV444P12: #if CONFIG_HEVC_NVDEC_HWACCEL *fmt++ = AV_PIX_FMT_CUDA; #endif diff --git a/libavcodec/nvdec.c b/libavcodec/nvdec.c index c7d5379770..72201a1123 100644 --- a/libavcodec/nvdec.c +++ b/libavcodec/nvdec.c @@ -35,6 +35,11 @@ #include "nvdec.h" #include "internal.h" +#if !NVDECAPI_CHECK_VERSION(9, 0) +#define cudaVideoSurfaceFormat_YUV444 2 +#define cudaVideoSurfaceFormat_YUV444_16Bit 3 +#endif + typedef struct NVDECDecoder { CUvideodecoder decoder; @@ -274,7 +279,8 @@ int ff_nvdec_decode_init(AVCodecContext *avctx) CUVIDDECODECREATEINFO params = { 0 }; - int cuvid_codec_type, cuvid_chroma_format; + cudaVideoSurfaceFormat output_format; + int cuvid_codec_type, cuvid_chroma_format, chroma_444; int ret = 0; sw_desc = av_pix_fmt_desc_get(avctx->sw_pix_fmt); @@ -292,6 +298,7 @@ int ff_nvdec_decode_init(AVCodecContext *avctx) av_log(avctx, AV_LOG_ERROR, "Unsupported chroma format\n"); return AVERROR(ENOSYS); } + chroma_444 = cuvid_chroma_format == cudaVideoChromaFormat_444; if (!avctx->hw_frames_ctx) { ret = ff_decode_get_hw_frames_ctx(avctx, AV_HWDEVICE_TYPE_CUDA); @@ -299,6 +306,21 @@ int ff_nvdec_decode_init(AVCodecContext *avctx) return ret; } + switch (sw_desc->comp[0].depth) { + case 8: + output_format = chroma_444 ? cudaVideoSurfaceFormat_YUV444 : + cudaVideoSurfaceFormat_NV12; + break; + case 10: + case 12: + output_format = chroma_444 ? cudaVideoSurfaceFormat_YUV444_16Bit : + cudaVideoSurfaceFormat_P016; + break; + default: + av_log(avctx, AV_LOG_ERROR, "Unsupported bit depth\n"); + return AVERROR(ENOSYS); + } + frames_ctx = (AVHWFramesContext*)avctx->hw_frames_ctx->data; params.ulWidth = avctx->coded_width; @@ -306,8 +328,7 @@ int ff_nvdec_decode_init(AVCodecContext *avctx) params.ulTargetWidth = avctx->coded_width; params.ulTargetHeight = avctx->coded_height; params.bitDepthMinus8 = sw_desc->comp[0].depth - 8; - params.OutputFormat = params.bitDepthMinus8 ? - cudaVideoSurfaceFormat_P016 : cudaVideoSurfaceFormat_NV12; + params.OutputFormat = output_format; params.CodecType = cuvid_codec_type; params.ChromaFormat = cuvid_chroma_format; params.ulNumDecodeSurfaces = frames_ctx->initial_pool_size; @@ -386,6 +407,8 @@ static int nvdec_retrieve_data(void *logctx, AVFrame *frame) NVDECFrame *cf = (NVDECFrame*)fdd->hwaccel_priv; NVDECDecoder *decoder = (NVDECDecoder*)cf->decoder_ref->data; + AVHWFramesContext *hwctx = (AVHWFramesContext *)frame->hw_frames_ctx->data; + CUVIDPROCPARAMS vpp = { 0 }; NVDECFrame *unmap_data = NULL; @@ -394,6 +417,7 @@ static int nvdec_retrieve_data(void *logctx, AVFrame *frame) unsigned int pitch, i; unsigned int offset = 0; + int shift_h = 0, shift_v = 0; int ret = 0; vpp.progressive_frame = 1; @@ -427,10 +451,11 @@ static int nvdec_retrieve_data(void *logctx, AVFrame *frame) unmap_data->idx_ref = av_buffer_ref(cf->idx_ref); unmap_data->decoder_ref = av_buffer_ref(cf->decoder_ref); + av_pix_fmt_get_chroma_sub_sample(hwctx->sw_format, &shift_h, &shift_v); for (i = 0; frame->linesize[i]; i++) { frame->data[i] = (uint8_t*)(devptr + offset); frame->linesize[i] = pitch; - offset += pitch * (frame->height >> (i ? 1 : 0)); + offset += pitch * (frame->height >> (i ? shift_v : 0)); } goto finish; @@ -566,7 +591,7 @@ int ff_nvdec_frame_params(AVCodecContext *avctx, { AVHWFramesContext *frames_ctx = (AVHWFramesContext*)hw_frames_ctx->data; const AVPixFmtDescriptor *sw_desc; - int cuvid_codec_type, cuvid_chroma_format; + int cuvid_codec_type, cuvid_chroma_format, chroma_444; sw_desc = av_pix_fmt_desc_get(avctx->sw_pix_fmt); if (!sw_desc) @@ -583,6 +608,7 @@ int ff_nvdec_frame_params(AVCodecContext *avctx, av_log(avctx, AV_LOG_VERBOSE, "Unsupported chroma format\n"); return AVERROR(EINVAL); } + chroma_444 = cuvid_chroma_format == cudaVideoChromaFormat_444; frames_ctx->format = AV_PIX_FMT_CUDA; frames_ctx->width = (avctx->coded_width + 1) & ~1; @@ -601,13 +627,13 @@ int ff_nvdec_frame_params(AVCodecContext *avctx, switch (sw_desc->comp[0].depth) { case 8: - frames_ctx->sw_format = AV_PIX_FMT_NV12; + frames_ctx->sw_format = chroma_444 ? AV_PIX_FMT_YUV444P : AV_PIX_FMT_NV12; break; case 10: - frames_ctx->sw_format = AV_PIX_FMT_P010; + frames_ctx->sw_format = chroma_444 ? AV_PIX_FMT_YUV444P16 : AV_PIX_FMT_P010; break; case 12: - frames_ctx->sw_format = AV_PIX_FMT_P016; + frames_ctx->sw_format = chroma_444 ? AV_PIX_FMT_YUV444P16 : AV_PIX_FMT_P016; break; default: return AVERROR(EINVAL); diff --git a/libavcodec/nvdec_hevc.c b/libavcodec/nvdec_hevc.c index e04a701f3a..d11b5e8a38 100644 --- a/libavcodec/nvdec_hevc.c +++ b/libavcodec/nvdec_hevc.c @@ -131,6 +131,17 @@ static int nvdec_hevc_start_frame(AVCodecContext *avctx, .IdrPicFlag = IS_IDR(s), .bit_depth_luma_minus8 = sps->bit_depth - 8, .bit_depth_chroma_minus8 = sps->bit_depth - 8, +#if NVDECAPI_CHECK_VERSION(9, 0) + .sps_range_extension_flag = sps->sps_range_extension_flag, + .transform_skip_rotation_enabled_flag = sps->transform_skip_rotation_enabled_flag, + .transform_skip_context_enabled_flag = sps->transform_skip_context_enabled_flag, + .implicit_rdpcm_enabled_flag = sps->implicit_rdpcm_enabled_flag, + .explicit_rdpcm_enabled_flag = sps->explicit_rdpcm_enabled_flag, + .extended_precision_processing_flag = sps->extended_precision_processing_flag, + .intra_smoothing_disabled_flag = sps->intra_smoothing_disabled_flag, + .persistent_rice_adaptation_enabled_flag = sps->persistent_rice_adaptation_enabled_flag, + .cabac_bypass_alignment_enabled_flag = sps->cabac_bypass_alignment_enabled_flag, +#endif .dependent_slice_segments_enabled_flag = pps->dependent_slice_segments_enabled_flag, .slice_segment_header_extension_present_flag = pps->slice_header_extension_present_flag, @@ -164,6 +175,13 @@ static int nvdec_hevc_start_frame(AVCodecContext *avctx, .uniform_spacing_flag = pps->uniform_spacing_flag, .num_tile_columns_minus1 = pps->num_tile_columns - 1, .num_tile_rows_minus1 = pps->num_tile_rows - 1, +#if NVDECAPI_CHECK_VERSION(9, 0) + .pps_range_extension_flag = pps->pps_range_extensions_flag, + .cross_component_prediction_enabled_flag = pps->cross_component_prediction_enabled_flag, + .chroma_qp_offset_list_enabled_flag = pps->chroma_qp_offset_list_enabled_flag, + .diff_cu_chroma_qp_offset_depth = pps->diff_cu_chroma_qp_offset_depth, + .chroma_qp_offset_list_len_minus1 = pps->chroma_qp_offset_list_len_minus1, +#endif .NumBitsForShortTermRPSInSlice = s->sh.short_term_rps ? s->sh.short_term_ref_pic_set_size : 0, .NumDeltaPocsOfRefRpsIdx = s->sh.short_term_rps ? s->sh.short_term_rps->rps_idx_num_delta_pocs : 0, @@ -185,6 +203,18 @@ static int nvdec_hevc_start_frame(AVCodecContext *avctx, for (i = 0; i < pps->num_tile_rows; i++) ppc->row_height_minus1[i] = pps->row_height[i] - 1; +#if NVDECAPI_CHECK_VERSION(9, 0) + if (pps->chroma_qp_offset_list_len_minus1 > FF_ARRAY_ELEMS(ppc->cb_qp_offset_list) || + pps->chroma_qp_offset_list_len_minus1 > FF_ARRAY_ELEMS(ppc->cr_qp_offset_list)) { + av_log(avctx, AV_LOG_ERROR, "Too many chroma_qp_offsets\n"); + return AVERROR(ENOSYS); + } + for (i = 0; i <= pps->chroma_qp_offset_list_len_minus1; i++) { + ppc->cb_qp_offset_list[i] = pps->cb_qp_offset_list[i]; + ppc->cr_qp_offset_list[i] = pps->cr_qp_offset_list[i]; + } +#endif + if (s->rps[LT_CURR].nb_refs > FF_ARRAY_ELEMS(ppc->RefPicSetLtCurr) || s->rps[ST_CURR_BEF].nb_refs > FF_ARRAY_ELEMS(ppc->RefPicSetStCurrBefore) || s->rps[ST_CURR_AFT].nb_refs > FF_ARRAY_ELEMS(ppc->RefPicSetStCurrAfter)) {