From patchwork Mon Sep 16 09:28:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Khirnov X-Patchwork-Id: 51619 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9fc3:0:b0:48e:c0f8:d0de with SMTP id k3csp1160919vqy; Mon, 16 Sep 2024 02:32:40 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCX02vSnaBHtE+MyPcbYiuG5NUMCF8GZgMccbXjZ1Vvb7HoSccx7N3p2aA3uaVjtZ4X/+1Y1cOcIOm/yLEAlfj9N@gmail.com X-Google-Smtp-Source: AGHT+IHIB1zgJEmBoj0YPZuiqbEax5q4rmgvX45DBCdwDUSD89dnB0PqA333nr2ToCLVvMEsf/nE X-Received: by 2002:a17:907:e668:b0:a8a:822e:44c6 with SMTP id a640c23a62f3a-a90293dab80mr1668625466b.4.1726479159810; Mon, 16 Sep 2024 02:32:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726479159; cv=none; d=google.com; s=arc-20240605; b=Pv3l6XLsbNZi9K/m7GSYWIs2QM5BrKhDT5OB11H1WmHiLCLwPVaetPpIOAzpiUqlAD 7tks/wPdqt3LYFyHtsLzQ1JqoYXg4137M1B54oDZIflCkj99QWRF2yP9J/OBHIlv7vtK k9hurhugmrqsW8bmbnJuOYT2n9DMydeKRA84dDpzdLTWCL+4u4MmSoaCy9yeEKfwTUyj l58wncommbT3sCMQRvLnOG+Yj/DapTQcHU89GCm1dfOu5UF180nFbX88LiGcv7B8HsvF prE43y1zX2XN3skttveK2YUj82DapHWVeLaxsY1RWcT2t6Rr8ZBCbmD7Y15bZhMHYsoR XhiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=QJ7BsUrXlT2kphWX5gx52263uqLiJOwlFecxiIOLglQ=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=cxh6/aRZfyIiyZHbJYBXNF9ki0d5f/UEDitYSxqhL/FiPo750Lt8C0Iq7/hmKExh0F Y6jforj45rQP5dTwIJp1L2TpWjmqaSEqtNcM9e2KJQNRmiVnmbdnfsJuhY51S8bfg9da 0DWtDFYAGEURPcgQKQeG92hgT+OvxH0clCHLon2++wzEZBdhRssRjIo5LCbjziRpJ1Q3 3GSYTMqY8tALv5e/0gigTxoZxDqlz9Itlivc7LhroLhMopU3syoG4Ql4tfWOg6oNFMvS b+/9jjJTi0JIufFGl76n9A87dtdCmQ22/cshZJ6Zxx8O00PKtIYxqtM+FcF+T2or2bLF 8Teg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@khirnov.net header.s=mail header.b=E40aRjEE; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a90612cab7bsi345759166b.372.2024.09.16.02.32.39; Mon, 16 Sep 2024 02:32:39 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@khirnov.net header.s=mail header.b=E40aRjEE; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A9A9468DCB1; Mon, 16 Sep 2024 12:32:25 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail1.khirnov.net (quelana.khirnov.net [94.230.150.81]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CFC7F68D221 for ; Mon, 16 Sep 2024 12:32:17 +0300 (EEST) Authentication-Results: mail1.khirnov.net; dkim=pass (2048-bit key; unprotected) header.d=khirnov.net header.i=@khirnov.net header.a=rsa-sha256 header.s=mail header.b=E40aRjEE; dkim-atps=neutral Received: from localhost (mail1.khirnov.net [IPv6:::1]) by mail1.khirnov.net (Postfix) with ESMTP id 6EB144DE2 for ; Mon, 16 Sep 2024 11:32:17 +0200 (CEST) Received: from mail1.khirnov.net ([IPv6:::1]) by localhost (mail1.khirnov.net [IPv6:::1]) (amavis, port 10024) with ESMTP id 9agJ_2ViMM0V for ; Mon, 16 Sep 2024 11:32:15 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=khirnov.net; s=mail; t=1726479135; bh=gf7L/723RgxGoS8GVAbAHOmmmSpbGsI4z8Rvj4y46Ns=; h=From:To:Subject:Date:In-Reply-To:References:From; b=E40aRjEErLI3Gedxj7O5WL97GEmPKanxgofwfJ8+d/88QNEes3i9Yvv7KD0l7VqZZ ys4OImaaj/Sd2nRIQmivRfsr/LUw4N9mtJKrCQgGcZRPR+RjPlOMu+nLWgwENCbiWW oe+8u3Zqd3aMcjl+Jl+F96aGq7YShLPNKI8wOU8jXHOzRjs5FqV+4dxRCthOxnVq0Z jWncc5lk1zwkoz0U1L8k2ZYvAnPTjip7ndXzqfpq6TU3peoisB0bmZiQoSOGs037PR QHojIgVNsg0ve6CznRV+B0PDe8kzpv/J8W3SFVJ/xDRD1Gip2o9UoiJ/Zw0i+0Svmh W4J9ZnqjVlELA== Received: from libav.khirnov.net (libav.khirnov.net [IPv6:2a00:c500:561:201::7]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "libav.khirnov.net", Issuer "smtp.khirnov.net SMTP CA" (verified OK)) by mail1.khirnov.net (Postfix) with ESMTPS id 4B60881C for ; Mon, 16 Sep 2024 11:32:15 +0200 (CEST) Received: from libav.khirnov.net (libav.khirnov.net [IPv6:::1]) by libav.khirnov.net (Postfix) with ESMTP id 1F9CD3A086F for ; Mon, 16 Sep 2024 11:32:15 +0200 (CEST) From: Anton Khirnov To: ffmpeg-devel@ffmpeg.org Date: Mon, 16 Sep 2024 11:28:06 +0200 Message-ID: <20240916093211.10441-2-anton@khirnov.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240916093211.10441-1-anton@khirnov.net> References: <20240916093211.10441-1-anton@khirnov.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 15/23] lavc/hevcdec: implement decoding MV-HEVC X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: yf+pKqzhzdwy At most two layers are supported. Aspects of this work were sponsored by Vimeo and Meta. --- Now not using AV_FRAME_FLAG_DISCARD to discard views that are decoded but not output. --- Changelog | 1 + doc/decoders.texi | 45 +++++ libavcodec/hevc/hevcdec.c | 334 ++++++++++++++++++++++++++++++++++---- libavcodec/hevc/hevcdec.h | 35 +++- libavcodec/hevc/refs.c | 71 ++++++-- 5 files changed, 429 insertions(+), 57 deletions(-) + ref->base_layer_frame = (l != &s->layers[0] && s->layers[0].cur_frame) ? + s->layers[0].cur_frame - s->layers[0].DPB : -1; + if (s->sh.pic_output_flag) ref->flags = HEVC_FRAME_FLAG_OUTPUT | HEVC_FRAME_FLAG_SHORT_REF; else @@ -176,33 +199,49 @@ static void unref_missing_refs(HEVCLayerContext *l) } } -int ff_hevc_output_frames(HEVCContext *s, HEVCLayerContext *l, +int ff_hevc_output_frames(HEVCContext *s, + unsigned layers_active_decode, unsigned layers_active_output, unsigned max_output, unsigned max_dpb, int discard) { while (1) { - int nb_dpb = 0; + int nb_dpb[HEVC_VPS_MAX_LAYERS] = { 0 }; int nb_output = 0; int min_poc = INT_MAX; - int i, min_idx, ret = 0; + int min_layer = -1; + int min_idx, ret = 0; - for (i = 0; i < FF_ARRAY_ELEMS(l->DPB); i++) { - HEVCFrame *frame = &l->DPB[i]; - if (frame->flags & HEVC_FRAME_FLAG_OUTPUT) { - nb_output++; - if (frame->poc < min_poc || nb_output == 1) { - min_poc = frame->poc; - min_idx = i; + for (int layer = 0; layer < FF_ARRAY_ELEMS(s->layers); layer++) { + HEVCLayerContext *l = &s->layers[layer]; + + if (!(layers_active_decode & (1 << layer))) + continue; + + for (int i = 0; i < FF_ARRAY_ELEMS(l->DPB); i++) { + HEVCFrame *frame = &l->DPB[i]; + if (frame->flags & HEVC_FRAME_FLAG_OUTPUT) { + // nb_output counts AUs with an output-pending frame + // in at least one layer + if (!(frame->base_layer_frame >= 0 && + (s->layers[0].DPB[frame->base_layer_frame].flags & HEVC_FRAME_FLAG_OUTPUT))) + nb_output++; + if (min_layer < 0 || frame->poc < min_poc) { + min_poc = frame->poc; + min_idx = i; + min_layer = layer; + } } + nb_dpb[layer] += !!frame->flags; } - nb_dpb += !!frame->flags; } if (nb_output > max_output || - (nb_output && nb_dpb > max_dpb)) { - HEVCFrame *frame = &l->DPB[min_idx]; + (nb_output && + (nb_dpb[0] > max_dpb || nb_dpb[1] > max_dpb))) { + HEVCFrame *frame = &s->layers[min_layer].DPB[min_idx]; AVFrame *f = frame->needs_fg ? frame->frame_grain : frame->f; + int output = !discard && (layers_active_output & (1 << min_layer)); - if (!discard) { + if (output) { f->pkt_dts = s->pkt_dts; ret = ff_container_fifo_write(s->output_fifo, f); } @@ -210,8 +249,8 @@ int ff_hevc_output_frames(HEVCContext *s, HEVCLayerContext *l, if (ret < 0) return ret; - av_log(s->avctx, AV_LOG_DEBUG, "%s frame with POC %d.\n", - discard ? "Discarded" : "Output", frame->poc); + av_log(s->avctx, AV_LOG_DEBUG, "%s frame with POC %d/%d.\n", + output ? "Output" : "Discarded", min_layer, frame->poc); continue; } return 0; diff --git a/Changelog b/Changelog index b6f91d7c8c..ff5d1b1bc8 100644 --- a/Changelog +++ b/Changelog @@ -21,6 +21,7 @@ version : - MediaCodec AAC/AMR-NB/AMR-WB/MP3 decoding - YUV colorspace negotiation for codecs and filters, obsoleting the YUVJ pixel format +- MV-HEVC decoding version 7.0: diff --git a/doc/decoders.texi b/doc/decoders.texi index 2fcc761d2f..17bb361ffa 100644 --- a/doc/decoders.texi +++ b/doc/decoders.texi @@ -38,6 +38,51 @@ Select an operating point of a scalable AV1 bitstream (0 - 31). Default is 0. @end table +@section hevc +HEVC (AKA ITU-T H.265 or ISO/IEC 23008-2) decoder. + +The decoder supports MV-HEVC multiview streams with at most two views. Views to +be output are selected by supplying a list of view IDs to the decoder (the +@option{view_ids} option). This option may be set either statically before +decoder init, or from the @code{get_format()} callback - useful for the case +when the view count or IDs change dynamically during decoding. + +Only the base layer is decoded by default. + +Note that if you are using the @code{ffmpeg} CLI tool, you should be using view +specifiers as documented in its manual, rather than the options documented here. + +@subsection Options + +@table @option + +@item view_ids (MV-HEVC) +Specify a list of view IDs that should be output. This option can also be set to +a single '-1', which will cause all views defined in the VPS to be decoded and +output. + +@item view_ids_available (MV-HEVC) +This option may be read by the caller to retrieve an array of view IDs available +in the active VPS. The array is empty for single-layer video. + +The value of this option is guaranteed to be accurate when read from the +@code{get_format()} callback. It may also be set at other times (e.g. after +opening the decoder), but the value is informational only and may be incorrect +(e.g. when the stream contains multiple distinct VPS NALUs). + +@item view_pos_available (MV-HEVC) +This option may be read by the caller to retrieve an array of view positions +(left, right, or unspecified) available in the active VPS, as +@code{AVStereo3DView} values. When the array is available, its elements apply to +the corresponding elements of @option{view_ids_available}, i.e. +@code{view_pos_available[i]} contains the position of view with ID +@code{view_ids_available[i]}. + +Same validity restrictions as for @option{view_ids_available} apply to +this option. + +@end table + @section rawvideo Raw video decoder. diff --git a/libavcodec/hevc/hevcdec.c b/libavcodec/hevc/hevcdec.c index cbf763b8be..15828ef9eb 100644 --- a/libavcodec/hevc/hevcdec.c +++ b/libavcodec/hevc/hevcdec.c @@ -34,6 +34,7 @@ #include "libavutil/mem.h" #include "libavutil/opt.h" #include "libavutil/pixdesc.h" +#include "libavutil/stereo3d.h" #include "libavutil/timecode.h" #include "aom_film_grain.h" @@ -417,6 +418,109 @@ static int export_stream_params_from_sei(HEVCContext *s) return 0; } +static int export_multilayer(HEVCContext *s, const HEVCVPS *vps) +{ + const HEVCSEITDRDI *tdrdi = &s->sei.tdrdi; + + av_freep(&s->view_ids_available); + s->nb_view_ids_available = 0; + av_freep(&s->view_pos_available); + s->nb_view_pos_available = 0; + + // don't export anything in the trivial case (1 layer, view id=0) + if (vps->nb_layers < 2 && !vps->view_id[0]) + return 0; + + s->view_ids_available = av_calloc(vps->nb_layers, sizeof(*s->view_ids_available)); + if (!s->view_ids_available) + return AVERROR(ENOMEM); + + if (tdrdi->num_ref_displays) { + s->view_pos_available = av_calloc(vps->nb_layers, sizeof(*s->view_pos_available)); + if (!s->view_pos_available) + return AVERROR(ENOMEM); + } + + for (int i = 0; i < vps->nb_layers; i++) { + s->view_ids_available[i] = vps->view_id[i]; + + if (s->view_pos_available) { + s->view_pos_available[i] = vps->view_id[i] == tdrdi->left_view_id[0] ? + AV_STEREO3D_VIEW_LEFT : + vps->view_id[i] == tdrdi->right_view_id[0] ? + AV_STEREO3D_VIEW_RIGHT : AV_STEREO3D_VIEW_UNSPEC; + } + } + s->nb_view_ids_available = vps->nb_layers; + s->nb_view_pos_available = s->view_pos_available ? vps->nb_layers : 0; + + return 0; +} + +static int setup_multilayer(HEVCContext *s, const HEVCVPS *vps) +{ + unsigned layers_active_output = 0, highest_layer; + + s->layers_active_output = 1; + s->layers_active_decode = 1; + + // nothing requested - decode base layer only + if (!s->nb_view_ids) + return 0; + + if (s->nb_view_ids == 1 && s->view_ids[0] == -1) { + layers_active_output = (1 << vps->nb_layers) - 1; + } else { + for (int i = 0; i < s->nb_view_ids; i++) { + int view_id = s->view_ids[i]; + int layer_idx = -1; + + if (view_id < 0) { + av_log(s->avctx, AV_LOG_ERROR, + "Invalid view ID requested: %d\n", view_id); + return AVERROR(EINVAL); + } + + for (int j = 0; j < vps->nb_layers; j++) { + if (vps->view_id[j] == view_id) { + layer_idx = j; + break; + } + } + if (layer_idx < 0) { + av_log(s->avctx, AV_LOG_ERROR, + "View ID %d not present in VPS\n", view_id); + return AVERROR(EINVAL); + } + layers_active_output |= 1 << layer_idx; + } + } + + if (!layers_active_output) { + av_log(s->avctx, AV_LOG_ERROR, "No layers selected\n"); + return AVERROR_BUG; + } + + highest_layer = ff_log2(layers_active_output); + if (highest_layer >= FF_ARRAY_ELEMS(s->layers)) { + av_log(s->avctx, AV_LOG_ERROR, + "Too many layers requested: %u\n", layers_active_output); + return AVERROR(EINVAL); + } + + /* Assume a higher layer depends on all the lower ones. + * This is enforced in VPS parsing currently, this logic will need + * to be changed if we want to support more complex dependency structures. + */ + s->layers_active_decode = (1 << (highest_layer + 1)) - 1; + s->layers_active_output = layers_active_output; + + av_log(s->avctx, AV_LOG_DEBUG, "decode/output layers: %x/%x\n", + s->layers_active_decode, s->layers_active_output); + + return 0; +} + static enum AVPixelFormat get_format(HEVCContext *s, const HEVCSPS *sps) { #define HWACCEL_MAX (CONFIG_HEVC_DXVA2_HWACCEL + \ @@ -428,6 +532,7 @@ static enum AVPixelFormat get_format(HEVCContext *s, const HEVCSPS *sps) CONFIG_HEVC_VDPAU_HWACCEL + \ CONFIG_HEVC_VULKAN_HWACCEL) enum AVPixelFormat pix_fmts[HWACCEL_MAX + 2], *fmt = pix_fmts; + int ret; switch (sps->pix_fmt) { case AV_PIX_FMT_YUV420P: @@ -547,7 +652,23 @@ static enum AVPixelFormat get_format(HEVCContext *s, const HEVCSPS *sps) *fmt++ = sps->pix_fmt; *fmt = AV_PIX_FMT_NONE; - return ff_get_format(s->avctx, pix_fmts); + // export multilayer information from active VPS to the caller, + // so it is available in get_format() + ret = export_multilayer(s, sps->vps); + if (ret < 0) + return ret; + + ret = ff_get_format(s->avctx, pix_fmts); + if (ret < 0) + return ret; + s->avctx->pix_fmt = ret; + + // set up multilayer decoding, if requested by caller + ret = setup_multilayer(s, sps->vps); + if (ret < 0) + return ret; + + return 0; } static int set_sps(HEVCContext *s, HEVCLayerContext *l, const HEVCSPS *sps) @@ -2948,13 +3069,60 @@ static int set_side_data(HEVCContext *s) return 0; } -static int hevc_frame_start(HEVCContext *s, HEVCLayerContext *l) +static int find_finish_setup_nal(const HEVCContext *s) +{ + int nal_idx = 0; + + for (int i = nal_idx; i < s->pkt.nb_nals; i++) { + const H2645NAL *nal = &s->pkt.nals[i]; + const int layer_id = nal->nuh_layer_id; + GetBitContext gb = nal->gb; + + if (layer_id > HEVC_MAX_NUH_LAYER_ID || s->vps->layer_idx[layer_id] < 0 || + !(s->layers_active_decode & (1 << s->vps->layer_idx[layer_id]))) + continue; + + switch (nal->type) { + case HEVC_NAL_TRAIL_R: + case HEVC_NAL_TRAIL_N: + case HEVC_NAL_TSA_N: + case HEVC_NAL_TSA_R: + case HEVC_NAL_STSA_N: + case HEVC_NAL_STSA_R: + case HEVC_NAL_BLA_W_LP: + case HEVC_NAL_BLA_W_RADL: + case HEVC_NAL_BLA_N_LP: + case HEVC_NAL_IDR_W_RADL: + case HEVC_NAL_IDR_N_LP: + case HEVC_NAL_CRA_NUT: + case HEVC_NAL_RADL_N: + case HEVC_NAL_RADL_R: + case HEVC_NAL_RASL_N: + case HEVC_NAL_RASL_R: + if (!get_bits1(&gb)) // first_slice_segment_in_pic_flag + continue; + case HEVC_NAL_VPS: + case HEVC_NAL_SPS: + case HEVC_NAL_PPS: + nal_idx = i; + break; + } + } + + return nal_idx; +} + +static int hevc_frame_start(HEVCContext *s, HEVCLayerContext *l, + unsigned nal_idx) { const HEVCPPS *const pps = s->ps.pps_list[s->sh.pps_id]; const HEVCSPS *const sps = pps->sps; int pic_size_in_ctb = ((sps->width >> sps->log2_min_cb_size) + 1) * ((sps->height >> sps->log2_min_cb_size) + 1); - int new_sequence = IS_IDR(s) || IS_BLA(s) || s->last_eos; + int new_sequence = (l == &s->layers[0]) && + (IS_IDR(s) || IS_BLA(s) || s->last_eos); + int prev_layers_active_decode = s->layers_active_decode; + int prev_layers_active_output = s->layers_active_output; int ret; if (sps->vps != s->vps && l != &s->layers[0]) { @@ -2965,7 +3133,32 @@ static int hevc_frame_start(HEVCContext *s, HEVCLayerContext *l) ff_refstruct_replace(&s->pps, pps); if (l->sps != sps) { - enum AVPixelFormat pix_fmt; + const HEVCSPS *sps_base = s->layers[0].sps; + enum AVPixelFormat pix_fmt = sps->pix_fmt; + + if (l != &s->layers[0]) { + if (!sps_base) { + av_log(s->avctx, AV_LOG_ERROR, + "Access unit starts with a non-base layer frame\n"); + return AVERROR_INVALIDDATA; + } + + // Files produced by Vision Pro lack VPS extension VUI, + // so the secondary layer has no range information. + // This check avoids failing in such a case. + if (sps_base->pix_fmt == AV_PIX_FMT_YUVJ420P && + sps->pix_fmt == AV_PIX_FMT_YUV420P && + !sps->vui.common.video_signal_type_present_flag) + pix_fmt = sps_base->pix_fmt; + + if (pix_fmt != sps_base->pix_fmt || + sps->width != sps_base->width || + sps->height != sps_base->height) { + av_log(s->avctx, AV_LOG_ERROR, + "Base/non-base layer SPS have unsupported parameter combination\n"); + return AVERROR(ENOSYS); + } + } ff_hevc_clear_refs(l); @@ -2973,14 +3166,17 @@ static int hevc_frame_start(HEVCContext *s, HEVCLayerContext *l) if (ret < 0) return ret; - export_stream_params(s, sps); + if (l == &s->layers[0]) { + export_stream_params(s, sps); - pix_fmt = get_format(s, sps); - if (pix_fmt < 0) - return pix_fmt; - s->avctx->pix_fmt = pix_fmt; + ret = get_format(s, sps); + if (ret < 0) { + set_sps(s, l, NULL); + return ret; + } - new_sequence = 1; + new_sequence = 1; + } } memset(l->horizontal_bs, 0, l->bs_width * l->bs_height); @@ -3015,7 +3211,8 @@ static int hevc_frame_start(HEVCContext *s, HEVCLayerContext *l) s->local_ctx[0].end_of_tiles_x = pps->column_width[0] << sps->log2_ctb_size; if (new_sequence) { - ret = ff_hevc_output_frames(s, l, 0, 0, s->sh.no_output_of_prior_pics_flag); + ret = ff_hevc_output_frames(s, prev_layers_active_decode, prev_layers_active_output, + 0, 0, s->sh.no_output_of_prior_pics_flag); if (ret < 0) return ret; } @@ -3072,7 +3269,8 @@ static int hevc_frame_start(HEVCContext *s, HEVCLayerContext *l) s->cur_frame->f->pict_type = 3 - s->sh.slice_type; - ret = ff_hevc_output_frames(s, l, sps->temporal_layer[sps->max_sub_layers - 1].num_reorder_pics, + ret = ff_hevc_output_frames(s, s->layers_active_decode, s->layers_active_output, + sps->temporal_layer[sps->max_sub_layers - 1].num_reorder_pics, sps->temporal_layer[sps->max_sub_layers - 1].max_dec_pic_buffering, 0); if (ret < 0) goto fail; @@ -3083,13 +3281,21 @@ static int hevc_frame_start(HEVCContext *s, HEVCLayerContext *l) goto fail; } - ff_thread_finish_setup(s->avctx); + // after starting the base-layer frame we know which layers will be decoded, + // so we can now figure out which NALUs to wait for before we can call + // ff_thread_finish_setup() + if (l == &s->layers[0]) + s->finish_setup_nal_idx = find_finish_setup_nal(s); + + if (nal_idx >= s->finish_setup_nal_idx) + ff_thread_finish_setup(s->avctx); return 0; fail: - if (s->cur_frame) - ff_hevc_unref_frame(s->cur_frame, ~0); + if (l->cur_frame) + ff_hevc_unref_frame(l->cur_frame, ~0); + l->cur_frame = NULL; s->cur_frame = s->collocated_ref = NULL; s->slice_initialized = 0; return ret; @@ -3164,9 +3370,9 @@ static int verify_md5(HEVCContext *s, AVFrame *frame) return err; } -static int hevc_frame_end(HEVCContext *s) +static int hevc_frame_end(HEVCContext *s, HEVCLayerContext *l) { - HEVCFrame *out = s->cur_frame; + HEVCFrame *out = l->cur_frame; const AVFilmGrainParams *fgp; av_unused int ret; @@ -3198,23 +3404,32 @@ static int hevc_frame_end(HEVCContext *s) } else { if (s->avctx->err_recognition & AV_EF_CRCCHECK && s->sei.picture_hash.is_md5) { - ret = verify_md5(s, s->cur_frame->f); + ret = verify_md5(s, out->f); if (ret < 0 && s->avctx->err_recognition & AV_EF_EXPLODE) return ret; } } s->sei.picture_hash.is_md5 = 0; - av_log(s->avctx, AV_LOG_DEBUG, "Decoded frame with POC %d.\n", s->poc); + av_log(s->avctx, AV_LOG_DEBUG, "Decoded frame with POC %zu/%d.\n", + l - s->layers, s->poc); return 0; } -static int decode_slice(HEVCContext *s, HEVCLayerContext *l, - const H2645NAL *nal, GetBitContext *gb) +static int decode_slice(HEVCContext *s, unsigned nal_idx, GetBitContext *gb) { + const int layer_idx = s->vps ? s->vps->layer_idx[s->nuh_layer_id] : 0; + HEVCLayerContext *l; int ret; + // skip layers not requested to be decoded + // layers_active_decode can only change while decoding a base-layer frame, + // so we can check it for non-base layers + if (layer_idx < 0 || + (s->nuh_layer_id > 0 && !(s->layers_active_decode & (1 << layer_idx)))) + return 0; + ret = hls_slice_header(&s->sh, s, gb); if (ret < 0) { // hls_slice_header() does not cleanup on failure thus the state now is inconsistant so we cannot use it on depandant slices @@ -3230,16 +3445,25 @@ static int decode_slice(HEVCContext *s, HEVCLayerContext *l, return 0; } + // switching to a new layer, mark previous layer's frame (if any) as done + if (s->cur_layer != layer_idx && + s->layers[s->cur_layer].cur_frame && + s->avctx->active_thread_type == FF_THREAD_FRAME) + ff_progress_frame_report(&s->layers[s->cur_layer].cur_frame->tf, INT_MAX); + + s->cur_layer = layer_idx; + l = &s->layers[s->cur_layer]; + if (s->sh.first_slice_in_pic_flag) { - if (s->cur_frame) { + if (l->cur_frame) { av_log(s->avctx, AV_LOG_ERROR, "Two slices reporting being the first in the same frame.\n"); return AVERROR_INVALIDDATA; } - ret = hevc_frame_start(s, l); + ret = hevc_frame_start(s, l, nal_idx); if (ret < 0) return ret; - } else if (!s->cur_frame) { + } else if (!l->cur_frame) { av_log(s->avctx, AV_LOG_ERROR, "First slice in a frame missing.\n"); return AVERROR_INVALIDDATA; } @@ -3251,16 +3475,16 @@ static int decode_slice(HEVCContext *s, HEVCLayerContext *l, return AVERROR_INVALIDDATA; } - ret = decode_slice_data(s, l, nal, gb); + ret = decode_slice_data(s, l, &s->pkt.nals[nal_idx], gb); if (ret < 0) return ret; return 0; } -static int decode_nal_unit(HEVCContext *s, const H2645NAL *nal) +static int decode_nal_unit(HEVCContext *s, unsigned nal_idx) { - HEVCLayerContext *l = &s->layers[0]; + H2645NAL *nal = &s->pkt.nals[nal_idx]; GetBitContext gb = nal->gb; int ret; @@ -3319,7 +3543,7 @@ static int decode_nal_unit(HEVCContext *s, const H2645NAL *nal) case HEVC_NAL_RADL_R: case HEVC_NAL_RASL_N: case HEVC_NAL_RASL_R: - ret = decode_slice(s, l, nal, &gb); + ret = decode_slice(s, nal_idx, &gb); if (ret < 0) goto fail; break; @@ -3420,11 +3644,10 @@ static int decode_nal_units(HEVCContext *s, const uint8_t *buf, int length) H2645NAL *nal = &s->pkt.nals[i]; if (s->avctx->skip_frame >= AVDISCARD_ALL || - (s->avctx->skip_frame >= AVDISCARD_NONREF - && ff_hevc_nal_is_nonref(nal->type)) || nal->nuh_layer_id > 0) + (s->avctx->skip_frame >= AVDISCARD_NONREF && ff_hevc_nal_is_nonref(nal->type))) continue; - ret = decode_nal_unit(s, nal); + ret = decode_nal_unit(s, i); if (ret < 0) { av_log(s->avctx, AV_LOG_WARNING, "Error parsing NAL unit #%d.\n", i); @@ -3433,12 +3656,17 @@ static int decode_nal_units(HEVCContext *s, const uint8_t *buf, int length) } fail: - if (s->cur_frame) { + for (int i = 0; i < FF_ARRAY_ELEMS(s->layers); i++) { + HEVCLayerContext *l = &s->layers[i]; + + if (!l->cur_frame) + continue; + if (ret >= 0) - ret = hevc_frame_end(s); + ret = hevc_frame_end(s, l); if (s->avctx->active_thread_type == FF_THREAD_FRAME) - ff_progress_frame_report(&s->cur_frame->tf, INT_MAX); + ff_progress_frame_report(&l->cur_frame->tf, INT_MAX); } return ret; @@ -3459,6 +3687,11 @@ static int hevc_decode_extradata(HEVCContext *s, uint8_t *buf, int length, int f if (first && s->ps.sps_list[i]) { const HEVCSPS *sps = s->ps.sps_list[i]; export_stream_params(s, sps); + + ret = export_multilayer(s, sps->vps); + if (ret < 0) + return ret; + break; } } @@ -3489,7 +3722,8 @@ static int hevc_receive_frame(AVCodecContext *avctx, AVFrame *frame) av_packet_unref(avpkt); ret = ff_decode_get_packet(avctx, avpkt); if (ret == AVERROR_EOF) { - ret = ff_hevc_output_frames(s, &s->layers[0], 0, 0, 0); + ret = ff_hevc_output_frames(s, s->layers_active_decode, + s->layers_active_output, 0, 0, 0); if (ret < 0) return ret; goto do_output; @@ -3555,6 +3789,8 @@ static int hevc_ref_frame(HEVCFrame *dst, const HEVCFrame *src) dst->ctb_count = src->ctb_count; dst->flags = src->flags; + dst->base_layer_frame = src->base_layer_frame; + ff_refstruct_replace(&dst->hwaccel_picture_private, src->hwaccel_picture_private); @@ -3690,9 +3926,24 @@ static int hevc_update_thread_context(AVCodecContext *dst, s->is_nalff = s0->is_nalff; s->nal_length_size = s0->nal_length_size; + s->layers_active_decode = s0->layers_active_decode; + s->layers_active_output = s0->layers_active_output; s->film_grain_warning_shown = s0->film_grain_warning_shown; + if (s->nb_view_ids != s0->nb_view_ids || + memcmp(s->view_ids, s0->view_ids, sizeof(*s->view_ids) * s->nb_view_ids)) { + av_freep(&s->view_ids); + s->nb_view_ids = 0; + + if (s0->nb_view_ids) { + s->view_ids = av_memdup(s0->view_ids, s0->nb_view_ids * sizeof(*s0->view_ids)); + if (!s->view_ids) + return AVERROR(ENOMEM); + s->nb_view_ids = s0->nb_view_ids; + } + } + ret = ff_h2645_sei_ctx_replace(&s->sei.common, &s0->sei.common); if (ret < 0) return ret; @@ -3787,6 +4038,19 @@ static const AVOption options[] = { AV_OPT_TYPE_BOOL, {.i64 = 0}, 0, 1, PAR }, { "strict-displaywin", "stricly apply default display window size", OFFSET(apply_defdispwin), AV_OPT_TYPE_BOOL, {.i64 = 0}, 0, 1, PAR }, + { "view_ids", "Array of view IDs that should be decoded and output; a single -1 to decode all views", + .offset = OFFSET(view_ids), .type = AV_OPT_TYPE_INT | AV_OPT_TYPE_FLAG_ARRAY, + .min = -1, .max = INT_MAX, .flags = PAR }, + { "view_ids_available", "Array of available view IDs is exported here", + .offset = OFFSET(view_ids_available), .type = AV_OPT_TYPE_UINT | AV_OPT_TYPE_FLAG_ARRAY, + .flags = PAR | AV_OPT_FLAG_EXPORT | AV_OPT_FLAG_READONLY }, + { "view_pos_available", "Array of view positions for view_ids_available is exported here, as AVStereo3DView", + .offset = OFFSET(view_pos_available), .type = AV_OPT_TYPE_UINT | AV_OPT_TYPE_FLAG_ARRAY, + .flags = PAR | AV_OPT_FLAG_EXPORT | AV_OPT_FLAG_READONLY, .unit = "view_pos" }, + { "unspecified", .type = AV_OPT_TYPE_CONST, .default_val = { .i64 = AV_STEREO3D_VIEW_UNSPEC }, .unit = "view_pos" }, + { "left", .type = AV_OPT_TYPE_CONST, .default_val = { .i64 = AV_STEREO3D_VIEW_LEFT }, .unit = "view_pos" }, + { "right", .type = AV_OPT_TYPE_CONST, .default_val = { .i64 = AV_STEREO3D_VIEW_RIGHT }, .unit = "view_pos" }, + { NULL }, }; diff --git a/libavcodec/hevc/hevcdec.h b/libavcodec/hevc/hevcdec.h index 57bf5aa599..6ba2ca3887 100644 --- a/libavcodec/hevc/hevcdec.h +++ b/libavcodec/hevc/hevcdec.h @@ -375,6 +375,10 @@ typedef struct HEVCFrame { void *hwaccel_picture_private; ///< RefStruct reference + // for secondary-layer frames, this is the DPB index of the base-layer frame + // from the same AU, if it exists, otherwise -1 + int base_layer_frame; + /** * A combination of HEVC_FRAME_FLAG_* */ @@ -487,9 +491,13 @@ typedef struct HEVCContext { HEVCLocalContext *local_ctx; unsigned nb_local_ctx; - HEVCLayerContext layers[1]; - // index in layers of the layer currently being decoded + // per-layer decoding state, addressed by VPS layer indices + HEVCLayerContext layers[HEVC_VPS_MAX_LAYERS]; + // VPS index of the layer currently being decoded unsigned cur_layer; + // bitmask of layer indices that are active for decoding/output + unsigned layers_active_decode; + unsigned layers_active_output; /** 1 if the independent slice segment header was successfully parsed */ uint8_t slice_initialized; @@ -539,11 +547,24 @@ typedef struct HEVCContext { H2645Packet pkt; // type of the first VCL NAL of the current frame enum HEVCNALUnitType first_nal_type; + // index in pkt.nals of the NAL unit after which we can call + // ff_thread_finish_setup() + unsigned finish_setup_nal_idx; int is_nalff; ///< this flag is != 0 if bitstream is encapsulated ///< as a format defined in 14496-15 int apply_defdispwin; + // multi-layer AVOptions + int *view_ids; + unsigned nb_view_ids; + + unsigned *view_ids_available; + unsigned nb_view_ids_available; + + unsigned *view_pos_available; + unsigned nb_view_pos_available; + int nal_length_size; ///< Number of bytes used for nal length (1, 2 or 4) int nuh_layer_id; @@ -644,12 +665,14 @@ static av_always_inline int ff_hevc_nal_is_nonref(enum HEVCNALUnitType type) * Find frames in the DPB that are ready for output and either write them to the * output FIFO or drop their output flag, depending on the value of discard. * - * @param max_output maximum number of output-pending frames that can be - * present in the DPB before output is triggered + * @param max_output maximum number of AUs with an output-pending frame in at + * least one layer that can be present in the DPB before output + * is triggered * @param max_dpb maximum number of any frames that can be present in the DPB - * before output is triggered + * for any layer before output is triggered */ -int ff_hevc_output_frames(HEVCContext *s, HEVCLayerContext *l, +int ff_hevc_output_frames(HEVCContext *s, + unsigned layers_active_decode, unsigned layers_active_output, unsigned max_output, unsigned max_dpb, int discard); void ff_hevc_unref_frame(HEVCFrame *frame, int flags); diff --git a/libavcodec/hevc/refs.c b/libavcodec/hevc/refs.c index 625ac68aaa..71f07c798c 100644 --- a/libavcodec/hevc/refs.c +++ b/libavcodec/hevc/refs.c @@ -80,12 +80,32 @@ void ff_hevc_flush_dpb(HEVCContext *s) static HEVCFrame *alloc_frame(HEVCContext *s, HEVCLayerContext *l) { + const HEVCVPS *vps = l->sps->vps; + const int view_id = vps->view_id[s->cur_layer]; int i, j, ret; for (i = 0; i < FF_ARRAY_ELEMS(l->DPB); i++) { HEVCFrame *frame = &l->DPB[i]; if (frame->f) continue; + ret = ff_progress_frame_alloc(s->avctx, &frame->tf); + if (ret < 0) + return NULL; + + // add view ID side data if it's nontrivial + if (vps->nb_layers > 1 || view_id) { + AVFrameSideData *sd = av_frame_side_data_new(&frame->f->side_data, + &frame->f->nb_side_data, + AV_FRAME_DATA_VIEW_ID, + sizeof(int), 0); + if (!sd) + goto fail; + *(int*)sd->data = view_id; + } + ret = ff_progress_frame_get_buffer(s->avctx, &frame->tf, AV_GET_BUFFER_FLAG_REF); if (ret < 0) @@ -152,6 +172,9 @@ int ff_hevc_set_new_ref(HEVCContext *s, HEVCLayerContext *l, int poc) l->cur_frame = ref; s->collocated_ref = NULL;