From patchwork Thu Feb 24 11:49:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?Q2zDqW1lbnQgQsWTc2No?= X-Patchwork-Id: 34516 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6838:d078:0:0:0:0 with SMTP id x24csp1751089nkx; Thu, 24 Feb 2022 03:50:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJzGjrjYQwhPWIntzbsdJUg03PSvZobBFkmAZkUoTaCiDUAs3gY3rnpclVcs2EEEQLOI7IF9 X-Received: by 2002:a17:906:eb0f:b0:6af:738:3380 with SMTP id mb15-20020a170906eb0f00b006af07383380mr1924722ejb.398.1645703406853; Thu, 24 Feb 2022 03:50:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645703406; cv=none; d=google.com; s=arc-20160816; b=WNvsPwP4r5El/u68XNc/n8OJCV4PXaMoCluKeMOXTGCYtX6HJjnTkGKFv+lVlSC8ka /7S4XKHKZpMicrw5CGZPpsT9WvcfE11zQtVZq8NzLtODBrMF9HwcrxAeMAJBIk1APr2q VKS0ccVXaLsRrado8xevwfeyKyno0l74oBwU6WFj1PH7jQq9W1m5+J6SV9W7HOWD+kzq fcrTRaGjbTffuXksCjWdFtiL6+Z2wxnWj2FdWoBa6rNVCEDgYvDn96U53tXHsivwpvRa MJw0Q7M5yoeI4QgxIzLDcc20sWwTr3QOe4h04+WOru7e9pstSAryqnIe9Z5PMctBzx/N Pjdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=9KAP+PuFAbUvcj/spqt0YlD70Rvsz4pkhFkB5JvYMkE=; b=A7xykUDe8h8HmlDrgfGlfVKTmndFlogAuHe+aLwZ9GfhPNsfegn7DeO0Zv3v/LojVi ruSPJXku0arrQ+SSufS7USoj4japsaK3/mLDSkZt0TEYjsdx29JvKfr56pO7SKPat2q2 5C1+8NZarn4X1W3q4MiJ7n+Gu49Vv7FxY+tOe6IOgSvweSgP/nv0Dm6fLL6tLS4gUsjy DAAGdFT0bwjmBxz+Gyw57ATOmFsodg5VJPslrl4HzHkG5DG+cVHOxx1T/GCIKH1mDAx6 fBWN+z2cjiOGRx+abTHk0eM+BAGW8AMyWeyxO4fq9wmhDtN3EX7DvHk6R6y8L2e7AgzC oTkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@pkh.me header.s=selector1 header.b=XS7D79ht; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=pkh.me Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id g3si1941689ejt.122.2022.02.24.03.50.06; Thu, 24 Feb 2022 03:50:06 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@pkh.me header.s=selector1 header.b=XS7D79ht; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=pkh.me Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9ECBC68B33C; Thu, 24 Feb 2022 13:49:25 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ssq0.pkh.me (laubervilliers-656-1-228-164.w92-154.abo.wanadoo.fr [92.154.28.164]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4589E68B333 for ; Thu, 24 Feb 2022 13:49:17 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pkh.me; s=selector1; t=1645703347; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dPAAPYpZOBhPZGyIoZhXU+/6k7pl2W1viI2sCbOCpbg=; b=XS7D79ht1mP1MFhwWUXHKGvhOCEAdyunK3MVJGwQEsdDmlJ02tzrceCuvK8jpKhWt0/Ooj j7kB2w0fHM7PBF5R8XLblGtuxQWpvzgtpk/RmRsGGObsWidd9FuNxungtkpM6mu9Jqss7Y cZkWIEvS4HteR5Raxf5sTzI5gr2R/m4= Received: from localhost (ssq0.pkh.me [local]) by ssq0.pkh.me (OpenSMTPD) with ESMTPA id 8529ccf9; Thu, 24 Feb 2022 11:49:07 +0000 (UTC) From: =?utf-8?b?Q2zDqW1lbnQgQsWTc2No?= To: ffmpeg-devel@ffmpeg.org Date: Thu, 24 Feb 2022 12:49:02 +0100 Message-Id: <20220224114903.251006-5-u@pkh.me> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220224114903.251006-1-u@pkh.me> References: <20220224114903.251006-1-u@pkh.me> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 4/5] avformat/mov: fix seeking with HEVC open GOP files X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?utf-8?b?Q2zDqW1lbnQgQsWTc2No?= Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: mOB4Wmzc+QyX This was tested with medias recorded from an iPhone XR and an iPhone 13. Here is how a typical stream looks like in coding order: ┌────────┬─────┬─────┬──────────┐ │ sample | PTS | DTS | keyframe | ├────────┼─────┼─────┼──────────┤ ┊ ┊ ┊ ┊ ┊ │ 53 │ 560 │ 510 │ No │ │ 54 │ 540 │ 520 │ No │ │ 55 │ 530 │ 530 │ No │ │ 56 │ 550 │ 540 │ No │ │ 57 │ 600 │ 550 │ Yes │ │ * 58 │ 580 │ 560 │ No │ │ * 59 │ 570 │ 570 │ No │ │ * 60 │ 590 │ 580 │ No │ │ 61 │ 640 │ 590 │ No │ │ 62 │ 620 │ 600 │ No │ ┊ ┊ ┊ ┊ ┊ In composition/display order: ┌────────┬─────┬─────┬──────────┐ │ sample | PTS | DTS | keyframe | ├────────┼─────┼─────┼──────────┤ ┊ ┊ ┊ ┊ ┊ │ 55 │ 530 │ 530 │ No │ │ 54 │ 540 │ 520 │ No │ │ 56 │ 550 │ 540 │ No │ │ 53 │ 560 │ 510 │ No │ │ * 59 │ 570 │ 570 │ No │ │ * 58 │ 580 │ 560 │ No │ │ * 60 │ 590 │ 580 │ No │ │ 57 │ 600 │ 550 │ Yes │ │ 63 │ 610 │ 610 │ No │ │ 62 │ 620 │ 600 │ No │ ┊ ┊ ┊ ┊ ┊ Sample/frame 58, 59 and 60 are B-frames which actually depends on the key frame (57). Here the key frame is not an IDR but a "CRA" (Clean Random Access). Initially, I thought I could rely on the sdtp box (independent and disposable samples), but unfortunately: sdtp[54] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0 sdtp[55] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0 sdtp[56] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0 sdtp[57] is_leading:0 sample_depends_on:2 sample_is_depended_on:0 sample_has_redundancy:0 sdtp[58] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0 sdtp[59] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0 sdtp[60] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0 sdtp[61] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0 sdtp[62] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0 The information that might have been useful here would have been is_leading, but all the samples are set to 0 so this was unusable. Instead, we need to rely on sgpd/sbgp tables. In my case the video track contained 3 sgpd tables with the following grouping types: tscl, sync and tsas. In the sync table we have the following 2 entries (only): sgpd.sync[1]: sync nal_unit_type:0x14 sgpd.sync[2]: sync nal_unit_type:0x15 (The count starts at 1 because 0 carries the undefined semantic, we'll see that later in the reference table). The NAL unit types presented here correspond to: libavcodec/hevc.h: HEVC_NAL_IDR_N_LP = 20, libavcodec/hevc.h: HEVC_NAL_CRA_NUT = 21, In parallel, the sbgp sync table contains the following: ┌────┬───────┬─────┐ │ id │ count │ gdi │ ├────┼───────┼─────┤ │ 0 │ 1 │ 1 │ │ 1 │ 56 │ 0 │ │ 2 │ 1 │ 2 │ │ 3 │ 59 │ 0 │ │ 4 │ 1 │ 2 │ │ 5 │ 59 │ 0 │ │ 6 │ 1 │ 2 │ │ 7 │ 59 │ 0 │ │ 8 │ 1 │ 2 │ │ 9 │ 59 │ 0 │ │ 10 │ 1 │ 2 │ │ 11 │ 11 │ 0 │ └────┴───────┴─────┘ The gdi column (group description index) directly refers to the index in the sgpd.sync table. This means the first frame is an IDR, then we have batches of undefined frames interlaced with CRA frames. No IDR ever appears again (tried on a 30+ seconds sample). With that information, we can build an heuristic using the presentation order. A few things needed to be introduced in this commit: 1. min_sample_duration is extracted from the stts: we need the minimal step between sample in order to PTS-step backward to a valid point 2. In order to avoid a loop over the ctts table systematically during a seek, we build an expanded list of sample offsets which will be used to translate from DTS to PTS 3. An open_key_samples index to keep track of all the non-IDR key frames; for now it only supports HEVC CRA frames. We should probably add BLA frames as well, but I don't have any sample so I prefered to leave that for later It is entirely possible I missed something obvious in my approach, but I couldn't come up with a better solution. Also, as mentioned in the diff, we could optimize is_open_key_sample(), but the linear scaling overhead should be fine for now since it only happens in seek events. Fixing this issue prevents sending broken packets to the decoder. With FFmpeg hevc decoder the frames are skipped, with VideoToolbox the frames are glitching. --- libavformat/isom.h | 5 ++ libavformat/mov.c | 120 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 125 insertions(+) diff --git a/libavformat/isom.h b/libavformat/isom.h index bbe395ee4b..5caf42b15d 100644 --- a/libavformat/isom.h +++ b/libavformat/isom.h @@ -222,6 +222,11 @@ typedef struct MOVStreamContext { MOVSbgp *sync_group; uint8_t *sgpd_sync; uint32_t sgpd_sync_count; + int32_t *sample_offsets; + int sample_offsets_count; + int *open_key_samples; + int open_key_samples_count; + uint32_t min_sample_duration; int nb_frames_for_fps; int64_t duration_for_fps; diff --git a/libavformat/mov.c b/libavformat/mov.c index 6b22d8c5f8..573ee8bda3 100644 --- a/libavformat/mov.c +++ b/libavformat/mov.c @@ -50,6 +50,7 @@ #include "libavutil/dovi_meta.h" #include "libavcodec/ac3tab.h" #include "libavcodec/flac.h" +#include "libavcodec/hevc.h" #include "libavcodec/mpegaudiodecheader.h" #include "libavcodec/mlp_parse.h" #include "avformat.h" @@ -3882,6 +3883,69 @@ static void mov_fix_index(MOVContext *mov, AVStream *st) msc->current_index = msc->index_ranges[0].start; } +static uint32_t get_sgpd_sync_index(const MOVStreamContext *sc, int nal_unit_type) +{ + for (uint32_t i = 0; i < sc->sgpd_sync_count; i++) + if (sc->sgpd_sync[i] == HEVC_NAL_CRA_NUT) + return i + 1; + return 0; +} + +static int build_open_gop_key_points(AVStream *st) +{ + int k; + int sample_id = 0; + uint32_t cra_index; + MOVStreamContext *sc = st->priv_data; + + if (st->codecpar->codec_id != AV_CODEC_ID_HEVC || !sc->sync_group_count) + return 0; + + /* Build an unrolled index of the samples */ + sc->sample_offsets_count = 0; + for (uint32_t i = 0; i < sc->ctts_count; i++) + sc->sample_offsets_count += sc->ctts_data[i].count; + av_freep(&sc->sample_offsets); + sc->sample_offsets = av_calloc(sc->sample_offsets_count, sizeof(*sc->sample_offsets)); + if (!sc->sample_offsets) + return AVERROR(ENOMEM); + k = 0; + for (uint32_t i = 0; i < sc->ctts_count; i++) + for (int j = 0; j < sc->ctts_data[i].count; j++) + sc->sample_offsets[k++] = sc->ctts_data[i].duration; + + /* The following HEVC NAL type reveal the use of open GOP sync points + * (TODO: BLA types may also be concerned) */ + cra_index = get_sgpd_sync_index(sc, HEVC_NAL_CRA_NUT); /* Clean Random Access */ + if (!cra_index) + return 0; + + /* Build a list of open-GOP key samples */ + sc->open_key_samples_count = 0; + for (uint32_t i = 0; i < sc->sync_group_count; i++) + if (sc->sync_group[i].index == cra_index) + sc->open_key_samples_count += sc->sync_group[i].count; + av_freep(&sc->open_key_samples); + sc->open_key_samples = av_calloc(sc->open_key_samples_count, sizeof(*sc->open_key_samples)); + if (!sc->open_key_samples) + return AVERROR(ENOMEM); + k = 0; + for (uint32_t i = 0; i < sc->sync_group_count; i++) { + const MOVSbgp *sg = &sc->sync_group[i]; + if (sg->index == cra_index) + for (uint32_t j = 0; j < sg->count; j++) + sc->open_key_samples[k++] = sample_id; + sample_id += sg->count; + } + + /* Identify the minimal time step between samples */ + sc->min_sample_duration = UINT_MAX; + for (uint32_t i = 0; i < sc->stts_count; i++) + sc->min_sample_duration = FFMIN(sc->min_sample_duration, sc->stts_data[i].duration); + + return 0; +} + static void mov_build_index(MOVContext *mov, AVStream *st) { MOVStreamContext *sc = st->priv_data; @@ -3897,6 +3961,10 @@ static void mov_build_index(MOVContext *mov, AVStream *st) MOVCtts *ctts_data_old = sc->ctts_data; unsigned int ctts_count_old = sc->ctts_count; + int ret = build_open_gop_key_points(st); + if (ret < 0) + return; + if (sc->elst_count) { int i, edit_start_index = 0, multiple_edits = 0; int64_t empty_duration = 0; // empty duration of the first edit list entry @@ -7772,6 +7840,8 @@ static int mov_read_close(AVFormatContext *s) av_freep(&sc->rap_group); av_freep(&sc->sync_group); av_freep(&sc->sgpd_sync); + av_freep(&sc->sample_offsets); + av_freep(&sc->open_key_samples); av_freep(&sc->display_matrix); av_freep(&sc->index_ranges); @@ -8444,6 +8514,49 @@ static int mov_seek_fragment(AVFormatContext *s, AVStream *st, int64_t timestamp return 0; } +static int is_open_key_sample(const MOVStreamContext *sc, int sample) +{ + // TODO: a bisect search would scale much better + for (int i = 0; i < sc->open_key_samples_count; i++) { + const int oks = sc->open_key_samples[i]; + if (oks == sample) + return 1; + if (oks > sample) /* list is monotically increasing so we can stop early */ + break; + } + return 0; +} + +/* + * Some key sample may be key frames but not IDR frames, so a random access to + * them may not be allowed. + */ +static int can_seek_to_key_sample(AVStream *st, int sample, int64_t requested_pts) +{ + MOVStreamContext *sc = st->priv_data; + FFStream *const sti = ffstream(st); + int64_t key_sample_dts, key_sample_pts; + + if (st->codecpar->codec_id != AV_CODEC_ID_HEVC) + return 1; + + if (sample >= sc->sample_offsets_count) + return 1; + + key_sample_dts = sti->index_entries[sample].timestamp; + key_sample_pts = key_sample_dts + sc->sample_offsets[sample] + sc->dts_shift; + + /* + * If the sample needs to be presented before an open key sample, they may + * not be decodable properly, even though they come after in decoding + * order. + */ + if (is_open_key_sample(sc, sample) && key_sample_pts > requested_pts) + return 0; + + return 1; +} + static int mov_seek_stream(AVFormatContext *s, AVStream *st, int64_t timestamp, int flags) { MOVStreamContext *sc = st->priv_data; @@ -8459,12 +8572,19 @@ static int mov_seek_stream(AVFormatContext *s, AVStream *st, int64_t timestamp, if (ret < 0) return ret; + for (;;) { sample = av_index_search_timestamp(st, timestamp, flags); av_log(s, AV_LOG_TRACE, "stream %d, timestamp %"PRId64", sample %d\n", st->index, timestamp, sample); if (sample < 0 && sti->nb_index_entries && timestamp < sti->index_entries[0].timestamp) sample = 0; if (sample < 0) /* not sure what to do */ return AVERROR_INVALIDDATA; + + if (!sample || can_seek_to_key_sample(st, sample, timestamp)) + break; + timestamp -= FFMAX(sc->min_sample_duration, 1); + } + mov_current_sample_set(sc, sample); av_log(s, AV_LOG_TRACE, "stream %d, found sample %d\n", st->index, sc->current_sample); /* adjust ctts index */