From patchwork Fri Feb 18 23:20:00 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: =?utf-8?b?Q2zDqW1lbnQgQsWTc2No?= <u@pkh.me>
X-Patchwork-Id: 34399
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a05:6838:d078:0:0:0:0 with SMTP id x24csp1378318nkx;
        Fri, 18 Feb 2022 15:20:55 -0800 (PST)
X-Google-Smtp-Source: 
 ABdhPJwjK4BPsvdn/W1IsRC+eto0FpiJuCNG9vyyJfUSbplcN4xY5lkVqWeVwA4mgMm+WGh+itlV
X-Received: by 2002:aa7:c04e:0:b0:400:4daf:bab1 with SMTP id
 k14-20020aa7c04e000000b004004dafbab1mr10508881edo.101.1645226455252;
        Fri, 18 Feb 2022 15:20:55 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1645226455; cv=none;
        d=google.com; s=arc-20160816;
        b=P4eo81nSU8EsVRztBDi89aTbI/Pp5TxGNVndykY/yZh9m2vBtiLdZ+uPCa+/VtvgQA
         xUJBd4L4GH6x+tld5uhrjzgpS1OnWZt0xx2EPdMtYrzYndcY0kaLjDJpbIMkHNoOnbPI
         Q/Vo5B/8NWQqfmE3g/n3ifOwVAdgQPUpppi2wl1Ut+c25SwXRLF/eSthCCsu5igwVZvr
         N6Qi9HyIrIASi+welzN+DGXYuIpg+CqGfwf29N3qzVI/BQr4/4gWaUFmrZ3eKJ3a65H8
         wNbn0FEXW+wLpg5ckMQ64/Kelgyc1fbbx0vwCAqIXAQxPIQmlu9mzof6mZ2Tc17DcXas
         +p0g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:mime-version:references:in-reply-to
         :message-id:date:to:from:dkim-signature:delivered-to;
        bh=Y66Mmq8SbI7IbGUDcDGGfnegjpNEV5NLr2jtwm/+oT0=;
        b=vJGk7ZCV/mxDIS71KaGgbHGIdwFOOvfzJ+CjCH+PNss4y5Hfw7Ax8kfjloDphqKPfY
         ZFEMAyB3UlpVnKDpyKrD4J8dPeqpsjMzd7RNqJq5KqTq33mDLMw/mdbzft6oTgtiWY4n
         iYp+AHpNM3P32d96IUylrqM+G24STGkfvDs6hZ4meXnOZuqPOWnhTkuuDFkRAT8nZcXx
         mIRmhEaDz0+7ujeJ5gDmsicwe7ZdldLA+jEnmpCFeMiJ73hZVcXOZUzm2SY1yPXu5Ppy
         oUJpHPQtpiIEIVqlwDGP3MDxQCMMDC9fRRN6gjVJnwoJwDWr4D9HpTCL9ZSFLM4hzU/L
         WjQA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@pkh.me
 header.s=selector1 header.b=X0tN3PJn;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=pkh.me
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id
 ej5si5536721edb.224.2022.02.18.15.20.54;
        Fri, 18 Feb 2022 15:20:55 -0800 (PST)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@pkh.me
 header.s=selector1 header.b=X0tN3PJn;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=pkh.me
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 13B7168B0A8;
	Sat, 19 Feb 2022 01:20:22 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from ssq0.pkh.me
 (laubervilliers-656-1-228-164.w92-154.abo.wanadoo.fr [92.154.28.164])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6979768AF5C
 for <ffmpeg-devel@ffmpeg.org>; Sat, 19 Feb 2022 01:20:14 +0200 (EET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pkh.me; s=selector1;
 t=1645226404;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
 content-transfer-encoding:content-transfer-encoding:
 in-reply-to:in-reply-to:references:references;
 bh=l9Za0JINjBRSoj6uWFbTpDBrrc956210d2bZXawEEeo=;
 b=X0tN3PJnaVjlc5BMV0hoWQtS6r+4CcIAQql1QhuHI5iTG+lwdYjeY3K7mRf8GXtiXKAimT
 7ZM8035IGkHG7sLnhuOFbMOaxeM0spKNu+/rAaPnq6D3Lg6d7DYpei1v0p9ioP3OxOEzvD
 fPFMj0gQw16bM+KcSOEHyjfnD/1uz/Q=
Received: from localhost (ssq0.pkh.me [local])
 by ssq0.pkh.me (OpenSMTPD) with ESMTPA id aacdbe36;
 Fri, 18 Feb 2022 23:20:04 +0000 (UTC)
From: =?utf-8?b?Q2zDqW1lbnQgQsWTc2No?= <u@pkh.me>
To: ffmpeg-devel@ffmpeg.org
Date: Sat, 19 Feb 2022 00:20:00 +0100
Message-Id: <20220218232001.345826-5-u@pkh.me>
X-Mailer: git-send-email 2.34.0
In-Reply-To: <20220218232001.345826-1-u@pkh.me>
References: <20220218232001.345826-1-u@pkh.me>
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH 4/5] avformat/mov: fix seeking with HEVC open
 GOP files
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: =?utf-8?b?Q2zDqW1lbnQgQsWTc2No?= <u@pkh.me>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: 5ej0ar1bA87H

This was tested with medias recorded from an iPhone XR and an iPhone 13.

Here is how a typical stream looks like in coding order:

    ┌────────┬─────┬─────┬──────────┐
    │ sample | PTS | DTS | keyframe |
    ├────────┼─────┼─────┼──────────┤
    ┊        ┊     ┊     ┊          ┊
    │   53   │ 560 │ 510 │    No    │
    │   54   │ 540 │ 520 │    No    │
    │   55   │ 530 │ 530 │    No    │
    │   56   │ 550 │ 540 │    No    │
    │   57   │ 600 │ 550 │    Yes   │
    │ * 58   │ 580 │ 560 │    No    │
    │ * 59   │ 570 │ 570 │    No    │
    │ * 60   │ 590 │ 580 │    No    │
    │   61   │ 640 │ 590 │    No    │
    │   62   │ 620 │ 600 │    No    │
    ┊        ┊     ┊     ┊          ┊

In composition/display order:

    ┌────────┬─────┬─────┬──────────┐
    │ sample | PTS | DTS | keyframe |
    ├────────┼─────┼─────┼──────────┤
    ┊        ┊     ┊     ┊          ┊
    │   55   │ 530 │ 530 │    No    │
    │   54   │ 540 │ 520 │    No    │
    │   56   │ 550 │ 540 │    No    │
    │   53   │ 560 │ 510 │    No    │
    │ * 59   │ 570 │ 570 │    No    │
    │ * 58   │ 580 │ 560 │    No    │
    │ * 60   │ 590 │ 580 │    No    │
    │   57   │ 600 │ 550 │    Yes   │
    │   63   │ 610 │ 610 │    No    │
    │   62   │ 620 │ 600 │    No    │
    ┊        ┊     ┊     ┊          ┊

Sample/frame 58, 59 and 60 are B-frames which actually depends on the
key frame (57). Here the key frame is not an IDR but a "CRA" (Clean
Random Access).

Initially, I thought I could rely on the sdtp box (independent and
disposable samples), but unfortunately:

    sdtp[54] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
    sdtp[55] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
    sdtp[56] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
    sdtp[57] is_leading:0 sample_depends_on:2 sample_is_depended_on:0 sample_has_redundancy:0
    sdtp[58] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
    sdtp[59] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
    sdtp[60] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
    sdtp[61] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
    sdtp[62] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0

The information that might have been useful here would have been
is_leading, but all the samples are set to 0 so this was unusable.

Instead, we need to rely on sgpd/sbgp tables. In my case the video track
contained 3 sgpd tables with the following grouping types: tscl, sync
and tsas. In the sync table we have the following 2 entries (only):

    sgpd.sync[1]: sync nal_unit_type:0x14
    sgpd.sync[2]: sync nal_unit_type:0x15

(The count starts at 1 because 0 carries the undefined semantic, we'll
see that later in the reference table).

The NAL unit types presented here correspond to:

    libavcodec/hevc.h:    HEVC_NAL_IDR_N_LP       = 20,
    libavcodec/hevc.h:    HEVC_NAL_CRA_NUT        = 21,

In parallel, the sbgp sync table contains the following:

    ┌────┬───────┬─────┐
    │ id │ count │ gdi │
    ├────┼───────┼─────┤
    │  0 │   1   │  1  │
    │  1 │   56  │  0  │
    │  2 │   1   │  2  │
    │  3 │   59  │  0  │
    │  4 │   1   │  2  │
    │  5 │   59  │  0  │
    │  6 │   1   │  2  │
    │  7 │   59  │  0  │
    │  8 │   1   │  2  │
    │  9 │   59  │  0  │
    │ 10 │   1   │  2  │
    │ 11 │   11  │  0  │
    └────┴───────┴─────┘

The gdi column (group description index) directly refers to the index in
the sgpd.sync table. This means the first frame is an IDR, then we have
batches of undefined frames interlaced with CRA frames. No IDR ever
appears again (tried on a 30+ seconds sample).

With that information, we can build an heuristic using the presentation
order.

A few things needed to be introduced in this commit:

1. min_sample_duration is extracted from the stts: we need the minimal
   step between sample in order to PTS-step backward to a valid point
2. In order to avoid a loop over the ctts table systematically during a
   seek, we build an expanded list of sample offsets which will be used
   to translate from DTS to PTS
3. An open_key_samples index to keep track of all the non-IDR key
   frames; for now it only supports HEVC CRA frames. We should probably
   add BLA frames as well, but I don't have any sample so I prefered to
   leave that for later

It is entirely possible I missed something obvious in my approach, but I
couldn't come up with a better solution. Also, as mentioned in the diff,
we could optimize is_open_key_sample(), but the linear scaling overhead
should be fine for now since it only happens in seek events.

Fixing this issue prevents sending broken packets to the decoder. With
FFmpeg hevc decoder the frames are skipped, with VideoToolbox the frames
are glitching.
---
 libavformat/isom.h |   5 ++
 libavformat/mov.c  | 121 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 124 insertions(+), 2 deletions(-)

diff --git a/libavformat/isom.h b/libavformat/isom.h
index bbe395ee4b..7362c876be 100644
--- a/libavformat/isom.h
+++ b/libavformat/isom.h
@@ -169,11 +169,14 @@ typedef struct MOVStreamContext {
     int64_t *chunk_offsets;
     unsigned int stts_count;
     MOVStts *stts_data;
+    uint32_t min_sample_duration;
     unsigned int sdtp_count;
     uint8_t *sdtp_data;
     unsigned int ctts_count;
     unsigned int ctts_allocated_size;
     MOVCtts *ctts_data;
+    int32_t *sample_offsets;
+    int sample_offsets_count;
     unsigned int stsc_count;
     MOVStsc *stsc_data;
     unsigned int stsc_index;
@@ -222,6 +225,8 @@ typedef struct MOVStreamContext {
     MOVSbgp *sync_group;
     uint8_t *sgpd_sync;
     uint32_t sgpd_sync_count;
+    int *open_key_samples;
+    int open_key_samples_count;
 
     int nb_frames_for_fps;
     int64_t duration_for_fps;
diff --git a/libavformat/mov.c b/libavformat/mov.c
index 6b22d8c5f8..919dd940c0 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -50,6 +50,7 @@
 #include "libavutil/dovi_meta.h"
 #include "libavcodec/ac3tab.h"
 #include "libavcodec/flac.h"
+#include "libavcodec/hevc.h"
 #include "libavcodec/mpegaudiodecheader.h"
 #include "libavcodec/mlp_parse.h"
 #include "avformat.h"
@@ -2948,6 +2949,7 @@ static int mov_read_stts(MOVContext *c, AVIOContext *pb, MOVAtom atom)
     if (entries >= INT_MAX / sizeof(*sc->stts_data))
         return AVERROR(ENOMEM);
 
+    sc->min_sample_duration = INT_MAX;
     for (i = 0; i < entries && !pb->eof_reached; i++) {
         unsigned int sample_duration;
         unsigned int sample_count;
@@ -2967,6 +2969,7 @@ static int mov_read_stts(MOVContext *c, AVIOContext *pb, MOVAtom atom)
 
         sc->stts_data[i].count= sample_count;
         sc->stts_data[i].duration= sample_duration;
+        sc->min_sample_duration = FFMIN(sc->min_sample_duration, sample_duration);
 
         av_log(c->fc, AV_LOG_TRACE, "sample_count=%u, sample_duration=%u\n",
                 sample_count, sample_duration);
@@ -3069,6 +3072,7 @@ static int mov_read_ctts(MOVContext *c, AVIOContext *pb, MOVAtom atom)
     AVStream *st;
     MOVStreamContext *sc;
     unsigned int i, entries, ctts_count = 0;
+    int32_t *sample_offsets;
 
     if (c->fc->nb_streams < 1)
         return 0;
@@ -3090,6 +3094,7 @@ static int mov_read_ctts(MOVContext *c, AVIOContext *pb, MOVAtom atom)
     if (!sc->ctts_data)
         return AVERROR(ENOMEM);
 
+    sc->sample_offsets_count = 0;
     for (i = 0; i < entries && !pb->eof_reached; i++) {
         int count    = avio_rb32(pb);
         int duration = avio_rb32(pb);
@@ -3101,8 +3106,9 @@ static int mov_read_ctts(MOVContext *c, AVIOContext *pb, MOVAtom atom)
             continue;
         }
 
-        add_ctts_entry(&sc->ctts_data, &ctts_count, &sc->ctts_allocated_size,
-                       count, duration);
+        if (add_ctts_entry(&sc->ctts_data, &ctts_count, &sc->ctts_allocated_size,
+                           count, duration) >= 0)
+            sc->sample_offsets_count += count;
 
         av_log(c->fc, AV_LOG_TRACE, "count=%d, duration=%d\n",
                 count, duration);
@@ -3127,6 +3133,15 @@ static int mov_read_ctts(MOVContext *c, AVIOContext *pb, MOVAtom atom)
 
     av_log(c->fc, AV_LOG_TRACE, "dts shift %d\n", sc->dts_shift);
 
+    av_freep(&sc->sample_offsets);
+    sc->sample_offsets = av_calloc(sc->sample_offsets_count, sizeof(*sc->sample_offsets));
+    if (!sc->sample_offsets)
+        return AVERROR(ENOMEM);
+    sample_offsets = sc->sample_offsets;
+    for (int i = 0; i < sc->ctts_count; i++)
+        for (int j = 0; j < sc->ctts_data[i].count; j++)
+             *sample_offsets++ = sc->ctts_data[i].duration;
+
     return 0;
 }
 
@@ -3882,6 +3897,52 @@ static void mov_fix_index(MOVContext *mov, AVStream *st)
     msc->current_index = msc->index_ranges[0].start;
 }
 
+static uint32_t get_sgpd_sync_index(const MOVStreamContext *sc, int nal_unit_type)
+{
+    for (uint32_t i = 0; i < sc->sgpd_sync_count; i++)
+        if (sc->sgpd_sync[i] == HEVC_NAL_CRA_NUT)
+            return i + 1;
+    return 0;
+}
+
+static int build_open_gop_key_points(AVStream *st)
+{
+    int k = 0;
+    int sample_id = 0;
+    int open_sample_count = 0;
+    uint32_t cra_index;
+    MOVStreamContext *sc = st->priv_data;
+
+    if (st->codecpar->codec_id != AV_CODEC_ID_HEVC || !sc->sync_group_count)
+        return 0;
+
+    /* The following HEVC NAL type reveal the use of open GOP sync points
+     * (TODO: BLA types may also be concerned) */
+    cra_index = get_sgpd_sync_index(sc, HEVC_NAL_CRA_NUT); /* Clean Random Access */
+    if (!cra_index)
+        return 0;
+
+    for (uint32_t i = 0; i < sc->sync_group_count; i++)
+        if (sc->sync_group[i].index == cra_index)
+            open_sample_count += sc->sync_group[i].count;
+
+    av_freep(&sc->open_key_samples);
+    sc->open_key_samples_count = open_sample_count;
+    sc->open_key_samples = av_calloc(open_sample_count, sizeof(*sc->open_key_samples));
+    if (!sc->open_key_samples)
+        return AVERROR(ENOMEM);
+
+    for (uint32_t i = 0; i < sc->sync_group_count; i++) {
+        const MOVSbgp *sg = &sc->sync_group[i];
+        if (sg->index == cra_index)
+            for (uint32_t j = 0; j < sg->count; j++)
+                sc->open_key_samples[k++] = sample_id;
+        sample_id += sg->count;
+    }
+
+    return 0;
+}
+
 static void mov_build_index(MOVContext *mov, AVStream *st)
 {
     MOVStreamContext *sc = st->priv_data;
@@ -3897,6 +3958,10 @@ static void mov_build_index(MOVContext *mov, AVStream *st)
     MOVCtts *ctts_data_old = sc->ctts_data;
     unsigned int ctts_count_old = sc->ctts_count;
 
+    int ret = build_open_gop_key_points(st);
+    if (ret < 0)
+        return;
+
     if (sc->elst_count) {
         int i, edit_start_index = 0, multiple_edits = 0;
         int64_t empty_duration = 0; // empty duration of the first edit list entry
@@ -7772,6 +7837,8 @@ static int mov_read_close(AVFormatContext *s)
         av_freep(&sc->rap_group);
         av_freep(&sc->sync_group);
         av_freep(&sc->sgpd_sync);
+        av_freep(&sc->open_key_samples);
+        av_freep(&sc->sample_offsets);
         av_freep(&sc->display_matrix);
         av_freep(&sc->index_ranges);
 
@@ -8444,6 +8511,49 @@ static int mov_seek_fragment(AVFormatContext *s, AVStream *st, int64_t timestamp
     return 0;
 }
 
+static int is_open_key_sample(const MOVStreamContext *sc, int sample)
+{
+    // TODO: a bisect search would scale much better
+    for (int i = 0; i < sc->open_key_samples_count; i++) {
+        const int oks = sc->open_key_samples[i];
+        if (oks == sample)
+            return 1;
+        if (oks > sample) /* list is monotically increasing so we can stop early */
+            break;
+    }
+    return 0;
+}
+
+/*
+ * Some key sample may be key frames but not IDR frames, so a random access to
+ * them may not be allowed.
+ */
+static int can_seek_to_key_sample(AVStream *st, int sample, int64_t requested_pts)
+{
+    MOVStreamContext *sc = st->priv_data;
+    FFStream *const sti = ffstream(st);
+    int64_t key_sample_dts, key_sample_pts;
+
+    if (st->codecpar->codec_id != AV_CODEC_ID_HEVC)
+        return 1;
+
+    if (sample >= sc->sample_offsets_count)
+        return 1;
+
+    key_sample_dts = sti->index_entries[sample].timestamp;
+    key_sample_pts = key_sample_dts + sc->sample_offsets[sample] + sc->dts_shift;
+
+    /*
+     * If the sample needs to be presented before an open key sample, they may
+     * not be decodable properly, even though they come after in decoding
+     * order.
+     */
+    if (is_open_key_sample(sc, sample) && key_sample_pts > requested_pts)
+        return 0;
+
+    return 1;
+}
+
 static int mov_seek_stream(AVFormatContext *s, AVStream *st, int64_t timestamp, int flags)
 {
     MOVStreamContext *sc = st->priv_data;
@@ -8459,12 +8569,19 @@ static int mov_seek_stream(AVFormatContext *s, AVStream *st, int64_t timestamp,
     if (ret < 0)
         return ret;
 
+    for (;;) {
     sample = av_index_search_timestamp(st, timestamp, flags);
     av_log(s, AV_LOG_TRACE, "stream %d, timestamp %"PRId64", sample %d\n", st->index, timestamp, sample);
     if (sample < 0 && sti->nb_index_entries && timestamp < sti->index_entries[0].timestamp)
         sample = 0;
     if (sample < 0) /* not sure what to do */
         return AVERROR_INVALIDDATA;
+
+        if (!sample || can_seek_to_key_sample(st, sample, timestamp))
+            break;
+        timestamp -= FFMAX(sc->min_sample_duration, 1);
+    }
+
     mov_current_sample_set(sc, sample);
     av_log(s, AV_LOG_TRACE, "stream %d, found sample %d\n", st->index, sc->current_sample);
     /* adjust ctts index */