From patchwork Thu May 16 22:30:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 13147 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 91E6F44964F for ; Fri, 17 May 2019 01:43:21 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 76161689D14; Fri, 17 May 2019 01:43:21 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CBE92689C5C for ; Fri, 17 May 2019 01:43:19 +0300 (EEST) Received: by mail-wr1-f66.google.com with SMTP id d18so5063301wrs.5 for ; Thu, 16 May 2019 15:43:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=R56nNBcu+9TQd98VcoZomJPvIAMC6+93xN5b7LuAeoU=; b=jhPtE4tOvvD0KQ73Er7pwVWSfUcc9IIISPmb5TDS0c+URTRtO8mFfco22+6s7UdIgP zpJqYfwkTDjtBMZE5mN8DrzSifXl+YSdvVHOMqR+Z6yI9HQGUz84T1HlEZCWYt9wtmbi dGyhJKaAqVta6fdaHIKfA9I3QK2d0p8YP7lRYlh4llQri7g99Me2SwJj+2cl+GDVciZ+ k/YL6V8YPTOWsvnCUmWB3b8RaepwEd/b3yezw+8XTX8WhWer6ubzhOn6KMKbWn6huPia Nn+usqSbRyBifNBLUIYpcdeSeb2Wz5rMMP7dTBhGMRZdka2Jr/LihVJoEHagj144yuZr T7mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=R56nNBcu+9TQd98VcoZomJPvIAMC6+93xN5b7LuAeoU=; b=Q08mwr1V9+L9pWEVKjvN+zSL93oIWPu3/lkQ5vMr33ybwH+cPbDHXxkB04MDekM23W dJZHJoeFl8luC4h6D9FmhgxjD9IARoxI/1J7666pfkVw3Su2gF0/8Yto1cKcJ5GX1eCQ 84QtShNTTqyH4M+PDSWk+/z5cbBcxfXWJ2qvk045vP3509rGsRj86piNPqf2jWIoX8Er RAfR7b1VT0mdaEqaUCWLwYN6d/kKbJDPZmnOL1pPP+ikX0HKBL4JXz88naj2QUB/NoRA oFYy+Z1QrXtfyLritDBx2KFVEGVTc0nv5vW7lhmYmIjO1JNByxemPzvd4MqqO4EzNFdb pQ5Q== X-Gm-Message-State: APjAAAX1MV3ZCh7vpsUCtdjbTQqMiH80A2CoLU7bjg74D+WTDdiEMUMn eQfmFAsP1s5Gv2Pzn+7bf+3ka0PG X-Google-Smtp-Source: APXvYqzeB3NVe1usYy3zGJ3VBoysyI1IVt3sEAh3+Tcw3zmK0die9DsTQb4xkC2i72rNryuFCfedMQ== X-Received: by 2002:a5d:400b:: with SMTP id n11mr31171573wrp.123.1558046599054; Thu, 16 May 2019 15:43:19 -0700 (PDT) Received: from localhost.localdomain (ipbcc18715.dynamic.kabel-deutschland.de. [188.193.135.21]) by smtp.gmail.com with ESMTPSA id i185sm11168725wmg.32.2019.05.16.15.43.18 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 May 2019 15:43:18 -0700 (PDT) From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Fri, 17 May 2019 00:30:05 +0200 Message-Id: <20190516223018.30827-22-andreas.rheinhardt@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190516223018.30827-1-andreas.rheinhardt@gmail.com> References: <20190516223018.30827-1-andreas.rheinhardt@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 21/37] avformat/matroskadec: Introduce a "last known good" position X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Currently, resyncing during reading packets works as follows: The current position is recorded, then a call to matroska_parse_cluster is made and if said call fails, the demuxer tries to resync from the earlier position. If the call doesn't fail, but also doesn't deliver a packet, then this is looped. There are two problems with this approach: 1. The Matroska file format aims to be forward-compatible; to achieve this, a demuxer should simply ignore and skip elements it doesn't know about. But it is not possible to reliably distinguish unknown elements from junk. If matroska_parse_cluster encounters an unknown element, it can therefore not simply error out; instead it returns zero and the loop is iterated which includes an update of the position that is intended to be used in case of errors, i.e. the element that is skipped is not searched for level 1 element ids to resync to at all if later calls to matroska_parse_cluster return an error. Notice that in case that sync has been lost there can be a chain of several unknown/possibly junk elements before an error is detected. 2. Even if a call to matroska_parse_cluster delivers a packet, this does not mean that everything is fine. E.g. it might be that some of the block's data is missing and that the data that was presumed to be from the block just read actually contains the beginning of the next element. This will only be apparent at the next call of matroska_read_packet, which uses the (false) end of the earlier block as resync position so that in the (not unlikely) case that the call to matroska_parse_cluster fails, the data believed to be part of the earlier block is not searched for a level 1 element to resync to. To counter this, a "last known good" position is introduced. When an element id that is known to be allowed at this position in the hierarchy (according to the syntax currently in use for parsing) is read and some further checks (regarding the length of the element and its containing master element) are passed, then the beginning of the current element is treated as a "good" position and recorded as such in the MatroskaDemuxContext. Because of 2., only the start of the element is treated as a "good" position, not the whole element. If an error occurs later during parsing of clusters, the resync process starts at the last known good position. Given that when the header is damaged the subsequent resync never skips over data and is therefore unaffected by both issues, the "last known good" concept is not used there. Signed-off-by: Andreas Rheinhardt --- libavformat/matroskadec.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c index deb01578d5..949169d708 100644 --- a/libavformat/matroskadec.c +++ b/libavformat/matroskadec.c @@ -335,6 +335,7 @@ typedef struct MatroskaDemuxContext { int num_levels; MatroskaLevel levels[EBML_MAX_DEPTH]; uint32_t current_id; + int64_t resync_pos; uint64_t time_scale; double duration; @@ -754,6 +755,9 @@ static int matroska_reset_status(MatroskaDemuxContext *matroska, matroska->current_id = id; matroska->num_levels = 1; matroska->current_cluster.pos = 0; + matroska->resync_pos = avio_tell(matroska->ctx->pb); + if (id) + matroska->resync_pos -= (av_log2(id) + 7) / 8; return 0; } @@ -1164,7 +1168,8 @@ static int ebml_parse(MatroskaDemuxContext *matroska, AVIOContext *pb = matroska->ctx->pb; uint32_t id; uint64_t length; - int res; + int64_t pos = avio_tell(pb); + int res, update_pos = 1; void *newelem; MatroskaLevel1Element *level1_elem; @@ -1177,7 +1182,8 @@ static int ebml_parse(MatroskaDemuxContext *matroska, res == AVERROR_EOF) ? 1 : res; } matroska->current_id = id | 1 << 7 * res; - } + } else + pos -= (av_log2(matroska->current_id) + 7) / 8; id = matroska->current_id; @@ -1188,6 +1194,7 @@ static int ebml_parse(MatroskaDemuxContext *matroska, return 0; // we reached the end of an unknown size cluster if (!syntax->id && id != EBML_ID_VOID && id != EBML_ID_CRC32) { av_log(matroska->ctx, AV_LOG_DEBUG, "Unknown entry 0x%"PRIX32"\n", id); + update_pos = 0; } data = (char *) data + syntax->data_offset; @@ -1242,6 +1249,13 @@ static int ebml_parse(MatroskaDemuxContext *matroska, return AVERROR_INVALIDDATA; } } + + if (update_pos) { + // We have found an element that is allowed at this place + // in the hierarchy and it passed all checks, so treat the beginning + // of the element as the "last known good" position. + matroska->resync_pos = pos; + } } switch (syntax->type) { @@ -3562,12 +3576,16 @@ static int matroska_read_packet(AVFormatContext *s, AVPacket *pkt) MatroskaDemuxContext *matroska = s->priv_data; int ret = 0; + if (matroska->resync_pos == -1) { + // This can only happen if generic seeking has been used. + matroska->resync_pos = avio_tell(s->pb); + } + while (matroska_deliver_packet(matroska, pkt)) { - int64_t pos = avio_tell(matroska->ctx->pb); if (matroska->done) return (ret < 0) ? ret : AVERROR_EOF; if (matroska_parse_cluster(matroska) < 0) - ret = matroska_resync(matroska, pos); + ret = matroska_resync(matroska, matroska->resync_pos); } return 0; @@ -3629,6 +3647,7 @@ err: // slightly hackish but allows proper fallback to // the generic seeking code. matroska_reset_status(matroska, 0, -1); + matroska->resync_pos = -1; matroska_clear_queue(matroska); st->skip_to_keyframe = matroska->skip_to_keyframe = 0;