From patchwork Thu May 16 22:30:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 13143 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 4C39F44964F for ; Fri, 17 May 2019 01:43:31 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 33C5D68A55B; Fri, 17 May 2019 01:43:31 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B5FE668A3B2 for ; Fri, 17 May 2019 01:43:27 +0300 (EEST) Received: by mail-wm1-f67.google.com with SMTP id y3so5127806wmm.2 for ; Thu, 16 May 2019 15:43:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SQuipFa3x7grKbmSn2LqaC0YZJuhM+E0IQC2C81KTj4=; b=tfA3KmEM60Xo2SvPW7wqMvHQDVF5zo4pmaMPERy2qF2t/tl++iiJqId3Jyh9W+Rd9z LyARBtH8RDLTf4Fxfrn34KDiVpuCKsmFKu6YKbiY3a83qcDG3/vgx5cW8km8ZTmYIszg //iurhg5w5ppv0BAG2ArfORVOOI8z6e2jyZCutajcsgNeIQD2kH9XNWDpA84iWJ+pNdo iA/WGEva6aYja5sY/1zy8QztteR2wFaz/vskbR9BUrx7aP/UAcZxaCXMG+/n0vjAjWpv Z29cd8BuJtTIeyLprl8JKyuEro6VvN6DRkLOR3Hx6LYcZMMAY7y+B24UN7ixFt4+pJOs DROA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SQuipFa3x7grKbmSn2LqaC0YZJuhM+E0IQC2C81KTj4=; b=kSLzp0GRMuNP8Ej+geXiGFlDeeb43Q1jHtGtlYCqiNPGVvuwZJUFXbC5zBJ9rLmX0F 2I5OqHga8TtHjX780VKhqVKmoaKDQ7tosyJGTkYhOnKxJg1CRg2pn03b/dXGUlhC3neP tqZzRoRLV66buAHbe7diZ7tr9A0lMihPof5zDFPbuREASrWKNuksabCgg2uUlio5H59k 58RoqKF1VfbdQRI/9ZUQEHEL5oLdN9u0S/i+7uSkkHkWn7haBak2acJayTXStJI3iINR 593mJAf6fa2lUCGVxUOhH32tkQE6jm3lWa+kd1jjEy1E813or6LUI9pPOSFetNRl/XPA JgeQ== X-Gm-Message-State: APjAAAU7d0cdIchoSJyDzbJoAskgjYCx0vI1lPTqZtxWNt3EudRUp0ZH ej/zl/PR11I968SZ/2I+ev2TfYOd X-Google-Smtp-Source: APXvYqy6ST/Nu+hwVyEnAzs8Vu9Og1yv7eimYycqvbXOgSCRXnuULUCYt2MzHPsh8R6Hq813xWoqOw== X-Received: by 2002:a1c:f30e:: with SMTP id q14mr31427111wmq.31.1558046606978; Thu, 16 May 2019 15:43:26 -0700 (PDT) Received: from localhost.localdomain (ipbcc18715.dynamic.kabel-deutschland.de. [188.193.135.21]) by smtp.gmail.com with ESMTPSA id i185sm11168725wmg.32.2019.05.16.15.43.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 May 2019 15:43:26 -0700 (PDT) From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Fri, 17 May 2019 00:30:14 +0200 Message-Id: <20190516223018.30827-31-andreas.rheinhardt@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190516223018.30827-1-andreas.rheinhardt@gmail.com> References: <20190516223018.30827-1-andreas.rheinhardt@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 30/37] avformat/matroskadec: Don't skip too much when unseekable X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The Matroska (and WebM) file format achieves forward-compability by insisting that demuxers ignore and skip elements they don't know about. Unfortunately, this complicates the detection of errors as errors resulting from loosing sync can't be reliably distinguished from unknown elements that are part of a future version of the standard. Up until now, the strategy to deal with this situation was to skip all unknown elements that are not obviously erroneous; if an error happened, it was tried to seek to the last known good position to resync from (and resync to level 1 elements). This is working fine if the input is seekable, but if it is not, then the skipped data can usually not be rechecked lateron. This is particularly acute if unknown-length clusters are in use, as the check for whether a child element exceeds the containing master element is ineffective in this situation. To remedy this, a new heuristic has been introduced: If an unknown element is encountered in non-seekable mode, an error is presumed to have happened based upon a combination of the length of the row of the already encountered unknown elements and of how far away skipping this element would take us. Signed-off-by: Andreas Rheinhardt --- libavformat/matroskadec.c | 54 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 52 insertions(+), 2 deletions(-) diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c index 2fab6612fa..f19e0fc523 100644 --- a/libavformat/matroskadec.c +++ b/libavformat/matroskadec.c @@ -73,6 +73,12 @@ * still need to be performed */ #define LEVEL_ENDED 3 /* return value of ebml_parse when the * syntax level used for parsing ended. */ +#define SKIP_THRESHOLD 1024 * 1024 /* In non-seekable mode, if more than SKIP_THRESHOLD + * of unkown, potentially damaged data is encountered, + * it is considered an error. */ +#define UNKNOWN_EQUIV 50 * 1024 /* An unknown element is considered equivalent + * to this many bytes of unknown data for the + * SKIP_THRESHOLD check. */ typedef enum { EBML_NONE, @@ -338,6 +344,7 @@ typedef struct MatroskaDemuxContext { int num_levels; uint32_t current_id; int64_t resync_pos; + int unknown_count; uint64_t time_scale; double duration; @@ -763,8 +770,9 @@ static int matroska_reset_status(MatroskaDemuxContext *matroska, return err; } - matroska->current_id = id; - matroska->num_levels = 1; + matroska->current_id = id; + matroska->num_levels = 1; + matroska->unknown_count = 0; matroska->resync_pos = avio_tell(matroska->ctx->pb); if (id) matroska->resync_pos -= (av_log2(id) + 7) / 8; @@ -1285,6 +1293,48 @@ static int ebml_parse(MatroskaDemuxContext *matroska, } else level_check = 0; + if (!(pb->seekable & AVIO_SEEKABLE_NORMAL)) { + // Loosing sync will likely manifest itself as encountering unknown + // elements which are not reliably distinguishable from elements + // belonging to future extensions of the format. + // We use a heuristic to detect such situations: If the current + // element is not expected at the current syntax level and there + // were only a few unknown elements in a row, then the element is + // skipped or considered defective based upon the length of the + // current element (i.e. how much would be skipped); if there were + // more than a few skipped elements in a row and skipping the current + // element would lead us more than SKIP_THRESHOLD away from the last + // known good position, then it is inferred that an error occured. + // The dependency on the number of unknown elements in a row exists + // because the distance to the last known good position is + // automatically big if the last parsed element was big. + // In both cases, each unknown element is considered equivalent to + // UNKNOWN_EQUIV of skipped bytes for the check. + // The whole check is only done for non-seekable output, because + // in this situation skipped data can't simply be rechecked later. + // This is especially important when using unkown length elements + // as the check for whether a child exceeds its containing master + // element is not effective in this situation. + if (update_pos) { + matroska->unknown_count = 0; + } else { + int64_t dist = length + UNKNOWN_EQUIV * matroska->unknown_count++; + + if (matroska->unknown_count > 3) + dist += pos_alt - matroska->resync_pos; + + if (dist > SKIP_THRESHOLD) { + av_log(matroska->ctx, AV_LOG_ERROR, + "Unknown element %"PRIX32" at pos. 0x%"PRIx64" with " + "length 0x%"PRIx64" considered as invalid data. Last " + "known good position 0x%"PRIx64", %d unknown elements" + " in a row\n", id, pos, length, matroska->resync_pos, + matroska->unknown_count); + return AVERROR_INVALIDDATA; + } + } + } + if (update_pos) { // We have found an element that is allowed at this place // in the hierarchy and it passed all checks, so treat the beginning