From patchwork Sat Jun 1 22:47:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 13376 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id B0823446F95 for ; Sun, 2 Jun 2019 01:48:44 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9F4D468AAFA; Sun, 2 Jun 2019 01:48:44 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C97BC68AAF0 for ; Sun, 2 Jun 2019 01:48:38 +0300 (EEST) Received: by mail-wr1-f68.google.com with SMTP id w13so8771969wru.11 for ; Sat, 01 Jun 2019 15:48:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=qSU61Z0ZPaTM7bF4htxVvUw4w3N04Dhqs0omBgP7TPY=; b=KM5Opko2fbCS7RkHng+W21gLoitTShCO33Tm60GeFujX7/jVaeKcRVxgWhzsB61+sG +JyG24e+J2VVoSc47IJEtqSIIDbI84ZVRHA9JEgf95FDB94VS+b9xE9ga/v2baiaSZKC TL/RCWtFzrF8nDEKGQ1oTU7VsegXFfXDRx3XYTlRJc0EjCu6RJT4RjPuRMrbQWyRqVNy 1riOtBawIrftsM4yn6tq/Ko7vi+cFDq2x5gkMozBBlAqWhgkELT6+u23wXBBoKoeIOzT We9kLWJ+SN2Qg0UvlG2+z0eutTjK9ohQgtvXqLvEdez1ywKOtbsMDnqKV0FjkMpsnvNm ixZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qSU61Z0ZPaTM7bF4htxVvUw4w3N04Dhqs0omBgP7TPY=; b=AyAUaZgvIyaN7r3URp2qGVN/XvX5rKAIqsn5ad5vmhMch9dfukxnhg17d86euQpSU1 YjJ8VW5JLxfGjUelbzSF+mOHFFHKjQrHTKTE5nGCK4SCjZUn+FIv0ItVv2R9qMVVbKOQ SzegmL5G0eojsNxyEPrFdYExYRcDkTZsHSBo5NgCSjLFM18BDvsA8tDgmFmYBqBapQod EPMTGjQTfuoo/fglHDnc3daGXU4DXjko4m8tHir3q+AD6bXqkqbt0vAEIv/y6sJ6Cd/V HUDOdaWib7/kIZbe5PBEvGQLk8cbQATGtlIkkTZ1iv/VGKe+FNo1CDiu7p16glikhJvW e0mA== X-Gm-Message-State: APjAAAVxk/gMcbFtrYTEp1/Qib6jFqoEVcLQ464SgFSyyI7xvLfIxP1D sLsRRm+77FI43PE9KvANhruV+7Fa X-Google-Smtp-Source: APXvYqx9hSqqbqZBEJp8w7ghuEjZ59q8555irqR5UJTow6EuJWXK+GYlQhrPNl1TO3CHpKjjpQonNw== X-Received: by 2002:a5d:5302:: with SMTP id e2mr10980068wrv.347.1559429318168; Sat, 01 Jun 2019 15:48:38 -0700 (PDT) Received: from localhost.localdomain (ipbcc063db.dynamic.kabel-deutschland.de. [188.192.99.219]) by smtp.gmail.com with ESMTPSA id c24sm7591892wmb.21.2019.06.01.15.48.37 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Sat, 01 Jun 2019 15:48:37 -0700 (PDT) From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Sun, 2 Jun 2019 00:47:19 +0200 Message-Id: <20190601224719.32872-6-andreas.rheinhardt@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190601224719.32872-1-andreas.rheinhardt@gmail.com> References: <20190601224719.32872-1-andreas.rheinhardt@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 5/5] startcode: Filter out non-startcodes earlier X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Up until now, the bitmasks used to initially find out when one needs to take a closer look and search for startcodes were rather primitive: If a block (of four or eight bytes, depending on the system) contained a zero, it was treated as a target for closer inspection. This can be improved: Using the same technique of bitmasking as before, one can test whether an arbitrary continuous range of bits consists of zeros alone (and one can or the results of tests for non-overlapping ranges of bits). A simple improvement would consist in simply testing for whether every even byte does not vanish. But even better masks are possible, in particular for big endian systems. If all of the bits called 'a' or all of the bits called 'b' in the uint32_t aaaa aaaa aaaa aaax bbbb bbbb bbbb bbby vanish, then this is a candidate for closer inspection. If not, then it follows that the three least serious bytes can't be the position of the 0x01 byte of a 0x00 0x00 0x01 startcode. And if the most significant byte is the position of the 0x01 of a startcode and if the same test is performed on each block of four bytes in sequence, then this would have all the b bits (as well as the y bit) of the previous uint32_t read would vanish and it would be considered a candidate for closer inspection. In other words: All startcodes can be found with this test. The amount of candidates is thereby reduced by a factor of about 256 (from about 4/2^8 to about 2/2^15). For eight bytes at a time, the patterns are simply applied to the high and low 32 bits at the same time. Unfortunately, not so much is possible for little-endian systems because continuous blocks of bits in the input byte array won't be continuous blocks in the integer read. But one can nevertheless even improve upon the "check every even/odd byte" test using the following mask: aaaa aaaa aaaa bbbb bbbb bbbb cccc cccc If any of these three blocks of bits vanishes, the block is a candidate for further inspection. In the byte array, this mask corresponds to the following situation: cccc cccc bbbb bbbb aaaa bbbb aaaa aaaa If a startcode's 0x01 is at the second or third position, the c block vanishes; if it is at the fourth position, the b block vanishes and if it is at the first position, the a block of the previous four-byte block vanishes. (This derivation just made use of the fact that there must be two vanishing bytes in sequence; I couldn't find a way to utilize that the third byte also has nearly no bits set.) So all startcodes can be found with this mask. Doubling this mask yields a mask for eight bytes at a time. I haven't found a better mask for eight bytes than this one. Using this, one can detect reduce the amount of false positives by about 72% compared to the earlier check whether one of the bytes vanishes. Being more effective at initial filtering makes the whole process more efficient, of course. Here are some benchmarks for a 30.2 Mb/s transport stream A (10 iterations of 8192 runs each) and another 7.4 Mb/s transport stream B (10 iterations of 131072 runs each) on an x64; except for "vanilla" all versions tested didn't return false positives. | vanilla | aligned | aligned | | (unaligned) | bad masks | good masks | --|-------------|-----------|------------| A | 411578 | 323355 | 233494 | B | 55476 | 44147 | 31793 | "vanilla" is the version before this patchset, i.e. before the switch to aligned reads. "aligned bad masks" is the version before this commit. "aligned good masks" is the current version. Signed-off-by: Andreas Rheinhardt --- libavcodec/startcode.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/libavcodec/startcode.c b/libavcodec/startcode.c index f6105289f1..7588447bed 100644 --- a/libavcodec/startcode.c +++ b/libavcodec/startcode.c @@ -90,10 +90,18 @@ int ff_startcode_find_candidate_c(const uint8_t *buf, int size) #if HAVE_FAST_64BIT INITIALIZATION(8); - MAIN_LOOP(64, 0x0101010101010101ULL, 0x8080808080808080ULL); +#if HAVE_BIGENDIAN + MAIN_LOOP(64, 0x0002000200020002ULL, 0x8000800080008000ULL); +#else + MAIN_LOOP(64, 0x0010010100100101ULL, 0x8008008080080080ULL); +#endif #else INITIALIZATION(4); - MAIN_LOOP(32, 0x01010101U, 0x80808080U); +#if HAVE_BIGENDIAN + MAIN_LOOP(32, 0x00020002U, 0x80008000U); +#else + MAIN_LOOP(32, 0x00100101U, 0x80080080U); +#endif #endif /* For the end, buf is the earliest possible position