From patchwork Sun Jun 9 11:00:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 13476 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 7064C4471A6 for ; Sun, 9 Jun 2019 14:18:00 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 470C568AAF5; Sun, 9 Jun 2019 14:18:00 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 01B1B68AA53 for ; Sun, 9 Jun 2019 14:17:53 +0300 (EEST) Received: by mail-wr1-f68.google.com with SMTP id b17so6305651wrq.11 for ; Sun, 09 Jun 2019 04:17:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=at51zu9Cddxlp2nWw9BoJ9sG8BG+F+8NJ4GOmUPKMuY=; b=EKgFAl5AfW+mHQ8/5alXgEjS053Hp5rCNlkMpmH+3OyYWXCnvmjvHJWNYModDzV3dv onGQuUaV77XXig/DrknO5Uz1Th2J3JDnUNq3y7/4miVDjalWL3p/tLZ7HpY+uixpTxKb sfWTILCt7ZxOEtoXcFB2BIi5w1cwaOLKaaZdTq5Z3/OZ7yJhEMteipUQ36FYRMY3O95w kJ1zcl/lXr++AM4ejfTuxt9kkglyVVjmE0uMe5sa0l9xyjbtrZWxMRDvP5Vaab0gZXGR 7Ez2BLQuui8e/12FeeD7w6jhFBs/vmGUvgGv8mQ9mL1DA10+C4rj9Edb4OagCc85+LHo MIpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=at51zu9Cddxlp2nWw9BoJ9sG8BG+F+8NJ4GOmUPKMuY=; b=j6aZ7kgBfjSHT5CM5AbAgf4GOLpkNi9RG386fO8bbJDQwo2GL+v3pPVHm7ZlYGxpvX dcfFD5shPXZuOoASWXlHZwYwLnjTIzLCvGL+eboekhjVGDTsUe/ETg6D8qL+8TjhswPa 3Aj2p9tEN3geOG5cchHycaL0VqzURe7nUmZnYalLYAWBw27Q7qXm6rXYgi/4pOabp7X9 wf1dWYjJxnkhQQjp6BAtK/RUpleN+IlPMzCxqomESaMcuX3zKnYIsJCXtlOCyia8ZNW/ FleMvOPBqvRFb7YilTKUZfMrBKUuUk9PIXyVdKCGT6vtE+tJbS1LPopR5Zsif5RFDcqF BxdQ== X-Gm-Message-State: APjAAAUUsNgx4TzMbZXJGs9BrJQaokE5YUjmqjz9oj43dcnBCqRB0/N2 NmHXwOqmNPtEx1e/H9UTOjVTt9F6 X-Google-Smtp-Source: APXvYqxXiVV6CVacgim+kAGiJtYQ1V+MHZKH8eMF5QcA9JM6PWOVwfyjox7raunQ1X2IWDhV95PV8Q== X-Received: by 2002:a5d:4087:: with SMTP id o7mr27277221wrp.277.1560078583044; Sun, 09 Jun 2019 04:09:43 -0700 (PDT) Received: from localhost.localdomain (ipbcc063db.dynamic.kabel-deutschland.de. [188.192.99.219]) by smtp.gmail.com with ESMTPSA id e7sm6055079wmd.0.2019.06.09.04.09.42 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Sun, 09 Jun 2019 04:09:42 -0700 (PDT) From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Sun, 9 Jun 2019 13:00:53 +0200 Message-Id: <20190609110053.4012-6-andreas.rheinhardt@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190609110053.4012-1-andreas.rheinhardt@gmail.com> References: <20190604111632.GZ3118@michaelspb> <20190609110053.4012-1-andreas.rheinhardt@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 5/5] startcode: Filter out non-startcodes earlier X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Up until now, the bitmasks used to initially find out when one needs to take a closer look and search for startcodes were rather primitive: If a block (of four or eight bytes, depending on the system) contained a zero, it was treated as a target for closer inspection. This can be improved: Using the same technique of bitmasking as before, one can test whether an arbitrary continuous range of bits consists of zeros alone (and one can or the results of tests for non-overlapping ranges of bits). A simple improvement would consist in simply testing for whether every even byte does not vanish. But even better masks are possible, in particular for big endian systems: If all of the bits called 'a' or all of the bits called 'b' in the uint32_t aaaa aaaa aaaa aaax bbbb bbbb bbbb bbby vanish, then this is a candidate for closer inspection. If not, then it follows that the three least serious bytes can't be the position of the 0x01 byte of a 0x00 0x00 0x01 startcode. And if the most significant byte is the position of the 0x01 of a startcode and if the same test is performed on each block of four bytes in sequence, then all the b bits (as well as the y bit) of the previous uint32_t would vanish and it would have been considered a candidate for closer inspection. In other words: All startcodes can be found with this test. The amount of candidates is thereby reduced by a factor of about 256 (from about 4/2^8 to about 2/2^15). For eight bytes at a time, the patterns are simply applied to the high and low 32 bits at the same time. Unfortunately, not so much is possible for little-endian systems because continuous blocks of bits in the input byte array won't be continuous blocks in the integer read. But one can nevertheless even improve upon the "check every even/odd byte" test using the following mask: aaaa aaaa aaaa bbbb bbbb bbbb cccc cccc If any of these three blocks of bits vanishes, the block is a candidate for further inspection. In the byte array, this mask corresponds to the following situation: cccc cccc bbbb bbbb aaaa bbbb aaaa aaaa If a startcode's 0x01 is at the second or third position, the c block vanishes; if it is at the fourth position, the b block vanishes and if it is at the first position, the a block of the previous four-byte block vanishes. (This derivation just made use of the fact that there must be two vanishing bytes in sequence; I couldn't find a way to utilize that the third byte also has nearly no bits set.) So all startcodes can be found with this mask. Doubling this mask yields a mask for eight bytes at a time. I haven't found a better mask for eight bytes than this one. Using this, one can detect reduce the amount of false positives by about 72% compared to the earlier check whether one of the bytes vanishes. Being more effective at initial filtering makes the whole process more efficient, of course. Here are some benchmarks for a 30.2 Mb/s transport stream A (10 iterations of 8192 runs each) and another 7.4 Mb/s transport stream B (10 iterations of 131072 runs each) on an x64; except for "vanilla" all versions tested didn't return false positives. | vanilla | aligned | aligned | | (unaligned) | bad masks | good masks | --|-------------|-----------|------------| A | 411578 | 323355 | 233494 | B | 55476 | 44147 | 31793 | "vanilla" is the version before this patchset, i.e. before the switch to aligned reads. "aligned bad masks" is the version before this commit. "aligned good masks" is the current version. Signed-off-by: Andreas Rheinhardt --- libavcodec/startcode.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/libavcodec/startcode.c b/libavcodec/startcode.c index b29c1490c0..fa506a37d1 100644 --- a/libavcodec/startcode.c +++ b/libavcodec/startcode.c @@ -95,10 +95,18 @@ int ff_startcode_find_candidate_c(const uint8_t *buf, int size) #if HAVE_FAST_64BIT INITIALIZATION(8); - MAIN_LOOP(64, 0x0101010101010101ULL, 0x8080808080808080ULL); +#if HAVE_BIGENDIAN + MAIN_LOOP(64, 0x0002000200020002ULL, 0x8000800080008000ULL); +#else + MAIN_LOOP(64, 0x0010010100100101ULL, 0x8008008080080080ULL); +#endif #else INITIALIZATION(4); - MAIN_LOOP(32, 0x01010101U, 0x80808080U); +#if HAVE_BIGENDIAN + MAIN_LOOP(32, 0x00020002U, 0x80008000U); +#else + MAIN_LOOP(32, 0x00100101U, 0x80080080U); +#endif #endif /* For the end, buf is the earliest possible position