diff mbox

[FFmpeg-devel,5/5] startcode: Filter out non-startcodes earlier

Message ID 20190601224719.32872-6-andreas.rheinhardt@gmail.com
State Superseded
Headers show

Commit Message

Andreas Rheinhardt June 1, 2019, 10:47 p.m. UTC
Up until now, the bitmasks used to initially find out when one needs
to take a closer look and search for startcodes were rather primitive:
If a block (of four or eight bytes, depending on the system) contained a
zero, it was treated as a target for closer inspection.
This can be improved: Using the same technique of bitmasking as before,
one can test whether an arbitrary continuous range of bits consists of
zeros alone (and one can or the results of tests for non-overlapping
ranges of bits). A simple improvement would consist in simply testing
for whether every even byte does not vanish. But even better masks are
possible, in particular for big endian systems.
If all of the bits called 'a' or all of the bits called 'b' in the
uint32_t aaaa aaaa aaaa aaax bbbb bbbb bbbb bbby vanish, then this is
a candidate for closer inspection. If not, then it follows that
the three least serious bytes can't be the position of the 0x01 byte of
a 0x00 0x00 0x01 startcode. And if the most significant byte is the
position of the 0x01 of a startcode and if the same test is performed on
each block of four bytes in sequence, then this would have all the b
bits (as well as the y bit) of the previous uint32_t read would vanish
and it would be considered a candidate for closer inspection. In other
words: All startcodes can be found with this test. The amount of
candidates is thereby reduced by a factor of about 256 (from about 4/2^8
to about 2/2^15). For eight bytes at a time, the patterns are simply
applied to the high and low 32 bits at the same time.
Unfortunately, not so much is possible for little-endian systems because
continuous blocks of bits in the input byte array won't be continuous
blocks in the integer read. But one can nevertheless even improve upon
the "check every even/odd byte" test using the following mask:
aaaa aaaa aaaa bbbb bbbb bbbb cccc cccc
If any of these three blocks of bits vanishes, the block is a candidate
for further inspection. In the byte array, this mask corresponds to the
following situation:
cccc cccc bbbb bbbb aaaa bbbb aaaa aaaa
If a startcode's 0x01 is at the second or third position, the c block
vanishes; if it is at the fourth position, the b block vanishes and if
it is at the first position, the a block of the previous four-byte block
vanishes. (This derivation just made use of the fact that there must be
two vanishing bytes in sequence; I couldn't find a way to utilize that
the third byte also has nearly no bits set.) So all startcodes can be
found with this mask. Doubling this mask yields a mask for eight bytes
at a time. I haven't found a better mask for eight bytes than this one.
Using this, one can detect reduce the amount of false positives by about
72% compared to the earlier check whether one of the bytes vanishes.

Being more effective at initial filtering makes the whole process more
efficient, of course. Here are some benchmarks for a 30.2 Mb/s transport
stream A (10 iterations of 8192 runs each) and another 7.4 Mb/s transport
stream B (10 iterations of 131072 runs each) on an x64; except for
"vanilla" all versions tested didn't return false positives.

  |   vanilla   |  aligned  |  aligned   |
  | (unaligned) | bad masks | good masks |
A |    411578   |   323355  |    233494  |
B |     55476   |    44147  |     31793  |

"vanilla" is the version before this patchset, i.e. before the switch to
aligned reads. "aligned bad masks" is the version before this commit.
"aligned good masks" is the current version.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
 libavcodec/startcode.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)
diff mbox


diff --git a/libavcodec/startcode.c b/libavcodec/startcode.c
index f6105289f1..7588447bed 100644
--- a/libavcodec/startcode.c
+++ b/libavcodec/startcode.c
@@ -90,10 +90,18 @@  int ff_startcode_find_candidate_c(const uint8_t *buf, int size)
-    MAIN_LOOP(64, 0x0101010101010101ULL, 0x8080808080808080ULL);
+    MAIN_LOOP(64, 0x0002000200020002ULL, 0x8000800080008000ULL);
+    MAIN_LOOP(64, 0x0010010100100101ULL, 0x8008008080080080ULL);
-    MAIN_LOOP(32, 0x01010101U, 0x80808080U);
+    MAIN_LOOP(32, 0x00020002U, 0x80008000U);
+    MAIN_LOOP(32, 0x00100101U, 0x80080080U);
     /* For the end, buf is the earliest possible position