diff mbox

[FFmpeg-devel,RFC] Improve and fix put_vc2_ue_uint() function.

Message ID CABA=pqd91-oZ+ivA4NL=ynRAYsORyS=cKhhwe75xLtS9RcQdfg@mail.gmail.com
State New
Headers show

Commit Message

Ivan Kalvachev Feb. 28, 2018, 8:14 p.m. UTC
Replace two bit handling loops and internal conditional branch
with simple formula using few logical operations.

The old function would generate wrong output
if the input does not fit into 15 bits.

Fix this by using 64 bit math and put_bits64().
This case should be quite rare, since the bug
has not asserted itself.

---
It's attempt for speed optimization, but in the
process it turned out it needs also bugfixing.

I only tested the old case of the code,
to confirm i've implemented the correct function.

Haven't done any benchmarks or run fate.

It should be faster, especially because currently coefficients bellow
2048 are written using lookup table and bypass this function.

If you like it, use it.

Best Regards
   Ivan Kalvachev.
diff mbox

Patch

From 1f7fd38fcb6c64281bc458c09c711fc567b3ef0f Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev <ikalvachev@gmail.com>
Date: Wed, 28 Feb 2018 17:48:40 +0200
Subject: [PATCH] Improve and fix put_vc2_ue_uint() function.

Replace two bit handling loops and internal conditional branch
with simple formula using few logical operations.

The old function would generate wrong output
if the input does not fit into 15 bits.

Fix this by using 64 bit math and put_bits64().
This case should be quite rare, since the bug
has not asserted itself.

Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
---
 libavcodec/vc2enc.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/libavcodec/vc2enc.c b/libavcodec/vc2enc.c
index b7adcd3d36..b2f1611ea3 100644
--- a/libavcodec/vc2enc.c
+++ b/libavcodec/vc2enc.c
@@ -187,28 +187,33 @@  typedef struct VC2EncContext {
 
 static av_always_inline void put_vc2_ue_uint(PutBitContext *pb, uint32_t val)
 {
-    int i;
-    int pbits = 0, bits = 0, topbit = 1, maxval = 1;
+    int bits = 0;
+    uint64_t pbits = 0;
 
     if (!val++) {
         put_bits(pb, 1, 1);
         return;
     }
 
-    while (val > maxval) {
-        topbit <<= 1;
-        maxval <<= 1;
-        maxval |=  1;
-    }
+    bits = ff_log2(val);
 
-    bits = ff_log2(topbit);
+    if (bits > 15) {
+        pbits = val;
 
-    for (i = 0; i < bits; i++) {
-        topbit >>= 1;
-        pbits <<= 2;
-        if (val & topbit)
-            pbits |= 0x1;
+        pbits = ((pbits<<16)|pbits)&0x0000FFFF0000FFFFULL;
+        pbits = ((pbits<< 8)|pbits)&0x00FF00FF00FF00FFULL;
+        pbits = ((pbits<< 4)|pbits)&0x0F0F0F0F0F0F0F0FULL;
+        pbits = ((pbits<< 2)|pbits)&0x3333333333333333ULL;
+        pbits = ((pbits<< 1)|pbits)&0x5555555555555555ULL;
+
+        put_bits64(pb, bits*2 + 1, (pbits << 1) | 1);
+        return;
     }
+                                             // ____'____ ____'____ ponm'lkji hgfe'dcba
+    val = ( (val << 8) | val ) & 0x00FF00FF; // ____'____ ponm'lkji ____'____ hgfe'dcba
+    val = ( (val << 4) | val ) & 0x0F0F0F0F; // ____'ponm ____'lkji ____'hgfe ____'dcba
+    val = ( (val << 2) | val ) & 0x33333333; // __po'__nm __lk'__ji __hg'__fe __dc'__ba
+    val = ( (val << 1) | val ) & 0x55555555; // _p_o'_n_m _l_k'_j_i _h_g'_f_e _d_c'_b_a
 
     put_bits(pb, bits*2 + 1, (pbits << 1) | 1);
 }
-- 
2.16.2