diff mbox series

[FFmpeg-devel,109/114] avcodec/utvideodec: Avoid implicit qsort when creating Huffman tables

Message ID 20201110105836.321916-18-andreas.rheinhardt@gmail.com
State Superseded
Headers show
Series VLC, esp. init_vlc patches | expand

Checks

Context Check Description
andriy/x86_make success Make finished
andriy/x86_make_fate success Make fate finished

Commit Message

Andreas Rheinhardt Nov. 10, 2020, 10:58 a.m. UTC
The Huffman trees used by Ut Video have two important characteristics:
(i) Longer codes are on the left of the tree and (ii) for codes of the
same length, the symbol is descending from left to right in the tree.
Therefore all the information that needs to be transmitted is how long
the code corresponding to a given symbol is; and this is also all that
is transmitted.

Before 341914495e5c2f60bc920b0c6f660e5948a47f5a, the decoder used qsort
to sort the (length, symbol) pairs by ascending length and for equal
lengths by ascending symbol. Since said commit, the decoder uses the
first pass over the lengths table to count how many symbols of each
length there are; with (i) one can then easily calculate the code of
the left-most code with a given length in the tree and from there one
can calculate the codes for all entries, using one running counter for
each possible length. This eliminated the explicit qsort in
build_huff().

Yet ff_init_vlc_sparse() sorts the table itself as it has to ensure that
all the entries that will be placed in the same subtable are contiguous.
The tables created now are non-contiguous (they are ordered by symbol
and codes of different length aren't ordered at all; only codes of the
same length are ordered according to (ii)).

This commit therefore modifies the algorithm used to automatically create
tables whose codes are sorted from left to right in the tree. The key to
do so is the observation that the counts obtained in the first pass can
be used to contain the range of the codes of each length in the second
pass: If counts[i] is the count of codes with length i, then the first
counts[32] codes are of length 32, the next counts[31] codes are of
length 31 etc. So one knows the index of the lowest symbol whose code
has length 32 (if any): It is counts[32] - 1 due to (ii), whereas the
index of the lowest symbol whose code has length 31 (if any) is
counts[32] + counts[31] - 1; the index of the second-to-lowest symbol of
length 32 (if existing) is counts[32] - 2 etc.

If one follows the algorithm outlined above, one can switch to
ff_init_vlc_from_lengths() which has no implicit qsort; it also means
that one can offload the computation of the codes.

This turned out to be beneficial for performance: For the sample from
ticket #4044 it decreased the decicycles spent on one call to
build_huff() from 508480 to 340688 (GCC 9.3, looping 10 times over the
file to get enough runs and then repeating this ten times); for another
sample; for another sample (YUV420p, natural content, 5500 frames, also
ten iterations) the time went down from 382346 to 275533 decicycles.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
---
 libavcodec/utvideodec.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)
diff mbox series

Patch

diff --git a/libavcodec/utvideodec.c b/libavcodec/utvideodec.c
index 8b47c14d98..3d505ea2a3 100644
--- a/libavcodec/utvideodec.c
+++ b/libavcodec/utvideodec.c
@@ -40,10 +40,15 @@ 
 #include "thread.h"
 #include "utvideo.h"
 
+typedef struct HuffEntry {
+    uint8_t len;
+    uint16_t sym;
+} HuffEntry;
+
 static int build_huff(const uint8_t *src, VLC *vlc, int *fsym, unsigned nb_elems)
 {
     int i;
-    uint32_t codes[1024];
+    HuffEntry he[1024];
     uint8_t bits[1024];
     uint16_t codes_count[33] = { 0 };
 
@@ -64,23 +69,21 @@  static int build_huff(const uint8_t *src, VLC *vlc, int *fsym, unsigned nb_elems
     if (codes_count[0] == nb_elems)
         return AVERROR_INVALIDDATA;
 
-    for (unsigned i = 32, nb_codes = 0; i > 0; i--) {
-        uint16_t curr = codes_count[i];   // # of leafs of length i
-        codes_count[i] = nb_codes / 2;    // # of non-leaf nodes on level i
-        nb_codes = codes_count[i] + curr; // # of nodes on level i
-    }
+    /* For Ut Video, longer codes are to the left of the tree and
+     * for codes with the same length the symbol is descending from
+     * left to right. So after the next loop --codes_count[i] will
+     * be the index of the first (lowest) symbol of length i when
+     * indexed by the position in the tree with left nodes being first. */
+    for (int i = 31; i >= 0; i--)
+        codes_count[i] += codes_count[i + 1];
+
+    for (unsigned i = 0; i < nb_elems; i++)
+        he[--codes_count[bits[i]]] = (HuffEntry) { bits[i], i };
 
-    for (unsigned i = nb_elems; i-- > 0;) {
-        if (!bits[i]) {
-            codes[i] = 0;
-            continue;
-        }
-        codes[i] = codes_count[bits[i]]++;
-    }
 #define VLC_BITS 11
-    return init_vlc(vlc, VLC_BITS, nb_elems,
-                    bits,  sizeof(*bits),  sizeof(*bits),
-                    codes, sizeof(*codes), sizeof(*codes), 0);
+    return ff_init_vlc_from_lengths(vlc, VLC_BITS, codes_count[0],
+                                    &he[0].len, sizeof(*he),
+                                    &he[0].sym, sizeof(*he), 2, 0, 0);
 }
 
 static int decode_plane10(UtvideoContext *c, int plane_no,