From patchwork Sat Sep 26 10:27:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 22607 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 2E3FD44A10F for ; Sat, 26 Sep 2020 13:30:12 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 17C4B68B75A; Sat, 26 Sep 2020 13:30:12 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 524C968B71E for ; Sat, 26 Sep 2020 13:30:04 +0300 (EEST) Received: by mail-wm1-f67.google.com with SMTP id s13so1705180wmh.4 for ; Sat, 26 Sep 2020 03:30:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VqAFODf/gGeYQyAKcFpb2YFN/8JuXzgnWP2opuuL+3A=; b=RtlRPjUpUKu++vr/YPw+PUTbOrjwBHrghT0Fzm167iNPqtbiDo0vEDyN6ZGeFSnPZU 1RqEmgbgBzkb9e/oLSjX/JMaq7LwQe8OLEr6/TViErylXgtbYHEt7ukbHBCPIFzA8BKj dy6ZW4R1MGAP5XsJ+3w1KE2++VQovehZ0BqkptCozyOO6BYjM4Oed7/XnjCVMSlmYWIh vx9G/JzqJntX86Q4hfrE7JTFDszT6QEsXvGTT6Er648ouB8K8S9mTUwKfXkGRQz/s+mb KDmXfLYZ79XwzI1gUSakLXMCsSL8iLL1YHt895ZZ/7UDTHiHFQToqRhnlOe7ytFiptoE HxfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VqAFODf/gGeYQyAKcFpb2YFN/8JuXzgnWP2opuuL+3A=; b=i4zfmlS3DabViO1EEJ3wRYqd+ySdQ04gZlI0EDtIiJRbIm2LFDJ2VAVVzBbCiK/GDU R0z0KxDY9zV+/Zx+bV7A+9hMn+lBDZ5P5ykzl4hPj50Y7x7OAkyVI10kLtdFyZPwfrGY Ne1Y82juasZFiRcBfapcA29TtHj7m/r78IuvMteArDXNfjOTOdC/5a3tllXu/S8PcrZI EKFoDv/j2zWnL7UcO1i39TkJfRPSOv7m43YjUfitUjt/Nh57p86QOWUXi366a/Aq7NT2 s5F2zOaCWJYlPKx/c8kQzOpVaNzUobzf4gDl7UThXTuYONHcGCs0k1bx2qmO6yvnx4zY IBQA== X-Gm-Message-State: AOAM531qCPdosnzcAt3kY02t6QHiyNuS9e4z+L0soy3PlI27i3Hey0hI Od5yBgraEoEVng3V775F7+Pbxfxpd/I= X-Google-Smtp-Source: ABdhPJzR7GpirCPrshPK0gqfYjhwBi4meyYcjoD/IwR4XaQuregaMfbWUtrPhBTOrG3ZMGd9FX+imQ== X-Received: by 2002:a1c:7215:: with SMTP id n21mr2175280wmc.154.1601116203462; Sat, 26 Sep 2020 03:30:03 -0700 (PDT) Received: from sblaptop.fritz.box (ipbcc1aa4b.dynamic.kabel-deutschland.de. [188.193.170.75]) by smtp.gmail.com with ESMTPSA id k8sm6064867wrl.42.2020.09.26.03.30.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 26 Sep 2020 03:30:02 -0700 (PDT) From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Sat, 26 Sep 2020 12:27:50 +0200 Message-Id: <20200926102804.228089-11-andreas.rheinhardt@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200926102804.228089-1-andreas.rheinhardt@gmail.com> References: <20200926102804.228089-1-andreas.rheinhardt@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 11/25] avcodec/magicyuv: Avoid AV_QSORT when creating Huffman table X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The MagicYUV format stores Huffman tables in its bitstream by coding the length of a given symbol; it does not code the actual code directly, instead this is to be inferred by the rule that a symbol is to the left of every shorter symbol in the Huffman tree and that for symbols of the same length the symbol is ascending from left to right. Our decoder implemented this by first sorting the array containing length and symbol of each element according to descending length and for equal length, according to ascending symbol. Afterwards, the current state in the tree got encoded in a variable code; if the next array entry had length len, then the len most significant bits of code contained the code of this entry. Whenever an entry of the array of length len was processed, code was incremented by 1U << (32 - len). So two entries of length len have the same effect as incrementing code by 1U << (32 - (len - 1)), which corresponds to the parent node of length len - 1 of the two nodes of length len etc. This commit modifies this to avoid sorting the entries before calculating the codes. This is done by calculating how many non-leaf nodes there are on each level of the tree before calculating the codes. Afterwards every leaf node on this level gets assigned the number of nodes already on this level as code. This of course works only because the entries are already sorted by their symbol initially, so that this algorithm indeed gives ascending symbols from left to right on every level. This offers both speed- as well as (obvious) codesize advantages. With Clang 10 the number of decicycles for build_huffman decreased from 1561987 to 1228405; for GCC 9 it went from 1825096 decicyles to 1429921. These tests were carried out with a sample with 150 frames that was looped 13 times; and this was iterated 10 times. The earlier reference point here is from the point when the loop generating the codes was traversed in reverse order (as the patch reversing the order led to performance penalties). Signed-off-by: Andreas Rheinhardt --- libavcodec/magicyuv.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/libavcodec/magicyuv.c b/libavcodec/magicyuv.c index 17dea69d76..7dded9b457 100644 --- a/libavcodec/magicyuv.c +++ b/libavcodec/magicyuv.c @@ -25,7 +25,6 @@ #define CACHED_BITSTREAM_READER !ARCH_X86_32 #include "libavutil/pixdesc.h" -#include "libavutil/qsort.h" #include "avcodec.h" #include "bytestream.h" @@ -74,26 +73,24 @@ typedef struct MagicYUVContext { LLVidDSPContext llviddsp; } MagicYUVContext; -static int huff_cmp_len(const void *a, const void *b) +static int huff_build(HuffEntry he[], uint16_t codes_count[33], + VLC *vlc, int nb_elems) { - const HuffEntry *aa = a, *bb = b; - return (bb->len - aa->len) * 4096 + aa->sym - bb->sym; -} - -static int huff_build(HuffEntry he[], VLC *vlc, int nb_elems) -{ - uint32_t code; - - AV_QSORT(he, nb_elems, HuffEntry, huff_cmp_len); + unsigned nb_codes = 0, max = 0; + + for (int i = 32; i > 0; i--) { + uint16_t curr = codes_count[i]; // # of leafs of length i + codes_count[i] = nb_codes / 2; // # of non-leaf nodes on level i + nb_codes = codes_count[i] + curr; // # of nodes on level i + if (curr && !max) + max = i; + } - code = 0; for (unsigned i = 0; i < nb_elems; i++) { - he[i].code = code >> (32 - he[i].len); - code += 0x80000000u >> (he[i].len - 1); + he[i].code = codes_count[he[i].len]; + codes_count[he[i].len]++; } - - ff_free_vlc(vlc); - return ff_init_vlc_sparse(vlc, FFMIN(he[0].len, 12), nb_elems, + return ff_init_vlc_sparse(vlc, FFMIN(max, 12), nb_elems, &he[0].len, sizeof(he[0]), sizeof(he[0].len), &he[0].code, sizeof(he[0]), sizeof(he[0].code), &he[0].sym, sizeof(he[0]), sizeof(he[0].sym), 0); @@ -389,6 +386,7 @@ static int build_huffman(AVCodecContext *avctx, const uint8_t *table, MagicYUVContext *s = avctx->priv_data; GetByteContext gb; HuffEntry he[4096]; + uint16_t length_count[33] = { 0 }; int i = 0, j = 0, k; bytestream2_init(&gb, table, table_size); @@ -409,6 +407,7 @@ static int build_huffman(AVCodecContext *avctx, const uint8_t *table, return AVERROR_INVALIDDATA; } + length_count[x] += l; for (; j < k; j++) { he[j].sym = j; he[j].len = x; @@ -416,7 +415,7 @@ static int build_huffman(AVCodecContext *avctx, const uint8_t *table, if (j == max) { j = 0; - if (huff_build(he, &s->vlc[i], max)) { + if (huff_build(he, length_count, &s->vlc[i], max)) { av_log(avctx, AV_LOG_ERROR, "Cannot build Huffman codes\n"); return AVERROR_INVALIDDATA; } @@ -424,6 +423,7 @@ static int build_huffman(AVCodecContext *avctx, const uint8_t *table, if (i == s->planes) { break; } + memset(length_count, 0, sizeof(length_count)); } }