From patchwork Fri Nov 20 07:33:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 23886 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 5D76D449F85 for ; Fri, 20 Nov 2020 09:50:54 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7DF1C68BD77; Fri, 20 Nov 2020 09:34:52 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ed1-f68.google.com (mail-ed1-f68.google.com [209.85.208.68]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1073568BCE0 for ; Fri, 20 Nov 2020 09:34:41 +0200 (EET) Received: by mail-ed1-f68.google.com with SMTP id e18so8513179edy.6 for ; Thu, 19 Nov 2020 23:34:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=/EPxxp6P0fD8Er99p5ewIDMl1C59ZybSZ3ss59ZcTKU=; b=GpKE8FlOFdmjrZeQ8O0hOpUpuOCQ1nwIVRvPx3V57btfwgq8h906OO/tOgjguKOmC9 16xUUBR5zoTgUfryjsmucmx7DohPgy9CctdKb5KFVTPU4qSTp96Gv50yUQvYXxfFtqL3 /tWv0IZXatvff61O2Nb2kVGGiE7ookI3cpV4lrkoVIgObD6/quVGL5Ln5UcXjNg4MN5/ UL+a1Sa/nTMz8917pKtZZiNXPS+YB/H3r54E2PLPyNugbcaWn4XL1NOy0msVMmpTF9N6 y+utmcSmiYktxBzgIie42DldX3GANM1EwGSgEWzMGkb5YJR1DQmezae4TJCJye+SEgTl YztQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=/EPxxp6P0fD8Er99p5ewIDMl1C59ZybSZ3ss59ZcTKU=; b=Mt3+OTkkoxzpFBhk97BK7TRpYB7wC1cY1gBt20IaQp4Vb2DWezoXG4I8vlrSXw2+H0 WmNriDycgeql7HiI7cmvwOCtuzTwXP/YneyK4pDF0/RWOW8bylcGQSguqvnvWbkHd/D+ suMk4qOl3xrs/3xsprEzgAxsF/YJ1s1G5KIuUM/WKyNCGir1HZ6JXebOSQYrbmPR9626 daewQTdmetCNY6UXNvmA0/MZB0Q0bXQc+P81gaIGAn4sMyaOM3NL6JaMf0jlDt6MRqc9 odm4dC6AV8l9sJp67+CYU4W+45VXWzSo7g5WjCS2Ntt7M1K8YH9wgKUFTVAoOHblh1dC Mbtw== X-Gm-Message-State: AOAM531R6gqr0u1bXoLDQ4AZDOEyVLaIWcrvY+wLI/FivkUKhNe48e0j Tw7bUxDxHJtqwZxEdazokFWSOFqEYW36MQ== X-Google-Smtp-Source: ABdhPJw61SZdAYtjN9Y4BRWY4MwAlDVHJuIajbA1gxjFMZSbWojvTl2jkV90nLF7oq9Hz+uEektVlw== X-Received: by 2002:a50:8a8e:: with SMTP id j14mr23099983edj.87.1605857679984; Thu, 19 Nov 2020 23:34:39 -0800 (PST) Received: from sblaptop.fritz.box (ipbcc1aa4b.dynamic.kabel-deutschland.de. [188.193.170.75]) by smtp.gmail.com with ESMTPSA id i13sm769110ejv.84.2020.11.19.23.34.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Nov 2020 23:34:39 -0800 (PST) From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Fri, 20 Nov 2020 08:33:00 +0100 Message-Id: <20201120073327.820745-36-andreas.rheinhardt@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201120072116.818090-1-andreas.rheinhardt@gmail.com> References: <20201120072116.818090-1-andreas.rheinhardt@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 135/162] avcodec/magicyuv: Optimize creating Huffman tables X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" MagicYUV transmits its Huffman trees by providing the length of the code corresponding to each symbol; then the decoder has to assemble the table in such a way that (i) longer codes are to the left of the tree and (ii) for codes of the same length the symbols are ascending from left to right. Up until now the decoder did this as follows: It counted the number of codes of each length and derived the first code of a given length via (ii). Then the array of lengths is traversed a second time to create the codes; there is one running counter for each length to do so. This process creates a default symbol table (that is omitted). This commit changes this as follows: Everything is indexed by the position in the tree (with codes to the left first); given (i), we can calculate the ranges occupied by the codes of each length; and with (ii) we can derive the actual symbols of each code; the running counters for each length are now used for the symbols and not for the codes. Doing so allows us to switch to ff_init_vlc_from_lengths(); this has the advantage that the codes table needs only be traversed once and that the codes need not be sorted any more (right now, the codes that are so long that they will be put into subtables need to be sorted so that codes that end up in the same subtable are contiguous). For a sample produced by our encoder (natural content, 4000 frames, YUV420p, ten iterations, GCC 9.3) this decreased the amount of decicycles for each call to build_huffman() from 1336049 to 1309401. Notice that our encoder restricts the code lengths to 12 and our decoder only uses subtables when the code is longer than 12 bits, so the sorting that can be avoided does not happen at the moment. If one reduces the decoder's tables to nine bits, the performance improvement becomes more apparent: The amount of decicycles for build_huffman() decreased from 1165210 to 654055. Signed-off-by: Andreas Rheinhardt --- libavcodec/magicyuv.c | 39 +++++++++++++++++---------------------- 1 file changed, 17 insertions(+), 22 deletions(-) diff --git a/libavcodec/magicyuv.c b/libavcodec/magicyuv.c index c7e2754c44..13cb346119 100644 --- a/libavcodec/magicyuv.c +++ b/libavcodec/magicyuv.c @@ -47,7 +47,7 @@ typedef enum Prediction { typedef struct HuffEntry { uint8_t len; - uint16_t code; + uint16_t sym; } HuffEntry; typedef struct MagicYUVContext { @@ -72,27 +72,22 @@ typedef struct MagicYUVContext { LLVidDSPContext llviddsp; } MagicYUVContext; -static int huff_build(HuffEntry he[], uint16_t codes_count[33], - VLC *vlc, int nb_elems) +static int huff_build(const uint8_t len[], uint16_t codes_pos[33], + VLC *vlc, int nb_elems, void *logctx) { - unsigned nb_codes = 0, max = 0; - - for (int i = 32; i > 0; i--) { - uint16_t curr = codes_count[i]; // # of leafs of length i - codes_count[i] = nb_codes / 2; // # of non-leaf nodes on level i - nb_codes = codes_count[i] + curr; // # of nodes on level i - if (curr && !max) - max = i; - } + HuffEntry he[4096]; + + for (int i = 31; i > 0; i--) + codes_pos[i] += codes_pos[i + 1]; + + for (unsigned i = nb_elems; i-- > 0;) + he[--codes_pos[len[i]]] = (HuffEntry){ len[i], i }; - for (unsigned i = 0; i < nb_elems; i++) { - he[i].code = codes_count[he[i].len]; - codes_count[he[i].len]++; - } ff_free_vlc(vlc); - return init_vlc(vlc, FFMIN(max, 12), nb_elems, - &he[0].len, sizeof(he[0]), sizeof(he[0].len), - &he[0].code, sizeof(he[0]), sizeof(he[0].code), 0); + return ff_init_vlc_from_lengths(vlc, FFMIN(he[0].len, 12), nb_elems, + &he[0].len, sizeof(he[0]), + &he[0].sym, sizeof(he[0]), sizeof(he[0].sym), + 0, 0, logctx); } static void magicyuv_median_pred16(uint16_t *dst, const uint16_t *src1, @@ -384,7 +379,7 @@ static int build_huffman(AVCodecContext *avctx, const uint8_t *table, { MagicYUVContext *s = avctx->priv_data; GetByteContext gb; - HuffEntry he[4096]; + uint8_t len[4096]; uint16_t length_count[33] = { 0 }; int i = 0, j = 0, k; @@ -408,11 +403,11 @@ static int build_huffman(AVCodecContext *avctx, const uint8_t *table, length_count[x] += l; for (; j < k; j++) - he[j].len = x; + len[j] = x; if (j == max) { j = 0; - if (huff_build(he, length_count, &s->vlc[i], max)) { + if (huff_build(len, length_count, &s->vlc[i], max, avctx)) { av_log(avctx, AV_LOG_ERROR, "Cannot build Huffman codes\n"); return AVERROR_INVALIDDATA; }