From patchwork Fri Nov 20 07:32:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 23885 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id E955C449F85 for ; Fri, 20 Nov 2020 09:50:39 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1233668BD55; Fri, 20 Nov 2020 09:34:50 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ed1-f65.google.com (mail-ed1-f65.google.com [209.85.208.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id F1FC468B95A for ; Fri, 20 Nov 2020 09:34:39 +0200 (EET) Received: by mail-ed1-f65.google.com with SMTP id t11so8467225edj.13 for ; Thu, 19 Nov 2020 23:34:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=QwnmtLn2KfSDlZRJj2JaJbCJVNaR2D4ZJrfIqnB44MQ=; b=tVzPWtBxFHYVqc+Oey2v2xIIxH24bakzOqQxdGKDJ9Nek8EkqK8cnRYNmZJKh6boKM AKA4FinZveEj/e9BZHEMxM9lsDamCQxl+BPC/kMvaRI4qt3Pub17BD2SjrfuVFcvj60h XGkkgRVyiyYhjcrLG//fb9tdsrYs2rBLviaJ3+MT66JZ57zSC9Gb+LXtHLCJGAEj7+P/ nBAgZr6YQFW5ud5jt/PzWf6aB4E+AU67Yyz/tw4leX/v7aAv/bw3MFCzCMSEVRPY+26/ zQTKOc+R/OhJI7/dxxB77ABSrukT7VbsqrOSYfzGwA1JtcwW6vLKJ5nPTco+V1v0JZvF F15A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=QwnmtLn2KfSDlZRJj2JaJbCJVNaR2D4ZJrfIqnB44MQ=; b=fiiB+huTL2lhgIKDwxAI8Vks9u9bHwxze4kcULjQwCkgriahzs41Zft12YLkhng4sp 9Bxj3aTmeaPFsI7RBETPNfpgzm52nm8I2OXbLmYaoLhLbV9TfjKSXL77ZPL4IpZz08E4 v17CqDX224j8HNuHFrlw5lZXpavlz1mSNx6tadi1A1ZWRsoDXW6+rMXbQveM146YSULX fZxEDdEORMdw5rqB7G3FzAyssq1K80gAFiMAG+ZPqIezrjAO0BC3k4oLSiRpGAhvLg0N eb9sgeqJAcnpbRde5UtzUkU6HRMmj4VoyLnoW01jZkx3BFdYx6uZW6LOGL9gjdz2Rpnp KqzQ== X-Gm-Message-State: AOAM533KYNjceBQEutg+HPZVu4a6YQfLeELh96JCZGhzbQnh1e0jKLCV Gzi5fZnQlBqO2ugsfQOrdO6O3CpsygCA0Q== X-Google-Smtp-Source: ABdhPJzP4opkj3x1Ir+tRbGI8WnOHLoWoC3uBX+2zRUDHSyyMKpkXUAUdbry7ae9btZ4p5jai5mcww== X-Received: by 2002:a05:6402:54c:: with SMTP id i12mr35749451edx.9.1605857678963; Thu, 19 Nov 2020 23:34:38 -0800 (PST) Received: from sblaptop.fritz.box (ipbcc1aa4b.dynamic.kabel-deutschland.de. [188.193.170.75]) by smtp.gmail.com with ESMTPSA id i13sm769110ejv.84.2020.11.19.23.34.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Nov 2020 23:34:38 -0800 (PST) From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Fri, 20 Nov 2020 08:32:59 +0100 Message-Id: <20201120073327.820745-35-andreas.rheinhardt@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201120072116.818090-1-andreas.rheinhardt@gmail.com> References: <20201120072116.818090-1-andreas.rheinhardt@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 134/162] avcodec/utvideodec: Avoid implicit qsort when creating Huffman tables X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The Huffman trees used by Ut Video have two important characteristics: (i) Longer codes are on the left of the tree and (ii) for codes of the same length, the symbol is descending from left to right in the tree. Therefore all the information that needs to be transmitted is how long the code corresponding to a given symbol is; and this is also all that is transmitted. Before 341914495e5c2f60bc920b0c6f660e5948a47f5a, the decoder used qsort to sort the (length, symbol) pairs by ascending length and for equal lengths by ascending symbol. Since said commit, the decoder uses a first pass over the lengths table to count how many symbols of each length there are; with (i) one can then easily calculate the code of the left-most code with a given length in the tree and from there one can calculate the codes for all entries, using one running counter for each possible length. This eliminated the explicit qsort in build_huff(). Yet ff_init_vlc_sparse() sorts the table itself as it has to ensure that all the entries that will be placed in the same subtable are contiguous. The tables created now are non-contiguous (they are ordered by symbol and codes of different length aren't ordered at all; only codes of the same length are ordered according to (ii)). This commit therefore modifies the algorithm used to automatically create tables whose codes are sorted from left to right in the tree. The key to do so is the observation that the counts obtained in the first pass can be used to contain the range of the codes of each length in the second pass: If counts[i] is the count of codes with length i, then the first counts[32] codes are of length 32, the next counts[31] codes are of length 31 etc. So one knows the index of the lowest symbol whose code has length 32 (if any): It is counts[32] - 1 due to (ii), whereas the index of the lowest symbol whose code has length 31 (if any) is counts[32] + counts[31] - 1; the index of the second-to-lowest symbol of length 32 (if existing) is counts[32] - 2 etc. If one follows the algorithm outlined above, one can switch to ff_init_vlc_from_lengths() which has no implicit qsort; it also means that one can offload the computation of the codes. This turned out to be beneficial for performance: For the sample from ticket #4044 it decreased the decicycles spent on one call to build_huff() from 508480 to 340688 (GCC 9.3, looping 10 times over the file to get enough runs and then repeating this ten times); for another sample (YUV420p, natural content, 5500 frames, also ten iterations) the time went down from 382346 to 275533 decicycles. Signed-off-by: Andreas Rheinhardt --- libavcodec/utvideodec.c | 42 ++++++++++++++++++++++------------------- 1 file changed, 23 insertions(+), 19 deletions(-) diff --git a/libavcodec/utvideodec.c b/libavcodec/utvideodec.c index 8b47c14d98..1b10b3bd06 100644 --- a/libavcodec/utvideodec.c +++ b/libavcodec/utvideodec.c @@ -40,10 +40,16 @@ #include "thread.h" #include "utvideo.h" -static int build_huff(const uint8_t *src, VLC *vlc, int *fsym, unsigned nb_elems) +typedef struct HuffEntry { + uint8_t len; + uint16_t sym; +} HuffEntry; + +static int build_huff(UtvideoContext *c, const uint8_t *src, VLC *vlc, + int *fsym, unsigned nb_elems) { int i; - uint32_t codes[1024]; + HuffEntry he[1024]; uint8_t bits[1024]; uint16_t codes_count[33] = { 0 }; @@ -64,23 +70,21 @@ static int build_huff(const uint8_t *src, VLC *vlc, int *fsym, unsigned nb_elems if (codes_count[0] == nb_elems) return AVERROR_INVALIDDATA; - for (unsigned i = 32, nb_codes = 0; i > 0; i--) { - uint16_t curr = codes_count[i]; // # of leafs of length i - codes_count[i] = nb_codes / 2; // # of non-leaf nodes on level i - nb_codes = codes_count[i] + curr; // # of nodes on level i - } + /* For Ut Video, longer codes are to the left of the tree and + * for codes with the same length the symbol is descending from + * left to right. So after the next loop --codes_count[i] will + * be the index of the first (lowest) symbol of length i when + * indexed by the position in the tree with left nodes being first. */ + for (int i = 31; i >= 0; i--) + codes_count[i] += codes_count[i + 1]; + + for (unsigned i = 0; i < nb_elems; i++) + he[--codes_count[bits[i]]] = (HuffEntry) { bits[i], i }; - for (unsigned i = nb_elems; i-- > 0;) { - if (!bits[i]) { - codes[i] = 0; - continue; - } - codes[i] = codes_count[bits[i]]++; - } #define VLC_BITS 11 - return init_vlc(vlc, VLC_BITS, nb_elems, - bits, sizeof(*bits), sizeof(*bits), - codes, sizeof(*codes), sizeof(*codes), 0); + return ff_init_vlc_from_lengths(vlc, VLC_BITS, codes_count[0], + &he[0].len, sizeof(*he), + &he[0].sym, sizeof(*he), 2, 0, 0, c->avctx); } static int decode_plane10(UtvideoContext *c, int plane_no, @@ -95,7 +99,7 @@ static int decode_plane10(UtvideoContext *c, int plane_no, GetBitContext gb; int prev, fsym; - if ((ret = build_huff(huff, &vlc, &fsym, 1024)) < 0) { + if ((ret = build_huff(c, huff, &vlc, &fsym, 1024)) < 0) { av_log(c->avctx, AV_LOG_ERROR, "Cannot build Huffman codes\n"); return ret; } @@ -255,7 +259,7 @@ static int decode_plane(UtvideoContext *c, int plane_no, return 0; } - if (build_huff(src, &vlc, &fsym, 256)) { + if (build_huff(c, src, &vlc, &fsym, 256)) { av_log(c->avctx, AV_LOG_ERROR, "Cannot build Huffman codes\n"); return AVERROR_INVALIDDATA; }