From patchwork Fri Sep 8 08:15:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Gisquet X-Patchwork-Id: 43655 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4e27:b0:149:dfde:5c0a with SMTP id gk39csp324673pzb; Fri, 8 Sep 2023 01:15:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFdn8t59pSQVEamHlx04nHepiU9bqW9nwfpySt7gkOGhVww4cGSpX7OJBRvIm1yNcYIr2N2 X-Received: by 2002:a17:906:cc0c:b0:9a9:fcfa:b6a7 with SMTP id ml12-20020a170906cc0c00b009a9fcfab6a7mr627701ejb.36.1694160951341; Fri, 08 Sep 2023 01:15:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694160951; cv=none; d=google.com; s=arc-20160816; b=sM77wN/2Naj0siWi+CjtMmtNCBtITLKvpDlKneSZpIUt96uFqDIdI6gPio99OGnrv5 4b5RE0M7QSOBNsNyROEpYaigr6wYst+ok7qziAB496SHHqDCZe+rqocS/w+QwzTDej6K VvNEdggaYkcANlPcuL2iA6nQXKXArt+v+rAKaBUjQwmYhh6I/hbbp+mwolFD2uhOujQ+ 5un6y6nxy6dJyNlwRVD6xI96JU1zaOto7yoxOBFqpBWTGEszCV8w1x1qpw7I7XZe17Yy iJCuAtQtRBdfSgKZPRpM6SKtcOmmnnRAxERBLeRORfLny4/kW3BJt265sJGC10gASjMy 4TSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=KRpXAGk5akL7x19Fr5TDrd8g1Gft8NRT57bWGfIpHQM=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=I2Nb0fyGdr2tINAYAAPAEXQcCPSUEGMsTEJzi5Ks2xPn+gdB/gvzGfdhAVdMA5k5OE VM24uuTIcxWbUHPvcvmQpytTZjcpOQg5/gJ+PstPYfPnfpywAQSNN9YurJeJkMgnb/TN 2I5aJ6sv9dzsKaTz4HYZelKeMqxY8fRFKyPx/LLGphbMAasnCDvgwxzGwqL6OYnxZ/vc LcTFOmcXHx+CuNE/zpROHtvWy2IHklcbEMlc8n4VhZnT5Go2btyP9b5EYmMNRn9o2Kt8 r+C5YW50sT1yOvThw59SPWCJJhivfiot/17Zur1TKGbnliK4C76NidbIgSwIWOzEjDIX 2pXA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id um2-20020a170906cf8200b0099c93638298si897385ejb.230.2023.09.08.01.15.51; Fri, 08 Sep 2023 01:15:51 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DBFC568C886; Fri, 8 Sep 2023 11:15:28 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from smtp1-g21.free.fr (unknown [212.27.42.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 60B5168C857 for ; Fri, 8 Sep 2023 11:15:20 +0300 (EEST) Received: from localhost.localdomain (unknown [IPv6:2a01:e0a:8a7:6440:5540:d7b2:7ae2:c181]) (Authenticated sender: christophe.gisquet@free.fr) by smtp1-g21.free.fr (Postfix) with ESMTPSA id E331AB00583 for ; Fri, 8 Sep 2023 10:15:18 +0200 (CEST) From: Christophe Gisquet To: ffmpeg-devel@ffmpeg.org Date: Fri, 8 Sep 2023 10:15:04 +0200 Message-ID: <20230908081508.510-3-christophe.gisquet@gmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20230908081508.510-1-christophe.gisquet@gmail.com> References: <20230908081508.510-1-christophe.gisquet@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/7] proresdec2: use VLC for level instead of EC switch X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 0nAI8ZOOlqKS x86/x64: 61/52 -> 55/46 Around 7-10% speedup. Run and DC do not lend themselves to such changes, likely because their distribution is less skewed, and need larger average vlc read iterations. --- libavcodec/proresdec.h | 1 + libavcodec/proresdec2.c | 77 ++++++++++++++++++++++++++++++++++------- 2 files changed, 66 insertions(+), 12 deletions(-) diff --git a/libavcodec/proresdec.h b/libavcodec/proresdec.h index 1e48752e6f..7ebacaeb21 100644 --- a/libavcodec/proresdec.h +++ b/libavcodec/proresdec.h @@ -22,6 +22,7 @@ #ifndef AVCODEC_PRORESDEC_H #define AVCODEC_PRORESDEC_H +#define CACHED_BITSTREAM_READER 1 #include "get_bits.h" #include "blockdsp.h" #include "proresdsp.h" diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c index 65e8b01755..91c689d9ef 100644 --- a/libavcodec/proresdec2.c +++ b/libavcodec/proresdec2.c @@ -24,17 +24,17 @@ * Known FOURCCs: 'apch' (HQ), 'apcn' (SD), 'apcs' (LT), 'apco' (Proxy), 'ap4h' (4444), 'ap4x' (4444 XQ) */ -#define CACHED_BITSTREAM_READER 1 +//#define DEBUG #include "config_components.h" #include "libavutil/internal.h" #include "libavutil/mem_internal.h" +#include "libavutil/thread.h" #include "avcodec.h" #include "codec_internal.h" #include "decode.h" -#include "get_bits.h" #include "hwaccel_internal.h" #include "hwconfig.h" #include "idctdsp.h" @@ -129,8 +129,64 @@ static void unpack_alpha_12(GetBitContext *gb, uint16_t *dst, int num_coeffs, } } +#define AC_BITS 12 +#define PRORES_LEV_BITS 9 + +static const uint8_t ac_info[] = { 0x04, 0x0A, 0x05, 0x06, 0x28, 0x4C }; +static VLC ac_vlc[6]; + +static av_cold void init_vlcs(void) +{ + int i; + for (i = 0; i < sizeof(ac_info); i++) { + uint32_t ac_codes[1<> 5; /* rice code order */ + exp_order = (codebook >> 2) & 7; /* exp golomb code order */ + + switch_val = (switch_bits+1) << rice_order; + + // Values are actually transformed, but this is more a wrapping + for (ac = 0; ac <1<= switch_val) { + val += (1 << exp_order) - switch_val; + exponent = av_log2(val); + bits = exponent+1+switch_bits-exp_order/*0*/ + exponent+1/*val*/; + code = val; + } else if (rice_order) { + bits = (val >> rice_order)/*0*/ + 1/*1*/ + rice_order/*val*/; + code = (1 << rice_order) | val; + } else { + bits = val/*0*/ + 1/*1*/; + code = 1; + } + if (bits > max_bits) max_bits = bits; + ac_bits [ac] = bits; + ac_codes[ac] = code; + } + + ff_free_vlc(ac_vlc+i); + + if (init_vlc(ac_vlc+i, PRORES_LEV_BITS, 1<priv_data; uint8_t idct_permutation[64]; @@ -184,6 +240,9 @@ static av_cold int decode_init(AVCodecContext *avctx) ctx->pix_fmt = AV_PIX_FMT_NONE; + // init dc_tables + ff_thread_once(&init_static_once, init_vlcs); + if (avctx->bits_per_raw_sample == 10){ ctx->unpack_alpha = unpack_alpha_10; } else if (avctx->bits_per_raw_sample == 12){ @@ -510,7 +569,7 @@ static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, return 0; } -// adaptive codebook switching lut according to previous run/level values +// adaptive codebook switching lut according to previous run values static const char run_to_cb[16][4] = { { 2, 0, -1, 1 }, { 2, 0, -1, 1 }, { 1, 0, 0, 0 }, { 1, 0, 0, 0 }, { 0, 0, 1, -1 }, { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, @@ -518,12 +577,6 @@ static const char run_to_cb[16][4] = { { 0, 2, 3, -4 } }; -static const char lev_to_cb[10][4] = { - { 0, 0, 1, -1 }, { 2, 0, 0, -1 }, { 1, 0, 0, 0 }, { 2, 0, -1, 1 }, { 0, 0, 1, -1 }, - { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, - { 0, 2, 3, -4 } -}; - static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContext *gb, int16_t *out, int blocks_per_slice) { @@ -540,8 +593,9 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContex block_mask = blocks_per_slice - 1; for (pos = block_mask;;) { + static const uint8_t ctx_to_tbl[] = { 0, 1, 2, 3, 0, 4, 4, 4, 4, 5 }; + const VLC* tbl = ac_vlc + ctx_to_tbl[FFMIN(level, 9)]; unsigned int runcb = FFMIN(run, 15); - unsigned int levcb = FFMIN(level, 9); bits_rem = get_bits_left(gb); if (!bits_rem || (bits_rem < 16 && !show_bits(gb, bits_rem))) break; @@ -554,8 +608,7 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContex return AVERROR_INVALIDDATA; } - DECODE_CODEWORD2(level, lev_to_cb[levcb][0], lev_to_cb[levcb][1], - lev_to_cb[levcb][2], lev_to_cb[levcb][3]); + level = get_vlc2(gb, tbl->table, PRORES_LEV_BITS, 3); level += 1; i = pos >> log2_block_count;