From patchwork Fri Sep 8 08:15:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Gisquet X-Patchwork-Id: 43654 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4e27:b0:149:dfde:5c0a with SMTP id gk39csp324597pzb; Fri, 8 Sep 2023 01:15:41 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGKSq3CZcFJV6fhmBlf5/Mt8hgvmrWcq3k7pyE7MCu/5PpsRdKhxY4bzeOE+zxMpaVK5ZNh X-Received: by 2002:a17:906:5199:b0:9a1:d087:e0bd with SMTP id y25-20020a170906519900b009a1d087e0bdmr1356363ejk.6.1694160941436; Fri, 08 Sep 2023 01:15:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694160941; cv=none; d=google.com; s=arc-20160816; b=q2euWOyBm09OkitcrZ6zxomGBAdqlbbFwSSni71hH2A7yShgmm/14K0H0oDXYywqDe vbsBraCvZKLfAombSnf9xJXK+UuPeUSP1u/Y2KHLsbrKkcEr0Pi/zYdW95R9uWdzrU3a ZefSfU/UjQaSejdCTboDjcfPOWyIzOYGqRA8sj0XcMMqwG2o9ttbqRESQScJQUeCEfDJ 2xaenoQNioVH74mWWuOpJHhKwK86Fg0FPUP/0TppPQP08x5iT2INDLVo9A31GS3NzRLK lxfcEdsXGJZ0hh6FT0tIqU22erCF9adV6Izeo5IqIMy2sNaCEx5tNXyMjwXfely6ZfHl u/Ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=yhaAQJ1gPxaX5pmdHrLu35Nr1QIBe4sgqJ1T4PaiMJ8=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=RPEorOlVLMgWHgd8DiIXz7rsf/gYEDHzpQ/Hv5oGPiEY+KK9Vu43rwNw4n4ygPy3Qg zUoLgymCXCPOObVx+26h/FtQiV2LZWNpplPVgts3OFNBytTyktnWWD6Zu5OXjbEubKK3 JVmI8i4HPiiAPOrH8EXHzvUM4KsAWpV3gEZfPWhi9TJTtxDYjjJY7lvu4EVcNsvRgf1C X+zyxolHqYsXehsdhR3Tb6zUkXCEyTnFEnrI/fwULdIsaAshQw116xDcRMBCBmup5c7O vCo8B7CjqTIMEVtzyQiNwTT5CpA0V8uhrnHlPFkpAK26bShsslTaW0JeZjhv4pwd22VH kfOw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id rs10-20020a170907036a00b00992acfbcee1si863790ejb.851.2023.09.08.01.15.41; Fri, 08 Sep 2023 01:15:41 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AF1D968C882; Fri, 8 Sep 2023 11:15:27 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from smtp1-g21.free.fr (unknown [212.27.42.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 262C668C7FE for ; Fri, 8 Sep 2023 11:15:19 +0300 (EEST) Received: from localhost.localdomain (unknown [IPv6:2a01:e0a:8a7:6440:5540:d7b2:7ae2:c181]) (Authenticated sender: christophe.gisquet@free.fr) by smtp1-g21.free.fr (Postfix) with ESMTPSA id AE89FB00563 for ; Fri, 8 Sep 2023 10:15:17 +0200 (CEST) From: Christophe Gisquet To: ffmpeg-devel@ffmpeg.org Date: Fri, 8 Sep 2023 10:15:03 +0200 Message-ID: <20230908081508.510-2-christophe.gisquet@gmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20230908081508.510-1-christophe.gisquet@gmail.com> References: <20230908081508.510-1-christophe.gisquet@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/7] proresdec2: store precomputed EC parameters X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: uip2BNYBuv6Z Having the various orders and offsets stored in a codebook is compact but causes additional computations. Using instead a table for the precomputed results achieve some speedups at the cost of ~132 bytes. Around 5% speedup. --- libavcodec/proresdec2.c | 54 +++++++++++++++++++++++++++++++++++------ 1 file changed, 47 insertions(+), 7 deletions(-) diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c index 6e243cfc17..65e8b01755 100644 --- a/libavcodec/proresdec2.c +++ b/libavcodec/proresdec2.c @@ -427,6 +427,7 @@ static int decode_picture_header(AVCodecContext *avctx, const uint8_t *buf, cons # define READ_BITS get_bits #endif +/* Kept for reference and because clearer for first DC */ #define DECODE_CODEWORD(val, codebook) \ do { \ unsigned int rice_order, exp_order, switch_bits; \ @@ -454,18 +455,41 @@ static int decode_picture_header(AVCodecContext *avctx, const uint8_t *buf, cons } \ } while (0) +/* number of bits to switch between rice and exp golomb */ +#define DECODE_CODEWORD2(val, switch_bits, rice_order, diff, offset) \ + do { \ + unsigned int q, buf, bits; \ + \ + buf = show_bits(gb, 14); \ + q = 13 - av_log2(buf); \ + \ + if (q > switch_bits) { /* exp golomb */ \ + bits = (q<<1) + (int)diff; \ + val = READ_BITS(gb, bits) + (int)offset; \ + } else if (rice_order) { \ + skip_remaining(gb, q+1); \ + val = (q << rice_order) + get_bits(gb, rice_order); \ + } else { \ + val = q; \ + skip_remaining(gb, q+1); \ + } \ + } while (0) + + #define TOSIGNED(x) (((x) >> 1) ^ (-((x) & 1))) #define FIRST_DC_CB 0xB8 -static const uint8_t dc_codebook[7] = { 0x04, 0x28, 0x28, 0x4D, 0x4D, 0x70, 0x70}; +static const char dc_codebook[7][4] = { + { 0, 0, 1, -1 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, + { 1, 2, 2, 0 }, { 1, 2, 2, 0 }, { 0, 3, 4, -8 }, { 0, 3, 4, -8 } +}; static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, int blocks_per_slice) { int16_t prev_dc; int code, i, sign; - DECODE_CODEWORD(code, FIRST_DC_CB); prev_dc = TOSIGNED(code); out[0] = prev_dc; @@ -475,7 +499,9 @@ static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, code = 5; sign = 0; for (i = 1; i < blocks_per_slice; i++, out += 64) { - DECODE_CODEWORD(code, dc_codebook[FFMIN(code, 6U)]); + unsigned int dccb = FFMIN(code, 6U); + DECODE_CODEWORD2(code, dc_codebook[dccb][0], dc_codebook[dccb][1], + dc_codebook[dccb][2], dc_codebook[dccb][3]); if(code) sign ^= -(code & 1); else sign = 0; prev_dc += (((code + 1) >> 1) ^ sign) - sign; @@ -485,8 +511,18 @@ static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, } // adaptive codebook switching lut according to previous run/level values -static const uint8_t run_to_cb[16] = { 0x06, 0x06, 0x05, 0x05, 0x04, 0x29, 0x29, 0x29, 0x29, 0x28, 0x28, 0x28, 0x28, 0x28, 0x28, 0x4C }; -static const uint8_t lev_to_cb[10] = { 0x04, 0x0A, 0x05, 0x06, 0x04, 0x28, 0x28, 0x28, 0x28, 0x4C }; +static const char run_to_cb[16][4] = { + { 2, 0, -1, 1 }, { 2, 0, -1, 1 }, { 1, 0, 0, 0 }, { 1, 0, 0, 0 }, { 0, 0, 1, -1 }, + { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, + { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, + { 0, 2, 3, -4 } +}; + +static const char lev_to_cb[10][4] = { + { 0, 0, 1, -1 }, { 2, 0, 0, -1 }, { 1, 0, 0, 0 }, { 2, 0, -1, 1 }, { 0, 0, 1, -1 }, + { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, + { 0, 2, 3, -4 } +}; static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContext *gb, int16_t *out, int blocks_per_slice) @@ -504,18 +540,22 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContex block_mask = blocks_per_slice - 1; for (pos = block_mask;;) { + unsigned int runcb = FFMIN(run, 15); + unsigned int levcb = FFMIN(level, 9); bits_rem = get_bits_left(gb); if (!bits_rem || (bits_rem < 16 && !show_bits(gb, bits_rem))) break; - DECODE_CODEWORD(run, run_to_cb[FFMIN(run, 15)]); + DECODE_CODEWORD2(run, run_to_cb[runcb][0], run_to_cb[runcb][1], + run_to_cb[runcb][2], run_to_cb[runcb][3]); pos += run + 1; if (pos >= max_coeffs) { av_log(avctx, AV_LOG_ERROR, "ac tex damaged %d, %d\n", pos, max_coeffs); return AVERROR_INVALIDDATA; } - DECODE_CODEWORD(level, lev_to_cb[FFMIN(level, 9)]); + DECODE_CODEWORD2(level, lev_to_cb[levcb][0], lev_to_cb[levcb][1], + lev_to_cb[levcb][2], lev_to_cb[levcb][3]); level += 1; i = pos >> log2_block_count;