From patchwork Tue Feb 27 23:12:33 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Rostislav Pehlivanov <atomnuker@gmail.com>
X-Patchwork-Id: 7761
Delivered-To: ffmpegpatchwork@gmail.com
Received: by 10.2.181.170 with SMTP id m39csp3109008jaj;
	Tue, 27 Feb 2018 15:18:20 -0800 (PST)
X-Google-Smtp-Source: 
 AG47ELs8KfhfGOf4lpQ04/9IHZtXmR7S0zZ2FaVF9zNBftJydhSDUKgv4LhViDi/mgfaGGBlcInH
X-Received: by 10.28.52.9 with SMTP id b9mr2791611wma.134.1519773500669;
	Tue, 27 Feb 2018 15:18:20 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1519773500; cv=none;
	d=google.com; s=arc-20160816;
	b=scisRrt6CWfD1Z6DYftCrkB0t31vG/HDTVObrqVthyr5qWkk3lAEaMQzeYxLAewev8
	1bV+riZ4JApwYhXt7Q+iwt3dH+sX+APg7DDlO1UfkM9+Z+BvXRUWr2IyxVPyk3h5Ikh/
	69/WDo3NRGnfQtDGfA1a78wUhhoFQWULt2tXnSPhwwyPkCz/qKKCgEV/GCF785o8FiVh
	W/fsYKI7drKsivAv+iBuYx8o4aj0M5YcNJhiatiFj6BqDhYlgeKClbdiCooG3RUWHp8M
	PPPU5UA8G+K9eYq4d/j8DQfLFV4gpiRqoiwEbLgWlI96aQulCse/VimheaM1Ye1EJq9m
	QcHw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
	s=arc-20160816;
	h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to
	:list-subscribe:list-help:list-post:list-archive:list-unsubscribe
	:list-id:precedence:subject:message-id:date:to:from:dkim-signature
	:delivered-to:arc-authentication-results;
	bh=J3NG8zgm25kP+MjhzwVEfbq8J3RhsfEwKCz4TVRdRAE=;
	b=Kjm+pzjigqsqrE+rzfEl9FpIlJACueBz+w+3kWyiaXZHmu4Vy+BM1tG33SQ4EfTOLX
	3JiB+5mgdZ+czw1YIv1e6OK3DzxSY6Bs0VtWVvZB7Q0k7GRcaQDsarBcaFVvv70waWsn
	KMTnFBbjZMcQjNLHbo6O0B6iRrezepAVoX02liwzIG/0qRtU1OX+SJO/3VvOz8N29s87
	C2tf8zzcYd3OKv3Vad3WYBkTMXqYb7tNx3P+C5VLAkXVr1lg7ZLfMYcEI2ujTJaUKl7B
	BCAZkagwMSiI89BMWnE9S5akk3GJtdcRpM/DYKFo6gsaPFE4WKqJdxrAGTS+gE2jwv55
	z7ow==
ARC-Authentication-Results: i=1; mx.google.com;
	dkim=neutral (body hash did not verify) header.i=@gmail.com
	header.s=20161025 header.b=oB0r0HeZ;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
	dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
	by mx.google.com with ESMTP id
	m129si317091wmg.193.2018.02.27.15.18.19;
	Tue, 27 Feb 2018 15:18:20 -0800 (PST)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
	dkim=neutral (body hash did not verify) header.i=@gmail.com
	header.s=20161025 header.b=oB0r0HeZ;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
	dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5276368A286;
	Wed, 28 Feb 2018 01:18:13 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mail-wr0-f195.google.com (mail-wr0-f195.google.com
	[209.85.128.195])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id EC417689FAE
	for <ffmpeg-devel@ffmpeg.org>; Wed, 28 Feb 2018 01:18:06 +0200 (EET)
Received: by mail-wr0-f195.google.com with SMTP id k9so481206wre.9
	for <ffmpeg-devel@ffmpeg.org>; Tue, 27 Feb 2018 15:18:11 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=from:to:cc:subject:date:message-id;
	bh=ww1ZrvZxxq6AfT8PEACI0Zkiomtu65ga6a6Its5R3k8=;
	b=oB0r0HeZNy48DQWhhOLsv7QgKlzW0W7a5G5QXrZpRyy15OywPg2TFGnAbBPQ+H/1XU
	SjR7CJfLHFGZ3b+h/UWUQ4T3GOLioAQhNfS+Ch9q1Gx7f8nRoxuJjqOCdi3bfD4XoWvA
	XXC9uLfIuX1BbhAKuPd0KSBL1uGOaBSb2GmyGvqc8buRqNUWy5Tb7xJlV3t40LPESVNE
	GJnT87LrnVe2QSmJbXeqZsYW1ENARsOz+fHwITVFJi6heko1i8mzpFCxcwaKK8vsi8l5
	3N9fYyGYlXRkpjnShWKf6KADFGyTtEi+fdnZGQwqp5iw0Aj9R8jC2OW7oCWizc1Tk5lx
	aYQQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id;
	bh=ww1ZrvZxxq6AfT8PEACI0Zkiomtu65ga6a6Its5R3k8=;
	b=hIGCcLbtL6zjjNGsKQSpKuAQxxVFVMolu1rhDT6wMCiOBn88IBw0ZEO3Wd3fhWIjxh
	KF+Icmqj+th0nd1gBZhbBSROCdQjiNc3I4can+aqQ9J7yxO9/K2dC0fjcPxuEuMgOC9e
	76KCWpoKJ/P5DW9yuZPDYytgwN16UT41YXEBb62+PCa/mpmAtzEE0Jhew0QpzD36c6L9
	rab3f/4RspPutY1k8jfRtfo6J6GezqJieFPwB54S26oHgtChcni4CCSwcyuemyqFynbC
	USNKWq5ldWw9uZsbP9HOKzcqWCF6GroDfL9qRFpppPcwzo2O/gRzeSLBceZizStFP+pw
	/Y2A==
X-Gm-Message-State: APf1xPAd157DkQr0mL/w9cgflLpPD/2ZIo4RSYumvk+1N6U9xFdTQfhA
	076D/f6OTa0EMtW6uxwBT+A8JIht
X-Received: by 10.223.161.10 with SMTP id o10mr15010263wro.60.1519773156403;
	Tue, 27 Feb 2018 15:12:36 -0800 (PST)
Received: from skyhide.pars.ee ([2a00:23c4:7c88:af00:4438:129e:b7bb:e26f])
	by smtp.gmail.com with ESMTPSA id
	m187sm945800wmg.0.2018.02.27.15.12.34
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Tue, 27 Feb 2018 15:12:35 -0800 (PST)
From: Rostislav Pehlivanov <atomnuker@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Date: Tue, 27 Feb 2018 23:12:33 +0000
Message-Id: <20180227231233.1372-1-atomnuker@gmail.com>
X-Mailer: git-send-email 2.16.2
Subject: [FFmpeg-devel] [PATCH] vc2enc: replace quantization LUT with a
	smaller division LUT
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <http://ffmpeg.org/mailman/options/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <http://ffmpeg.org/pipermail/ffmpeg-devel/>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
Cc: Rostislav Pehlivanov <atomnuker@gmail.com>
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

This commit replaces the huge and impractical LUT which converted coeffs
and a quantizer to bits to encode and instead uses a standard multiplication
and a shift to replace the division and then codes the values using the
regular golomb coding functions.
I was unable to see a performance difference on my machine but perhaps
someone else here can test. In any case, its better than the old one if
only because its smaller and less intrusive.

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
---
 libavcodec/vc2enc.c | 118 ++++++++++++++--------------------------------------
 1 file changed, 31 insertions(+), 87 deletions(-)

diff --git a/libavcodec/vc2enc.c b/libavcodec/vc2enc.c
index b7adcd3d36..74347b0b39 100644
--- a/libavcodec/vc2enc.c
+++ b/libavcodec/vc2enc.c
@@ -29,10 +29,6 @@
 #include "vc2enc_dwt.h"
 #include "diractab.h"
 
-/* Total range is -COEF_LUT_TAB to +COEFF_LUT_TAB, but total tab size is half
- * (COEF_LUT_TAB*DIRAC_MAX_QUANT_INDEX), as the sign is appended during encoding */
-#define COEF_LUT_TAB 2048
-
 /* The limited size resolution of each slice forces us to do this */
 #define SSIZE_ROUND(b) (FFALIGN((b), s->size_scaler) + 4 + s->prefix_bytes)
 
@@ -152,9 +148,8 @@ typedef struct VC2EncContext {
     uint8_t quant[MAX_DWT_LEVELS][4];
     int custom_quant_matrix;
 
-    /* Coefficient LUT */
-    uint32_t *coef_lut_val;
-    uint8_t  *coef_lut_len;
+    /* Division LUT */
+    uint32_t qmagic_lut[116][2];
 
     int num_x; /* #slices horizontally */
     int num_y; /* #slices vertically */
@@ -229,37 +224,6 @@ static av_always_inline int count_vc2_ue_uint(uint32_t val)
     return ff_log2(topbit)*2 + 1;
 }
 
-static av_always_inline void get_vc2_ue_uint(int val, uint8_t *nbits,
-                                             uint32_t *eval)
-{
-    int i;
-    int pbits = 0, bits = 0, topbit = 1, maxval = 1;
-
-    if (!val++) {
-        *nbits = 1;
-        *eval = 1;
-        return;
-    }
-
-    while (val > maxval) {
-        topbit <<= 1;
-        maxval <<= 1;
-        maxval |=  1;
-    }
-
-    bits = ff_log2(topbit);
-
-    for (i = 0; i < bits; i++) {
-        topbit >>= 1;
-        pbits <<= 2;
-        if (val & topbit)
-            pbits |= 0x1;
-    }
-
-    *nbits = bits*2 + 1;
-    *eval = (pbits << 1) | 1;
-}
-
 /* VC-2 10.4 - parse_info() */
 static void encode_parse_info(VC2EncContext *s, enum DiracParseCodes pcode)
 {
@@ -557,7 +521,7 @@ static void encode_picture_start(VC2EncContext *s)
     encode_wavelet_transform(s);
 }
 
-#define QUANT(c, qf) (((c) << 2)/(qf))
+#define QUANT(c, mul, add, shift) ((mul * c + add) >> shift)
 
 /* VC-2 13.5.5.2 - slice_band() */
 static void encode_subband(VC2EncContext *s, PutBitContext *pb, int sx, int sy,
@@ -570,24 +534,17 @@ static void encode_subband(VC2EncContext *s, PutBitContext *pb, int sx, int sy,
     const int top    = b->height * (sy+0) / s->num_y;
     const int bottom = b->height * (sy+1) / s->num_y;
 
-    const int qfactor = ff_dirac_qscale_tab[quant];
-    const uint8_t  *len_lut = &s->coef_lut_len[quant*COEF_LUT_TAB];
-    const uint32_t *val_lut = &s->coef_lut_val[quant*COEF_LUT_TAB];
-
     dwtcoef *coeff = b->buf + top * b->stride;
+    const uint64_t q_m = ((uint64_t)(s->qmagic_lut[quant][0])) << 2;
+    const uint64_t q_a = s->qmagic_lut[quant][1];
+    const int q_s = av_log2(ff_dirac_qscale_tab[quant]) + 32;
 
     for (y = top; y < bottom; y++) {
         for (x = left; x < right; x++) {
-            const int neg = coeff[x] < 0;
-            uint32_t c_abs = FFABS(coeff[x]);
-            if (c_abs < COEF_LUT_TAB) {
-                put_bits(pb, len_lut[c_abs], val_lut[c_abs] | neg);
-            } else {
-                c_abs = QUANT(c_abs, qfactor);
-                put_vc2_ue_uint(pb, c_abs);
-                if (c_abs)
-                    put_bits(pb, 1, neg);
-            }
+            uint32_t c_abs = QUANT(FFABS(coeff[x]), q_m, q_a, q_s);
+            put_vc2_ue_uint(pb, c_abs);
+            if (c_abs)
+                put_bits(pb, 1, coeff[x] < 0);
         }
         coeff += b->stride;
     }
@@ -619,8 +576,9 @@ static int count_hq_slice(SliceArgs *slice, int quant_idx)
                 SubBand *b = &s->plane[p].band[level][orientation];
 
                 const int q_idx = quants[level][orientation];
-                const uint8_t *len_lut = &s->coef_lut_len[q_idx*COEF_LUT_TAB];
-                const int qfactor = ff_dirac_qscale_tab[q_idx];
+                const uint64_t q_m = ((uint64_t)s->qmagic_lut[q_idx][0]) << 2;
+                const uint64_t q_a = s->qmagic_lut[q_idx][1];
+                const int q_s = av_log2(ff_dirac_qscale_tab[q_idx]) + 32;
 
                 const int left   = b->width  * slice->x    / s->num_x;
                 const int right  = b->width  *(slice->x+1) / s->num_x;
@@ -631,14 +589,9 @@ static int count_hq_slice(SliceArgs *slice, int quant_idx)
 
                 for (y = top; y < bottom; y++) {
                     for (x = left; x < right; x++) {
-                        uint32_t c_abs = FFABS(buf[x]);
-                        if (c_abs < COEF_LUT_TAB) {
-                            bits += len_lut[c_abs];
-                        } else {
-                            c_abs = QUANT(c_abs, qfactor);
-                            bits += count_vc2_ue_uint(c_abs);
-                            bits += !!c_abs;
-                        }
+                        uint32_t c_abs = QUANT(FFABS(buf[x]), q_m, q_a, q_s);
+                        bits += count_vc2_ue_uint(c_abs);
+                        bits += !!c_abs;
                     }
                     buf += b->stride;
                 }
@@ -1059,8 +1012,6 @@ static av_cold int vc2_encode_end(AVCodecContext *avctx)
     }
 
     av_freep(&s->slice_args);
-    av_freep(&s->coef_lut_len);
-    av_freep(&s->coef_lut_val);
 
     return 0;
 }
@@ -1069,7 +1020,7 @@ static av_cold int vc2_encode_init(AVCodecContext *avctx)
 {
     Plane *p;
     SubBand *b;
-    int i, j, level, o, shift, ret;
+    int i, level, o, shift, ret;
     const AVPixFmtDescriptor *fmt = av_pix_fmt_desc_get(avctx->pix_fmt);
     const int depth = fmt->comp[0].depth;
     VC2EncContext *s = avctx->priv_data;
@@ -1211,27 +1162,20 @@ static av_cold int vc2_encode_init(AVCodecContext *avctx)
     if (!s->slice_args)
         goto alloc_fail;
 
-    /* Lookup tables */
-    s->coef_lut_len = av_malloc(COEF_LUT_TAB*(s->q_ceil+1)*sizeof(*s->coef_lut_len));
-    if (!s->coef_lut_len)
-        goto alloc_fail;
-
-    s->coef_lut_val = av_malloc(COEF_LUT_TAB*(s->q_ceil+1)*sizeof(*s->coef_lut_val));
-    if (!s->coef_lut_val)
-        goto alloc_fail;
-
-    for (i = 0; i < s->q_ceil; i++) {
-        uint8_t  *len_lut = &s->coef_lut_len[i*COEF_LUT_TAB];
-        uint32_t *val_lut = &s->coef_lut_val[i*COEF_LUT_TAB];
-        for (j = 0; j < COEF_LUT_TAB; j++) {
-            get_vc2_ue_uint(QUANT(j, ff_dirac_qscale_tab[i]),
-                            &len_lut[j], &val_lut[j]);
-            if (len_lut[j] != 1) {
-                len_lut[j] += 1;
-                val_lut[j] <<= 1;
-            } else {
-                val_lut[j] = 1;
-            }
+    for (i = 0; i < 116; i++) {
+        const uint32_t qf = ff_dirac_qscale_tab[i];
+        const int m = av_log2(qf);
+        const uint32_t t = (1UL << (m + 32)) / qf;
+        const uint32_t r = (t*qf + qf) & ((1UL << 32) - 1);
+        if (!(qf & (qf - 1))) {
+            s->qmagic_lut[i][0] = 0xFFFFFFFF;
+            s->qmagic_lut[i][1] = 0xFFFFFFFF;
+        } else if (r <= 1UL << m) {
+            s->qmagic_lut[i][0] = t + 1;
+            s->qmagic_lut[i][1] = 0;
+        } else {
+            s->qmagic_lut[i][0] = t;
+            s->qmagic_lut[i][1] = t;
         }
     }