From patchwork Mon Sep 25 23:33:18 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Mateusz <mateuszb@poczta.onet.pl>
X-Patchwork-Id: 5267
Delivered-To: ffmpegpatchwork@gmail.com
Received: by 10.2.36.26 with SMTP id f26csp3180440jaa;
	Mon, 25 Sep 2017 16:34:25 -0700 (PDT)
X-Google-Smtp-Source: 
 AOwi7QCv1iXs19onX6OFkohJL2bEXIC0uTmuQwyBu1RnCxZdYTTZFA7bFV+WQWO5rhI3TljIm7WA
X-Received: by 10.28.210.72 with SMTP id j69mr1694433wmg.75.1506382465227;
	Mon, 25 Sep 2017 16:34:25 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1506382465; cv=none;
	d=google.com; s=arc-20160816;
	b=qEXK41lQpGRcpTfb+YPdQ8z3rV/heTQyTV5/X96Opkk25im5G/xLEJHCwo5x+QTRI1
	mKtwA/c959AjyYnyoOwHwfnP6dP0smmr2p9HNFwLKAAO5SOs7jMZAMaaRzOPEI6d35zy
	00DR2L1fUgd7F2HUnuCvQsgdSJkiownOvulWkIp3l4oSWLaoM0f4+7vJNvtJ8fbnKex/
	M7MInd4kak7mUCdr3PNgxmhMGJUFmNqQ1KqYlFVXYfpj3f+KcJOhqRmdBUFQI5qvUhPV
	vzxXGgQfGKqUOp4ahVxj3HZ6sZevoZ2RdKQxMVZj/CTA6bTVq3t+TAMeVWqwnPA8Pz0R
	mOlw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
	s=arc-20160816;
	h=sender:errors-to:reply-to:list-subscribe:list-help:list-post
	:list-archive:list-unsubscribe:list-id:precedence:subject
	:content-language:in-reply-to:mime-version:user-agent:date
	:message-id:from:references:to:dkim-signature:delivered-to
	:arc-authentication-results;
	bh=uw//ZFZzYLGVc5Cc0/8WjpbHJHDlYla/6KJafXfhxyc=;
	b=Dlk4cl2UY9ryAg3Ynefn8ppuAc1n+E6UkmOci6vFvd13qY+BNB1LgEo8Rclp/LJClb
	ntJUQjuyAntgKPyfqsUX/MYEGNXVUDyYl/RAJ4lv0gW6k5KHgdSq6+DldN5dpDWval+k
	ueWlRwqS+1SQiFVeXH7H8hTjeP7C+G35vEd0IzRUTKKorboVvz7eT0Y27deU77c8XAXu
	xrS2imK/YSjSoHZ00Ce8j+gP8NK9jjk0H5Z5oihYn9XATqXp7eum0Pm9yh/eV40msFK2
	5yU8pQ1hveaa8h33RMNCDQvaQmIcqjog8+X1lyIqEILFc4KPzaWZ3x1xU8tt4mDwwDML
	nLsA==
ARC-Authentication-Results: i=1; mx.google.com;
	dkim=neutral (body hash did not verify) header.i=@poczta.onet.pl
	header.s=2011 header.b=erq4d4ku;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
	by mx.google.com with ESMTP id
	m189si426352wmd.221.2017.09.25.16.34.24;
	Mon, 25 Sep 2017 16:34:25 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
	dkim=neutral (body hash did not verify) header.i=@poczta.onet.pl
	header.s=2011 header.b=erq4d4ku;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1730268920E;
	Tue, 26 Sep 2017 02:34:12 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from smtpo107.poczta.onet.pl (smtpo107.poczta.onet.pl
	[213.180.149.160])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E916568043B
	for <ffmpeg-devel@ffmpeg.org>; Tue, 26 Sep 2017 02:34:04 +0300 (EEST)
Received: from [192.168.1.2] (aawo185.neoplus.adsl.tpnet.pl [83.6.74.185])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128
	bits)) (No client certificate requested)
	(Authenticated sender: mateuszb@poczta.onet.pl)
	by smtp.poczta.onet.pl (Onet) with ESMTPSA id 3y1L4H1nNMzxh5nN
	for <ffmpeg-devel@ffmpeg.org>; Tue, 26 Sep 2017 01:33:18 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=poczta.onet.pl;
	s=2011; t=1506382399;
	bh=ZWTSOy/Ii4/S8n9PwUOltcaeW6CoaSsTWnxRijhFbBw=;
	h=Subject:To:References:From:Date:In-Reply-To:From;
	b=erq4d4kuIdIKV8Dmi//dfV+YT1G3DFeQsKzMyv+C6ZxlYKwPN4nBrQ0Afk1Z/hPOL
	6MvdIBjo9NU+07XpQjJLOzpbBlyDtWaojgzOsY9voQDrKrDZl59vqrJT4/WwtkM811
	7n6GICzIhrvc6ofZHLo7F13n052sRibwXvzRhQdM=
To: ffmpeg-devel@ffmpeg.org
References: <a558ea5d-dd53-36ca-1408-1487f9057fe3@poczta.onet.pl>
	<20170923150153.GG7094@nb4>
	<cfab6df5-28ea-da92-cfc4-539fd1c66762@poczta.onet.pl>
	<CAB0OVGpzO_G04oOfk5hA-vP7HBp_xTE01t+_OT2s=XpfJLTBwg@mail.gmail.com>
From: Mateusz <mateuszb@poczta.onet.pl>
Message-ID: <cc130554-b831-d014-4565-c474c984a59f@poczta.onet.pl>
Date: Tue, 26 Sep 2017 01:33:18 +0200
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
	Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: 
 <CAB0OVGpzO_G04oOfk5hA-vP7HBp_xTE01t+_OT2s=XpfJLTBwg@mail.gmail.com>
Content-Language: en-US
Subject: Re: [FFmpeg-devel] [PATCH] swscale_unscaled: fix DITHER_COPY macro,
	use it only for dst_depth == 8
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <http://ffmpeg.org/mailman/options/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <http://ffmpeg.org/pipermail/ffmpeg-devel/>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

W dniu 2017-09-25 o 22:53, Carl Eugen Hoyos pisze:
> 2017-09-23 19:18 GMT+02:00 Mateusz <mateuszb@poczta.onet.pl>:
>> W dniu 2017-09-23 o 17:01, Michael Niedermayer pisze:
>>> On Fri, Sep 22, 2017 at 02:10:01AM +0200, Mateusz wrote:
>>>> To reduce bit depth in planar YUV or gray pixel formats
>>>> ffmpeg uses DITHER_COPY macro.
>>>> Now it makes images greener and with visible dither pattern.
>>>>
>>>> In my opinion there is no point to use dither tables for
>>>> destination bit depth >= 9, we can use simple down-shift
>>>> which is neutral in full and limited range -- result images
>>>> are with the same brightness and with the same colors.
>>>
>>> Theres no reason why dither should mess up the average color tone.
>>
>> In theory -- yes, I agree.
>> In reality -- current version of DITHER_COPY mess up the
>> average color tone.
>> It's one of the reasons why I sending these patches.
>>
>>> And if the user asks for >= 9 bit depth and has >= 10 bit
>>> input the user likely wants to get the best quality.
>>> Thats more so in a world where computers get faster
>>> every few years, this isnt 1995 where shaving off a add
>>> or a multiply per pixel was actually making a difference
>>> in being able to play something in realtime
>>> More so coverting between bit depths might be memory
>>> speed limited and not limited by arithmetic computations
>>> once its done in SIMD
>>
>> Yes, I agree. Now I can't write patches in NASM syntax, but
>> I started reading and learning.
>> I hope I'll back in a few months...
> 
> I strongly suspect there should be agreement over the C code
> first, asm optimizations can be done once C code is agreed
> upon.
> 
> Thank you for the samples, Carl Eugen

I've sent C code patch 2017-09-06 (and nothing) so I thought that the
problem is with speed. For simplicity I've attached this patch.

This code full-fill all rules (for limited and for full range):
white -> white
black -> black
white-1 -> gray (it means white-1 -> white usually but sometimes white-1 -> white-1)
black+1 -> gray (it means black+1 -> black usually but sometimes black+1 -> black+1)
increase bitdepth & decrease bitdepth = identity

About code:
limited range (shiftonly == 1)
tmp = (src + dither)>>shift; dst = tmp - (tmp>>dst_depth);

In theory it is enough to make only dst = (src + dither)>>shift;
-- white in limited range has 0 on bits to remove (235*4 for example)
so overflow is impossible. For files with full range not marked as
full range overflow is possible (for dither > 0) and white goes
to black. tmp - (tmp>>dst_depth) undoing this overflow.

full range (shiftonly == 0)
dst = (src - (src>>dst_depth) + dither)>>shift;

If we want to remove 2 bits from source (for example 10-bit -> 8-bit)
white in full range: ...111|11
white-1 in full range: ...111|10
so we should subtract dither for rule 'white-1 -> gray'
black in full range: ...000|00
black+1 in full range: ...000|01
so we should add dither for rule 'black+1 -> gray'
(src>>dst_depth) is 0 for small values (close to black) and max possible
dither for big values (close to white)

For rule 'increase bitdepth & decrease bitdepth = identity' in full range,
we have increase bitdepth: (v<<(dst_depth-src_depth)) | (v>>(2*src_depth-dst_depth))
so (src - (src>>dst_depth))>>shift is exactly inverse operation when we
decreasing bitdepth.
For full range when 'src' is from increased 'v'
(src - (src>>dst_depth)) is equal 'v' shifted only, so we are as in limited range.
And overflow is impossible in operation (src - (src>>dst_depth) + dither).

Mateusz
From fd9e271ea531d25bc5a708d0dfeb1be5415b90d0 Mon Sep 17 00:00:00 2001
From: Mateusz <mateuszb@poczta.onet.pl>
Date: Wed, 6 Sep 2017 09:05:02 +0200
Subject: [PATCH] fix DITHER_COPY macro according to shiftonly state
---
 libswscale/swscale_unscaled.c | 73 ++++++++++++++++++++++---------------------
 1 file changed, 38 insertions(+), 35 deletions(-)

diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c
index ef36aec..e3e375a 100644
--- a/libswscale/swscale_unscaled.c
+++ b/libswscale/swscale_unscaled.c
@@ -110,24 +110,6 @@ DECLARE_ALIGNED(8, static const uint8_t, dithers)[8][8][8]={
   { 112, 16,104,  8,118, 22,110, 14,},
 }};
 
-static const uint16_t dither_scale[15][16]={
-{    2,    3,    3,    5,    5,    5,    5,    5,    5,    5,    5,    5,    5,    5,    5,    5,},
-{    2,    3,    7,    7,   13,   13,   25,   25,   25,   25,   25,   25,   25,   25,   25,   25,},
-{    3,    3,    4,   15,   15,   29,   57,   57,   57,  113,  113,  113,  113,  113,  113,  113,},
-{    3,    4,    4,    5,   31,   31,   61,  121,  241,  241,  241,  241,  481,  481,  481,  481,},
-{    3,    4,    5,    5,    6,   63,   63,  125,  249,  497,  993,  993,  993,  993,  993, 1985,},
-{    3,    5,    6,    6,    6,    7,  127,  127,  253,  505, 1009, 2017, 4033, 4033, 4033, 4033,},
-{    3,    5,    6,    7,    7,    7,    8,  255,  255,  509, 1017, 2033, 4065, 8129,16257,16257,},
-{    3,    5,    6,    8,    8,    8,    8,    9,  511,  511, 1021, 2041, 4081, 8161,16321,32641,},
-{    3,    5,    7,    8,    9,    9,    9,    9,   10, 1023, 1023, 2045, 4089, 8177,16353,32705,},
-{    3,    5,    7,    8,   10,   10,   10,   10,   10,   11, 2047, 2047, 4093, 8185,16369,32737,},
-{    3,    5,    7,    8,   10,   11,   11,   11,   11,   11,   12, 4095, 4095, 8189,16377,32753,},
-{    3,    5,    7,    9,   10,   12,   12,   12,   12,   12,   12,   13, 8191, 8191,16381,32761,},
-{    3,    5,    7,    9,   10,   12,   13,   13,   13,   13,   13,   13,   14,16383,16383,32765,},
-{    3,    5,    7,    9,   10,   12,   14,   14,   14,   14,   14,   14,   14,   15,32767,32767,},
-{    3,    5,    7,    9,   11,   12,   14,   15,   15,   15,   15,   15,   15,   15,   16,65535,},
-};
-
 
 static void fillPlane(uint8_t *plane, int stride, int width, int height, int y,
                       uint8_t val)
@@ -1502,24 +1484,45 @@ static int packedCopyWrapper(SwsContext *c, const uint8_t *src[],
 }
 
 #define DITHER_COPY(dst, dstStride, src, srcStride, bswap, dbswap)\
-    uint16_t scale= dither_scale[dst_depth-1][src_depth-1];\
-    int shift= src_depth-dst_depth + dither_scale[src_depth-2][dst_depth-1];\
-    for (i = 0; i < height; i++) {\
-        const uint8_t *dither= dithers[src_depth-9][i&7];\
-        for (j = 0; j < length-7; j+=8){\
-            dst[j+0] = dbswap((bswap(src[j+0]) + dither[0])*scale>>shift);\
-            dst[j+1] = dbswap((bswap(src[j+1]) + dither[1])*scale>>shift);\
-            dst[j+2] = dbswap((bswap(src[j+2]) + dither[2])*scale>>shift);\
-            dst[j+3] = dbswap((bswap(src[j+3]) + dither[3])*scale>>shift);\
-            dst[j+4] = dbswap((bswap(src[j+4]) + dither[4])*scale>>shift);\
-            dst[j+5] = dbswap((bswap(src[j+5]) + dither[5])*scale>>shift);\
-            dst[j+6] = dbswap((bswap(src[j+6]) + dither[6])*scale>>shift);\
-            dst[j+7] = dbswap((bswap(src[j+7]) + dither[7])*scale>>shift);\
+    unsigned shift= src_depth-dst_depth, tmp;\
+    if (shiftonly) {\
+        for (i = 0; i < height; i++) {\
+            const uint8_t *dither= dithers[shift-1][i&7];\
+            for (j = 0; j < length-7; j+=8){\
+                tmp = (bswap(src[j+0]) + dither[0])>>shift; dst[j+0] = dbswap(tmp - (tmp>>dst_depth));\
+                tmp = (bswap(src[j+1]) + dither[1])>>shift; dst[j+1] = dbswap(tmp - (tmp>>dst_depth));\
+                tmp = (bswap(src[j+2]) + dither[2])>>shift; dst[j+2] = dbswap(tmp - (tmp>>dst_depth));\
+                tmp = (bswap(src[j+3]) + dither[3])>>shift; dst[j+3] = dbswap(tmp - (tmp>>dst_depth));\
+                tmp = (bswap(src[j+4]) + dither[4])>>shift; dst[j+4] = dbswap(tmp - (tmp>>dst_depth));\
+                tmp = (bswap(src[j+5]) + dither[5])>>shift; dst[j+5] = dbswap(tmp - (tmp>>dst_depth));\
+                tmp = (bswap(src[j+6]) + dither[6])>>shift; dst[j+6] = dbswap(tmp - (tmp>>dst_depth));\
+                tmp = (bswap(src[j+7]) + dither[7])>>shift; dst[j+7] = dbswap(tmp - (tmp>>dst_depth));\
+            }\
+            for (; j < length; j++){\
+                tmp = (bswap(src[j]) + dither[j&7])>>shift; dst[j] = dbswap(tmp - (tmp>>dst_depth));\
+            }\
+            dst += dstStride;\
+            src += srcStride;\
+        }\
+    } else {\
+        for (i = 0; i < height; i++) {\
+            const uint8_t *dither= dithers[shift-1][i&7];\
+            for (j = 0; j < length-7; j+=8){\
+                tmp = bswap(src[j+0]); dst[j+0] = dbswap((tmp - (tmp>>dst_depth) + dither[0])>>shift);\
+                tmp = bswap(src[j+1]); dst[j+1] = dbswap((tmp - (tmp>>dst_depth) + dither[1])>>shift);\
+                tmp = bswap(src[j+2]); dst[j+2] = dbswap((tmp - (tmp>>dst_depth) + dither[2])>>shift);\
+                tmp = bswap(src[j+3]); dst[j+3] = dbswap((tmp - (tmp>>dst_depth) + dither[3])>>shift);\
+                tmp = bswap(src[j+4]); dst[j+4] = dbswap((tmp - (tmp>>dst_depth) + dither[4])>>shift);\
+                tmp = bswap(src[j+5]); dst[j+5] = dbswap((tmp - (tmp>>dst_depth) + dither[5])>>shift);\
+                tmp = bswap(src[j+6]); dst[j+6] = dbswap((tmp - (tmp>>dst_depth) + dither[6])>>shift);\
+                tmp = bswap(src[j+7]); dst[j+7] = dbswap((tmp - (tmp>>dst_depth) + dither[7])>>shift);\
+            }\
+            for (; j < length; j++){\
+                tmp = bswap(src[j]); dst[j] = dbswap((tmp - (tmp>>dst_depth) + dither[j&7])>>shift);\
+            }\
+            dst += dstStride;\
+            src += srcStride;\
         }\
-        for (; j < length; j++)\
-            dst[j] = dbswap((bswap(src[j]) + dither[j&7])*scale>>shift);\
-        dst += dstStride;\
-        src += srcStride;\
     }
 
 static int planarCopyWrapper(SwsContext *c, const uint8_t *src[],