From patchwork Wed Sep 27 10:04:21 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Mateusz X-Patchwork-Id: 5305 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.2.36.26 with SMTP id f26csp4826324jaa; Wed, 27 Sep 2017 03:06:57 -0700 (PDT) X-Google-Smtp-Source: AOwi7QBZEB54kIWLLWX+VzzcWSh63P/cmm8iVPwFESOe1TLkZ96xrPwNp55OHjVXeoIeoqvYI2mU X-Received: by 10.28.105.156 with SMTP id z28mr1196753wmh.73.1506506817427; Wed, 27 Sep 2017 03:06:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1506506817; cv=none; d=google.com; s=arc-20160816; b=EAQS9/Ea7vJlZCAlbye1t12GT3av9zPu+uUIRWI78RMP5YziFmF/Sejy7XpPy0t4cB cbKSF0fBJpJGAEjzh6E09xFoYElVn6nx6V/aIqsRZHPuaVA71uXzVjYOSPzmSgyOOKYI YDSki6kL+sV5ZdL7RkPFTS02PVe9TV8kalcoAQ8OgUoidP55pejWlnSfrPXG9MHMX+// RQpQArH6QcZhJdtcNbvCYJsE2oBZXl+4maA9CWg4j4RVjlm8YQwJTMER/Z73fEy7nIi7 VgvbemoU6MkBN7JkJtIXcwgxQRhesLSh5asolEBAzH1wibgGKp3IxIpqrgiR1fW6rCjo z2hQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:dkim-signature:delivered-to :arc-authentication-results; bh=Kw3GVnFwCNPS7so6YZlZs+2IlR+rCP3ds76wdCMFpyo=; b=Klr6l56ivbm+aRaxB2n877Ef1I0rstHp1r98fyDavRIHFJWpIL8M18qrO9PJwTrqw+ 6xGgy0PiJw4F/yDqg2K20uRVfXaWYuj1fqazItyYnT7letgSFLxGitKHXVxX7GBAIC38 lRoSF6b1NQ5WBiBwULL0BXgLq5aXQ1oCHfsZwkyWkIymVeH5XjdC3TrMcU/y9FFevrs3 +OZV1DDrhmox50Qsoy+vI3Ofi+lqGWMPL+azXGfNsFDqSguO8IXkfWktiKLHNwHBu57A UNBGkG5T7boIdgYcZZ6He7h3lNEs9LDuD5t2TTjGVXaTbSUhpxR/RSIPZgsz8hq8bc8n epnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@poczta.onet.pl header.s=2011 header.b=bzj2v5rR; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id l81si3159689wmf.234.2017.09.27.03.06.56; Wed, 27 Sep 2017 03:06:57 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@poczta.onet.pl header.s=2011 header.b=bzj2v5rR; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 68A786883A1; Wed, 27 Sep 2017 13:06:43 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from smtpo70.poczta.onet.pl (smtpo71.poczta.onet.pl [141.105.16.21]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A2592680981 for ; Wed, 27 Sep 2017 13:06:36 +0300 (EEST) Received: from [192.168.1.2] (afhx186.neoplus.adsl.tpnet.pl [95.49.205.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mateuszb@poczta.onet.pl) by smtp.poczta.onet.pl (Onet) with ESMTPSA id 3y2D4f6J2Pzlk1lW for ; Wed, 27 Sep 2017 12:06:42 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=poczta.onet.pl; s=2011; t=1506506803; bh=jLfLtz/LAxdjmqE+9z65doPO6IeC7w35ZTnEmfi/UGI=; h=Subject:To:References:From:Date:In-Reply-To:From; b=bzj2v5rRQprX8T8s5pSh6zDgm81vyYZk5WMWZBZEtVAEx5hKHiTKGpPq8f8jSGRHP Qnr9KllFB1hqGdc2w0Zs7Vzcaacdt1v7HQQb5dVAeLxlRHHGC3xwL48MwTzaYbF1rW ku5eCuLIx1p4GvpHArCoLtFknsl4L216BmQWWqRU= To: ffmpeg-devel@ffmpeg.org References: <20170923150153.GG7094@nb4> From: Mateusz Message-ID: <39349041-c48c-77ec-b29e-f1352ad84753@poczta.onet.pl> Date: Wed, 27 Sep 2017 12:04:21 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Subject: Re: [FFmpeg-devel] [PATCH] swscale_unscaled: fix DITHER_COPY macro, use it only for dst_depth == 8 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" W dniu 2017-09-26 o 13:31, Carl Eugen Hoyos pisze: > 2017-09-26 1:33 GMT+02:00 Mateusz : > >> I've sent C code patch 2017-09-06 (and nothing) so I thought that the >> problem is with speed. For simplicity I've attached this patch. > > You could (wait a day or two and) either add an option to > select your dithering code or put it under #ifdef so more > people can test it. I've attached patch that do nothing unless you specify --extra-cflags="-DNEW_DITHER_COPY" or export CFLAGS="-DNEW_DITHER_COPY" >> In theory it is enough to make only dst = (src + dither)>>shift; >> -- white in limited range has 0 on bits to remove (235*4 for example) >> so overflow is impossible. For files with full range not marked as >> full range overflow is possible (for dither > 0) and white goes >> to black. tmp - (tmp>>dst_depth) undoing this overflow. > > (Not necessarily related, sorry if I misunderstand:) > Valid limited-range frames can contain some pixels with peak > values outside of the defined range. > > Carl Eugen OK, so this fight with possible overflow is even more needed. Mateusz From 2c0adb2d9a0fc0fbbffc643d27860fbb779c08fc Mon Sep 17 00:00:00 2001 From: Mateusz Date: Tue, 26 Sep 2017 22:20:10 +0200 Subject: [PATCH] swscale: new precise DITHER_COPY macro (if "-DNEW_DITHER_COPY" CFLAGS) --- libswscale/swscale_unscaled.c | 47 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c index ef36aec..0d41695 100644 --- a/libswscale/swscale_unscaled.c +++ b/libswscale/swscale_unscaled.c @@ -110,6 +110,7 @@ DECLARE_ALIGNED(8, static const uint8_t, dithers)[8][8][8]={ { 112, 16,104, 8,118, 22,110, 14,}, }}; +#ifndef NEW_DITHER_COPY static const uint16_t dither_scale[15][16]={ { 2, 3, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,}, { 2, 3, 7, 7, 13, 13, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,}, @@ -127,7 +128,7 @@ static const uint16_t dither_scale[15][16]={ { 3, 5, 7, 9, 10, 12, 14, 14, 14, 14, 14, 14, 14, 15,32767,32767,}, { 3, 5, 7, 9, 11, 12, 14, 15, 15, 15, 15, 15, 15, 15, 16,65535,}, }; - +#endif static void fillPlane(uint8_t *plane, int stride, int width, int height, int y, uint8_t val) @@ -1501,6 +1502,7 @@ static int packedCopyWrapper(SwsContext *c, const uint8_t *src[], return srcSliceH; } +#ifndef NEW_DITHER_COPY #define DITHER_COPY(dst, dstStride, src, srcStride, bswap, dbswap)\ uint16_t scale= dither_scale[dst_depth-1][src_depth-1];\ int shift= src_depth-dst_depth + dither_scale[src_depth-2][dst_depth-1];\ @@ -1521,6 +1523,49 @@ static int packedCopyWrapper(SwsContext *c, const uint8_t *src[], dst += dstStride;\ src += srcStride;\ } +#else +#define DITHER_COPY(dst, dstStride, src, srcStride, bswap, dbswap)\ + unsigned shift= src_depth-dst_depth, tmp;\ + if (shiftonly) {\ + for (i = 0; i < height; i++) {\ + const uint8_t *dither= dithers[shift-1][i&7];\ + for (j = 0; j < length-7; j+=8){\ + tmp = (bswap(src[j+0]) + dither[0])>>shift; dst[j+0] = dbswap(tmp - (tmp>>dst_depth));\ + tmp = (bswap(src[j+1]) + dither[1])>>shift; dst[j+1] = dbswap(tmp - (tmp>>dst_depth));\ + tmp = (bswap(src[j+2]) + dither[2])>>shift; dst[j+2] = dbswap(tmp - (tmp>>dst_depth));\ + tmp = (bswap(src[j+3]) + dither[3])>>shift; dst[j+3] = dbswap(tmp - (tmp>>dst_depth));\ + tmp = (bswap(src[j+4]) + dither[4])>>shift; dst[j+4] = dbswap(tmp - (tmp>>dst_depth));\ + tmp = (bswap(src[j+5]) + dither[5])>>shift; dst[j+5] = dbswap(tmp - (tmp>>dst_depth));\ + tmp = (bswap(src[j+6]) + dither[6])>>shift; dst[j+6] = dbswap(tmp - (tmp>>dst_depth));\ + tmp = (bswap(src[j+7]) + dither[7])>>shift; dst[j+7] = dbswap(tmp - (tmp>>dst_depth));\ + }\ + for (; j < length; j++){\ + tmp = (bswap(src[j]) + dither[j&7])>>shift; dst[j] = dbswap(tmp - (tmp>>dst_depth));\ + }\ + dst += dstStride;\ + src += srcStride;\ + }\ + } else {\ + for (i = 0; i < height; i++) {\ + const uint8_t *dither= dithers[shift-1][i&7];\ + for (j = 0; j < length-7; j+=8){\ + tmp = bswap(src[j+0]); dst[j+0] = dbswap((tmp - (tmp>>dst_depth) + dither[0])>>shift);\ + tmp = bswap(src[j+1]); dst[j+1] = dbswap((tmp - (tmp>>dst_depth) + dither[1])>>shift);\ + tmp = bswap(src[j+2]); dst[j+2] = dbswap((tmp - (tmp>>dst_depth) + dither[2])>>shift);\ + tmp = bswap(src[j+3]); dst[j+3] = dbswap((tmp - (tmp>>dst_depth) + dither[3])>>shift);\ + tmp = bswap(src[j+4]); dst[j+4] = dbswap((tmp - (tmp>>dst_depth) + dither[4])>>shift);\ + tmp = bswap(src[j+5]); dst[j+5] = dbswap((tmp - (tmp>>dst_depth) + dither[5])>>shift);\ + tmp = bswap(src[j+6]); dst[j+6] = dbswap((tmp - (tmp>>dst_depth) + dither[6])>>shift);\ + tmp = bswap(src[j+7]); dst[j+7] = dbswap((tmp - (tmp>>dst_depth) + dither[7])>>shift);\ + }\ + for (; j < length; j++){\ + tmp = bswap(src[j]); dst[j] = dbswap((tmp - (tmp>>dst_depth) + dither[j&7])>>shift);\ + }\ + dst += dstStride;\ + src += srcStride;\ + }\ + } +#endif static int planarCopyWrapper(SwsContext *c, const uint8_t *src[], int srcStride[], int srcSliceY, int srcSliceH,