From patchwork Sat Mar 18 20:28:10 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Brzostek X-Patchwork-Id: 3009 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.50.79 with SMTP id y76csp802198vsy; Sat, 18 Mar 2017 13:32:39 -0700 (PDT) X-Received: by 10.223.163.206 with SMTP id m14mr20726116wrb.34.1489869159453; Sat, 18 Mar 2017 13:32:39 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id y104si1375311wrc.154.2017.03.18.13.32.38; Sat, 18 Mar 2017 13:32:39 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@msystem.waw.pl; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E7068688259; Sat, 18 Mar 2017 22:32:17 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from s36a.hekko.net.pl (s36a.hekko.net.pl [188.116.19.94]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2BDF368060B for ; Sat, 18 Mar 2017 22:32:11 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=msystem.waw.pl; s=x; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:To:Subject:Sender:Reply-To:Cc: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=eAtsH8uDzj3Sxrx2izsavTEz4AWA0/pMmQt/vdd0TYU=; b=thw6EamF0FrLw/KnRLtNlFayY 7G0pEjHcXx6PQLgQmyvf2e8OoJR+sRoB0F1wXqzxmbDj/6EdeXdfCiAwXQpykdBEt+JxNr4kaYQT6 m1YJ7itgYGv7hyx08620Fdmv9iB52LCfjXm+h4tk1N0vcbUnxlJB+7JwcvRgu70S0gM5xceDwJTVY wvUkCeLTotvZmvw9UXOELD9iCsDMf1pRJDL6G7e7I/ZZJF0Z3hGCmaaDTgW05ZveFJYj+1WtTxXoF LGC1/xd4rTRLHXA4PZIfBX/L29BYLoExaN8QjbsYA61VKiR9/XE2VjYgeXsmem/6RhpwcIuHihGFA 329GOKYLA==; Received: from adpe40.neoplus.adsl.tpnet.pl ([79.185.112.40] helo=[192.168.1.3]) by s36.hekko.net.pl with esmtpsa (TLSv1.2:DHE-RSA-AES128-SHA:128) (Exim 4.87) (envelope-from ) id 1cpL1X-0001DO-Hl for ffmpeg-devel@ffmpeg.org; Sat, 18 Mar 2017 21:32:28 +0100 To: FFmpeg development discussions and patches References: <5de4b4a8-767b-5ddb-1b5b-78cb913e32d0@msystem.waw.pl> <20170316181738.GQ5776@nb4> From: Mateusz Brzostek Message-ID: <517fc771-37f0-e2ed-e2b9-9865888bfd85@msystem.waw.pl> Date: Sat, 18 Mar 2017 21:28:10 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170316181738.GQ5776@nb4> X-HEKKO: 79.185.112.40:mateusz@msystem.waw.pl Subject: Re: [FFmpeg-devel] [PATCH] libswscale/swscale_unscaled: fix DITHER_COPY macro X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" W dniu 2017-03-16 o 19:17, Michael Niedermayer pisze: > On Wed, Mar 15, 2017 at 10:52:29PM +0100, Mateusz Brzostek wrote: >> Hello! >> >> There are 3 problems with DITHER_COPY macro in libswscale/swscale_unscaled.c: >> 1) there is overflow in dithering from 12-bit to 10-bit (output value > 1023); >> 2) for limit range the lower limit is not respected, for example from 10-bit to 8-bit value 64 is converted to 15; >> 3) for many iteration of downscale/upscale of the same image the 200th iteration is significantly darker. >> >> The first bug is due to wrong dithers table (now it is OK only for 8-bit destination), fix is: >> - const uint8_t *dither= dithers[src_depth-9][i&7];\ >> + const uint8_t *dither= dithers[src_depth-dst_depth-1][i&7];\ >> >> For bugs 2) and 3) it is needed formula that do not make images darker (in attachment). So please review. > does your code maintain white and black levels ? > with 4 bits white is 15, with 7 bits white is 127 for example > white should stay white > black should stay black > in both directions > > [...] After speed tests I've prepared faster version of this patch (do exactly the same but faster). ffmpeg0 - clean 64-bit gcc 6.3 build for Windows from snapshot 2017-03-18, ffmpeg1 - the same + old patch, ffmpeg2 - the same + new patch. Average speed was (from 10 laps): ffmpeg0 -- 2.625 ffmpeg1 -- 2.540 ffmpeg2 -- 2.628 New patch is faster than old and is not slower than current (buggy) code. My speed test looks like this (copy from 324 frames movie 2400x1350, yuv444p12): f:\speed\ff2>for /L %n in (1 1 10) do (for /L %i in (0 1 2) do (ffmpeg%i -y -stats -i ../original_vc3.mxf -v error -strict -1 -p ix_fmt yuv444p10 -f yuv4mpegpipe NUL ) ) f:\speed\ff2>(for /L %i in (0 1 2) do (ffmpeg%i -y -stats -i ../original_vc3.mxf -v error -strict -1 -pix_fmt yuv444p10 -f yuv4m pegpipe NUL ) ) f:\speed\ff2>(ffmpeg0 -y -stats -i ../original_vc3.mxf -v error -strict -1 -pix_fmt yuv444p10 -f yuv4mpegpipe NUL ) frame= 324 fps= 63 q=-0.0 Lsize= 6150939kB time=00:00:13.50 bitrate=3732481.2kbits/s speed=2.64x f:\speed\ff2>(ffmpeg1 -y -stats -i ../original_vc3.mxf -v error -strict -1 -pix_fmt yuv444p10 -f yuv4mpegpipe NUL ) frame= 324 fps= 61 q=-0.0 Lsize= 6150939kB time=00:00:13.50 bitrate=3732481.2kbits/s speed=2.54x f:\speed\ff2>(ffmpeg2 -y -stats -i ../original_vc3.mxf -v error -strict -1 -pix_fmt yuv444p10 -f yuv4mpegpipe NUL ) frame= 324 fps= 63 q=-0.0 Lsize= 6150939kB time=00:00:13.50 bitrate=3732481.2kbits/s speed=2.62x New patch inline (+ in attachment for easy apply): libswscale/swscale_unscaled.c | 41 +++++++++++------------------------------ 1 file changed, 11 insertions(+), 30 deletions(-) libswscale/swscale_unscaled.c | 41 +++++++++++------------------------------ 1 file changed, 11 insertions(+), 30 deletions(-) diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c index ba3d688..a1d0a8d 100644 --- a/libswscale/swscale_unscaled.c +++ b/libswscale/swscale_unscaled.c @@ -110,24 +110,6 @@ DECLARE_ALIGNED(8, static const uint8_t, dithers)[8][8][8]={ { 112, 16,104, 8,118, 22,110, 14,}, }}; -static const uint16_t dither_scale[15][16]={ -{ 2, 3, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,}, -{ 2, 3, 7, 7, 13, 13, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,}, -{ 3, 3, 4, 15, 15, 29, 57, 57, 57, 113, 113, 113, 113, 113, 113, 113,}, -{ 3, 4, 4, 5, 31, 31, 61, 121, 241, 241, 241, 241, 481, 481, 481, 481,}, -{ 3, 4, 5, 5, 6, 63, 63, 125, 249, 497, 993, 993, 993, 993, 993, 1985,}, -{ 3, 5, 6, 6, 6, 7, 127, 127, 253, 505, 1009, 2017, 4033, 4033, 4033, 4033,}, -{ 3, 5, 6, 7, 7, 7, 8, 255, 255, 509, 1017, 2033, 4065, 8129,16257,16257,}, -{ 3, 5, 6, 8, 8, 8, 8, 9, 511, 511, 1021, 2041, 4081, 8161,16321,32641,}, -{ 3, 5, 7, 8, 9, 9, 9, 9, 10, 1023, 1023, 2045, 4089, 8177,16353,32705,}, -{ 3, 5, 7, 8, 10, 10, 10, 10, 10, 11, 2047, 2047, 4093, 8185,16369,32737,}, -{ 3, 5, 7, 8, 10, 11, 11, 11, 11, 11, 12, 4095, 4095, 8189,16377,32753,}, -{ 3, 5, 7, 9, 10, 12, 12, 12, 12, 12, 12, 13, 8191, 8191,16381,32761,}, -{ 3, 5, 7, 9, 10, 12, 13, 13, 13, 13, 13, 13, 14,16383,16383,32765,}, -{ 3, 5, 7, 9, 10, 12, 14, 14, 14, 14, 14, 14, 14, 15,32767,32767,}, -{ 3, 5, 7, 9, 11, 12, 14, 15, 15, 15, 15, 15, 15, 15, 16,65535,}, -}; - static void fillPlane(uint8_t *plane, int stride, int width, int height, int y, uint8_t val) @@ -1485,22 +1467,21 @@ static int packedCopyWrapper(SwsContext *c, const uint8_t *src[], } #define DITHER_COPY(dst, dstStride, src, srcStride, bswap, dbswap)\ - uint16_t scale= dither_scale[dst_depth-1][src_depth-1];\ - int shift= src_depth-dst_depth + dither_scale[src_depth-2][dst_depth-1];\ + unsigned shift= src_depth-dst_depth, dst_max= (1<>shift);\ - dst[j+1] = dbswap((bswap(src[j+1]) + dither[1])*scale>>shift);\ - dst[j+2] = dbswap((bswap(src[j+2]) + dither[2])*scale>>shift);\ - dst[j+3] = dbswap((bswap(src[j+3]) + dither[3])*scale>>shift);\ - dst[j+4] = dbswap((bswap(src[j+4]) + dither[4])*scale>>shift);\ - dst[j+5] = dbswap((bswap(src[j+5]) + dither[5])*scale>>shift);\ - dst[j+6] = dbswap((bswap(src[j+6]) + dither[6])*scale>>shift);\ - dst[j+7] = dbswap((bswap(src[j+7]) + dither[7])*scale>>shift);\ + tmp = (bswap(src[j+0]) + dither[0])>>shift; dst[j+0] = dbswap(FFMIN(tmp, dst_max));\ + tmp = (bswap(src[j+1]) + dither[1])>>shift; dst[j+1] = dbswap(FFMIN(tmp, dst_max));\ + tmp = (bswap(src[j+2]) + dither[2])>>shift; dst[j+2] = dbswap(FFMIN(tmp, dst_max));\ + tmp = (bswap(src[j+3]) + dither[3])>>shift; dst[j+3] = dbswap(FFMIN(tmp, dst_max));\ + tmp = (bswap(src[j+4]) + dither[4])>>shift; dst[j+4] = dbswap(FFMIN(tmp, dst_max));\ + tmp = (bswap(src[j+5]) + dither[5])>>shift; dst[j+5] = dbswap(FFMIN(tmp, dst_max));\ + tmp = (bswap(src[j+6]) + dither[6])>>shift; dst[j+6] = dbswap(FFMIN(tmp, dst_max));\ + tmp = (bswap(src[j+7]) + dither[7])>>shift; dst[j+7] = dbswap(FFMIN(tmp, dst_max));\ }\ for (; j < length; j++)\ - dst[j] = dbswap((bswap(src[j]) + dither[j&7])*scale>>shift);\ + tmp = (bswap(src[j]) + dither[j&7])>>shift; dst[j] = dbswap(FFMIN(tmp, dst_max));\ dst += dstStride;\ src += srcStride;\ } diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c index ba3d688..a1d0a8d 100644 --- a/libswscale/swscale_unscaled.c +++ b/libswscale/swscale_unscaled.c @@ -110,24 +110,6 @@ DECLARE_ALIGNED(8, static const uint8_t, dithers)[8][8][8]={ { 112, 16,104, 8,118, 22,110, 14,}, }}; -static const uint16_t dither_scale[15][16]={ -{ 2, 3, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,}, -{ 2, 3, 7, 7, 13, 13, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,}, -{ 3, 3, 4, 15, 15, 29, 57, 57, 57, 113, 113, 113, 113, 113, 113, 113,}, -{ 3, 4, 4, 5, 31, 31, 61, 121, 241, 241, 241, 241, 481, 481, 481, 481,}, -{ 3, 4, 5, 5, 6, 63, 63, 125, 249, 497, 993, 993, 993, 993, 993, 1985,}, -{ 3, 5, 6, 6, 6, 7, 127, 127, 253, 505, 1009, 2017, 4033, 4033, 4033, 4033,}, -{ 3, 5, 6, 7, 7, 7, 8, 255, 255, 509, 1017, 2033, 4065, 8129,16257,16257,}, -{ 3, 5, 6, 8, 8, 8, 8, 9, 511, 511, 1021, 2041, 4081, 8161,16321,32641,}, -{ 3, 5, 7, 8, 9, 9, 9, 9, 10, 1023, 1023, 2045, 4089, 8177,16353,32705,}, -{ 3, 5, 7, 8, 10, 10, 10, 10, 10, 11, 2047, 2047, 4093, 8185,16369,32737,}, -{ 3, 5, 7, 8, 10, 11, 11, 11, 11, 11, 12, 4095, 4095, 8189,16377,32753,}, -{ 3, 5, 7, 9, 10, 12, 12, 12, 12, 12, 12, 13, 8191, 8191,16381,32761,}, -{ 3, 5, 7, 9, 10, 12, 13, 13, 13, 13, 13, 13, 14,16383,16383,32765,}, -{ 3, 5, 7, 9, 10, 12, 14, 14, 14, 14, 14, 14, 14, 15,32767,32767,}, -{ 3, 5, 7, 9, 11, 12, 14, 15, 15, 15, 15, 15, 15, 15, 16,65535,}, -}; - static void fillPlane(uint8_t *plane, int stride, int width, int height, int y, uint8_t val) @@ -1485,22 +1467,21 @@ static int packedCopyWrapper(SwsContext *c, const uint8_t *src[], } #define DITHER_COPY(dst, dstStride, src, srcStride, bswap, dbswap)\ - uint16_t scale= dither_scale[dst_depth-1][src_depth-1];\ - int shift= src_depth-dst_depth + dither_scale[src_depth-2][dst_depth-1];\ + unsigned shift= src_depth-dst_depth, dst_max= (1<>shift);\ - dst[j+1] = dbswap((bswap(src[j+1]) + dither[1])*scale>>shift);\ - dst[j+2] = dbswap((bswap(src[j+2]) + dither[2])*scale>>shift);\ - dst[j+3] = dbswap((bswap(src[j+3]) + dither[3])*scale>>shift);\ - dst[j+4] = dbswap((bswap(src[j+4]) + dither[4])*scale>>shift);\ - dst[j+5] = dbswap((bswap(src[j+5]) + dither[5])*scale>>shift);\ - dst[j+6] = dbswap((bswap(src[j+6]) + dither[6])*scale>>shift);\ - dst[j+7] = dbswap((bswap(src[j+7]) + dither[7])*scale>>shift);\ + tmp = (bswap(src[j+0]) + dither[0])>>shift; dst[j+0] = dbswap(FFMIN(tmp, dst_max));\ + tmp = (bswap(src[j+1]) + dither[1])>>shift; dst[j+1] = dbswap(FFMIN(tmp, dst_max));\ + tmp = (bswap(src[j+2]) + dither[2])>>shift; dst[j+2] = dbswap(FFMIN(tmp, dst_max));\ + tmp = (bswap(src[j+3]) + dither[3])>>shift; dst[j+3] = dbswap(FFMIN(tmp, dst_max));\ + tmp = (bswap(src[j+4]) + dither[4])>>shift; dst[j+4] = dbswap(FFMIN(tmp, dst_max));\ + tmp = (bswap(src[j+5]) + dither[5])>>shift; dst[j+5] = dbswap(FFMIN(tmp, dst_max));\ + tmp = (bswap(src[j+6]) + dither[6])>>shift; dst[j+6] = dbswap(FFMIN(tmp, dst_max));\ + tmp = (bswap(src[j+7]) + dither[7])>>shift; dst[j+7] = dbswap(FFMIN(tmp, dst_max));\ }\ for (; j < length; j++)\ - dst[j] = dbswap((bswap(src[j]) + dither[j&7])*scale>>shift);\ + tmp = (bswap(src[j]) + dither[j&7])>>shift; dst[j] = dbswap(FFMIN(tmp, dst_max));\ dst += dstStride;\ src += srcStride;\ }