Message ID | 5de4b4a8-767b-5ddb-1b5b-78cb913e32d0@msystem.waw.pl |
---|---|
State | Superseded |
Headers | show |
On Wed, Mar 15, 2017 at 10:52:29PM +0100, Mateusz Brzostek wrote: > Hello! > > There are 3 problems with DITHER_COPY macro in libswscale/swscale_unscaled.c: > 1) there is overflow in dithering from 12-bit to 10-bit (output value > 1023); > 2) for limit range the lower limit is not respected, for example from 10-bit to 8-bit value 64 is converted to 15; > 3) for many iteration of downscale/upscale of the same image the 200th iteration is significantly darker. > > The first bug is due to wrong dithers table (now it is OK only for 8-bit destination), fix is: > - const uint8_t *dither= dithers[src_depth-9][i&7];\ > + const uint8_t *dither= dithers[src_depth-dst_depth-1][i&7];\ > > For bugs 2) and 3) it is needed formula that do not make images darker (in attachment). So please review. does your code maintain white and black levels ? with 4 bits white is 15, with 7 bits white is 127 for example white should stay white black should stay black in both directions [...]
W dniu 2017-03-16 o 19:17, Michael Niedermayer pisze: > On Wed, Mar 15, 2017 at 10:52:29PM +0100, Mateusz Brzostek wrote: >> Hello! >> >> There are 3 problems with DITHER_COPY macro in libswscale/swscale_unscaled.c: >> 1) there is overflow in dithering from 12-bit to 10-bit (output value > 1023); >> 2) for limit range the lower limit is not respected, for example from 10-bit to 8-bit value 64 is converted to 15; >> 3) for many iteration of downscale/upscale of the same image the 200th iteration is significantly darker. >> >> The first bug is due to wrong dithers table (now it is OK only for 8-bit destination), fix is: >> - const uint8_t *dither= dithers[src_depth-9][i&7];\ >> + const uint8_t *dither= dithers[src_depth-dst_depth-1][i&7];\ >> >> For bugs 2) and 3) it is needed formula that do not make images darker (in attachment). So please review. > does your code maintain white and black levels ? > with 4 bits white is 15, with 7 bits white is 127 for example > white should stay white > black should stay black > in both directions === short version === Yes, white goes to white, black goes to black. === long version === This DITHER_COPY macro is for case when destination bit-depth >= 8 and source bit-depth > destination bit-depth and the picture type is the same (for example yuv444p12 to yuv444p10). It is replacement for down-shift. For destination bit-depth < 8 it is needed real dither (not simplified like this DITHER_COPY). The formula before the patch was: dst = (src + dither) * scale >> shift where: dst - destination, src - source, dither - constant values from table 8x8, scale and shift are chosen to fulfill formula (in mathematics/floating point arithmetic): (max_value_src + max_value_dither) * scale / (2^shift) < max_value_dst + 1 This is due to lack of clip function (for speed reason). If you consider (scale / (2^shift)) number for example for conversion 10-bit to 8-bit, we have (scale / (2^shift)) < 1/4 because (1023 + 3) * 1/4 = 256.5 and 256.5 >= 255 + 1 (with 1/4 there will be overflow: 1023 go to 256 for dither > 0). So for src > 0 with 2 least significant bits = 0 and dither = 0 we have dst <= (src >> 2) - 1 which is the darker destination effect (it is unavoidable in this model). In my patch the number (scale / (2^shift)) == 1/4 for 10-bit to 8-bit example. This fix the darker destination effect but create possible overflow problem. In my patch the possible overflow go to sign bit of int32_t variable and is clipped to 0x7FFFFFFF. In my tests the new DITHER_COPY is a little bit slower than old but is much faster than real dither in x265. Mateusz
2017-03-15 22:52 GMT+01:00 Mateusz Brzostek <mateusz@msystem.waw.pl>: > Hello! > > There are 3 problems with DITHER_COPY macro in libswscale/swscale_unscaled.c: > 1) there is overflow in dithering from 12-bit to 10-bit (output value > 1023); > 2) for limit range the lower limit is not respected, for example from 10-bit to 8-bit value 64 is converted to 15; > 3) for many iteration of downscale/upscale of the same image the 200th iteration is significantly darker. > > The first bug is due to wrong dithers table (now it is OK only for 8-bit destination), fix is: > - const uint8_t *dither= dithers[src_depth-9][i&7];\ > + const uint8_t *dither= dithers[src_depth-dst_depth-1][i&7];\ I believe this implies that you could split your patch. Carl Eugen
diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c index ba3d688..429c98c 100644 --- a/libswscale/swscale_unscaled.c +++ b/libswscale/swscale_unscaled.c @@ -110,24 +110,6 @@ DECLARE_ALIGNED(8, static const uint8_t, dithers)[8][8][8]={ { 112, 16,104, 8,118, 22,110, 14,}, }}; -static const uint16_t dither_scale[15][16]={ -{ 2, 3, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,}, -{ 2, 3, 7, 7, 13, 13, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,}, -{ 3, 3, 4, 15, 15, 29, 57, 57, 57, 113, 113, 113, 113, 113, 113, 113,}, -{ 3, 4, 4, 5, 31, 31, 61, 121, 241, 241, 241, 241, 481, 481, 481, 481,}, -{ 3, 4, 5, 5, 6, 63, 63, 125, 249, 497, 993, 993, 993, 993, 993, 1985,}, -{ 3, 5, 6, 6, 6, 7, 127, 127, 253, 505, 1009, 2017, 4033, 4033, 4033, 4033,}, -{ 3, 5, 6, 7, 7, 7, 8, 255, 255, 509, 1017, 2033, 4065, 8129,16257,16257,}, -{ 3, 5, 6, 8, 8, 8, 8, 9, 511, 511, 1021, 2041, 4081, 8161,16321,32641,}, -{ 3, 5, 7, 8, 9, 9, 9, 9, 10, 1023, 1023, 2045, 4089, 8177,16353,32705,}, -{ 3, 5, 7, 8, 10, 10, 10, 10, 10, 11, 2047, 2047, 4093, 8185,16369,32737,}, -{ 3, 5, 7, 8, 10, 11, 11, 11, 11, 11, 12, 4095, 4095, 8189,16377,32753,}, -{ 3, 5, 7, 9, 10, 12, 12, 12, 12, 12, 12, 13, 8191, 8191,16381,32761,}, -{ 3, 5, 7, 9, 10, 12, 13, 13, 13, 13, 13, 13, 14,16383,16383,32765,}, -{ 3, 5, 7, 9, 10, 12, 14, 14, 14, 14, 14, 14, 14, 15,32767,32767,}, -{ 3, 5, 7, 9, 11, 12, 14, 15, 15, 15, 15, 15, 15, 15, 16,65535,}, -}; - static void fillPlane(uint8_t *plane, int stride, int width, int height, int y, uint8_t val) @@ -1485,22 +1467,22 @@ static int packedCopyWrapper(SwsContext *c, const uint8_t *src[], } #define DITHER_COPY(dst, dstStride, src, srcStride, bswap, dbswap)\ - uint16_t scale= dither_scale[dst_depth-1][src_depth-1];\ - int shift= src_depth-dst_depth + dither_scale[src_depth-2][dst_depth-1];\ + int scale= 31-src_depth, shift= 31-dst_depth;\ for (i = 0; i < height; i++) {\ - const uint8_t *dither= dithers[src_depth-9][i&7];\ + const uint8_t *dither= dithers[src_depth-dst_depth-1][i&7];\ for (j = 0; j < length-7; j+=8){\ - dst[j+0] = dbswap((bswap(src[j+0]) + dither[0])*scale>>shift);\ - dst[j+1] = dbswap((bswap(src[j+1]) + dither[1])*scale>>shift);\ - dst[j+2] = dbswap((bswap(src[j+2]) + dither[2])*scale>>shift);\ - dst[j+3] = dbswap((bswap(src[j+3]) + dither[3])*scale>>shift);\ - dst[j+4] = dbswap((bswap(src[j+4]) + dither[4])*scale>>shift);\ - dst[j+5] = dbswap((bswap(src[j+5]) + dither[5])*scale>>shift);\ - dst[j+6] = dbswap((bswap(src[j+6]) + dither[6])*scale>>shift);\ - dst[j+7] = dbswap((bswap(src[j+7]) + dither[7])*scale>>shift);\ + int32_t tmp = (bswap(src[j+0]) + (unsigned)dither[0])<<scale; tmp = (tmp|(tmp>>31)) & 0x7FFFFFFF; dst[j+0] = dbswap(tmp>>shift);\ + tmp = (bswap(src[j+1]) + (unsigned)dither[1])<<scale; tmp = (tmp|(tmp>>31)) & 0x7FFFFFFF; dst[j+1] = dbswap(tmp>>shift);\ + tmp = (bswap(src[j+2]) + (unsigned)dither[2])<<scale; tmp = (tmp|(tmp>>31)) & 0x7FFFFFFF; dst[j+2] = dbswap(tmp>>shift);\ + tmp = (bswap(src[j+3]) + (unsigned)dither[3])<<scale; tmp = (tmp|(tmp>>31)) & 0x7FFFFFFF; dst[j+3] = dbswap(tmp>>shift);\ + tmp = (bswap(src[j+4]) + (unsigned)dither[4])<<scale; tmp = (tmp|(tmp>>31)) & 0x7FFFFFFF; dst[j+4] = dbswap(tmp>>shift);\ + tmp = (bswap(src[j+5]) + (unsigned)dither[5])<<scale; tmp = (tmp|(tmp>>31)) & 0x7FFFFFFF; dst[j+5] = dbswap(tmp>>shift);\ + tmp = (bswap(src[j+6]) + (unsigned)dither[6])<<scale; tmp = (tmp|(tmp>>31)) & 0x7FFFFFFF; dst[j+6] = dbswap(tmp>>shift);\ + tmp = (bswap(src[j+7]) + (unsigned)dither[7])<<scale; tmp = (tmp|(tmp>>31)) & 0x7FFFFFFF; dst[j+7] = dbswap(tmp>>shift);\ + }\ + for (; j < length; j++){\ + int32_t tmp = (bswap(src[j]) + (unsigned)dither[j&7])<<scale; tmp = (tmp|(tmp>>31)) & 0x7FFFFFFF; dst[j] = dbswap(tmp>>shift);\ }\ - for (; j < length; j++)\ - dst[j] = dbswap((bswap(src[j]) + dither[j&7])*scale>>shift);\ dst += dstStride;\ src += srcStride;\ }