From patchwork Fri Jan 6 13:13:41 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: u-9iep@aetey.se X-Patchwork-Id: 2074 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.89.21 with SMTP id n21csp5828635vsb; Fri, 6 Jan 2017 05:14:11 -0800 (PST) X-Received: by 10.28.6.147 with SMTP id 141mr3939510wmg.98.1483708451787; Fri, 06 Jan 2017 05:14:11 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id lc2si5608842wjb.18.2017.01.06.05.14.11; Fri, 06 Jan 2017 05:14:11 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@fripost.org; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A18F468A24B; Fri, 6 Jan 2017 15:14:03 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from outgoing.fripost.org (giraff.fripost.org [178.16.208.44]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8D81B689DE7 for ; Fri, 6 Jan 2017 15:13:57 +0200 (EET) Received: from localhost (localhost [127.0.0.1]) by outgoing.fripost.org (Postfix) with ESMTP id 1F922A06C6B for ; Fri, 6 Jan 2017 14:14:03 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fripost.org; h= content-disposition:content-type:content-type:mime-version :message-id:subject:subject:from:from:date:date; s=20140703; t= 1483708442; x=1485522843; bh=iDWOOQ6wa+iROcdmKGiHBNy/oXRmWhdSuDV 0Um0pZ5s=; b=OC2ZuypxVUbwmiZ61z59C/UZGlRdqY2kB3J7lfw9aCJmuFNtDSM yIRp5Eyiq7yoOf76FYRMSMK1bl3ZwQ8vr5psUgus/SWH1IEcpfwSrXrxFUgjmwvF keH1+E3pRr3u+llgHYPJ8bQ3rQAMX4AEotWHvcRc5xcoqD9DL9QvTr4Q= X-Virus-Scanned: Debian amavisd-new at fripost.org Received: from outgoing.fripost.org ([127.0.0.1]) by localhost (giraff.fripost.org [127.0.0.1]) (amavisd-new, port 10040) with LMTP id r7zdkoXCUoEz for ; Fri, 6 Jan 2017 14:14:02 +0100 (CET) Received: from smtp.fripost.org (unknown [172.16.0.6]) by outgoing.fripost.org (Postfix) with ESMTP id EAE61A06C66 for ; Fri, 6 Jan 2017 14:14:02 +0100 (CET) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) by smtp.fripost.org (Postfix) with ESMTPSA id 6FF7028E028D for ; Fri, 6 Jan 2017 14:13:56 +0100 (CET) Received: (qmail 22815 invoked from network); 6 Jan 2017 12:53:50 -0000 Received: from localhost (HELO aetey.se) (eh1ba719@127.0.0.1) by mail with ESMTPA; 6 Jan 2017 12:53:50 -0000 Date: Fri, 6 Jan 2017 14:13:41 +0100 From: u-9iep@aetey.se To: FFmpeg development discussions and patches Message-ID: <20170106131341.GO5271@example.net> MIME-Version: 1.0 Content-Disposition: inline Subject: [FFmpeg-devel] patch: the fastest video decoder ever? :) X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Hello, On slow hardware a 16-bit framebuffer depth makes a remarkable difference in performance and still provides a good look. When videos are to be played on slow terminals there is a single best speed performer, cinepak (tell me if you know a better codec in this respect, the only comparable one I know of is VQA-15, but there is no reasonable encoder for it, nor a safe decoder). Unfortunately cinepak as present in ffmpeg yields rgb24 which has to be repacked depending on the framebuffer format. The attached patch makes cinepak to provide rgb565 in native endianness. This cuts in half (!) the mplayer CPU consumption on i686 with Xorg in the corresponding mode/depth, compared to decoding to rgb24. The optimization is possible because cinepak is a VC codec and pixel format packing at codebook fill is much more efficient than at a later stage. (Thanks Michael for the suggestion, long ago). Of course the picture quality is slightly affected if this decoder version is used with larger framebuffer depths. A run-time switch would be to prefer, if this optimization against all odds would be welcome upstream :) I am posting this as a proof of concept, lacking run-time format switching. As said, this patch yields half CPU consumption or double speed at decoding for rgb565 terminals. Regards, Rune --- libavcodec/cinepak.c.rgb24 2017-01-05 14:43:53.379965430 +0100 +++ libavcodec/cinepak.c 2017-01-05 21:54:04.333090225 +0100 @@ -30,7 +30,9 @@ * http://wiki.multimedia.cx/index.php?title=Sega_FILM * * Cinepak colorspace support (c) 2013 Rl, Aetey Global Technologies AB + * Cinepak rgb565 format (c) 2017 Rl, Aetey Global Technologies AB * @author Cinepak colorspace, Rl, Aetey Global Technologies AB + * @author Cinepak rgb565, Rl, Aetey Global Technologies AB */ #include @@ -42,8 +44,12 @@ #include "avcodec.h" #include "internal.h" +/* feel free to replace with a better mapping implementation + * (keeping in mind slow, not very "intelligent" hardware) + */ +#define PACK_RGB_RGB565(r,g,b) ((((r)>>3)<<11)|(((g)>>2)<<5)|(b>>3)) -typedef uint8_t cvid_codebook[12]; +typedef uint16_t cvid_codebook[4]; /* in the native endian rgb565 format */ #define MAX_STRIPS 32 @@ -73,13 +79,13 @@ uint32_t pal[256]; } CinepakContext; -static void cinepak_decode_codebook (cvid_codebook *codebook, +static void cinepak_decode_codebook (cvid_codebook *codebook, int palette_video, int chunk_id, int size, const uint8_t *data) { const uint8_t *eod = (data + size); uint32_t flag, mask; int i, n; - uint8_t *p; + uint16_t *p; /* check if this chunk contains 4- or 6-element vectors */ n = (chunk_id & 0x04) ? 4 : 6; @@ -98,33 +104,36 @@ } if (!(chunk_id & 0x01) || (flag & mask)) { - int k, kk; + int k; if ((data + n) > eod) break; - for (k = 0; k < 4; ++k) { - int r = *data++; - for (kk = 0; kk < 3; ++kk) - *p++ = r; - } - if (n == 6) { - int r, g, b, u, v; + if (n == 4) +/* 8-bit palette indexes or otherwise grayscale values, + * they need different handling */ + if (palette_video) + for (k = 0; k < 4; ++k) + *p++ = *data++; + else + for (k = 0; k < 4; ++k) { + int r = *data++; + *p++ = PACK_RGB_RGB565(r,r,r); + } + else { + int y[4], u, v; + for (k = 0; k < 4; ++k) +/* 8-bit luminance values */ + y[k] = *data++; u = *(int8_t *)data++; v = *(int8_t *)data++; - p -= 12; - for(k=0; k<4; ++k) { - r = *p++ + v*2; - g = *p++ - (u/2) - v; - b = *p + u*2; - p -= 2; - *p++ = av_clip_uint8(r); - *p++ = av_clip_uint8(g); - *p++ = av_clip_uint8(b); - } + for(k=0; k<4; ++k) + *p++ = PACK_RGB_RGB565(av_clip_uint8(y[k] + v*2), + av_clip_uint8(y[k] - (u/2) - v), + av_clip_uint8(y[k] + u*2)); } } else { - p += 12; + p += 4; } } } @@ -134,8 +143,8 @@ { const uint8_t *eod = (data + size); uint32_t flag, mask; - uint8_t *cb0, *cb1, *cb2, *cb3; - int x, y; + uint16_t *cb0, *cb1, *cb2, *cb3; + int x, y; char *ip0, *ip1, *ip2, *ip3; flag = 0; @@ -145,7 +154,7 @@ /* take care of y dimension not being multiple of 4, such streams exist */ ip0 = ip1 = ip2 = ip3 = s->frame->data[0] + - (s->palette_video?strip->x1:strip->x1*3) + (y * s->frame->linesize[0]); + (s->palette_video?strip->x1:strip->x1*2) + (y * s->frame->linesize[0]); if(s->avctx->height - y > 1) { ip1 = ip0 + s->frame->linesize[0]; if(s->avctx->height - y > 2) { @@ -181,29 +190,25 @@ } if ((chunk_id & 0x02) || (~flag & mask)) { - uint8_t *p; + uint16_t *p; if (data >= eod) return AVERROR_INVALIDDATA; p = strip->v1_codebook[*data++]; if (s->palette_video) { - ip3[0] = ip3[1] = ip2[0] = ip2[1] = p[6]; - ip3[2] = ip3[3] = ip2[2] = ip2[3] = p[9]; + ip3[0] = ip3[1] = ip2[0] = ip2[1] = p[2]; + ip3[2] = ip3[3] = ip2[2] = ip2[3] = p[3]; ip1[0] = ip1[1] = ip0[0] = ip0[1] = p[0]; - ip1[2] = ip1[3] = ip0[2] = ip0[3] = p[3]; + ip1[2] = ip1[3] = ip0[2] = ip0[3] = p[1]; } else { - p += 6; - memcpy(ip3 + 0, p, 3); memcpy(ip3 + 3, p, 3); - memcpy(ip2 + 0, p, 3); memcpy(ip2 + 3, p, 3); - p += 3; /* ... + 9 */ - memcpy(ip3 + 6, p, 3); memcpy(ip3 + 9, p, 3); - memcpy(ip2 + 6, p, 3); memcpy(ip2 + 9, p, 3); - p -= 9; /* ... + 0 */ - memcpy(ip1 + 0, p, 3); memcpy(ip1 + 3, p, 3); - memcpy(ip0 + 0, p, 3); memcpy(ip0 + 3, p, 3); - p += 3; /* ... + 3 */ - memcpy(ip1 + 6, p, 3); memcpy(ip1 + 9, p, 3); - memcpy(ip0 + 6, p, 3); memcpy(ip0 + 9, p, 3); + * (uint16_t *)ip3 = *((uint16_t *)ip3+1) = + * (uint16_t *)ip2 = *((uint16_t *)ip2+1) = p[2]; + *((uint16_t *)ip3+2) = *((uint16_t *)ip3+3) = + *((uint16_t *)ip2+2) = *((uint16_t *)ip2+3) = p[3]; + * (uint16_t *)ip1 = *((uint16_t *)ip1+1) = + * (uint16_t *)ip0 = *((uint16_t *)ip0+1) = p[0]; + *((uint16_t *)ip1+2) = *((uint16_t *)ip1+3) = + *((uint16_t *)ip0+2) = *((uint16_t *)ip0+3) = p[1]; } } else if (flag & mask) { @@ -217,34 +222,34 @@ if (s->palette_video) { uint8_t *p; p = ip3; - *p++ = cb2[6]; - *p++ = cb2[9]; - *p++ = cb3[6]; - *p = cb3[9]; + *p++ = cb2[2]; + *p++ = cb2[3]; + *p++ = cb3[2]; + *p = cb3[3]; p = ip2; *p++ = cb2[0]; - *p++ = cb2[3]; + *p++ = cb2[1]; *p++ = cb3[0]; - *p = cb3[3]; + *p = cb3[1]; p = ip1; - *p++ = cb0[6]; - *p++ = cb0[9]; - *p++ = cb1[6]; - *p = cb1[9]; + *p++ = cb0[2]; + *p++ = cb0[3]; + *p++ = cb1[2]; + *p = cb1[3]; p = ip0; *p++ = cb0[0]; - *p++ = cb0[3]; + *p++ = cb0[1]; *p++ = cb1[0]; - *p = cb1[3]; + *p = cb1[1]; } else { - memcpy(ip3 + 0, cb2 + 6, 6); - memcpy(ip3 + 6, cb3 + 6, 6); - memcpy(ip2 + 0, cb2 + 0, 6); - memcpy(ip2 + 6, cb3 + 0, 6); - memcpy(ip1 + 0, cb0 + 6, 6); - memcpy(ip1 + 6, cb1 + 6, 6); - memcpy(ip0 + 0, cb0 + 0, 6); - memcpy(ip0 + 6, cb1 + 0, 6); + memcpy(ip3 + 0, cb2 + 2, 4); + memcpy(ip3 + 4, cb3 + 2, 4); + memcpy(ip2 + 0, cb2 + 0, 4); + memcpy(ip2 + 4, cb3 + 0, 4); + memcpy(ip1 + 0, cb0 + 2, 4); + memcpy(ip1 + 4, cb1 + 2, 4); + memcpy(ip0 + 0, cb0 + 0, 4); + memcpy(ip0 + 4, cb1 + 0, 4); } } @@ -254,8 +259,8 @@ ip0 += 4; ip1 += 4; ip2 += 4; ip3 += 4; } else { - ip0 += 12; ip1 += 12; - ip2 += 12; ip3 += 12; + ip0 += 8; ip1 += 8; + ip2 += 8; ip3 += 8; } } } @@ -290,16 +295,16 @@ case 0x21: case 0x24: case 0x25: - cinepak_decode_codebook (strip->v4_codebook, chunk_id, - chunk_size, data); + cinepak_decode_codebook (strip->v4_codebook, s->palette_video, + chunk_id, chunk_size, data); break; case 0x22: case 0x23: case 0x26: case 0x27: - cinepak_decode_codebook (strip->v1_codebook, chunk_id, - chunk_size, data); + cinepak_decode_codebook (strip->v1_codebook, s->palette_video, + chunk_id, chunk_size, data); break; case 0x30: @@ -412,10 +417,16 @@ s->sega_film_skip_bytes = -1; /* uninitialized state */ - // check for paletted data + /* check for paletted data */ if (avctx->bits_per_coded_sample != 8) { s->palette_video = 0; - avctx->pix_fmt = AV_PIX_FMT_RGB24; +/* we choose to produce video data in a certain 16-bit format, + * for the sake of video output on slow hardware with this native format + * FIXME: this should be made selectable via an option, not hardcoded, + * FIXME: to make the same binary usable with different hardware, + * FIXME: for the moment we make this a compile-time choice by using + * FIXME: a certain version of the cinepak.c file */ + avctx->pix_fmt = AV_PIX_FMT_RGB565; } else { s->palette_video = 1; avctx->pix_fmt = AV_PIX_FMT_PAL8;