From patchwork Tue Feb 19 20:28:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Tomas_H=C3=A4rdin?= X-Patchwork-Id: 12107 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 01BF14482C9 for ; Tue, 19 Feb 2019 22:28:44 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D8CCF68A781; Tue, 19 Feb 2019 22:28:43 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail.acc.umu.se (mail.acc.umu.se [130.239.18.156]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C554968A0EA for ; Tue, 19 Feb 2019 22:28:37 +0200 (EET) Received: from localhost (localhost.localdomain [127.0.0.1]) by amavisd-new (Postfix) with ESMTP id 99C9444B9A for ; Tue, 19 Feb 2019 21:28:36 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=acc.umu.se; s=mail1; t=1550608116; bh=sLG/UMq50GFd2BLiVR4G3tTxRUs5uCMcUxzTrpNzfZ0=; h=Subject:From:To:Date:In-Reply-To:References:From; b=GdGimZLeuQXuXdX9uIwtEAWsFUUsEkvePjfhzF6x3wD9EdSVPK73/HrJcDu+/WJvW CGbaJ3Ql3Og6QYx7VPKbnIu08h0ImDx08RrJrPNLHFdXgiONK71zYv72GBlLhOGzAK Ue/Ky4FNauU/W2ppiL/6eYOkAt75GnMoSAoS0JOcIJpg1SB2Ku0fiGb9b1rgTiub1y ZJx0KHEx/CAByFQqer0nou6DSVhshIgbneF/xQmxKNoFWEtUue51QSzm9YVg92T2hj sUJwQXzVy04BtBGDKT/BJKJ6iiQl5bzAKFCx2lJ4kIxnJ4MOTg8vfPuIP8PV/GhxXF Y9+lUcPQw22fg== Received: from [192.168.1.226] (h-39-105.A258.priv.bahnhof.se [79.136.39.105]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: tjoppen) by mail.acc.umu.se (Postfix) with ESMTPSA id 4E44F44B97 for ; Tue, 19 Feb 2019 21:28:35 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=acc.umu.se; s=mail1; t=1550608115; bh=sLG/UMq50GFd2BLiVR4G3tTxRUs5uCMcUxzTrpNzfZ0=; h=Subject:From:To:Date:In-Reply-To:References:From; b=cz0ykYLXV2YDZc+BS0QAybZYEHnNLVfd1KtE670voZhYS0rd5atlWX+IZ9qfA3IF+ K5TIig6pqBJ/LBknTLel5Kl0XrR3/rSMCBpPBKiin0ZOwcdaFtVQz9rLG5Zy8iSzcQ c0q0UFwSTk2uxNoaKsWfvU/Zy0et3mjWX8Q6GeH1j0aE+F7NIT1Is0lm3XyAek7D9Y +3H6uNj/xk8Top6pDkJt0jXbWHBO9SISilXBIS9z1GJu7gyh5jR9pfpVK0Vi6x6/lF T3ENGyVtXqN9lKHXyaB2fsmGkJpZZeYg02HUwfGcn+RaMczD3FCpBZPtCw9cUm6yo5 lSpknk165mSjw== Message-ID: <1550608113.23660.4.camel@acc.umu.se> From: Tomas =?ISO-8859-1?Q?H=E4rdin?= To: FFmpeg development discussions and patches Date: Tue, 19 Feb 2019 21:28:33 +0100 In-Reply-To: <1549814670.27822.3.camel@acc.umu.se> References: <20190209131021.9959-1-matthew.w.fearnley@gmail.com> <20190209131021.9959-2-matthew.w.fearnley@gmail.com> <1549814670.27822.3.camel@acc.umu.se> X-Mailer: Evolution 3.22.6-1+deb9u1 Mime-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH 2/2] libavcodec/zmbvenc: motion estimation improvements/bug fixes: X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" sön 2019-02-10 klockan 17:04 +0100 skrev Tomas Härdin: > lör 2019-02-09 klockan 13:10 +0000 skrev Matthew Fearnley: > > - Clamp ME range to -64..63 (prevents corruption when me_range is too high) > > - Allow MV's up to *and including* the positive range limit > > - Allow out-of-edge ME by padding the prev buffer with a border of 0's > > - Try previous MV before checking the rest (improves speed in some cases) > > - More robust logic in code - ensure *mx,*my,*xored are updated together > > --- > >  libavcodec/zmbvenc.c | 64 +++++++++++++++++++++++++++++++------------- > >  1 file changed, 46 insertions(+), 18 deletions(-) > > Passes FATE > > The only maybe suspicious thing is this part: > > > -    c->pstride = FFALIGN(avctx->width, 16); > > -    if (!(c->prev = av_malloc(c->pstride * avctx->height))) { > > + > > +    /* Allocate prev buffer - leave border around the outside for out of edge ME */ > > +    c->pstride = FFALIGN(avctx->width + c->lrange, 16); > > Shouldn't this be with + lrange + urange? I guess it works out fine due > to wraparound and lrange >= urange, but it makes me feel slightly > uneasy > > > +    prev_offset = FFALIGN(c->lrange + (c->pstride * c->lrange), 16); > > +    prev_size = prev_offset + (c->pstride * (avctx->height + c->urange)); > > The way I'd do this is compute the size first, then the offset. But I > guess this works out the same way. Maybe someone else wants to chime > in? Pushed patch 1 and a version of this that I only just now noticed I got off-list. Attaching here for posterity. We might want a test for me_range >= 64 as well. /Tomas From 0ffbd9124e27bf15a66aa9e7b11004963423a9be Mon Sep 17 00:00:00 2001 From: Matthew Fearnley Date: Thu, 7 Feb 2019 12:54:59 +0000 Subject: [PATCH] libavcodec/zmbvenc: motion estimation improvements/bug fixes: - Clamp ME range to -64..63 (prevents corruption when me_range is too high) - Allow MV's up to *and including* the positive range limit - Allow out-of-edge ME by padding the prev buffer with a border of 0's - Try previous MV before checking the rest (improves speed in some cases) - More robust logic in code - ensure *mx,*my,*xored are updated together --- libavcodec/zmbvenc.c | 71 +++++++++++++++++++++++++++++++++----------- 1 file changed, 53 insertions(+), 18 deletions(-) diff --git a/libavcodec/zmbvenc.c b/libavcodec/zmbvenc.c index 3df6e724c8..c9d50b6adf 100644 --- a/libavcodec/zmbvenc.c +++ b/libavcodec/zmbvenc.c @@ -45,11 +45,11 @@ typedef struct ZmbvEncContext { AVCodecContext *avctx; - int range; + int lrange, urange; uint8_t *comp_buf, *work_buf; uint8_t pal[768]; uint32_t pal2[256]; //for quick comparisons - uint8_t *prev; + uint8_t *prev, *prev_buf; int pstride; int comp_size; int keyint, curfrm; @@ -61,7 +61,6 @@ typedef struct ZmbvEncContext { /** Block comparing function * XXX should be optimized and moved to DSPContext - * TODO handle out of edge ME */ static inline int block_cmp(ZmbvEncContext *c, uint8_t *src, int stride, uint8_t *src2, int stride2, int bw, int bh, @@ -100,23 +99,42 @@ static inline int block_cmp(ZmbvEncContext *c, uint8_t *src, int stride, static int zmbv_me(ZmbvEncContext *c, uint8_t *src, int sstride, uint8_t *prev, int pstride, int x, int y, int *mx, int *my, int *xored) { - int dx, dy, tx, ty, tv, bv, bw, bh; + int dx, dy, txored, tv, bv, bw, bh; + int mx0, my0; - *mx = *my = 0; + mx0 = *mx; + my0 = *my; bw = FFMIN(ZMBV_BLOCK, c->avctx->width - x); bh = FFMIN(ZMBV_BLOCK, c->avctx->height - y); + + /* Try (0,0) */ bv = block_cmp(c, src, sstride, prev, pstride, bw, bh, xored); + *mx = *my = 0; if(!bv) return 0; - for(ty = FFMAX(y - c->range, 0); ty < FFMIN(y + c->range, c->avctx->height - bh); ty++){ - for(tx = FFMAX(x - c->range, 0); tx < FFMIN(x + c->range, c->avctx->width - bw); tx++){ - if(tx == x && ty == y) continue; // we already tested this block - dx = tx - x; - dy = ty - y; - tv = block_cmp(c, src, sstride, prev + dx + dy * pstride, pstride, bw, bh, xored); + + /* Try previous block's MV (if not 0,0) */ + if (mx0 || my0){ + tv = block_cmp(c, src, sstride, prev + mx0 + my0 * pstride, pstride, bw, bh, &txored); + if(tv < bv){ + bv = tv; + *mx = mx0; + *my = my0; + *xored = txored; + if(!bv) return 0; + } + } + + /* Try other MVs from top-to-bottom, left-to-right */ + for(dy = -c->lrange; dy <= c->urange; dy++){ + for(dx = -c->lrange; dx <= c->urange; dx++){ + if(!dx && !dy) continue; // we already tested this block + if(dx == mx0 && dy == my0) continue; // this one too + tv = block_cmp(c, src, sstride, prev + dx + dy * pstride, pstride, bw, bh, &txored); if(tv < bv){ bv = tv; *mx = dx; *my = dy; + *xored = txored; if(!bv) return 0; } } @@ -181,7 +199,7 @@ FF_ENABLE_DEPRECATION_WARNINGS int x, y, bh2, bw2, xored; uint8_t *tsrc, *tprev; uint8_t *mv; - int mx, my; + int mx = 0, my = 0; bw = (avctx->width + ZMBV_BLOCK - 1) / ZMBV_BLOCK; bh = (avctx->height + ZMBV_BLOCK - 1) / ZMBV_BLOCK; @@ -269,7 +287,7 @@ static av_cold int encode_end(AVCodecContext *avctx) av_freep(&c->work_buf); deflateEnd(&c->zstream); - av_freep(&c->prev); + av_freep(&c->prev_buf); return 0; } @@ -283,6 +301,7 @@ static av_cold int encode_init(AVCodecContext *avctx) int zret; // Zlib return code int i; int lvl = 9; + int prev_size, prev_offset; /* Entropy-based score tables for comparing blocks. * Suitable for blocks up to (ZMBV_BLOCK * ZMBV_BLOCK) bytes. @@ -295,9 +314,13 @@ static av_cold int encode_init(AVCodecContext *avctx) c->curfrm = 0; c->keyint = avctx->keyint_min; - c->range = 8; - if(avctx->me_range > 0) - c->range = FFMIN(avctx->me_range, 127); + + /* Motion estimation range: maximum distance is -64..63 */ + c->lrange = c->urange = 8; + if(avctx->me_range > 0){ + c->lrange = FFMIN(avctx->me_range, 64); + c->urange = FFMIN(avctx->me_range, 63); + } if(avctx->compression_level >= 0) lvl = avctx->compression_level; @@ -323,11 +346,23 @@ static av_cold int encode_init(AVCodecContext *avctx) av_log(avctx, AV_LOG_ERROR, "Can't allocate compression buffer.\n"); return AVERROR(ENOMEM); } - c->pstride = FFALIGN(avctx->width, 16); - if (!(c->prev = av_malloc(c->pstride * avctx->height))) { + + /* Allocate prev buffer - pad around the image to allow out-of-edge ME: + * - The image should be padded with `lrange` rows before and `urange` rows + * after. + * - The stride should be padded with `lrange` pixels, then rounded up to a + * multiple of 16 bytes. + * - The first row should also be padded with `lrange` pixels before, then + * aligned up to a multiple of 16 bytes. + */ + c->pstride = FFALIGN(avctx->width + c->lrange, 16); + prev_size = FFALIGN(c->lrange, 16) + c->pstride * (c->lrange + avctx->height + c->urange); + prev_offset = FFALIGN(c->lrange, 16) + c->pstride * c->lrange; + if (!(c->prev_buf = av_mallocz(prev_size))) { av_log(avctx, AV_LOG_ERROR, "Can't allocate picture.\n"); return AVERROR(ENOMEM); } + c->prev = c->prev_buf + prev_offset; c->zstream.zalloc = Z_NULL; c->zstream.zfree = Z_NULL; -- 2.17.1