From patchwork Sat Sep 10 14:30:28 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 532 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.140.134 with SMTP id o128csp804753vsd; Sat, 10 Sep 2016 07:31:11 -0700 (PDT) X-Received: by 10.194.85.18 with SMTP id d18mr7450241wjz.43.1473517870979; Sat, 10 Sep 2016 07:31:10 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id y14si7549798wmd.117.2016.09.10.07.30.58; Sat, 10 Sep 2016 07:31:10 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 77151689ED9; Sat, 10 Sep 2016 17:30:37 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com [74.125.82.68]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CA26B689E96 for ; Sat, 10 Sep 2016 17:30:30 +0300 (EEST) Received: by mail-wm0-f68.google.com with SMTP id w12so6779653wmf.1 for ; Sat, 10 Sep 2016 07:30:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:date:message-id:in-reply-to:references; bh=KAR8M6dfAfspYyDSH8RTvNtGbLkh1xBNke8eOFxZ7qI=; b=u6Ibya4q/YprLWoSroDYIQLucqpzYgD5luSeWsKYzCBFwbjO1Y9JWSsLCILyldECg1 j7LbkPWBuJPR9ES3kXVReOaIEjnoRWdYbicGA/wmIK0VoyCm9txcCw8xO3uFa4sI2xVy Fo+AdRfwUUe8jdbqSUezloEGQqrQhBrfi6CJ1LWC0oorGjLQOExzeDKD+614gyjCXVcA 5EKKpFJlLIS6ueSBVWv+elJe3fOM/NXtPAyZ6QK9YGBETvTM1cVJvfBfRqmeRjSlKdTj IJDlAeinAtjILFfJRMWDPRlK5dVLts9awHVjVdff+QMXAuriT8BuHz36OowPslM6GPmy mIbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=KAR8M6dfAfspYyDSH8RTvNtGbLkh1xBNke8eOFxZ7qI=; b=gYTzbIb5EvamaWbCExJg9SjVAilvu3Y9RF2LLw234CuCsKxSrqnH8SaWdsF5jiGk+m PjhX4bV7B0RzXlHflv4Ai8f8EpKh5Ptv+i42zwzdqJncOEmQckpSLPHsxATB1K0VbQBy /cLug+3+RHH7+tXCVDZbIxEfvydY/I8rN6Mn1N8SRqk/kyZLOUxZ3E8gIw4tzKa1A3HW 2zvF0oUrN1mO3Ou9BMD5suxQs92l7ZAJoM4CftrlIV2QcYYGCrU+UABj7pVwgq4rFvrp 5mT/WdYEDhQSEJ0wZpUmG2QRwBBw/UEQmHpx4uH09K/5G2JqObwUqndSUOeKQyyb3lF3 2IlQ== X-Gm-Message-State: AE9vXwNBVm8tJbwq7EssXK/i22db3YVnXT5RkFsVYk2RQqbeUSrUnvtdBPC3rQtz1SldQw== X-Received: by 10.194.148.99 with SMTP id tr3mr8703030wjb.173.1473517841274; Sat, 10 Sep 2016 07:30:41 -0700 (PDT) Received: from computer.gigaset.lan (141-138-53-117.dsl.iskon.hr. [141.138.53.117]) by smtp.gmail.com with ESMTPSA id m133sm8580404wmg.0.2016.09.10.07.30.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 10 Sep 2016 07:30:40 -0700 (PDT) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Sat, 10 Sep 2016 16:30:28 +0200 Message-Id: <1473517828-8893-2-git-send-email-onemda@gmail.com> X-Mailer: git-send-email 2.5.0 In-Reply-To: <1473517828-8893-1-git-send-email-onemda@gmail.com> References: <1473517828-8893-1-git-send-email-onemda@gmail.com> Subject: [FFmpeg-devel] [PATCH 2/2] avfilter/vf_overlay: inline yuv output formats X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Overall speedup ~10-20% Signed-off-by: Paul B Mahol Tested-by: Michael on mingw32 mingw64 linux32 mips and arm --- libavfilter/vf_overlay.c | 251 +++++++++++++++++++++++++++-------------------- 1 file changed, 147 insertions(+), 104 deletions(-) diff --git a/libavfilter/vf_overlay.c b/libavfilter/vf_overlay.c index 177544e..78ced18 100644 --- a/libavfilter/vf_overlay.c +++ b/libavfilter/vf_overlay.c @@ -462,121 +462,160 @@ static void blend_image_packed_rgb(AVFilterContext *ctx, } } -static void blend_image_yuv(AVFilterContext *ctx, - AVFrame *dst, const AVFrame *src, - int x, int y) +static av_always_inline void blend_plane(AVFilterContext *ctx, + AVFrame *dst, const AVFrame *src, + int src_w, int src_h, + int dst_w, int dst_h, + int i, int hsub, int vsub, + int x, int y, + int main_has_alpha) { - OverlayContext *s = ctx->priv; - int i, imax, j, jmax, k, kmax; - const int src_w = src->width; - const int src_h = src->height; - const int dst_w = dst->width; - const int dst_h = dst->height; - const int main_has_alpha = s->main_has_alpha; - - if (main_has_alpha) { - uint8_t alpha; ///< the amount of overlay to blend on to main - uint8_t *s, *sa, *d, *da; - - i = FFMAX(-y, 0); - sa = src->data[3] + i * src->linesize[3]; - da = dst->data[3] + (y+i) * dst->linesize[3]; - - for (imax = FFMIN(-y + dst_h, src_h); i < imax; i++) { - j = FFMAX(-x, 0); - s = sa + j; - d = da + x+j; - - for (jmax = FFMIN(-x + dst_w, src_w); j < jmax; j++) { - alpha = *s; - if (alpha != 0 && alpha != 255) { - uint8_t alpha_d = *d; - alpha = UNPREMULTIPLY_ALPHA(alpha, alpha_d); - } - switch (alpha) { - case 0: - break; - case 255: - *d = *s; - break; - default: - // apply alpha compositing: main_alpha += (1-main_alpha) * overlay_alpha - *d += FAST_DIV255((255 - *d) * *s); - } - d += 1; - s += 1; - } - da += dst->linesize[3]; - sa += src->linesize[3]; - } - } - for (i = 0; i < 3; i++) { - int hsub = i ? s->hsub : 0; - int vsub = i ? s->vsub : 0; - int src_wp = AV_CEIL_RSHIFT(src_w, hsub); - int src_hp = AV_CEIL_RSHIFT(src_h, vsub); - int dst_wp = AV_CEIL_RSHIFT(dst_w, hsub); - int dst_hp = AV_CEIL_RSHIFT(dst_h, vsub); - int yp = y>>vsub; - int xp = x>>hsub; - uint8_t *s, *sp, *d, *dp, *a, *ap; - - j = FFMAX(-yp, 0); - sp = src->data[i] + j * src->linesize[i]; - dp = dst->data[i] + (yp+j) * dst->linesize[i]; - ap = src->data[3] + (j<linesize[3]; - - for (jmax = FFMIN(-yp + dst_hp, src_hp); j < jmax; j++) { - k = FFMAX(-xp, 0); - d = dp + xp+k; - s = sp + k; - a = ap + (k<>vsub; + int xp = x>>hsub; + uint8_t *s, *sp, *d, *dp, *a, *ap; + int jmax, j, k, kmax; + + j = FFMAX(-yp, 0); + sp = src->data[i] + j * src->linesize[i]; + dp = dst->data[i] + (yp+j) * dst->linesize[i]; + ap = src->data[3] + (j<linesize[3]; + + for (jmax = FFMIN(-yp + dst_hp, src_hp); j < jmax; j++) { + k = FFMAX(-xp, 0); + d = dp + xp+k; + s = sp + k; + a = ap + (k<linesize[3]] + + a[1] + a[src->linesize[3]+1]) >> 2; + } else if (hsub || vsub) { + alpha_h = hsub && k+1 < src_wp ? + (a[0] + a[1]) >> 1 : a[0]; + alpha_v = vsub && j+1 < src_hp ? + (a[0] + a[src->linesize[3]]) >> 1 : a[0]; + alpha = (alpha_v + alpha_h) >> 1; + } else + alpha = a[0]; + // if the main channel has an alpha channel, alpha has to be calculated + // to create an un-premultiplied (straight) alpha value + if (main_has_alpha && alpha != 0 && alpha != 255) { // average alpha for color components, improve quality + uint8_t alpha_d; if (hsub && vsub && j+1 < src_hp && k+1 < src_wp) { - alpha = (a[0] + a[src->linesize[3]] + - a[1] + a[src->linesize[3]+1]) >> 2; + alpha_d = (d[0] + d[src->linesize[3]] + + d[1] + d[src->linesize[3]+1]) >> 2; } else if (hsub || vsub) { alpha_h = hsub && k+1 < src_wp ? - (a[0] + a[1]) >> 1 : a[0]; + (d[0] + d[1]) >> 1 : d[0]; alpha_v = vsub && j+1 < src_hp ? - (a[0] + a[src->linesize[3]]) >> 1 : a[0]; - alpha = (alpha_v + alpha_h) >> 1; + (d[0] + d[src->linesize[3]]) >> 1 : d[0]; + alpha_d = (alpha_v + alpha_h) >> 1; } else - alpha = a[0]; - // if the main channel has an alpha channel, alpha has to be calculated - // to create an un-premultiplied (straight) alpha value - if (main_has_alpha && alpha != 0 && alpha != 255) { - // average alpha for color components, improve quality - uint8_t alpha_d; - if (hsub && vsub && j+1 < src_hp && k+1 < src_wp) { - alpha_d = (d[0] + d[src->linesize[3]] + - d[1] + d[src->linesize[3]+1]) >> 2; - } else if (hsub || vsub) { - alpha_h = hsub && k+1 < src_wp ? - (d[0] + d[1]) >> 1 : d[0]; - alpha_v = vsub && j+1 < src_hp ? - (d[0] + d[src->linesize[3]]) >> 1 : d[0]; - alpha_d = (alpha_v + alpha_h) >> 1; - } else - alpha_d = d[0]; - alpha = UNPREMULTIPLY_ALPHA(alpha, alpha_d); - } - *d = FAST_DIV255(*d * (255 - alpha) + *s * alpha); - s++; - d++; - a += 1 << hsub; + alpha_d = d[0]; + alpha = UNPREMULTIPLY_ALPHA(alpha, alpha_d); + } + *d = FAST_DIV255(*d * (255 - alpha) + *s * alpha); + s++; + d++; + a += 1 << hsub; + } + dp += dst->linesize[i]; + sp += src->linesize[i]; + ap += (1 << vsub) * src->linesize[3]; + } +} + +static inline void alpha_composite(const AVFrame *src, const AVFrame *dst, + int src_w, int src_h, + int dst_w, int dst_h, + int x, int y) +{ + uint8_t alpha; ///< the amount of overlay to blend on to main + uint8_t *s, *sa, *d, *da; + int i, imax, j, jmax; + + i = FFMAX(-y, 0); + sa = src->data[3] + i * src->linesize[3]; + da = dst->data[3] + (y+i) * dst->linesize[3]; + + for (imax = FFMIN(-y + dst_h, src_h); i < imax; i++) { + j = FFMAX(-x, 0); + s = sa + j; + d = da + x+j; + + for (jmax = FFMIN(-x + dst_w, src_w); j < jmax; j++) { + alpha = *s; + if (alpha != 0 && alpha != 255) { + uint8_t alpha_d = *d; + alpha = UNPREMULTIPLY_ALPHA(alpha, alpha_d); + } + switch (alpha) { + case 0: + break; + case 255: + *d = *s; + break; + default: + // apply alpha compositing: main_alpha += (1-main_alpha) * overlay_alpha + *d += FAST_DIV255((255 - *d) * *s); } - dp += dst->linesize[i]; - sp += src->linesize[i]; - ap += (1 << vsub) * src->linesize[3]; + d += 1; + s += 1; } + da += dst->linesize[3]; + sa += src->linesize[3]; } } +static av_always_inline void blend_image_yuv(AVFilterContext *ctx, + AVFrame *dst, const AVFrame *src, + int hsub, int vsub, + int main_has_alpha, + int x, int y) +{ + const int src_w = src->width; + const int src_h = src->height; + const int dst_w = dst->width; + const int dst_h = dst->height; + + if (main_has_alpha) + alpha_composite(src, dst, src_w, src_h, dst_w, dst_h, x, y); + + blend_plane(ctx, dst, src, src_w, src_h, dst_w, dst_h, 0, 0, 0, x, y, main_has_alpha); + blend_plane(ctx, dst, src, src_w, src_h, dst_w, dst_h, 1, hsub, vsub, x, y, main_has_alpha); + blend_plane(ctx, dst, src, src_w, src_h, dst_w, dst_h, 2, hsub, vsub, x, y, main_has_alpha); +} + +static void blend_image_yuv420(AVFilterContext *ctx, AVFrame *dst, const AVFrame *src, int x, int y) +{ + OverlayContext *s = ctx->priv; + + blend_image_yuv(ctx, dst, src, 1, 1, s->main_has_alpha, x, y); +} + +static void blend_image_yuv422(AVFilterContext *ctx, AVFrame *dst, const AVFrame *src, int x, int y) +{ + OverlayContext *s = ctx->priv; + + blend_image_yuv(ctx, dst, src, 1, 0, s->main_has_alpha, x, y); +} + +static void blend_image_yuv444(AVFilterContext *ctx, AVFrame *dst, const AVFrame *src, int x, int y) +{ + OverlayContext *s = ctx->priv; + + blend_image_yuv(ctx, dst, src, 0, 0, s->main_has_alpha, x, y); +} + static int config_input_main(AVFilterLink *inlink) { OverlayContext *s = inlink->dst->priv; @@ -592,9 +631,13 @@ static int config_input_main(AVFilterLink *inlink) s->main_has_alpha = ff_fmt_is_in(inlink->format, alpha_pix_fmts); switch (s->format) { case OVERLAY_FORMAT_YUV420: + s->blend_image = blend_image_yuv420; + break; case OVERLAY_FORMAT_YUV422: + s->blend_image = blend_image_yuv422; + break; case OVERLAY_FORMAT_YUV444: - s->blend_image = blend_image_yuv; + s->blend_image = blend_image_yuv444; break; case OVERLAY_FORMAT_RGB: s->blend_image = blend_image_packed_rgb;