From patchwork Tue May 1 08:02:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 8709 Delivered-To: ffmpegpatchwork@gmail.com Received: by 2002:a02:155:0:0:0:0:0 with SMTP id c82-v6csp4317113jad; Tue, 1 May 2018 01:08:51 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpFL4XgODsjmz1wIDIGWFby7hMD33NhK074XGYPbgXSTs0nw8/s++Q7bBHr8G2EoadKB4Ga X-Received: by 2002:adf:84c3:: with SMTP id 61-v6mr11432576wrg.37.1525162131732; Tue, 01 May 2018 01:08:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525162131; cv=none; d=google.com; s=arc-20160816; b=O7GiV7twsBDRaESBeEMVLAJshbVN8QtBhalu3ycCHGrHpQ01OkBe585E/CMey4i8ti YLYwMqo5vJZENtt3rzGnO2cs2EB9fGR78TzgnRYqZPwsW2l+Gfg/zaYKUgkZPdFGTYNy pB/TJaYn0NsmH2PO5DtMPOjyyXL2+mIc/l0U9/mcZotKE2PPlUYH1vceLkFrWg4aKau3 3TGzMsB3wkZMsZsUUXiI6IhMUAWQOAPkBZlAPPpxJo1oCQSDA2d0S3NWYC/Kvez35p/7 WpKUyeKko/jVKS/DGn+cGYtgGZRvbB/9tGbVm2qPjmau1+1RgwVdg1gq5WOEKPOmht1R 8Bdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=w35bSwPLcHMVgYe7mNiZoRBBlY1Bn1y7x4jQzoqWTys=; b=NLLbXgq3PkMn1eG7e1pNHNBMHXbqVUtcARe95TRvZpDg/HfqamTs9OWae6uhGQLc9b t6Q03dBo5Et2ZPCWqErmc4oDM34bj5uuWiC1KE+wO0GOxpX5hLoIwgpz1r40dSf9WhSa VKAnokgOdtzG17tCt9lGWprI9C4391XLFje2//7DUZW9aM7KUK/eG2lT2on/yx5+++PC KL9mZEMYqGgCQArC0GPwNmHLAR+xRtpzp3QoJfY5wJ5OWE70nnt6LMqNXqBTKktMntlH Pio1bFBuTFNllni+I8ADmmQDcT8YaYMjU424LbdYB478YcfRLyjM+lorjPnN+n7Qxlfv 4CxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=jeD0TBz7; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id k74si173950wmc.202.2018.05.01.01.08.50; Tue, 01 May 2018 01:08:51 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=jeD0TBz7; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8234668A346; Tue, 1 May 2018 11:08:16 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr0-f193.google.com (mail-wr0-f193.google.com [209.85.128.193]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 597DF68A322 for ; Tue, 1 May 2018 11:08:09 +0300 (EEST) Received: by mail-wr0-f193.google.com with SMTP id c14-v6so10168555wrd.4 for ; Tue, 01 May 2018 01:08:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=9PsPuwSpbT33Maot/NolFJv0AwysOnru76a214YtfFE=; b=jeD0TBz7K797xiqijOzNc450lNn/jYYR2hFQnK279BQxd4TcOO0UcrsGqf57ryyJQo l9uRONE+EsnYZ6JJZZ9lHmqGnfLXhTDq8vD3YpcyP1hF+Dg7TJEzgMDI9GX6eTGW6R0J BGMlX3Uy8ag7h3z74TPVpbhsflpqYn5jzBvnNtT2qhGd62xtb+2PkgkWkm4hOAa7EA9Y H6i5YaUZMxupddLAllfKNlmim2gb/OCffFR8QiG7+1ByLfiRSABu6iADFil463FpGNp4 IVpLRvtvm6IH3xWZXKkbnGuMc3XLi7kRQ4lhBBwZEBU9ROstzpl1mx+bsyjXv2ynkwef wWVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=9PsPuwSpbT33Maot/NolFJv0AwysOnru76a214YtfFE=; b=n+Hypry0Ri55YNx3aAqGyL7H/UKJau3EBGZvETJpeVIGrmqYV34XRd+OOumCInx3tj 07+YGQyHLOAs/obd3HBAe2eGbcVIPy10M4C5dmoWZOeB2s26Cb2q6JCJia1QLnrhwFsS 66T3yWDyu/0gRmyloFKHDJD7104zFFKLeSnjTRAhpbXDGmpBl9JNCHqjtysTG0i3VfVQ BmiWDkXncNvkz2XODYWcnqPp8rcIaoYbhJNYibfSNj2pdQ/wF3wyxG3JmptfK3lSJ9/w nkpO6uybiCNCCyEhdmEaoFncOuBeuF36nA+KXlGbMcMAM6h7S1anhTJqL9hiuLR8mzU6 ESkQ== X-Gm-Message-State: ALQs6tAW6L6qPMWdb5M1PjtwPVBAiz/sjOVWX7cYmQ6a6p6Lotv6Enq+ D01hEwUUIHxDUWu4FNFOufsrRg== X-Received: by 2002:adf:b972:: with SMTP id b47-v6mr10803217wrg.238.1525161780485; Tue, 01 May 2018 01:03:00 -0700 (PDT) Received: from localhost.localdomain ([94.250.174.60]) by smtp.gmail.com with ESMTPSA id o10-v6sm8664501wrg.90.2018.05.01.01.02.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 May 2018 01:03:00 -0700 (PDT) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Tue, 1 May 2018 10:02:21 +0200 Message-Id: <20180501080221.31362-1-onemda@gmail.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180430161740.3688-1-onemda@gmail.com> References: <20180430161740.3688-1-onemda@gmail.com> Subject: [FFmpeg-devel] [PATCH] avfilter/vf_overlay: add x86 SIMD X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Specifically for yuv444, yuv422, yuv420 format when main stream has no alpha, and alpha is straight. Signed-off-by: Paul B Mahol --- libavfilter/vf_overlay.c | 75 +++++------------- libavfilter/vf_overlay.h | 85 +++++++++++++++++++++ libavfilter/x86/Makefile | 2 + libavfilter/x86/vf_overlay.asm | 157 ++++++++++++++++++++++++++++++++++++++ libavfilter/x86/vf_overlay_init.c | 63 +++++++++++++++ 5 files changed, 326 insertions(+), 56 deletions(-) create mode 100644 libavfilter/vf_overlay.h create mode 100644 libavfilter/x86/vf_overlay.asm create mode 100644 libavfilter/x86/vf_overlay_init.c diff --git a/libavfilter/vf_overlay.c b/libavfilter/vf_overlay.c index 8c1895cca4..c4d87306f1 100644 --- a/libavfilter/vf_overlay.c +++ b/libavfilter/vf_overlay.c @@ -39,6 +39,7 @@ #include "drawutils.h" #include "framesync.h" #include "video.h" +#include "vf_overlay.h" typedef struct ThreadData { AVFrame *dst, *src; @@ -59,21 +60,6 @@ static const char *const var_names[] = { NULL }; -enum var_name { - VAR_MAIN_W, VAR_MW, - VAR_MAIN_H, VAR_MH, - VAR_OVERLAY_W, VAR_OW, - VAR_OVERLAY_H, VAR_OH, - VAR_HSUB, - VAR_VSUB, - VAR_X, - VAR_Y, - VAR_N, - VAR_POS, - VAR_T, - VAR_VARS_NB -}; - #define MAIN 0 #define OVERLAY 1 @@ -92,45 +78,6 @@ enum EvalMode { EVAL_MODE_NB }; -enum OverlayFormat { - OVERLAY_FORMAT_YUV420, - OVERLAY_FORMAT_YUV422, - OVERLAY_FORMAT_YUV444, - OVERLAY_FORMAT_RGB, - OVERLAY_FORMAT_GBRP, - OVERLAY_FORMAT_AUTO, - OVERLAY_FORMAT_NB -}; - -typedef struct OverlayContext { - const AVClass *class; - int x, y; ///< position of overlaid picture - - uint8_t main_is_packed_rgb; - uint8_t main_rgba_map[4]; - uint8_t main_has_alpha; - uint8_t overlay_is_packed_rgb; - uint8_t overlay_rgba_map[4]; - uint8_t overlay_has_alpha; - int format; ///< OverlayFormat - int alpha_format; - int eval_mode; ///< EvalMode - - FFFrameSync fs; - - int main_pix_step[4]; ///< steps per pixel for each plane of the main output - int overlay_pix_step[4]; ///< steps per pixel for each plane of the overlay - int hsub, vsub; ///< chroma subsampling values - const AVPixFmtDescriptor *main_desc; ///< format descriptor for main input - - double var_values[VAR_VARS_NB]; - char *x_expr, *y_expr; - - AVExpr *x_pexpr, *y_pexpr; - - int (*blend_slice)(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs); -} OverlayContext; - static av_cold void uninit(AVFilterContext *ctx) { OverlayContext *s = ctx->priv; @@ -509,6 +456,7 @@ static av_always_inline void blend_plane(AVFilterContext *ctx, int jobnr, int nb_jobs) { + OverlayContext *octx = ctx->priv; int src_wp = AV_CEIL_RSHIFT(src_w, hsub); int src_hp = AV_CEIL_RSHIFT(src_h, vsub); int dst_wp = AV_CEIL_RSHIFT(dst_w, hsub); @@ -538,8 +486,18 @@ static av_always_inline void blend_plane(AVFilterContext *ctx, s = sp + k; a = ap + (k<blend_row[i]) { + int c = octx->blend_row[i](d, da, s, a, kmax - k, src->linesize[3]); - for (kmax = FFMIN(-xp + dst_wp, src_wp); k < kmax; k++) { + s += c; + d += dst_step * c; + da += (1 << hsub) * c; + a += (1 << hsub) * c; + k += c; + } + for (; k < kmax; k++) { int alpha_v, alpha_h, alpha; // average alpha for color components, improve quality @@ -916,7 +874,7 @@ static int config_input_main(AVFilterLink *inlink) } if (!s->alpha_format) - return 0; + goto end; switch (s->format) { case OVERLAY_FORMAT_YUV420: @@ -960,6 +918,11 @@ static int config_input_main(AVFilterLink *inlink) } break; } + +end: + if (ARCH_X86) + ff_overlay_init_x86(s, s->format, s->alpha_format, s->main_has_alpha); + return 0; } diff --git a/libavfilter/vf_overlay.h b/libavfilter/vf_overlay.h new file mode 100644 index 0000000000..072ece358f --- /dev/null +++ b/libavfilter/vf_overlay.h @@ -0,0 +1,85 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVFILTER_OVERLAY_H +#define AVFILTER_OVERLAY_H + +#include "libavutil/eval.h" +#include "libavutil/pixdesc.h" +#include "framesync.h" +#include "avfilter.h" + +enum var_name { + VAR_MAIN_W, VAR_MW, + VAR_MAIN_H, VAR_MH, + VAR_OVERLAY_W, VAR_OW, + VAR_OVERLAY_H, VAR_OH, + VAR_HSUB, + VAR_VSUB, + VAR_X, + VAR_Y, + VAR_N, + VAR_POS, + VAR_T, + VAR_VARS_NB +}; + +enum OverlayFormat { + OVERLAY_FORMAT_YUV420, + OVERLAY_FORMAT_YUV422, + OVERLAY_FORMAT_YUV444, + OVERLAY_FORMAT_RGB, + OVERLAY_FORMAT_GBRP, + OVERLAY_FORMAT_AUTO, + OVERLAY_FORMAT_NB +}; + +typedef struct OverlayContext { + const AVClass *class; + int x, y; ///< position of overlaid picture + + uint8_t main_is_packed_rgb; + uint8_t main_rgba_map[4]; + uint8_t main_has_alpha; + uint8_t overlay_is_packed_rgb; + uint8_t overlay_rgba_map[4]; + uint8_t overlay_has_alpha; + int format; ///< OverlayFormat + int alpha_format; + int eval_mode; ///< EvalMode + + FFFrameSync fs; + + int main_pix_step[4]; ///< steps per pixel for each plane of the main output + int overlay_pix_step[4]; ///< steps per pixel for each plane of the overlay + int hsub, vsub; ///< chroma subsampling values + const AVPixFmtDescriptor *main_desc; ///< format descriptor for main input + + double var_values[VAR_VARS_NB]; + char *x_expr, *y_expr; + + AVExpr *x_pexpr, *y_pexpr; + + int (*blend_row[4])(uint8_t *d, uint8_t *da, uint8_t *s, uint8_t *a, int w, + ptrdiff_t alinesize); + int (*blend_slice)(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs); +} OverlayContext; + +void ff_overlay_init_x86(OverlayContext *s, int format, int alpha_format, int main_has_alpha); + +#endif /* AVFILTER_OVERLAY_H */ diff --git a/libavfilter/x86/Makefile b/libavfilter/x86/Makefile index f60de3b73b..b484c8bd1c 100644 --- a/libavfilter/x86/Makefile +++ b/libavfilter/x86/Makefile @@ -13,6 +13,7 @@ OBJS-$(CONFIG_INTERLACE_FILTER) += x86/vf_tinterlace_init.o OBJS-$(CONFIG_LIMITER_FILTER) += x86/vf_limiter_init.o OBJS-$(CONFIG_MASKEDMERGE_FILTER) += x86/vf_maskedmerge_init.o OBJS-$(CONFIG_NOISE_FILTER) += x86/vf_noise.o +OBJS-$(CONFIG_OVERLAY_FILTER) += x86/vf_overlay_init.o OBJS-$(CONFIG_PP7_FILTER) += x86/vf_pp7_init.o OBJS-$(CONFIG_PSNR_FILTER) += x86/vf_psnr_init.o OBJS-$(CONFIG_PULLUP_FILTER) += x86/vf_pullup_init.o @@ -41,6 +42,7 @@ X86ASM-OBJS-$(CONFIG_IDET_FILTER) += x86/vf_idet.o X86ASM-OBJS-$(CONFIG_INTERLACE_FILTER) += x86/vf_interlace.o X86ASM-OBJS-$(CONFIG_LIMITER_FILTER) += x86/vf_limiter.o X86ASM-OBJS-$(CONFIG_MASKEDMERGE_FILTER) += x86/vf_maskedmerge.o +X86ASM-OBJS-$(CONFIG_OVERLAY_FILTER) += x86/vf_overlay.o X86ASM-OBJS-$(CONFIG_PP7_FILTER) += x86/vf_pp7.o X86ASM-OBJS-$(CONFIG_PSNR_FILTER) += x86/vf_psnr.o X86ASM-OBJS-$(CONFIG_PULLUP_FILTER) += x86/vf_pullup.o diff --git a/libavfilter/x86/vf_overlay.asm b/libavfilter/x86/vf_overlay.asm new file mode 100644 index 0000000000..d639cce9e5 --- /dev/null +++ b/libavfilter/x86/vf_overlay.asm @@ -0,0 +1,157 @@ +;***************************************************************************** +;* x86-optimized functions for overlay filter +;* +;* Copyright (C) 2018 Paul B Mahol +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;***************************************************************************** + +%include "libavutil/x86/x86util.asm" + +SECTION_RODATA + +pw_128: times 8 dw 128 +pw_255: times 8 dw 255 +pw_257: times 8 dw 257 +pw_65280: times 8 dw 65280 + +SECTION .text + +INIT_XMM sse4 +cglobal overlay_row_44, 6, 8, 6, 0, d, da, s, a, w, alinesize, r, x + xor xq, xq + movsxdifnidn wq, wd + mov rq, wq + and rq, mmsize/2 - 1 + cmp wq, mmsize/2 + jl .end + sub wq, rq + mova m3, [pw_255] + mova m4, [pw_128] + mova m5, [pw_257] + .loop0: + pmovzxbw m0, [sq+xq] + pmovzxbw m2, [aq+xq] + pmovzxbw m1, [dq+xq] + pmullw m0, m2 + pxor m2, m3 + pmullw m1, m2 + paddw m0, m4 + paddw m0, m1 + pmulhuw m0, m5 + packuswb m0, m0 + movq [dq+xq], m0 + add xq, mmsize/2 + cmp xq, wq + jl .loop0 + + .end: + mov eax, xd + RET + +INIT_XMM sse4 +cglobal overlay_row_22, 6, 8, 8, 0, d, da, s, a, w, al, r, x + xor xq, xq + movsxdifnidn wq, wd + sub wq, 1 + mov rq, wq + and rq, mmsize/2 - 1 + cmp wq, mmsize/2 + jl .end + sub wq, rq + mova m3, [pw_255] + mova m4, [pw_128] + mova m5, [pw_257] + mova m7, [pw_65280] + .loop0: + pmovzxbw m0, [sq+xq] + movu m2, [aq+2*xq] + pand m2, m3 + movu m6, [aq+2*xq] + pand m6, m7 + psrlw m6, 8 + paddw m2, m6 + psrlw m2, 1 + movu m6, [aq+2*xq] + pand m6, m3 + paddw m2, m6 + psrlw m2, 1 + pmovzxbw m1, [dq+xq] + pmullw m0, m2 + pxor m2, m3 + pmullw m1, m2 + paddw m0, m4 + paddw m0, m1 + pmulhuw m0, m5 + packuswb m0, m0 + movq [dq+xq], m0 + add xq, mmsize/2 + cmp xq, wq + jl .loop0 + + .end: + mov eax, xd + RET + +INIT_XMM sse4 +cglobal overlay_row_20, 6, 8, 8, 0, d, da, s, a, w, al, r, x + xor xq, xq + movsxdifnidn wq, wd + sub wq, 1 + mov rq, wq + and rq, mmsize/2 - 1 + cmp wq, mmsize/2 + jl .end + sub wq, rq + mov daq, aq + add daq, alq + mova m3, [pw_255] + mova m4, [pw_128] + mova m5, [pw_257] + mova m7, [pw_65280] + .loop0: + pmovzxbw m0, [sq+xq] + movu m2, [aq+2*xq] + pand m2, m3 + movu m6, [aq+2*xq] + pand m6, m7 + psrlw m6, 8 + paddw m2, m6 + movu m6, [daq+2*xq] + pand m6, m3 + paddw m2, m6 + movu m6, [daq+2*xq] + pand m6, m7 + psrlw m6, 8 + paddw m2, m6 + psrlw m2, 2 + pmovzxbw m1, [dq+xq] + pmullw m0, m2 + pxor m2, m3 + pmullw m1, m2 + paddw m0, m4 + paddw m0, m1 + pmulhuw m0, m5 + packuswb m0, m0 + movq [dq+xq], m0 + add xq, mmsize/2 + cmp xq, wq + jl .loop0 + + .end: + mov eax, xd + RET diff --git a/libavfilter/x86/vf_overlay_init.c b/libavfilter/x86/vf_overlay_init.c new file mode 100644 index 0000000000..865fd035f6 --- /dev/null +++ b/libavfilter/x86/vf_overlay_init.c @@ -0,0 +1,63 @@ +/* + * Copyright (c) 2018 Paul B Mahol + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/attributes.h" +#include "libavutil/cpu.h" +#include "libavutil/x86/cpu.h" +#include "libavfilter/vf_overlay.h" + +int ff_overlay_row_44_sse4(uint8_t *d, uint8_t *da, uint8_t *s, uint8_t *a, + int w, ptrdiff_t alinesize); + +int ff_overlay_row_20_sse4(uint8_t *d, uint8_t *da, uint8_t *s, uint8_t *a, + int w, ptrdiff_t alinesize); + +int ff_overlay_row_22_sse4(uint8_t *d, uint8_t *da, uint8_t *s, uint8_t *a, + int w, ptrdiff_t alinesize); + +av_cold void ff_overlay_init_x86(OverlayContext *s, int format, int alpha_format, int main_has_alpha) +{ + int cpu_flags = av_get_cpu_flags(); + + if (ARCH_X86_64 && EXTERNAL_SSE4(cpu_flags) && + (format == OVERLAY_FORMAT_YUV444 || + format == OVERLAY_FORMAT_GBRP) && + alpha_format == 0 && main_has_alpha == 0) { + s->blend_row[0] = ff_overlay_row_44_sse4; + s->blend_row[1] = ff_overlay_row_44_sse4; + s->blend_row[2] = ff_overlay_row_44_sse4; + } + + if (ARCH_X86_64 && EXTERNAL_SSE4(cpu_flags) && + (format == OVERLAY_FORMAT_YUV420) && + alpha_format == 0 && main_has_alpha == 0) { + s->blend_row[0] = ff_overlay_row_44_sse4; + s->blend_row[1] = ff_overlay_row_20_sse4; + s->blend_row[2] = ff_overlay_row_20_sse4; + } + + if (ARCH_X86_64 && EXTERNAL_SSE4(cpu_flags) && + (format == OVERLAY_FORMAT_YUV422) && + alpha_format == 0 && main_has_alpha == 0) { + s->blend_row[0] = ff_overlay_row_44_sse4; + s->blend_row[1] = ff_overlay_row_22_sse4; + s->blend_row[2] = ff_overlay_row_22_sse4; + } +}