From patchwork Sun Jan 26 18:28:49 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Paul B Mahol <onemda@gmail.com>
X-Patchwork-Id: 17566
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
X-Original-To: patchwork@ffaux-bg.ffmpeg.org
Delivered-To: patchwork@ffaux-bg.ffmpeg.org
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by ffaux.localdomain (Postfix) with ESMTP id CAD0144B851
	for <patchwork@ffaux-bg.ffmpeg.org>; Sun, 26 Jan 2020 20:29:05 +0200 (EET)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AE16668ACCE;
	Sun, 26 Jan 2020 20:29:05 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com
 [209.85.221.43])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 18E5868AC1A
 for <ffmpeg-devel@ffmpeg.org>; Sun, 26 Jan 2020 20:29:00 +0200 (EET)
Received: by mail-wr1-f43.google.com with SMTP id z7so8189737wrl.13
 for <ffmpeg-devel@ffmpeg.org>; Sun, 26 Jan 2020 10:29:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=from:to:subject:date:message-id;
 bh=rYi/q4gDDZH7DNfT6OMrNV4dlQHBGqSvr7raJv2aY8w=;
 b=E4PvX9qD/AApM1hcnyKVU5LApPAuWViiK29KV9/bjN/s5Y5j4S+kyt+alYj61YA67A
 4mlvS4NnfL1LHkJ9tqgiKiQ6J30BIIQVn1rqHnxZ09YCCWevTrk2FFGfhGvFGIdWW83q
 cMbWSft0H1De2GUTc1riu8pSeq8qMZyu+d73JhyMZwF+hMIH73bx85lfwR/Jls7KV0oN
 niKIjTy6n4Ok5LTNKhRDUOCMEmPUO065DHBfx099eRiR34KZ+L8UZvbofL7eQD6als7E
 vb/7WPiWD4/qtVNWO8RRM5bnq4ZlWN2D2r2sDm0tPqLnkGoSVpSfj/C3daGyiyQp8i/y
 nFNA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:subject:date:message-id;
 bh=rYi/q4gDDZH7DNfT6OMrNV4dlQHBGqSvr7raJv2aY8w=;
 b=pkT8cYSMd9hSNr/PJBCRT9K9nYXIs56FFyVNy47jXvUXXXV6XyynoIe3t2Sov3zfeQ
 XVSoH57penL2IEv5GhdqFpZ5JmibCwhi1UmdwXfoDk8yaURqx9/Vg7bBkbRrcn9Cxwpt
 pG3ArQA+Xhj+LXlGa00uPJ3tiCYfc9vCiuymdTjE4Xdlic+lhEBZGTU1TwcXIwfThmbU
 OVNJ0tskanb0ltfBpw9KS9P2waloh3Yqr00HKfbVASA1qKPBFlp+os6xJAQc+WktubCc
 OHSKqvjbWT4I7lZnZKwgvuCvMdo72QGcFpB6UgLCn038W6ieG9Gy5hjSJzgA65wdPdjq
 K9VQ==
X-Gm-Message-State: APjAAAVIaFmaH30jgqhsChgMtuiP2zBU2QZHNo2eWWfRGufdIorXtO8x
 Od5Lyo1rJkEESXKkGH1CSUs+4bLC8mw=
X-Google-Smtp-Source: 
 APXvYqx0LGr0wfoCRFhUlM4rmAI/602jeW3W3FIcAl2H3hSS+90EP8qvapDaknwj6yebzvslJtgIjA==
X-Received: by 2002:a5d:6b82:: with SMTP id n2mr18018408wrx.153.1580063338758;
 Sun, 26 Jan 2020 10:28:58 -0800 (PST)
Received: from localhost.localdomain ([94.250.182.166])
 by smtp.gmail.com with ESMTPSA id q130sm7525424wme.19.2020.01.26.10.28.57
 for <ffmpeg-devel@ffmpeg.org>
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Sun, 26 Jan 2020 10:28:57 -0800 (PST)
From: Paul B Mahol <onemda@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Date: Sun, 26 Jan 2020 19:28:49 +0100
Message-Id: <20200126182849.24462-1-onemda@gmail.com>
X-Mailer: git-send-email 2.17.1
Subject: [FFmpeg-devel] [PATCH] avfilter: add xfade opencl filter
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Signed-off-by: Paul B Mahol <onemda@gmail.com>
---
 configure                     |   1 +
 doc/filters.texi              |  97 ++++++++
 libavfilter/Makefile          |   1 +
 libavfilter/allfilters.c      |   1 +
 libavfilter/opencl/xfade.cl   | 136 +++++++++++
 libavfilter/opencl_source.h   |   1 +
 libavfilter/vf_xfade_opencl.c | 420 ++++++++++++++++++++++++++++++++++
 7 files changed, 657 insertions(+)
 create mode 100644 libavfilter/opencl/xfade.cl
 create mode 100644 libavfilter/vf_xfade_opencl.c

diff --git a/configure b/configure
index c02dbcc8b2..fbb1a86511 100755
--- a/configure
+++ b/configure
@@ -3596,6 +3596,7 @@ zscale_filter_deps="libzimg const_nan"
 scale_vaapi_filter_deps="vaapi"
 vpp_qsv_filter_deps="libmfx"
 vpp_qsv_filter_select="qsvvpp"
+xfade_opencl_filter_deps="opencl"
 yadif_cuda_filter_deps="ffnvcodec"
 yadif_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
 
diff --git a/doc/filters.texi b/doc/filters.texi
index 3f40af8439..a5b19e7b94 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -21343,6 +21343,103 @@ Apply a strong blur of both luma and chroma parameters:
 @end example
 @end itemize
 
+@section xfade_opencl
+
+Cross fade two videos with custom transition effect by using OpenCL.
+
+It accepts the following options:
+
+@table @option
+@item transition
+Set one of possible transition effects.
+
+@table @option
+@item custom
+Select custom transition effect, the actual transition description
+will be picked from source and kernel options.
+
+@item fade
+@item wipeleft
+@item wiperight
+@item wipeup
+@item wipedown
+@item slideleft
+@item slideright
+@item slideup
+@item slidedown
+
+Default transtition is fade.
+@end table
+
+@item source
+OpenCL program source file for custom transition.
+
+@item kernel
+Set name of kernel to use for custom transition from program source file.
+
+@item duration
+Set duration of video transition.
+
+@item offset
+Set time of start of transition relative to first video.
+@end table
+
+The program source file must contain a kernel function with the given name,
+which will be run once for each plane of the output.  Each run on a plane
+gets enqueued as a separate 2D global NDRange with one work-item for each
+pixel to be generated.  The global ID offset for each work-item is therefore
+the coordinates of a pixel in the destination image.
+
+The kernel function needs to take the following arguments:
+@itemize
+@item
+Destination image, @var{__write_only image2d_t}.
+
+This image will become the output; the kernel should write all of it.
+
+@item
+First Source image, @var{__read_only image2d_t}.
+Second Source image, @var{__read_only image2d_t}.
+
+These are the most recent images on each input.  The kernel may read from
+them to generate the output, but they can't be written to.
+
+@item
+Transition progress, @var{float}. This value is always between 0 and 1 inclusive.
+@end itemize
+
+Example programs:
+
+@itemize
+@item
+Apply dots curtain transition effect:
+@verbatim
+__kernel void blend_images(__write_only image2d_t dst,
+                           __read_only  image2d_t src1,
+                           __read_only  image2d_t src2,
+                           float progress)
+{
+    const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |
+                               CLK_FILTER_LINEAR);
+    int2  p = (int2)(get_global_id(0), get_global_id(1));
+    float2 rp = (float2)(get_global_id(0), get_global_id(1));
+    float2 dim = (float2)(get_image_dim(src1).x, get_image_dim(src1).y);
+    rp = rp / dim;
+
+    float2 dots = (float2)(20.0, 20.0);
+    float2 center = (float2)(0,0);
+    float2 unused;
+
+    float4 val1 = read_imagef(src1, sampler, p);
+    float4 val2 = read_imagef(src2, sampler, p);
+    bool next = distance(fract(rp * dots, &unused), (float2)(0.5, 0.5)) < (progress / distance(rp, center));
+
+    write_imagef(dst, p, next ? val1 : val2);
+}
+@end verbatim
+
+@end itemize
+
 @c man end OPENCL VIDEO FILTERS
 
 @chapter VAAPI Video Filters
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 58b3077dec..a5ee9c8e88 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -441,6 +441,7 @@ OBJS-$(CONFIG_W3FDIF_FILTER)                 += vf_w3fdif.o
 OBJS-$(CONFIG_WAVEFORM_FILTER)               += vf_waveform.o
 OBJS-$(CONFIG_WEAVE_FILTER)                  += vf_weave.o
 OBJS-$(CONFIG_XBR_FILTER)                    += vf_xbr.o
+OBJS-$(CONFIG_XFADE_OPENCL_FILTER)           += vf_xfade_opencl.o opencl.o opencl/xfade.o
 OBJS-$(CONFIG_XMEDIAN_FILTER)                += vf_xmedian.o framesync.o
 OBJS-$(CONFIG_XSTACK_FILTER)                 += vf_stack.o framesync.o
 OBJS-$(CONFIG_YADIF_FILTER)                  += vf_yadif.o yadif_common.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 6270c18ae2..8a7eac3757 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -420,6 +420,7 @@ extern AVFilter ff_vf_w3fdif;
 extern AVFilter ff_vf_waveform;
 extern AVFilter ff_vf_weave;
 extern AVFilter ff_vf_xbr;
+extern AVFilter ff_vf_xfade_opencl;
 extern AVFilter ff_vf_xmedian;
 extern AVFilter ff_vf_xstack;
 extern AVFilter ff_vf_yadif;
diff --git a/libavfilter/opencl/xfade.cl b/libavfilter/opencl/xfade.cl
new file mode 100644
index 0000000000..c52b68c5fd
--- /dev/null
+++ b/libavfilter/opencl/xfade.cl
@@ -0,0 +1,136 @@
+__kernel void fade(__write_only image2d_t dst,
+                   __read_only  image2d_t src1,
+                   __read_only  image2d_t src2,
+                   float progress)
+{
+    const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |
+                               CLK_FILTER_NEAREST);
+    int2  p = (int2)(get_global_id(0), get_global_id(1));
+
+    float4 val1 = read_imagef(src1, sampler, p);
+    float4 val2 = read_imagef(src2, sampler, p);
+
+    write_imagef(dst, p, val1 * progress + val2 * (1.f - progress));
+}
+
+__kernel void wipeleft(__write_only image2d_t dst,
+                       __read_only  image2d_t src1,
+                       __read_only  image2d_t src2,
+                       float progress)
+{
+    int   s = (int)(get_image_dim(src1).x * progress);
+    int2  p = (int2)(get_global_id(0), get_global_id(1));
+    const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |
+                               CLK_FILTER_NEAREST);
+
+    float4 val1 = read_imagef(src1, sampler, p);
+    float4 val2 = read_imagef(src2, sampler, p);
+
+    write_imagef(dst, p, p.x > s ? val2 : val1);
+}
+
+__kernel void wiperight(__write_only image2d_t dst,
+                        __read_only  image2d_t src1,
+                        __read_only  image2d_t src2,
+                        float progress)
+{
+    int   s = (int)(get_image_dim(src1).x * (1.f - progress));
+    int2  p = (int2)(get_global_id(0), get_global_id(1));
+    const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |
+                               CLK_FILTER_NEAREST);
+
+    float4 val1 = read_imagef(src1, sampler, p);
+    float4 val2 = read_imagef(src2, sampler, p);
+
+    write_imagef(dst, p, p.x > s ? val1 : val2);
+}
+
+__kernel void wipeup(__write_only image2d_t dst,
+                     __read_only  image2d_t src1,
+                     __read_only  image2d_t src2,
+                     float progress)
+{
+    int   s = (int)(get_image_dim(src1).y * progress);
+    int2  p = (int2)(get_global_id(0), get_global_id(1));
+    const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |
+                               CLK_FILTER_NEAREST);
+
+    float4 val1 = read_imagef(src1, sampler, p);
+    float4 val2 = read_imagef(src2, sampler, p);
+
+    write_imagef(dst, p, p.y > s ? val2 : val1);
+}
+
+__kernel void wipedown(__write_only image2d_t dst,
+                       __read_only  image2d_t src1,
+                       __read_only  image2d_t src2,
+                       float progress)
+{
+    int   s = (int)(get_image_dim(src1).y * (1.f - progress));
+    int2  p = (int2)(get_global_id(0), get_global_id(1));
+    const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |
+                               CLK_FILTER_NEAREST);
+
+    float4 val1 = read_imagef(src1, sampler, p);
+    float4 val2 = read_imagef(src2, sampler, p);
+
+    write_imagef(dst, p, p.y > s ? val1 : val2);
+}
+
+void slide(__write_only image2d_t dst,
+           __read_only  image2d_t src1,
+           __read_only  image2d_t src2,
+           float progress,
+           int2 direction)
+{
+    const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |  \
+                               CLK_FILTER_NEAREST);
+    int   w = get_image_dim(src1).x;
+    int   h = get_image_dim(src1).y;
+    int2 wh = (int2)(w, h);
+    int2 uv = (int2)(get_global_id(0), get_global_id(1));
+    int2 pi = (int2)(progress * w, progress * h);
+    int2 p = uv + pi * direction;
+    int2 f = p % wh;
+
+    f = f + (int2)(w, h) * (int2)(f.x < 0, f.y < 0);
+    float4 val1 = read_imagef(src1, sampler, f);
+    float4 val2 = read_imagef(src2, sampler, f);
+    write_imagef(dst, uv, mix(val1, val2, (p.y >= 0) * (h > p.y) * (p.x >= 0) * (w > p.x)));
+}
+
+__kernel void slidedown(__write_only image2d_t dst,
+                        __read_only  image2d_t src1,
+                        __read_only  image2d_t src2,
+                        float progress)
+{
+    int2 direction = (int2)(0, 1);
+    slide(dst, src1, src2, progress, direction);
+}
+
+__kernel void slideup(__write_only image2d_t dst,
+                      __read_only  image2d_t src1,
+                      __read_only  image2d_t src2,
+                      float progress)
+{
+    int2 direction = (int2)(0, -1);
+    slide(dst, src1, src2, progress, direction);
+}
+
+__kernel void slideleft(__write_only image2d_t dst,
+                        __read_only  image2d_t src1,
+                        __read_only  image2d_t src2,
+                        float progress)
+{
+    int2 direction = (int2)(-1, 0);
+    slide(dst, src1, src2, progress, direction);
+}
+
+__kernel void slideright(__write_only image2d_t dst,
+                         __read_only  image2d_t src1,
+                         __read_only  image2d_t src2,
+                         float progress)
+{
+    int2 direction = (int2)(1, 0);
+    slide(dst, src1, src2, progress, direction);
+}
diff --git a/libavfilter/opencl_source.h b/libavfilter/opencl_source.h
index 225e7a49ea..4e262672ad 100644
--- a/libavfilter/opencl_source.h
+++ b/libavfilter/opencl_source.h
@@ -30,5 +30,6 @@ extern const char *ff_opencl_source_overlay;
 extern const char *ff_opencl_source_tonemap;
 extern const char *ff_opencl_source_transpose;
 extern const char *ff_opencl_source_unsharp;
+extern const char *ff_opencl_source_xfade;
 
 #endif /* AVFILTER_OPENCL_SOURCE_H */
diff --git a/libavfilter/vf_xfade_opencl.c b/libavfilter/vf_xfade_opencl.c
new file mode 100644
index 0000000000..7e9b6becf5
--- /dev/null
+++ b/libavfilter/vf_xfade_opencl.c
@@ -0,0 +1,420 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/log.h"
+#include "libavutil/mem.h"
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
+
+#include "avfilter.h"
+#include "filters.h"
+#include "framesync.h"
+#include "internal.h"
+#include "opencl.h"
+#include "opencl_source.h"
+#include "video.h"
+
+enum XFadeTransitions {
+    CUSTOM,
+    FADE,
+    WIPELEFT,
+    WIPERIGHT,
+    WIPEUP,
+    WIPEDOWN,
+    SLIDELEFT,
+    SLIDERIGHT,
+    SLIDEUP,
+    SLIDEDOWN,
+    NB_TRANSITIONS,
+};
+
+typedef struct XFadeOpenCLContext {
+    OpenCLFilterContext ocf;
+
+    int              transition;
+    const char      *source_file;
+    const char      *kernel_name;
+    int64_t          duration;
+    int64_t          offset;
+
+    int              initialised;
+    cl_kernel        kernel;
+    cl_command_queue command_queue;
+
+    int              nb_planes;
+
+    int64_t          duration_pts;
+    int64_t          offset_pts;
+    int64_t          first_pts;
+    int64_t          last_pts;
+    int64_t          pts;
+    int              xfade_is_over;
+    int              need_second;
+    int              eof[2];
+    AVFrame         *xf[2];
+} XFadeOpenCLContext;
+
+static int xfade_opencl_load(AVFilterContext *avctx,
+                             enum AVPixelFormat main_format,
+                             enum AVPixelFormat xfade_format)
+{
+    XFadeOpenCLContext *ctx = avctx->priv;
+    cl_int cle;
+    const AVPixFmtDescriptor *main_desc;
+    int err, main_planes;
+    const char *kernel_name;
+
+    main_desc = av_pix_fmt_desc_get(main_format);
+    if (main_format != xfade_format) {
+        av_log(avctx, AV_LOG_ERROR, "Input formats are not same.\n");
+        return AVERROR(EINVAL);
+    }
+
+    main_planes = 0;
+    for (int i = 0; i < main_desc->nb_components; i++)
+        main_planes = FFMAX(main_planes,
+                            main_desc->comp[i].plane + 1);
+
+    ctx->nb_planes = main_planes;
+
+    if (ctx->transition == CUSTOM) {
+        err = ff_opencl_filter_load_program_from_file(avctx, ctx->source_file);
+    } else {
+        err = ff_opencl_filter_load_program(avctx, &ff_opencl_source_xfade, 1);
+    }
+    if (err < 0)
+        return err;
+
+    ctx->command_queue = clCreateCommandQueue(ctx->ocf.hwctx->context,
+                                              ctx->ocf.hwctx->device_id,
+                                              0, &cle);
+    CL_FAIL_ON_ERROR(AVERROR(EIO), "Failed to create OpenCL "
+                     "command queue %d.\n", cle);
+
+    switch (ctx->transition) {
+    case CUSTOM:     kernel_name = ctx->kernel_name; break;
+    case FADE:       kernel_name = "fade";           break;
+    case WIPELEFT:   kernel_name = "wipeleft";       break;
+    case WIPERIGHT:  kernel_name = "wiperight";      break;
+    case WIPEUP:     kernel_name = "wipeup";         break;
+    case WIPEDOWN:   kernel_name = "wipedown";       break;
+    case SLIDELEFT:  kernel_name = "slideleft";      break;
+    case SLIDERIGHT: kernel_name = "slideright";     break;
+    case SLIDEUP:    kernel_name = "slideup";        break;
+    case SLIDEDOWN:  kernel_name = "slidedown";      break;
+    default:
+        err = AVERROR_BUG;
+        goto fail;
+    }
+
+    ctx->kernel = clCreateKernel(ctx->ocf.program, kernel_name, &cle);
+    CL_FAIL_ON_ERROR(AVERROR(EIO), "Failed to create kernel %d.\n", cle);
+
+    ctx->initialised = 1;
+
+    return 0;
+
+fail:
+    if (ctx->command_queue)
+        clReleaseCommandQueue(ctx->command_queue);
+    if (ctx->kernel)
+        clReleaseKernel(ctx->kernel);
+    return err;
+}
+
+static int xfade_frame(AVFilterContext *avctx, AVFrame *a, AVFrame *b)
+{
+    AVFilterLink *outlink = avctx->outputs[0];
+    XFadeOpenCLContext *ctx = avctx->priv;
+    AVFrame *output;
+    cl_int cle;
+    cl_float progress = av_clipf(1.f - ((cl_float)(ctx->pts - ctx->first_pts - ctx->offset_pts) / ctx->duration_pts), 0.f, 1.f);
+    size_t global_work[2];
+    int kernel_arg = 0;
+    int err, plane;
+
+    if (!ctx->initialised) {
+        AVHWFramesContext *main_fc =
+            (AVHWFramesContext*)a->hw_frames_ctx->data;
+        AVHWFramesContext *xfade_fc =
+            (AVHWFramesContext*)b->hw_frames_ctx->data;
+
+        err = xfade_opencl_load(avctx, main_fc->sw_format,
+                                xfade_fc->sw_format);
+        if (err < 0)
+            return err;
+    }
+
+    output = ff_get_video_buffer(outlink, outlink->w, outlink->h);
+    if (!output) {
+        err = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    for (plane = 0; plane < ctx->nb_planes; plane++) {
+        cl_mem mem;
+        kernel_arg = 0;
+
+        mem = (cl_mem)output->data[plane];
+        CL_SET_KERNEL_ARG(ctx->kernel, kernel_arg, cl_mem, &mem);
+        kernel_arg++;
+
+        mem = (cl_mem)ctx->xf[0]->data[plane];
+        CL_SET_KERNEL_ARG(ctx->kernel, kernel_arg, cl_mem, &mem);
+        kernel_arg++;
+
+        mem = (cl_mem)ctx->xf[1]->data[plane];
+        CL_SET_KERNEL_ARG(ctx->kernel, kernel_arg, cl_mem, &mem);
+        kernel_arg++;
+
+        CL_SET_KERNEL_ARG(ctx->kernel, kernel_arg, cl_float, &progress);
+        kernel_arg++;
+
+        err = ff_opencl_filter_work_size_from_image(avctx, global_work,
+                                                    output, plane, 0);
+        if (err < 0)
+            goto fail;
+
+        cle = clEnqueueNDRangeKernel(ctx->command_queue, ctx->kernel, 2, NULL,
+                                     global_work, NULL, 0, NULL, NULL);
+        CL_FAIL_ON_ERROR(AVERROR(EIO), "Failed to enqueue xfade kernel "
+                         "for plane %d: %d.\n", plane, cle);
+    }
+
+    cle = clFinish(ctx->command_queue);
+    CL_FAIL_ON_ERROR(AVERROR(EIO), "Failed to finish command queue: %d.\n", cle);
+
+    err = av_frame_copy_props(output, ctx->xf[0]);
+    if (err < 0)
+        goto fail;
+
+    output->pts = ctx->pts;
+
+    return ff_filter_frame(outlink, output);
+
+fail:
+    av_frame_free(&output);
+    return err;
+}
+
+static int xfade_opencl_config_output(AVFilterLink *outlink)
+{
+    AVFilterContext *avctx = outlink->src;
+    XFadeOpenCLContext *ctx = avctx->priv;
+    AVFilterLink *mainlink = avctx->inputs[0];
+    int err;
+
+    err = ff_opencl_filter_config_output(outlink);
+    if (err < 0)
+        return err;
+
+    ctx->first_pts = ctx->last_pts = ctx->pts = AV_NOPTS_VALUE;
+
+    outlink->time_base = mainlink->time_base;
+    outlink->sample_aspect_ratio = mainlink->sample_aspect_ratio;
+    outlink->frame_rate = mainlink->frame_rate;
+
+    if (ctx->duration)
+        ctx->duration_pts = av_rescale_q(ctx->duration, AV_TIME_BASE_Q, outlink->time_base);
+    if (ctx->offset)
+        ctx->offset_pts = av_rescale_q(ctx->offset, AV_TIME_BASE_Q, outlink->time_base);
+
+    return 0;
+}
+
+static int xfade_opencl_activate(AVFilterContext *avctx)
+{
+    XFadeOpenCLContext *ctx = avctx->priv;
+    AVFilterLink *outlink = avctx->outputs[0];
+    AVFrame *in = NULL;
+    int ret = 0, status;
+    int64_t pts;
+
+    FF_FILTER_FORWARD_STATUS_BACK_ALL(outlink, avctx);
+
+    if (ctx->xfade_is_over) {
+        ret = ff_inlink_consume_frame(avctx->inputs[1], &in);
+        if (ret < 0) {
+            return ret;
+        } else if (ff_inlink_acknowledge_status(avctx->inputs[1], &status, &pts)) {
+            ff_outlink_set_status(outlink, status, ctx->pts);
+            return 0;
+        } else if (!ret) {
+            if (ff_outlink_frame_wanted(outlink)) {
+                ff_inlink_request_frame(avctx->inputs[1]);
+                return 0;
+            }
+        } else {
+            in->pts = (in->pts - ctx->last_pts) + ctx->pts;
+            return ff_filter_frame(outlink, in);
+        }
+    }
+
+    if (ff_inlink_queued_frames(avctx->inputs[0]) > 0) {
+        ctx->xf[0] = ff_inlink_peek_frame(avctx->inputs[0], 0);
+        if (ctx->xf[0]) {
+            if (ctx->first_pts == AV_NOPTS_VALUE) {
+                ctx->first_pts = ctx->xf[0]->pts;
+            }
+            ctx->pts = ctx->xf[0]->pts;
+            if (ctx->first_pts + ctx->offset_pts > ctx->xf[0]->pts) {
+                ctx->xf[0] = NULL;
+                ctx->need_second = 0;
+                ff_inlink_consume_frame(avctx->inputs[0], &in);
+                return ff_filter_frame(outlink, in);
+            }
+
+            ctx->need_second = 1;
+        }
+    }
+
+    if (ctx->xf[0] && ff_inlink_queued_frames(avctx->inputs[1]) > 0) {
+        ff_inlink_consume_frame(avctx->inputs[0], &ctx->xf[0]);
+        ff_inlink_consume_frame(avctx->inputs[1], &ctx->xf[1]);
+
+        ctx->last_pts = ctx->xf[1]->pts;
+        ctx->pts = ctx->xf[0]->pts;
+        if (ctx->xf[0]->pts - (ctx->first_pts + ctx->offset_pts) > ctx->duration_pts)
+            ctx->xfade_is_over = 1;
+        ret = xfade_frame(avctx, ctx->xf[0], ctx->xf[1]);
+        av_frame_free(&ctx->xf[0]);
+        av_frame_free(&ctx->xf[1]);
+        return ret;
+    }
+
+    if (ff_inlink_queued_frames(avctx->inputs[0]) > 0 &&
+        ff_inlink_queued_frames(avctx->inputs[1]) > 0) {
+        ff_filter_set_ready(avctx, 100);
+        return 0;
+    }
+
+    if (ff_outlink_frame_wanted(outlink)) {
+        if (!ctx->eof[0] && ff_outlink_get_status(avctx->inputs[0])) {
+            ctx->eof[0] = 1;
+            ctx->xfade_is_over = 1;
+        }
+        if (!ctx->eof[1] && ff_outlink_get_status(avctx->inputs[1])) {
+            ctx->eof[1] = 1;
+        }
+        if (!ctx->eof[0] && !ctx->xf[0])
+            ff_inlink_request_frame(avctx->inputs[0]);
+        if (!ctx->eof[1] && (ctx->need_second || ctx->eof[0]))
+            ff_inlink_request_frame(avctx->inputs[1]);
+        if (ctx->eof[0] && ctx->eof[1] && (
+            ff_inlink_queued_frames(avctx->inputs[0]) <= 0 ||
+            ff_inlink_queued_frames(avctx->inputs[1]) <= 0))
+            ff_outlink_set_status(outlink, AVERROR_EOF, AV_NOPTS_VALUE);
+        return 0;
+    }
+
+    return FFERROR_NOT_READY;
+}
+
+static av_cold void xfade_opencl_uninit(AVFilterContext *avctx)
+{
+    XFadeOpenCLContext *ctx = avctx->priv;
+    cl_int cle;
+
+    if (ctx->kernel) {
+        cle = clReleaseKernel(ctx->kernel);
+        if (cle != CL_SUCCESS)
+            av_log(avctx, AV_LOG_ERROR, "Failed to release "
+                   "kernel: %d.\n", cle);
+    }
+
+    if (ctx->command_queue) {
+        cle = clReleaseCommandQueue(ctx->command_queue);
+        if (cle != CL_SUCCESS)
+            av_log(avctx, AV_LOG_ERROR, "Failed to release "
+                   "command queue: %d.\n", cle);
+    }
+
+    ff_opencl_filter_uninit(avctx);
+}
+
+static AVFrame *get_video_buffer(AVFilterLink *inlink, int w, int h)
+{
+    XFadeOpenCLContext *s = inlink->dst->priv;
+
+    return s->xfade_is_over || !s->need_second ?
+        ff_null_get_video_buffer   (inlink, w, h) :
+        ff_default_get_video_buffer(inlink, w, h);
+}
+
+#define OFFSET(x) offsetof(XFadeOpenCLContext, x)
+#define FLAGS (AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM)
+
+static const AVOption xfade_opencl_options[] = {
+    { "transition", "set cross fade transition", OFFSET(transition), AV_OPT_TYPE_INT, {.i64=1}, 0, NB_TRANSITIONS-1, FLAGS, "transition" },
+    {   "custom",    "custom transition",     0, AV_OPT_TYPE_CONST, {.i64=CUSTOM},    0, 0, FLAGS, "transition" },
+    {   "fade",      "fade transition",       0, AV_OPT_TYPE_CONST, {.i64=FADE},      0, 0, FLAGS, "transition" },
+    {   "wipeleft",  "wipe left transition",  0, AV_OPT_TYPE_CONST, {.i64=WIPELEFT},  0, 0, FLAGS, "transition" },
+    {   "wiperight", "wipe right transition", 0, AV_OPT_TYPE_CONST, {.i64=WIPERIGHT}, 0, 0, FLAGS, "transition" },
+    {   "wipeup",    "wipe up transition",    0, AV_OPT_TYPE_CONST, {.i64=WIPEUP},    0, 0, FLAGS, "transition" },
+    {   "wipedown",  "wipe down transition",  0, AV_OPT_TYPE_CONST, {.i64=WIPEDOWN},  0, 0, FLAGS, "transition" },
+    {   "slideleft",  "slide left transition",  0, AV_OPT_TYPE_CONST, {.i64=SLIDELEFT},  0, 0, FLAGS, "transition" },
+    {   "slideright", "slide right transition", 0, AV_OPT_TYPE_CONST, {.i64=SLIDERIGHT}, 0, 0, FLAGS, "transition" },
+    {   "slideup",    "slide up transition",    0, AV_OPT_TYPE_CONST, {.i64=SLIDEUP},    0, 0, FLAGS, "transition" },
+    {   "slidedown",  "slide down transition",  0, AV_OPT_TYPE_CONST, {.i64=SLIDEDOWN},  0, 0, FLAGS, "transition" },
+    { "source", "set OpenCL program source file for custom transition", OFFSET(source_file), AV_OPT_TYPE_STRING, {.str = NULL}, .flags = FLAGS },
+    { "kernel", "set kernel name in program file for custom transition", OFFSET(kernel_name), AV_OPT_TYPE_STRING, {.str = NULL}, .flags = FLAGS },
+    { "duration", "set cross fade duration", OFFSET(duration), AV_OPT_TYPE_DURATION, {.i64=1000000}, 0, 60000000, FLAGS },
+    { "offset",   "set cross fade start relative to first input stream", OFFSET(offset), AV_OPT_TYPE_DURATION, {.i64=0}, 0, 60000000, FLAGS },
+    { NULL }
+};
+
+AVFILTER_DEFINE_CLASS(xfade_opencl);
+
+static const AVFilterPad xfade_opencl_inputs[] = {
+    {
+        .name             = "main",
+        .type             = AVMEDIA_TYPE_VIDEO,
+        .get_video_buffer = get_video_buffer,
+        .config_props     = &ff_opencl_filter_config_input,
+    },
+    {
+        .name             = "xfade",
+        .type             = AVMEDIA_TYPE_VIDEO,
+        .get_video_buffer = get_video_buffer,
+        .config_props     = &ff_opencl_filter_config_input,
+    },
+    { NULL }
+};
+
+static const AVFilterPad xfade_opencl_outputs[] = {
+    {
+        .name          = "default",
+        .type          = AVMEDIA_TYPE_VIDEO,
+        .config_props  = &xfade_opencl_config_output,
+    },
+    { NULL }
+};
+
+AVFilter ff_vf_xfade_opencl = {
+    .name            = "xfade_opencl",
+    .description     = NULL_IF_CONFIG_SMALL("Cross fade one video with another video."),
+    .priv_size       = sizeof(XFadeOpenCLContext),
+    .priv_class      = &xfade_opencl_class,
+    .init            = &ff_opencl_filter_init,
+    .uninit          = &xfade_opencl_uninit,
+    .query_formats   = &ff_opencl_filter_query_formats,
+    .activate        = &xfade_opencl_activate,
+    .inputs          = xfade_opencl_inputs,
+    .outputs         = xfade_opencl_outputs,
+    .flags_internal  = FF_FILTER_FLAG_HWFRAME_AWARE,
+};