diff mbox

[FFmpeg-devel] libavfilter: Add OpenCL convolution filter v0.1

Message ID 1521020637-32648-1-git-send-email-danyaschenko@gmail.com
State New
Headers show

Commit Message

Danil Iashchenko March 14, 2018, 9:43 a.m. UTC
Behaves like the existing convolution filter, except working on OpenCL
hardware frames.
Takes exactly the same options: 4 convolution matrices, 4 rdiv values, 4 bias values.
If not specified, default parameters are applied.
Matrices can be different sizes.

NEW IN THIS PATCH:
-fixed bug, if matrices have different dims.
-renamed some variables due to readability.

filter applies:
matrix0, rdiv0, bias0 to image plane0.
matrix1, rdiv1, bias1 to image plane1.
matrix2, rdiv2, bias2 to image plane2.
matrix3, rdiv3, bias3 to image plane3.

About Kernel parameters:
dst - destination image
src - source image
coef_matrices_dims - stores dimensions of matrix{0..3} consecutively one after the other
coef_matrices - stores matrices{0..3} consecutively one after the other
rdivs - stores rdiv{0..3} parameters consecutively one after the other
biases - stores bias{0..3} parameters consecutively one after the other

About sscanf. I had (!err_code) condition, because I would never get empty line
as option(if not specified, I always have default matrix), but changed to (err_code != 1) due to read-ability.
Also, before sscanf I split matrix with spaces, so I process each single value
of matrix seperately from others and check if they are ok.

about rdiv_buffer, bias_buffer, dims_buffer objects: they should be buffer objects, because they store sequence of values, not a single value

---
 configure                           |   1 +
 libavfilter/Makefile                |   1 +
 libavfilter/allfilters.c            |   1 +
 libavfilter/opencl/convolution.cl   |  46 ++++
 libavfilter/opencl_source.h         |   1 +
 libavfilter/vf_convolution_opencl.c | 449 ++++++++++++++++++++++++++++++++++++
 6 files changed, 499 insertions(+)
 create mode 100644 libavfilter/opencl/convolution.cl
 create mode 100644 libavfilter/vf_convolution_opencl.c

Comments

Carl Eugen Hoyos March 14, 2018, 10:17 a.m. UTC | #1
2018-03-14 10:43 GMT+01:00, Danil Iashchenko <danyaschenko@gmail.com>:

> About sscanf. I had (!err_code) condition, because I would never
> get empty line as option (if not specified, I always have default
> matrix), but changed to (err_code != 1) due to read-ability.

Was this requested?
I ask because several developers believe the opposite is true...

[...]

> --- /dev/null
> +++ b/libavfilter/opencl/convolution.cl
> @@ -0,0 +1,46 @@
> +/*
> + * This file is part of FFmpeg.

Please add your name.

[...]

> +    for (i = 0; i < 4; i++) {
> +        p = ctx->matrix_str[i];
> +        while (ctx->matrix_size[i] < 49) {

> +            if (!(arg = av_strtok(p, " ", &saveptr)))

Please split this line.

And ask one of the mentors to add your name to the OpenCL project
on https://trac.ffmpeg.org/wiki/SponsoringPrograms/GSoC/2018 -
other potential students should know that you are interested in this
project and that you have (nearly) completed a qualification task.

Thank you, Carl Eugen
Mark Thompson March 14, 2018, 11:26 p.m. UTC | #2
On 14/03/18 09:43, Danil Iashchenko wrote:
> Behaves like the existing convolution filter, except working on OpenCL
> hardware frames.
> Takes exactly the same options: 4 convolution matrices, 4 rdiv values, 4 bias values.
> If not specified, default parameters are applied.
> Matrices can be different sizes.
> 
> NEW IN THIS PATCH:
> -fixed bug, if matrices have different dims.
> -renamed some variables due to readability.
> 
> filter applies:
> matrix0, rdiv0, bias0 to image plane0.
> matrix1, rdiv1, bias1 to image plane1.
> matrix2, rdiv2, bias2 to image plane2.
> matrix3, rdiv3, bias3 to image plane3.
> 
> About Kernel parameters:
> dst - destination image
> src - source image
> coef_matrices_dims - stores dimensions of matrix{0..3} consecutively one after the other
> coef_matrices - stores matrices{0..3} consecutively one after the other
> rdivs - stores rdiv{0..3} parameters consecutively one after the other
> biases - stores bias{0..3} parameters consecutively one after the other
> 
> About sscanf. I had (!err_code) condition, because I would never get empty line
> as option(if not specified, I always have default matrix), but changed to (err_code != 1) due to read-ability.
> Also, before sscanf I split matrix with spaces, so I process each single value
> of matrix seperately from others and check if they are ok.

My logic here is that you want to make sure that exactly one argument has been converted, so checking for != 1 is the right thing to do for errors.  Even if you can argue that == 0 is the only possible error case and other error cases can't possibly happen, someone reading the code also has to think about that in the same way to be able to convince themselves that there isn't an error, where if you wrote != 1 they wouldn't need to think about it at all.

(To offer a counterexample to the suggestion that you couldn't get -1 from sscanf() in the previous code, note that it only split for ' ' so it would fail on other whitespace.  E.g. "1 2 3 4 	 6 7 8 9" (tab in the middle) would be accepted as a matrix and then invoke undefined behaviour reading the uninitialised value in the middle of the that matrix.)

> about rdiv_buffer, bias_buffer, dims_buffer objects: they should be buffer objects, because they store sequence of values, not a single value

You only need to pass one value to each invocation of the kernel, since only one value is needed for each plane.

> ---
>  configure                           |   1 +
>  libavfilter/Makefile                |   1 +
>  libavfilter/allfilters.c            |   1 +
>  libavfilter/opencl/convolution.cl   |  46 ++++
>  libavfilter/opencl_source.h         |   1 +
>  libavfilter/vf_convolution_opencl.c | 449 ++++++++++++++++++++++++++++++++++++
>  6 files changed, 499 insertions(+)
>  create mode 100644 libavfilter/opencl/convolution.cl
>  create mode 100644 libavfilter/vf_convolution_opencl.c
> 
> diff --git a/configure b/configure
> index 6916b45..bf5c312 100755
> --- a/configure
> +++ b/configure
> @@ -3210,6 +3210,7 @@ blackframe_filter_deps="gpl"
>  boxblur_filter_deps="gpl"
>  bs2b_filter_deps="libbs2b"
>  colormatrix_filter_deps="gpl"
> +convolution_opencl_filter_deps="opencl"
>  convolve_filter_deps="avcodec"
>  convolve_filter_select="fft"
>  coreimage_filter_deps="coreimage appkit"
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index 6a60836..d005934 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -156,6 +156,7 @@ OBJS-$(CONFIG_COLORLEVELS_FILTER)            += vf_colorlevels.o
>  OBJS-$(CONFIG_COLORMATRIX_FILTER)            += vf_colormatrix.o
>  OBJS-$(CONFIG_COLORSPACE_FILTER)             += vf_colorspace.o colorspacedsp.o
>  OBJS-$(CONFIG_CONVOLUTION_FILTER)            += vf_convolution.o
> +OBJS-$(CONFIG_CONVOLUTION_OPENCL_FILTER)     += vf_convolution_opencl.o opencl.o opencl/convolution.o
>  OBJS-$(CONFIG_CONVOLVE_FILTER)               += vf_convolve.o framesync.o
>  OBJS-$(CONFIG_COPY_FILTER)                   += vf_copy.o
>  OBJS-$(CONFIG_COREIMAGE_FILTER)              += vf_coreimage.o
> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> index 9adb109..f2dc55e 100644
> --- a/libavfilter/allfilters.c
> +++ b/libavfilter/allfilters.c
> @@ -166,6 +166,7 @@ static void register_all(void)
>      REGISTER_FILTER(COLORMATRIX,    colormatrix,    vf);
>      REGISTER_FILTER(COLORSPACE,     colorspace,     vf);
>      REGISTER_FILTER(CONVOLUTION,    convolution,    vf);
> +    REGISTER_FILTER(CONVOLUTION_OPENCL, convolution_opencl, vf);
>      REGISTER_FILTER(CONVOLVE,       convolve,       vf);
>      REGISTER_FILTER(COPY,           copy,           vf);
>      REGISTER_FILTER(COREIMAGE,      coreimage,      vf);
> diff --git a/libavfilter/opencl/convolution.cl b/libavfilter/opencl/convolution.cl
> new file mode 100644
> index 0000000..192f1ef
> --- /dev/null
> +++ b/libavfilter/opencl/convolution.cl
> @@ -0,0 +1,46 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +__kernel void convolution_global(__write_only image2d_t dst,
> +                                 __read_only  image2d_t src,
> +                                 __constant int *coef_matrices_dims,
> +                                 __constant float *coef_matrices,
> +                                 __constant float *rdivs,
> +                                 __constant float *biases)
> +{
> +    const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE | CLK_FILTER_NEAREST);
> +
> +    int2 loc = (int2)(get_global_id(0), get_global_id(1));
> +
> +    float4 convPix = (float4)(0.0f, 0.0f, 0.0f, 0.0f);
> +    for (int i = 0; i < 4; i++) {
> +        int half_matrix_dim = coef_matrices_dims[i] / 2;
> +        int offset = 0;
> +        for (int j = 0; j < i; j++) {
> +            offset += coef_matrices_dims[j] * coef_matrices_dims[j];
> +        }
> +        for (int conv_i = -half_matrix_dim; conv_i <= half_matrix_dim; conv_i++) {
> +            for (int conv_j = -half_matrix_dim; conv_j <= half_matrix_dim; conv_j++) {
> +                float4 px = read_imagef(src, sampler, loc + (int2)(conv_j, conv_i));
> +                convPix[i] += px[i] * coef_matrices[offset + (conv_i+half_matrix_dim) * coef_matrices_dims[i] + (conv_j+half_matrix_dim)];
> +            }
> +        }
> +        convPix[i] = convPix[i] * rdivs[i] + biases[i];
> +    }
> +    write_imagef(dst, loc, convPix);
> +}

This is still doing the wrong thing by acting per-component rather than per-plane.

E.g. compare the effects of:

-vf 'format=yuv420p,hwupload,convolution_opencl=0m=0 1 0 1 -4 1 0 1 0:0rdiv=1,hwdownload,format=yuv420p'

and 

-vf 'format=yuv420p,convolution=0m=0 1 0 1 -4 1 0 1 0:0rdiv=1,format=yuv420p'

(This should find edges in the luma plane while keeping the chroma planes unchanged.)

> diff --git a/libavfilter/opencl_source.h b/libavfilter/opencl_source.h
> index 23cdfc6..158fb3e 100644
> --- a/libavfilter/opencl_source.h
> +++ b/libavfilter/opencl_source.h
> @@ -19,6 +19,7 @@
>  #ifndef AVFILTER_OPENCL_SOURCE_H
>  #define AVFILTER_OPENCL_SOURCE_H
>  
> +extern const char *ff_opencl_source_convolution;
>  extern const char *ff_opencl_source_overlay;
>  extern const char *ff_opencl_source_unsharp;
>  
> diff --git a/libavfilter/vf_convolution_opencl.c b/libavfilter/vf_convolution_opencl.c
> new file mode 100644
> index 0000000..96132a3
> --- /dev/null
> +++ b/libavfilter/vf_convolution_opencl.c
> @@ -0,0 +1,449 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include "libavutil/common.h"
> +#include "libavutil/imgutils.h"
> +#include "libavutil/mem.h"
> +#include "libavutil/opt.h"
> +#include "libavutil/pixdesc.h"
> +#include "libavutil/avstring.h"
> +
> +
> +#include "avfilter.h"
> +#include "internal.h"
> +#include "opencl.h"
> +#include "opencl_source.h"
> +#include "video.h"
> +
> +typedef struct ConvolutionOpenCLContext {
> +    OpenCLFilterContext ocf;
> +
> +    int              initialised;
> +    cl_kernel        kernel;
> +    cl_command_queue command_queue;
> +
> +    char *matrix_str[4];
> +    float  rdivs[4];
> +    float  biases[4];
> +
> +    cl_int sum_matrices_sizes;
> +    cl_int matrix_size[4];
> +    cl_mem matrix;
> +    cl_int dims[4];
> +    cl_mem rdiv_buffer;
> +    cl_mem bias_buffer;
> +    cl_mem dims_buffer;
> +
> +} ConvolutionOpenCLContext;
> +
> +
> +static int convolution_opencl_init(AVFilterContext *avctx)
> +{
> +    ConvolutionOpenCLContext *ctx = avctx->priv;
> +    cl_int cle;
> +    int err;
> +
> +    err = ff_opencl_filter_load_program(avctx, &ff_opencl_source_convolution, 1);
> +    if (err < 0)
> +        goto fail;
> +
> +    ctx->command_queue = clCreateCommandQueue(ctx->ocf.hwctx->context,
> +                                              ctx->ocf.hwctx->device_id,
> +                                              0, &cle);
> +    if (!ctx->command_queue) {
> +        av_log(avctx, AV_LOG_ERROR, "Failed to create OpenCL "
> +               "command queue: %d.\n", cle);
> +        err = AVERROR(EIO);
> +        goto fail;
> +    }
> +
> +    ctx->kernel = clCreateKernel(ctx->ocf.program, "convolution_global", &cle);
> +    if (!ctx->kernel) {
> +        av_log(avctx, AV_LOG_ERROR, "Failed to create kernel: %d.\n", cle);
> +        err = AVERROR(EIO);
> +        goto fail;
> +    }
> +
> +    ctx->initialised = 1;
> +    return 0;
> +
> +fail:
> +    if (ctx->command_queue)
> +        clReleaseCommandQueue(ctx->command_queue);
> +    if (ctx->kernel)
> +        clReleaseKernel(ctx->kernel);
> +    return err;
> +}
> +
> +
> +static int convolution_opencl_make_filter_params(AVFilterContext *avctx)
> +{
> +    ConvolutionOpenCLContext *ctx = avctx->priv;
> +    float *matrix      = NULL;
> +    float *matrix_rdiv = NULL;
> +    float *matrix_bias = NULL;
> +    int   *matrix_dims = NULL;
> +    size_t matrix_bytes;
> +    cl_mem buffer, buffer_rdiv, buffer_bias, buffer_size;
> +    cl_int cle;
> +    int err, sscanf_err;
> +    int i, j, cnt;
> +    char *p, *arg, *saveptr = NULL;
> +    float input_matrix[4][49];
> +
> +    ctx->sum_matrices_sizes = 0;
> +    for (i = 0; i < 4; i++) {
> +        p = ctx->matrix_str[i];
> +        while (ctx->matrix_size[i] < 49) {
> +            if (!(arg = av_strtok(p, " ", &saveptr)))
> +                break;
> +            p = NULL;
> +            sscanf_err = sscanf(arg, "%f", &input_matrix[i][ctx->matrix_size[i]]);
> +            if (sscanf_err != 1) {
> +                av_log(ctx, AV_LOG_ERROR, "Matrix is sequence of 9, 25 or 49 signed numbers\n");
> +                return AVERROR(EINVAL);
> +            }
> +            ctx->matrix_size[i]++;
> +        }
> +        if (ctx->matrix_size[i] == 9) {
> +            ctx->dims[i] = 3;
> +        } else if (ctx->matrix_size[i] == 25) {
> +            ctx->dims[i] = 5;
> +        } else if (ctx->matrix_size[i] == 49) {
> +            ctx->dims[i] = 7;
> +        } else {
> +            av_log(ctx, AV_LOG_ERROR, "Invalid matrix size:%d\n", ctx->matrix_size[i]);
> +            return AVERROR(EINVAL);
> +        }
> +        ctx->sum_matrices_sizes += ctx->matrix_size[i];
> +
> +    }
> +    matrix_bytes = sizeof(float)*ctx->sum_matrices_sizes;
> +    matrix = av_malloc(matrix_bytes);
> +    if (!matrix) {
> +        err = AVERROR(ENOMEM);
> +        goto fail;
> +    }
> +    cnt = 0;
> +    for (i = 0; i < 4; i++) {
> +        for (j = 0; j < ctx->matrix_size[i]; j++) {
> +            matrix[cnt++] = input_matrix[i][j];
> +        }
> +    }
> +
> +    buffer = clCreateBuffer(ctx->ocf.hwctx->context,
> +                            CL_MEM_READ_ONLY |
> +                            CL_MEM_COPY_HOST_PTR |
> +                            CL_MEM_HOST_NO_ACCESS,
> +                            matrix_bytes, matrix, &cle);
> +    if (!buffer) {
> +        av_log(avctx, AV_LOG_ERROR, "Failed to create matrix buffer: "
> +               "%d.\n", cle);
> +        err = AVERROR(EIO);
> +        goto fail;
> +    }
> +    ctx->matrix = buffer;
> +
> +
> +    matrix_bytes = sizeof(float)*4;
> +    matrix_rdiv = av_malloc(matrix_bytes);
> +    if (!matrix_rdiv) {
> +        err = AVERROR(ENOMEM);
> +        goto fail;
> +    }
> +
> +    for (j = 0; j < 4; j++) {
> +        matrix_rdiv[j] = ctx->rdivs[j];
> +    }
> +    buffer_rdiv = clCreateBuffer(ctx->ocf.hwctx->context,
> +                            CL_MEM_READ_ONLY |
> +                            CL_MEM_COPY_HOST_PTR |
> +                            CL_MEM_HOST_NO_ACCESS,
> +                            matrix_bytes, matrix_rdiv, &cle);
> +    if (!buffer_rdiv) {
> +        av_log(avctx, AV_LOG_ERROR, "Failed to create rdiv buffer: "
> +               "%d.\n", cle);
> +        err = AVERROR(EIO);
> +        goto fail;
> +    }
> +    ctx->rdiv_buffer = buffer_rdiv;
> +
> +
> +    matrix_bytes = sizeof(float)*4;
> +    matrix_bias = av_malloc(matrix_bytes);
> +    if (!matrix_bias) {
> +        err = AVERROR(ENOMEM);
> +        goto fail;
> +    }
> +
> +    for (j = 0; j < 4; j++) {
> +        matrix_bias[j] = ctx->biases[j];
> +    }
> +    buffer_bias = clCreateBuffer(ctx->ocf.hwctx->context,
> +                            CL_MEM_READ_ONLY |
> +                            CL_MEM_COPY_HOST_PTR |
> +                            CL_MEM_HOST_NO_ACCESS,
> +                            matrix_bytes, matrix_bias, &cle);
> +    if (!buffer_bias) {
> +        av_log(avctx, AV_LOG_ERROR, "Failed to create bias buffer: "
> +               "%d.\n", cle);
> +        err = AVERROR(EIO);
> +        goto fail;
> +    }
> +    ctx->bias_buffer = buffer_bias;
> +
> +    matrix_bytes = sizeof(int)*4;
> +    matrix_dims = av_malloc(matrix_bytes);
> +    if (!matrix_dims) {
> +        err = AVERROR(ENOMEM);
> +        goto fail;
> +    }
> +
> +    for (j = 0; j < 4; j++) {
> +        matrix_dims[j] = ctx->dims[j];
> +    }
> +    buffer_size = clCreateBuffer(ctx->ocf.hwctx->context,
> +                            CL_MEM_READ_ONLY |
> +                            CL_MEM_COPY_HOST_PTR |
> +                            CL_MEM_HOST_NO_ACCESS,
> +                            matrix_bytes, matrix_dims, &cle);
> +    if (!buffer_size) {
> +        av_log(avctx, AV_LOG_ERROR, "Failed to create bias buffer: "
> +               "%d.\n", cle);
> +        err = AVERROR(EIO);
> +        goto fail;
> +    }
> +    ctx->dims_buffer = buffer_size;
> +
> +    err = 0;
> +fail:
> +    av_freep(&matrix_dims);
> +    av_freep(&matrix_bias);
> +    av_freep(&matrix_rdiv);
> +    av_freep(&matrix);
> +    return err;
> +}
> +
> +static int convolution_opencl_filter_frame(AVFilterLink *inlink, AVFrame *input)
> +{
> +    AVFilterContext    *avctx = inlink->dst;
> +    AVFilterLink     *outlink = avctx->outputs[0];
> +    ConvolutionOpenCLContext *ctx = avctx->priv;
> +    AVFrame *output = NULL;
> +    cl_int cle;
> +    size_t global_work[2];
> +    cl_mem src, dst;
> +    int err, p;
> +
> +    av_log(ctx, AV_LOG_DEBUG, "Filter input: %s, %ux%u (%"PRId64").\n",
> +           av_get_pix_fmt_name(input->format),
> +           input->width, input->height, input->pts);
> +
> +    if (!input->hw_frames_ctx)
> +        return AVERROR(EINVAL);
> +
> +    if (!ctx->initialised) {
> +        err = convolution_opencl_init(avctx);
> +        if (err < 0)
> +            goto fail;
> +
> +        err = convolution_opencl_make_filter_params(avctx);
> +        if (err < 0)
> +            goto fail;
> +    }
> +
> +    output = ff_get_video_buffer(outlink, outlink->w, outlink->h);
> +    if (!output) {
> +        err = AVERROR(ENOMEM);
> +        goto fail;
> +    }
> +
> +    for (p = 0; p < FF_ARRAY_ELEMS(output->data); p++) {
> +        src = (cl_mem) input->data[p];
> +        dst = (cl_mem)output->data[p];
> +
> +
> +        if (!dst)
> +            break;
> +
> +        cle = clSetKernelArg(ctx->kernel, 0, sizeof(cl_mem), &dst);
> +        if (cle != CL_SUCCESS) {
> +            av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
> +                   "destination image argument: %d.\n", cle);
> +            goto fail;
> +        }
> +        cle = clSetKernelArg(ctx->kernel, 1, sizeof(cl_mem), &src);
> +        if (cle != CL_SUCCESS) {
> +            av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
> +                   "source image argument: %d.\n", cle);
> +            goto fail;
> +        }
> +        cle = clSetKernelArg(ctx->kernel, 2, sizeof(cl_mem), &ctx->dims_buffer);
> +        if (cle != CL_SUCCESS) {
> +            av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
> +                   "matrix size argument: %d.\n", cle);
> +            goto fail;
> +        }
> +        cle = clSetKernelArg(ctx->kernel, 3, sizeof(cl_mem), &ctx->matrix);
> +        if (cle != CL_SUCCESS) {
> +            av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
> +                   "matrix argument: %d.\n", cle);
> +            goto fail;
> +        }
> +        cle = clSetKernelArg(ctx->kernel, 4, sizeof(cl_mem), &ctx->rdiv_buffer);
> +        if (cle != CL_SUCCESS) {
> +            av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
> +                   "div argument: %d.\n", cle);
> +            goto fail;
> +        }
> +        cle = clSetKernelArg(ctx->kernel, 5, sizeof(cl_mem), &ctx->bias_buffer);
> +        if (cle != CL_SUCCESS) {
> +            av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
> +                   "bias argument: %d.\n", cle);
> +            goto fail;
> +        }
> +
> +
> +
> +        global_work[0] = output->width;
> +        global_work[1] = output->height;
> +
> +
> +        av_log(avctx, AV_LOG_DEBUG, "Run kernel on plane %d "
> +               "(%"SIZE_SPECIFIER"x%"SIZE_SPECIFIER").\n",
> +               p, global_work[0], global_work[1]);
> +
> +        cle = clEnqueueNDRangeKernel(ctx->command_queue, ctx->kernel, 2, NULL,
> +                                     global_work, NULL,
> +                                     0, NULL, NULL);
> +        if (cle != CL_SUCCESS) {
> +            av_log(avctx, AV_LOG_ERROR, "Failed to enqueue kernel: %d.\n",
> +                   cle);
> +            err = AVERROR(EIO);
> +            goto fail;
> +        }
> +    }
> +
> +    cle = clFinish(ctx->command_queue);
> +    if (cle != CL_SUCCESS) {
> +        av_log(avctx, AV_LOG_ERROR, "Failed to finish command queue: %d.\n",
> +               cle);
> +        err = AVERROR(EIO);
> +        goto fail;
> +    }
> +
> +    err = av_frame_copy_props(output, input);
> +    if (err < 0)
> +        goto fail;
> +
> +    av_frame_free(&input);
> +
> +    av_log(ctx, AV_LOG_DEBUG, "Filter output: %s, %ux%u (%"PRId64").\n",
> +           av_get_pix_fmt_name(output->format),
> +           output->width, output->height, output->pts);
> +
> +    return ff_filter_frame(outlink, output);
> +
> +fail:
> +    clFinish(ctx->command_queue);
> +    av_frame_free(&input);
> +    av_frame_free(&output);
> +    return err;
> +}
> +
> +static av_cold void convolution_opencl_uninit(AVFilterContext *avctx)
> +{
> +    ConvolutionOpenCLContext *ctx = avctx->priv;
> +    cl_int cle;
> +
> +    if (ctx->matrix) {
> +        clReleaseMemObject(ctx->matrix);
> +        clReleaseMemObject(ctx->rdiv_buffer);
> +        clReleaseMemObject(ctx->bias_buffer);
> +        clReleaseMemObject(ctx->dims_buffer);
> +    }
> +
> +    if (ctx->kernel) {
> +        cle = clReleaseKernel(ctx->kernel);
> +        if (cle != CL_SUCCESS)
> +            av_log(avctx, AV_LOG_ERROR, "Failed to release "
> +                   "kernel: %d.\n", cle);
> +    }
> +
> +    if (ctx->command_queue) {
> +        cle = clReleaseCommandQueue(ctx->command_queue);
> +        if (cle != CL_SUCCESS)
> +            av_log(avctx, AV_LOG_ERROR, "Failed to release "
> +                   "command queue: %d.\n", cle);
> +    }
> +
> +    ff_opencl_filter_uninit(avctx);
> +}
> +
> +#define OFFSET(x) offsetof(ConvolutionOpenCLContext, x)
> +#define FLAGS (AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM)
> +static const AVOption convolution_opencl_options[] = {
> +    { "0m", "set matrix for 1st plane", OFFSET(matrix_str[0]), AV_OPT_TYPE_STRING, {.str="0 0 0 0 1 0 0 0 0"}, 0, 0, FLAGS },
> +    { "1m", "set matrix for 2nd plane", OFFSET(matrix_str[1]), AV_OPT_TYPE_STRING, {.str="0 0 0 0 1 0 0 0 0"}, 0, 0, FLAGS },
> +    { "2m", "set matrix for 3rd plane", OFFSET(matrix_str[2]), AV_OPT_TYPE_STRING, {.str="0 0 0 0 1 0 0 0 0"}, 0, 0, FLAGS },
> +    { "3m", "set matrix for 4th plane", OFFSET(matrix_str[3]), AV_OPT_TYPE_STRING, {.str="0 0 0 0 1 0 0 0 0"}, 0, 0, FLAGS },
> +    { "0rdiv", "set rdiv for 1st plane", OFFSET(rdivs[0]), AV_OPT_TYPE_FLOAT, {.dbl=1.0}, 0.0, INT_MAX, FLAGS},
> +    { "1rdiv", "set rdiv for 2nd plane", OFFSET(rdivs[1]), AV_OPT_TYPE_FLOAT, {.dbl=1.0}, 0.0, INT_MAX, FLAGS},
> +    { "2rdiv", "set rdiv for 3rd plane", OFFSET(rdivs[2]), AV_OPT_TYPE_FLOAT, {.dbl=1.0}, 0.0, INT_MAX, FLAGS},
> +    { "3rdiv", "set rdiv for 4th plane", OFFSET(rdivs[3]), AV_OPT_TYPE_FLOAT, {.dbl=1.0}, 0.0, INT_MAX, FLAGS},
> +    { "0bias", "set bias for 1st plane", OFFSET(biases[0]), AV_OPT_TYPE_FLOAT, {.dbl=0.0}, 0.0, INT_MAX, FLAGS},
> +    { "1bias", "set bias for 2nd plane", OFFSET(biases[1]), AV_OPT_TYPE_FLOAT, {.dbl=0.0}, 0.0, INT_MAX, FLAGS},
> +    { "2bias", "set bias for 3rd plane", OFFSET(biases[2]), AV_OPT_TYPE_FLOAT, {.dbl=0.0}, 0.0, INT_MAX, FLAGS},
> +    { "3bias", "set bias for 4th plane", OFFSET(biases[3]), AV_OPT_TYPE_FLOAT, {.dbl=0.0}, 0.0, INT_MAX, FLAGS},
> +    { NULL }
> +};
> +
> +AVFILTER_DEFINE_CLASS(convolution_opencl);
> +
> +
> +static const AVFilterPad convolution_opencl_inputs[] = {
> +    {
> +        .name         = "default",
> +        .type         = AVMEDIA_TYPE_VIDEO,
> +        .filter_frame = &convolution_opencl_filter_frame,
> +        .config_props = &ff_opencl_filter_config_input,
> +    },
> +    { NULL }
> +};
> +
> +static const AVFilterPad convolution_opencl_outputs[] = {
> +    {
> +        .name         = "default",
> +        .type         = AVMEDIA_TYPE_VIDEO,
> +        .config_props = &ff_opencl_filter_config_output,
> +    },
> +    { NULL }
> +};
> +
> +AVFilter ff_vf_convolution_opencl = {
> +    .name           = "convolution_opencl",
> +    .description    = NULL_IF_CONFIG_SMALL("Apply convolution mask to input video"),
> +    .priv_size      = sizeof(ConvolutionOpenCLContext),
> +    .priv_class     = &convolution_opencl_class,
> +    .init           = &ff_opencl_filter_init,
> +    .uninit         = &convolution_opencl_uninit,
> +    .query_formats  = &ff_opencl_filter_query_formats,
> +    .inputs         = convolution_opencl_inputs,
> +    .outputs        = convolution_opencl_outputs,
> +    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
> +};
> 

Beyond the fact that it doesn't do the same thing as the convolution filter and it should, I don't have any further comments on the code in this.

Thanks,

- Mark
diff mbox

Patch

diff --git a/configure b/configure
index 6916b45..bf5c312 100755
--- a/configure
+++ b/configure
@@ -3210,6 +3210,7 @@  blackframe_filter_deps="gpl"
 boxblur_filter_deps="gpl"
 bs2b_filter_deps="libbs2b"
 colormatrix_filter_deps="gpl"
+convolution_opencl_filter_deps="opencl"
 convolve_filter_deps="avcodec"
 convolve_filter_select="fft"
 coreimage_filter_deps="coreimage appkit"
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 6a60836..d005934 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -156,6 +156,7 @@  OBJS-$(CONFIG_COLORLEVELS_FILTER)            += vf_colorlevels.o
 OBJS-$(CONFIG_COLORMATRIX_FILTER)            += vf_colormatrix.o
 OBJS-$(CONFIG_COLORSPACE_FILTER)             += vf_colorspace.o colorspacedsp.o
 OBJS-$(CONFIG_CONVOLUTION_FILTER)            += vf_convolution.o
+OBJS-$(CONFIG_CONVOLUTION_OPENCL_FILTER)     += vf_convolution_opencl.o opencl.o opencl/convolution.o
 OBJS-$(CONFIG_CONVOLVE_FILTER)               += vf_convolve.o framesync.o
 OBJS-$(CONFIG_COPY_FILTER)                   += vf_copy.o
 OBJS-$(CONFIG_COREIMAGE_FILTER)              += vf_coreimage.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 9adb109..f2dc55e 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -166,6 +166,7 @@  static void register_all(void)
     REGISTER_FILTER(COLORMATRIX,    colormatrix,    vf);
     REGISTER_FILTER(COLORSPACE,     colorspace,     vf);
     REGISTER_FILTER(CONVOLUTION,    convolution,    vf);
+    REGISTER_FILTER(CONVOLUTION_OPENCL, convolution_opencl, vf);
     REGISTER_FILTER(CONVOLVE,       convolve,       vf);
     REGISTER_FILTER(COPY,           copy,           vf);
     REGISTER_FILTER(COREIMAGE,      coreimage,      vf);
diff --git a/libavfilter/opencl/convolution.cl b/libavfilter/opencl/convolution.cl
new file mode 100644
index 0000000..192f1ef
--- /dev/null
+++ b/libavfilter/opencl/convolution.cl
@@ -0,0 +1,46 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+__kernel void convolution_global(__write_only image2d_t dst,
+                                 __read_only  image2d_t src,
+                                 __constant int *coef_matrices_dims,
+                                 __constant float *coef_matrices,
+                                 __constant float *rdivs,
+                                 __constant float *biases)
+{
+    const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE | CLK_FILTER_NEAREST);
+
+    int2 loc = (int2)(get_global_id(0), get_global_id(1));
+
+    float4 convPix = (float4)(0.0f, 0.0f, 0.0f, 0.0f);
+    for (int i = 0; i < 4; i++) {
+        int half_matrix_dim = coef_matrices_dims[i] / 2;
+        int offset = 0;
+        for (int j = 0; j < i; j++) {
+            offset += coef_matrices_dims[j] * coef_matrices_dims[j];
+        }
+        for (int conv_i = -half_matrix_dim; conv_i <= half_matrix_dim; conv_i++) {
+            for (int conv_j = -half_matrix_dim; conv_j <= half_matrix_dim; conv_j++) {
+                float4 px = read_imagef(src, sampler, loc + (int2)(conv_j, conv_i));
+                convPix[i] += px[i] * coef_matrices[offset + (conv_i+half_matrix_dim) * coef_matrices_dims[i] + (conv_j+half_matrix_dim)];
+            }
+        }
+        convPix[i] = convPix[i] * rdivs[i] + biases[i];
+    }
+    write_imagef(dst, loc, convPix);
+}
diff --git a/libavfilter/opencl_source.h b/libavfilter/opencl_source.h
index 23cdfc6..158fb3e 100644
--- a/libavfilter/opencl_source.h
+++ b/libavfilter/opencl_source.h
@@ -19,6 +19,7 @@ 
 #ifndef AVFILTER_OPENCL_SOURCE_H
 #define AVFILTER_OPENCL_SOURCE_H
 
+extern const char *ff_opencl_source_convolution;
 extern const char *ff_opencl_source_overlay;
 extern const char *ff_opencl_source_unsharp;
 
diff --git a/libavfilter/vf_convolution_opencl.c b/libavfilter/vf_convolution_opencl.c
new file mode 100644
index 0000000..96132a3
--- /dev/null
+++ b/libavfilter/vf_convolution_opencl.c
@@ -0,0 +1,449 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/common.h"
+#include "libavutil/imgutils.h"
+#include "libavutil/mem.h"
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
+#include "libavutil/avstring.h"
+
+
+#include "avfilter.h"
+#include "internal.h"
+#include "opencl.h"
+#include "opencl_source.h"
+#include "video.h"
+
+typedef struct ConvolutionOpenCLContext {
+    OpenCLFilterContext ocf;
+
+    int              initialised;
+    cl_kernel        kernel;
+    cl_command_queue command_queue;
+
+    char *matrix_str[4];
+    float  rdivs[4];
+    float  biases[4];
+
+    cl_int sum_matrices_sizes;
+    cl_int matrix_size[4];
+    cl_mem matrix;
+    cl_int dims[4];
+    cl_mem rdiv_buffer;
+    cl_mem bias_buffer;
+    cl_mem dims_buffer;
+
+} ConvolutionOpenCLContext;
+
+
+static int convolution_opencl_init(AVFilterContext *avctx)
+{
+    ConvolutionOpenCLContext *ctx = avctx->priv;
+    cl_int cle;
+    int err;
+
+    err = ff_opencl_filter_load_program(avctx, &ff_opencl_source_convolution, 1);
+    if (err < 0)
+        goto fail;
+
+    ctx->command_queue = clCreateCommandQueue(ctx->ocf.hwctx->context,
+                                              ctx->ocf.hwctx->device_id,
+                                              0, &cle);
+    if (!ctx->command_queue) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create OpenCL "
+               "command queue: %d.\n", cle);
+        err = AVERROR(EIO);
+        goto fail;
+    }
+
+    ctx->kernel = clCreateKernel(ctx->ocf.program, "convolution_global", &cle);
+    if (!ctx->kernel) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create kernel: %d.\n", cle);
+        err = AVERROR(EIO);
+        goto fail;
+    }
+
+    ctx->initialised = 1;
+    return 0;
+
+fail:
+    if (ctx->command_queue)
+        clReleaseCommandQueue(ctx->command_queue);
+    if (ctx->kernel)
+        clReleaseKernel(ctx->kernel);
+    return err;
+}
+
+
+static int convolution_opencl_make_filter_params(AVFilterContext *avctx)
+{
+    ConvolutionOpenCLContext *ctx = avctx->priv;
+    float *matrix      = NULL;
+    float *matrix_rdiv = NULL;
+    float *matrix_bias = NULL;
+    int   *matrix_dims = NULL;
+    size_t matrix_bytes;
+    cl_mem buffer, buffer_rdiv, buffer_bias, buffer_size;
+    cl_int cle;
+    int err, sscanf_err;
+    int i, j, cnt;
+    char *p, *arg, *saveptr = NULL;
+    float input_matrix[4][49];
+
+    ctx->sum_matrices_sizes = 0;
+    for (i = 0; i < 4; i++) {
+        p = ctx->matrix_str[i];
+        while (ctx->matrix_size[i] < 49) {
+            if (!(arg = av_strtok(p, " ", &saveptr)))
+                break;
+            p = NULL;
+            sscanf_err = sscanf(arg, "%f", &input_matrix[i][ctx->matrix_size[i]]);
+            if (sscanf_err != 1) {
+                av_log(ctx, AV_LOG_ERROR, "Matrix is sequence of 9, 25 or 49 signed numbers\n");
+                return AVERROR(EINVAL);
+            }
+            ctx->matrix_size[i]++;
+        }
+        if (ctx->matrix_size[i] == 9) {
+            ctx->dims[i] = 3;
+        } else if (ctx->matrix_size[i] == 25) {
+            ctx->dims[i] = 5;
+        } else if (ctx->matrix_size[i] == 49) {
+            ctx->dims[i] = 7;
+        } else {
+            av_log(ctx, AV_LOG_ERROR, "Invalid matrix size:%d\n", ctx->matrix_size[i]);
+            return AVERROR(EINVAL);
+        }
+        ctx->sum_matrices_sizes += ctx->matrix_size[i];
+
+    }
+    matrix_bytes = sizeof(float)*ctx->sum_matrices_sizes;
+    matrix = av_malloc(matrix_bytes);
+    if (!matrix) {
+        err = AVERROR(ENOMEM);
+        goto fail;
+    }
+    cnt = 0;
+    for (i = 0; i < 4; i++) {
+        for (j = 0; j < ctx->matrix_size[i]; j++) {
+            matrix[cnt++] = input_matrix[i][j];
+        }
+    }
+
+    buffer = clCreateBuffer(ctx->ocf.hwctx->context,
+                            CL_MEM_READ_ONLY |
+                            CL_MEM_COPY_HOST_PTR |
+                            CL_MEM_HOST_NO_ACCESS,
+                            matrix_bytes, matrix, &cle);
+    if (!buffer) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create matrix buffer: "
+               "%d.\n", cle);
+        err = AVERROR(EIO);
+        goto fail;
+    }
+    ctx->matrix = buffer;
+
+
+    matrix_bytes = sizeof(float)*4;
+    matrix_rdiv = av_malloc(matrix_bytes);
+    if (!matrix_rdiv) {
+        err = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    for (j = 0; j < 4; j++) {
+        matrix_rdiv[j] = ctx->rdivs[j];
+    }
+    buffer_rdiv = clCreateBuffer(ctx->ocf.hwctx->context,
+                            CL_MEM_READ_ONLY |
+                            CL_MEM_COPY_HOST_PTR |
+                            CL_MEM_HOST_NO_ACCESS,
+                            matrix_bytes, matrix_rdiv, &cle);
+    if (!buffer_rdiv) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create rdiv buffer: "
+               "%d.\n", cle);
+        err = AVERROR(EIO);
+        goto fail;
+    }
+    ctx->rdiv_buffer = buffer_rdiv;
+
+
+    matrix_bytes = sizeof(float)*4;
+    matrix_bias = av_malloc(matrix_bytes);
+    if (!matrix_bias) {
+        err = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    for (j = 0; j < 4; j++) {
+        matrix_bias[j] = ctx->biases[j];
+    }
+    buffer_bias = clCreateBuffer(ctx->ocf.hwctx->context,
+                            CL_MEM_READ_ONLY |
+                            CL_MEM_COPY_HOST_PTR |
+                            CL_MEM_HOST_NO_ACCESS,
+                            matrix_bytes, matrix_bias, &cle);
+    if (!buffer_bias) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create bias buffer: "
+               "%d.\n", cle);
+        err = AVERROR(EIO);
+        goto fail;
+    }
+    ctx->bias_buffer = buffer_bias;
+
+    matrix_bytes = sizeof(int)*4;
+    matrix_dims = av_malloc(matrix_bytes);
+    if (!matrix_dims) {
+        err = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    for (j = 0; j < 4; j++) {
+        matrix_dims[j] = ctx->dims[j];
+    }
+    buffer_size = clCreateBuffer(ctx->ocf.hwctx->context,
+                            CL_MEM_READ_ONLY |
+                            CL_MEM_COPY_HOST_PTR |
+                            CL_MEM_HOST_NO_ACCESS,
+                            matrix_bytes, matrix_dims, &cle);
+    if (!buffer_size) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create bias buffer: "
+               "%d.\n", cle);
+        err = AVERROR(EIO);
+        goto fail;
+    }
+    ctx->dims_buffer = buffer_size;
+
+    err = 0;
+fail:
+    av_freep(&matrix_dims);
+    av_freep(&matrix_bias);
+    av_freep(&matrix_rdiv);
+    av_freep(&matrix);
+    return err;
+}
+
+static int convolution_opencl_filter_frame(AVFilterLink *inlink, AVFrame *input)
+{
+    AVFilterContext    *avctx = inlink->dst;
+    AVFilterLink     *outlink = avctx->outputs[0];
+    ConvolutionOpenCLContext *ctx = avctx->priv;
+    AVFrame *output = NULL;
+    cl_int cle;
+    size_t global_work[2];
+    cl_mem src, dst;
+    int err, p;
+
+    av_log(ctx, AV_LOG_DEBUG, "Filter input: %s, %ux%u (%"PRId64").\n",
+           av_get_pix_fmt_name(input->format),
+           input->width, input->height, input->pts);
+
+    if (!input->hw_frames_ctx)
+        return AVERROR(EINVAL);
+
+    if (!ctx->initialised) {
+        err = convolution_opencl_init(avctx);
+        if (err < 0)
+            goto fail;
+
+        err = convolution_opencl_make_filter_params(avctx);
+        if (err < 0)
+            goto fail;
+    }
+
+    output = ff_get_video_buffer(outlink, outlink->w, outlink->h);
+    if (!output) {
+        err = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    for (p = 0; p < FF_ARRAY_ELEMS(output->data); p++) {
+        src = (cl_mem) input->data[p];
+        dst = (cl_mem)output->data[p];
+
+
+        if (!dst)
+            break;
+
+        cle = clSetKernelArg(ctx->kernel, 0, sizeof(cl_mem), &dst);
+        if (cle != CL_SUCCESS) {
+            av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
+                   "destination image argument: %d.\n", cle);
+            goto fail;
+        }
+        cle = clSetKernelArg(ctx->kernel, 1, sizeof(cl_mem), &src);
+        if (cle != CL_SUCCESS) {
+            av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
+                   "source image argument: %d.\n", cle);
+            goto fail;
+        }
+        cle = clSetKernelArg(ctx->kernel, 2, sizeof(cl_mem), &ctx->dims_buffer);
+        if (cle != CL_SUCCESS) {
+            av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
+                   "matrix size argument: %d.\n", cle);
+            goto fail;
+        }
+        cle = clSetKernelArg(ctx->kernel, 3, sizeof(cl_mem), &ctx->matrix);
+        if (cle != CL_SUCCESS) {
+            av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
+                   "matrix argument: %d.\n", cle);
+            goto fail;
+        }
+        cle = clSetKernelArg(ctx->kernel, 4, sizeof(cl_mem), &ctx->rdiv_buffer);
+        if (cle != CL_SUCCESS) {
+            av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
+                   "div argument: %d.\n", cle);
+            goto fail;
+        }
+        cle = clSetKernelArg(ctx->kernel, 5, sizeof(cl_mem), &ctx->bias_buffer);
+        if (cle != CL_SUCCESS) {
+            av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
+                   "bias argument: %d.\n", cle);
+            goto fail;
+        }
+
+
+
+        global_work[0] = output->width;
+        global_work[1] = output->height;
+
+
+        av_log(avctx, AV_LOG_DEBUG, "Run kernel on plane %d "
+               "(%"SIZE_SPECIFIER"x%"SIZE_SPECIFIER").\n",
+               p, global_work[0], global_work[1]);
+
+        cle = clEnqueueNDRangeKernel(ctx->command_queue, ctx->kernel, 2, NULL,
+                                     global_work, NULL,
+                                     0, NULL, NULL);
+        if (cle != CL_SUCCESS) {
+            av_log(avctx, AV_LOG_ERROR, "Failed to enqueue kernel: %d.\n",
+                   cle);
+            err = AVERROR(EIO);
+            goto fail;
+        }
+    }
+
+    cle = clFinish(ctx->command_queue);
+    if (cle != CL_SUCCESS) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to finish command queue: %d.\n",
+               cle);
+        err = AVERROR(EIO);
+        goto fail;
+    }
+
+    err = av_frame_copy_props(output, input);
+    if (err < 0)
+        goto fail;
+
+    av_frame_free(&input);
+
+    av_log(ctx, AV_LOG_DEBUG, "Filter output: %s, %ux%u (%"PRId64").\n",
+           av_get_pix_fmt_name(output->format),
+           output->width, output->height, output->pts);
+
+    return ff_filter_frame(outlink, output);
+
+fail:
+    clFinish(ctx->command_queue);
+    av_frame_free(&input);
+    av_frame_free(&output);
+    return err;
+}
+
+static av_cold void convolution_opencl_uninit(AVFilterContext *avctx)
+{
+    ConvolutionOpenCLContext *ctx = avctx->priv;
+    cl_int cle;
+
+    if (ctx->matrix) {
+        clReleaseMemObject(ctx->matrix);
+        clReleaseMemObject(ctx->rdiv_buffer);
+        clReleaseMemObject(ctx->bias_buffer);
+        clReleaseMemObject(ctx->dims_buffer);
+    }
+
+    if (ctx->kernel) {
+        cle = clReleaseKernel(ctx->kernel);
+        if (cle != CL_SUCCESS)
+            av_log(avctx, AV_LOG_ERROR, "Failed to release "
+                   "kernel: %d.\n", cle);
+    }
+
+    if (ctx->command_queue) {
+        cle = clReleaseCommandQueue(ctx->command_queue);
+        if (cle != CL_SUCCESS)
+            av_log(avctx, AV_LOG_ERROR, "Failed to release "
+                   "command queue: %d.\n", cle);
+    }
+
+    ff_opencl_filter_uninit(avctx);
+}
+
+#define OFFSET(x) offsetof(ConvolutionOpenCLContext, x)
+#define FLAGS (AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM)
+static const AVOption convolution_opencl_options[] = {
+    { "0m", "set matrix for 1st plane", OFFSET(matrix_str[0]), AV_OPT_TYPE_STRING, {.str="0 0 0 0 1 0 0 0 0"}, 0, 0, FLAGS },
+    { "1m", "set matrix for 2nd plane", OFFSET(matrix_str[1]), AV_OPT_TYPE_STRING, {.str="0 0 0 0 1 0 0 0 0"}, 0, 0, FLAGS },
+    { "2m", "set matrix for 3rd plane", OFFSET(matrix_str[2]), AV_OPT_TYPE_STRING, {.str="0 0 0 0 1 0 0 0 0"}, 0, 0, FLAGS },
+    { "3m", "set matrix for 4th plane", OFFSET(matrix_str[3]), AV_OPT_TYPE_STRING, {.str="0 0 0 0 1 0 0 0 0"}, 0, 0, FLAGS },
+    { "0rdiv", "set rdiv for 1st plane", OFFSET(rdivs[0]), AV_OPT_TYPE_FLOAT, {.dbl=1.0}, 0.0, INT_MAX, FLAGS},
+    { "1rdiv", "set rdiv for 2nd plane", OFFSET(rdivs[1]), AV_OPT_TYPE_FLOAT, {.dbl=1.0}, 0.0, INT_MAX, FLAGS},
+    { "2rdiv", "set rdiv for 3rd plane", OFFSET(rdivs[2]), AV_OPT_TYPE_FLOAT, {.dbl=1.0}, 0.0, INT_MAX, FLAGS},
+    { "3rdiv", "set rdiv for 4th plane", OFFSET(rdivs[3]), AV_OPT_TYPE_FLOAT, {.dbl=1.0}, 0.0, INT_MAX, FLAGS},
+    { "0bias", "set bias for 1st plane", OFFSET(biases[0]), AV_OPT_TYPE_FLOAT, {.dbl=0.0}, 0.0, INT_MAX, FLAGS},
+    { "1bias", "set bias for 2nd plane", OFFSET(biases[1]), AV_OPT_TYPE_FLOAT, {.dbl=0.0}, 0.0, INT_MAX, FLAGS},
+    { "2bias", "set bias for 3rd plane", OFFSET(biases[2]), AV_OPT_TYPE_FLOAT, {.dbl=0.0}, 0.0, INT_MAX, FLAGS},
+    { "3bias", "set bias for 4th plane", OFFSET(biases[3]), AV_OPT_TYPE_FLOAT, {.dbl=0.0}, 0.0, INT_MAX, FLAGS},
+    { NULL }
+};
+
+AVFILTER_DEFINE_CLASS(convolution_opencl);
+
+
+static const AVFilterPad convolution_opencl_inputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .filter_frame = &convolution_opencl_filter_frame,
+        .config_props = &ff_opencl_filter_config_input,
+    },
+    { NULL }
+};
+
+static const AVFilterPad convolution_opencl_outputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .config_props = &ff_opencl_filter_config_output,
+    },
+    { NULL }
+};
+
+AVFilter ff_vf_convolution_opencl = {
+    .name           = "convolution_opencl",
+    .description    = NULL_IF_CONFIG_SMALL("Apply convolution mask to input video"),
+    .priv_size      = sizeof(ConvolutionOpenCLContext),
+    .priv_class     = &convolution_opencl_class,
+    .init           = &ff_opencl_filter_init,
+    .uninit         = &convolution_opencl_uninit,
+    .query_formats  = &ff_opencl_filter_query_formats,
+    .inputs         = convolution_opencl_inputs,
+    .outputs        = convolution_opencl_outputs,
+    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
+};