diff mbox

[FFmpeg-devel,RFC] lavfi: add opencl tonemap filter.

Message ID 1525419178-2419-1-git-send-email-ruiling.song@intel.com
State Superseded
Headers show

Commit Message

Ruiling Song May 4, 2018, 7:32 a.m. UTC
It basically does hdr to sdr conversion with tonemapping.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
---
This patch tries to add a filter to do hdr to sdr conversion with tonemapping.
The filter does all the job of tonemapping in one pass, which is quite different from the vf_tonemap.c
I choose this way because I think this would introduce less memory access.

And I find that tonemaping shares lots of code with colorspace conversion.
So I move color space related code into seprated files (both OpenCL kernel and host code).

I am not sure whether the design seems OK?
Is there anybody would like to give some comments on the overall design or implementation details?


Thanks!
Ruiling

 configure                              |   1 +
 libavfilter/Makefile                   |   2 +
 libavfilter/allfilters.c               |   1 +
 libavfilter/colorspace_basic.c         |  89 +++++++
 libavfilter/colorspace_basic.h         |  40 +++
 libavfilter/opencl/colorspace_basic.cl | 137 ++++++++++
 libavfilter/opencl/tonemap.cl          | 136 ++++++++++
 libavfilter/opencl_source.h            |   2 +
 libavfilter/vf_tonemap_opencl.c        | 472 +++++++++++++++++++++++++++++++++
 9 files changed, 880 insertions(+)
 create mode 100644 libavfilter/colorspace_basic.c
 create mode 100644 libavfilter/colorspace_basic.h
 create mode 100644 libavfilter/opencl/colorspace_basic.cl
 create mode 100644 libavfilter/opencl/tonemap.cl
 create mode 100644 libavfilter/vf_tonemap_opencl.c

Comments

Niklas Haas May 4, 2018, 6 p.m. UTC | #1
Hello Ruiling,

On Fri,  4 May 2018 15:32:58 +0800, Ruiling Song <ruiling.song@intel.com> wrote:
> It basically does hdr to sdr conversion with tonemapping.
> 
> Signed-off-by: Ruiling Song <ruiling.song@intel.com>
> ---
> This patch tries to add a filter to do hdr to sdr conversion with tonemapping.
> The filter does all the job of tonemapping in one pass, which is quite different from the vf_tonemap.c
> I choose this way because I think this would introduce less memory access.
> 
> And I find that tonemaping shares lots of code with colorspace conversion.
> So I move color space related code into seprated files (both OpenCL kernel and host code).
> 
> I am not sure whether the design seems OK?
> Is there anybody would like to give some comments on the overall design or implementation details?
> 
> 
> Thanks!
> Ruiling

As the original author of the tone mapping code that inspired vf_tonemap
and (by proxy) vf_tonemap_opencl, I can provide a handful of comments.

> +float3 map_one_pixel_rgb(float3 rgb, float peak) {
> +    // de-saturate
> +    float luma = get_luma(rgb.x, rgb.y, rgb.z);
> +    float overbright = max(luma - 2.0f, 1e-6f) / max(luma, 1e-6f);
> +    rgb = mix(rgb, (float3)luma, (float3)overbright);
> +
> +    float sig = max(max(rgb.x, max(rgb.y, rgb.z)), 1e-6f);
> +    float sig_old = sig;
> +    sig = TONE_FUNC(sig, peak);
> +    rgb *= (sig/sig_old);
> +    return rgb;
> +}

I consider this desaturation algorithm outdated. It works, but I think
it produces lower quality results than this one:

  float sig = max(max(rgb.x, max(rgb.y, rgb.z)), 1e-6f);
  float luma = get_luma(rgb.x, rgb.y, rgb.z);
  float coeff = max(sig - 0.18f, 1e-6f) / max(sig, 1e-6f);

  const float desaturation_coefficient = 0.5f; // or configurable!
  coeff = pow(coeff, 10 / desaturation_coefficient;
  rgb = mix(rgb, (float3)luma, coeff);
  sig = mix(sig, luma, coeff);

  // do the rest of tone-mapping on `sig`
  float sig_old = sig;
  ...

Basically, I've done the following:

- Calculate the overbright coefficient on `sig` rather than `luma` alone
- Instead of using an offset of 2.0f, use an offset of 0.18f
- Take the coefficient to a high exponent (lower exponent = more
  desaturation, which is why I invert the user-configurable parameter)

Since the new code needs to read `sig`, we also have to move up the
`sig` calculation and then correctly adjust it afterwards as well.

----

More importantly, this algorithm is missing something that I now
consider very important, and which would align well with OpenCL: source
brightness detection. Just doing the tone-mapping "blind" like this
works to some extent, but I think the best results require also
adjusting the exposure in order to compensate for hollywood's tendency
to ship poorly mastered, over-illuminated HDR material.

The basic premise is to calculate both the peak brightness as well as
the average brightness on a frame-by-frame basis, and incorporate those
measured values in the algorithm, in order to re-normalize overly bright
scenes to correspond to a typical SDR average of around 0.25. In
addition to this, calculating the peak explicitly allows us to exactly
tailor the hable() function to this specific frame, even if the
mastering metadata is missing or useless. (Which it almost always is)

Doing this in OpenCL would essentially require implementing a big
map-reduce to keep track of respectively the sum and max of each pixel's
brightness. In addition to this, I recommend averaging the results over
a few frames (I like to use around one second), with the caveat that
this is best paired with at least a naive scene change detection
heuristic to make sure this averaging window gets reset on a scene
change.

> +static double determine_signal_peak(AVFrame *in)
> +{
> +    AVFrameSideData *sd = av_frame_get_side_data(in, AV_FRAME_DATA_CONTENT_LIGHT_LEVEL);
> +    double peak = 0;
> +
> +    if (sd) {
> +        AVContentLightMetadata *clm = (AVContentLightMetadata *)sd->data;
> +        peak = clm->MaxCLL;
> +    }
> +
> +    sd = av_frame_get_side_data(in, AV_FRAME_DATA_MASTERING_DISPLAY_METADATA);
> +    if (!peak && sd) {
> +        AVMasteringDisplayMetadata *metadata = (AVMasteringDisplayMetadata *)sd->data;
> +        if (metadata->has_luminance)
> +            peak = av_q2d(metadata->max_luminance);
> +    }
> +
> +    /* smpte2084 needs the side data above to work correctly
> +     * if missing, assume that the original transfer was arib-std-b67 */
> +    if (!peak)
> +        peak = 1200;
> +
> +    return peak;
> +}

This seems strange. For a source without peak tagging, you should
probably be deciding based on the video frame's tagged transfer function
(ST.2084 or STD-B67). If it's the former, default to 10,000, rather than
1200.

> +    switch(input_frames_ctx->sw_format) {
> +    case AV_PIX_FMT_P010:
> +        err = launch_kernel(avctx, ctx->kernel, output, input, peak);
> +        if (err < 0) goto fail;
> +        break;
> +    default:
> +        av_log(ctx, AV_LOG_ERROR, "unsupported format in tonemap_opencl.\n");
> +        err = AVERROR(ENOSYS);
> +        goto fail;
> +    }

This seems a bit needlessly restrictive? At the very least, I would
expect high-bit RGB and ideally float formats to also be supported.

Finally, I'm not entirely sure how you ingest HLG, but in the case of
HLG it's important to run the HLG OOTF (to take the content from
scene-referred to display-referred light) *before* tone-mapping, rather
than after it. I would assume this is probably handled by other FFmpeg
components but it might be worth double checking just to be sure.

----

As a last note, you can find my GLSL(+Vulkan) implementations of the
algorithm changes described above, as well as all of the related
color-management code and various decision logic for what values to
infer/default here:
https://github.com/haasn/libplacebo/blob/master/src/shaders/colorspace.c

The documentation for the tunable parameters is here:
https://github.com/haasn/libplacebo/blob/master/src/include/libplacebo/shaders/colorspace.h

Of specific interest are the functions `pl_shader_tone_map` (the core tone
mapping logic), `hdr_update_peak` (peak/avg detection using compute
shaders + SSBOs) and `pl_shader_color_map` (the calling code that also
does color conversions, OOTF application, etc.)

The way I implemented the logic there is fully generalized and allows
going from any combination of source (space, peak, average) to any
destination (space, peak, average), doing only the operations necessary
while making sure to compensate for brightness differences. This also
works well when viewing HDR material on HDR devices with a lower dynamic
range than the original material, even when those devices are calibrated
to SDR curves (search for `hdr_simulation` in that file).

I hope that may provide some assistance in (ultimately) making the
ffmpeg tone mapping filter as good as it can be.

Thanks for reading,
Niklas Haas
Niklas Haas May 5, 2018, 4:45 p.m. UTC | #2
Another thing that came to my mind:

On Fri,  4 May 2018 15:32:58 +0800, Ruiling Song <ruiling.song@intel.com> wrote:
> +static int tonemap_opencl_filter_frame(AVFilterLink *inlink, AVFrame *input)
> +{
> +    AVFilterContext    *avctx = inlink->dst;
> +    AVFilterLink     *outlink = avctx->outputs[0];
> +    TonemapOpenCLContext *ctx = avctx->priv;
> +    AVFrame *output = NULL;
> +    cl_int cle;
> +    int err;
> +    double peak = ctx->peak;
> +
> +    AVHWFramesContext *input_frames_ctx =
> +        (AVHWFramesContext*)input->hw_frames_ctx->data;
> +
> +    av_log(ctx, AV_LOG_DEBUG, "Filter input: %s, %ux%u (%"PRId64").\n",
> +           av_get_pix_fmt_name(input->format),
> +           input->width, input->height, input->pts);
> +
> +    if (!input->hw_frames_ctx)
> +        return AVERROR(EINVAL);
> +
> +    output = ff_get_video_buffer(outlink, outlink->w, outlink->h);
> +    if (!output) {
> +        err = AVERROR(ENOMEM);
> +        goto fail;
> +    }
> +
> +    err = av_frame_copy_props(output, input);
> +    if (err < 0)
> +        goto fail;
> +
> +    if (!peak)
> +        peak = determine_signal_peak(input);
> +
> +    if (ctx->trc != -1)
> +        output->color_trc = ctx->trc;
> +    if (ctx->primaries != -1)
> +        output->color_primaries = ctx->primaries;
> +    if (ctx->colorspace != -1)
> +        output->colorspace = ctx->colorspace;
> +
> +    ctx->trc_in = input->color_trc;
> +    ctx->trc_out = output->color_trc;
> +    ctx->colorspace_in = input->colorspace;
> +    ctx->colorspace_out = output->colorspace;
> +    ctx->primaries_in = input->color_primaries;
> +    ctx->primaries_out = output->color_primaries;
> +
> +    assert(output->sw_format == AV_PIX_FMT_NV12);
> +
> +    if (!ctx->initialised) {
> +        err = tonemap_opencl_init(avctx);
> +        if (err < 0)
> +            goto fail;
> +    }
> +
> +    switch(input_frames_ctx->sw_format) {
> +    case AV_PIX_FMT_P010:
> +        err = launch_kernel(avctx, ctx->kernel, output, input, peak);
> +        if (err < 0) goto fail;
> +        break;
> +    default:
> +        av_log(ctx, AV_LOG_ERROR, "unsupported format in tonemap_opencl.\n");
> +        err = AVERROR(ENOSYS);
> +        goto fail;
> +    }
> +
> +    cle = clFinish(ctx->command_queue);
> +    if (cle != CL_SUCCESS) {
> +        av_log(avctx, AV_LOG_ERROR, "Failed to finish command queue: %d.\n",
> +               cle);
> +        err = AVERROR(EIO);
> +        goto fail;
> +    }
> +
> +
> +    av_frame_free(&input);
> +
> +    av_log(ctx, AV_LOG_DEBUG, "Filter output: %s, %ux%u (%"PRId64").\n",
> +           av_get_pix_fmt_name(output->format),
> +           output->width, output->height, output->pts);
> +
> +    return ff_filter_frame(outlink, output);
> +
> +fail:
> +    clFinish(ctx->command_queue);
> +    av_frame_free(&input);
> +    av_frame_free(&output);
> +    return err;
> +}

Reading this logic, it seems like each frame is individually processed
by calling clEnqueueNDRangeKernel followed by clFinish, after which the
function returns. I'm not very familiar with OpenCL, but if it behaves
anything like OpenGL and Vulkan, this kind of design essentially forces
blocking until the frame is finished processing and fully resident in
host RAM again, before the next frame can be started?

Generally when batch processing large amounts of images, it's better to
do asynchronous processing to avoid pipeline stalls. If the GPU is free
to resume work on the next image while the previous image is still in
the process of getting cleared from caches and DMA'd back into host
RAM, without having to stall for the transfer either way. Ideally,
PCIe bandwidth permitting, you want either the compute processors or the
DMA engine to be saturated. A blocking design permits neither.

I'm not entirely sure how that's done in OpenCL, if at all possible, but
normally it would revolve around using some kind of fence or signal
object to regularly poll in-flight frames and only emit them from the
filter once they're available. In general, this requires some kind of
filter API that allows ingesting and emitting frames at different times
- I'm sure this is possible with lavf somehow?

Correct me if I'm wrong,
Niklas
Daniel Oberhoff May 5, 2018, 5:40 p.m. UTC | #3
> 
> As a last note, you can find my GLSL(+Vulkan) implementations of the
> algorithm changes described above, as well as all of the related
> color-management code and various decision logic for what values to
> infer/default here:
> https://github.com/haasn/libplacebo/blob/master/src/shaders/colorspace.c

Again I am wondering if there is interest of having gl and/or Vulkan filters in ffmpeg, along with direct transfer support where the source/target is on gpu, as with many of the hwaccel targets?

That is what we do in our pipeline to get high throughput, and i‘d be happy in pushing some of that back into ffmpeg.

Best,

Daniel
Rostislav Pehlivanov May 5, 2018, 6:41 p.m. UTC | #4
On 5 May 2018 at 18:40, Daniel Oberhoff <danieloberhoff@googlemail.com>
wrote:

>
> >
> > As a last note, you can find my GLSL(+Vulkan) implementations of the
> > algorithm changes described above, as well as all of the related
> > color-management code and various decision logic for what values to
> > infer/default here:
> > https://github.com/haasn/libplacebo/blob/master/src/shaders/colorspace.c
>
> Again I am wondering if there is interest of having gl and/or Vulkan
> filters in ffmpeg, along with direct transfer support where the
> source/target is on gpu, as with many of the hwaccel targets?
>
> That is what we do in our pipeline to get high throughput, and i‘d be
> happy in pushing some of that back into ffmpeg.
>
> Best,
>
> Daniel
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

There is already a Vulkan patchset alongside a bunch of filters. I posted
it 2 weeks ago. Yes, it does accept GLSL using libshaderc. Yes, it supports
mapping VAAPI and DRM frames. Yes, its much faster than opencl on all
systems I've tested. No, there will not be an opengl-based filtering
system, only vulkan.
Daniel Oberhoff May 5, 2018, 10:44 p.m. UTC | #5
> Am 05.05.2018 um 20:41 schrieb Rostislav Pehlivanov <atomnuker@gmail.com>:
> 
> On 5 May 2018 at 18:40, Daniel Oberhoff <danieloberhoff@googlemail.com>
> wrote:
> 
>> 
>>> 
>>> As a last note, you can find my GLSL(+Vulkan) implementations of the
>>> algorithm changes described above, as well as all of the related
>>> color-management code and various decision logic for what values to
>>> infer/default here:
>>> https://github.com/haasn/libplacebo/blob/master/src/shaders/colorspace.c
>> 
>> Again I am wondering if there is interest of having gl and/or Vulkan
>> filters in ffmpeg, along with direct transfer support where the
>> source/target is on gpu, as with many of the hwaccel targets?
>> 
>> That is what we do in our pipeline to get high throughput, and i‘d be
>> happy in pushing some of that back into ffmpeg.
>> 
>> Best,
>> 
>> Daniel
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>> 
> 
> There is already a Vulkan patchset alongside a bunch of filters. I posted
> it 2 weeks ago. Yes, it does accept GLSL using libshaderc. Yes, it supports
> mapping VAAPI and DRM frames. Yes, its much faster than opencl on all
> systems I've tested. No, there will not be an opengl-based filtering
> system, only vulkan.

That makes a lot of sense, gl is dying. We are just reluctant right now to switch (yet) for various reasons. Is there any chance for that to land? Can I have a look? :)

Best
Ruiling Song May 8, 2018, 7:37 a.m. UTC | #6
Hello Niklas,

Thanks so much for your valuable feedback.

> -----Original Message-----

> From: Niklas Haas [mailto:ffmpeg@haasn.xyz]

> Sent: Saturday, May 5, 2018 2:00 AM

> To: Song, Ruiling <ruiling.song@intel.com>

> Cc: ffmpeg-devel@ffmpeg.org; sw@jkqxz.net

> Subject: Re: [FFmpeg-devel,RFC] lavfi: add opencl tonemap filter.

> 

> Hello Ruiling,

> 

> On Fri,  4 May 2018 15:32:58 +0800, Ruiling Song <ruiling.song@intel.com>

> wrote:

> > It basically does hdr to sdr conversion with tonemapping.

> >

> > Signed-off-by: Ruiling Song <ruiling.song@intel.com>

> > ---

> > This patch tries to add a filter to do hdr to sdr conversion with tonemapping.

> > The filter does all the job of tonemapping in one pass, which is quite different

> from the vf_tonemap.c

> > I choose this way because I think this would introduce less memory access.

> >

> > And I find that tonemaping shares lots of code with colorspace conversion.

> > So I move color space related code into seprated files (both OpenCL kernel and

> host code).

> >

> > I am not sure whether the design seems OK?

> > Is there anybody would like to give some comments on the overall design or

> implementation details?

> >

> >

> > Thanks!

> > Ruiling

> 

> As the original author of the tone mapping code that inspired vf_tonemap

> and (by proxy) vf_tonemap_opencl, I can provide a handful of comments.

> 

> > +float3 map_one_pixel_rgb(float3 rgb, float peak) {

> > +    // de-saturate

> > +    float luma = get_luma(rgb.x, rgb.y, rgb.z);

> > +    float overbright = max(luma - 2.0f, 1e-6f) / max(luma, 1e-6f);

> > +    rgb = mix(rgb, (float3)luma, (float3)overbright);

> > +

> > +    float sig = max(max(rgb.x, max(rgb.y, rgb.z)), 1e-6f);

> > +    float sig_old = sig;

> > +    sig = TONE_FUNC(sig, peak);

> > +    rgb *= (sig/sig_old);

> > +    return rgb;

> > +}

> 

> I consider this desaturation algorithm outdated. It works, but I think

> it produces lower quality results than this one:

> 

>   float sig = max(max(rgb.x, max(rgb.y, rgb.z)), 1e-6f);

>   float luma = get_luma(rgb.x, rgb.y, rgb.z);

>   float coeff = max(sig - 0.18f, 1e-6f) / max(sig, 1e-6f);

> 

>   const float desaturation_coefficient = 0.5f; // or configurable!

>   coeff = pow(coeff, 10 / desaturation_coefficient;

>   rgb = mix(rgb, (float3)luma, coeff);

>   sig = mix(sig, luma, coeff);

> 

>   // do the rest of tone-mapping on `sig`

>   float sig_old = sig;

>   ...

> 

> Basically, I've done the following:

> 

> - Calculate the overbright coefficient on `sig` rather than `luma` alone

> - Instead of using an offset of 2.0f, use an offset of 0.18f

> - Take the coefficient to a high exponent (lower exponent = more

>   desaturation, which is why I invert the user-configurable parameter)

> 

> Since the new code needs to read `sig`, we also have to move up the

> `sig` calculation and then correctly adjust it afterwards as well.

I will try your suggestion.

> 

> ----

> 

> More importantly, this algorithm is missing something that I now

> consider very important, and which would align well with OpenCL: source

> brightness detection. Just doing the tone-mapping "blind" like this

> works to some extent, but I think the best results require also

> adjusting the exposure in order to compensate for hollywood's tendency

> to ship poorly mastered, over-illuminated HDR material.

> 

> The basic premise is to calculate both the peak brightness as well as

> the average brightness on a frame-by-frame basis, and incorporate those

> measured values in the algorithm, in order to re-normalize overly bright

> scenes to correspond to a typical SDR average of around 0.25. In

> addition to this, calculating the peak explicitly allows us to exactly

> tailor the hable() function to this specific frame, even if the

> mastering metadata is missing or useless. (Which it almost always is)

> 

> Doing this in OpenCL would essentially require implementing a big

> map-reduce to keep track of respectively the sum and max of each pixel's

> brightness. In addition to this, I recommend averaging the results over

> a few frames (I like to use around one second), with the caveat that

> this is best paired with at least a naive scene change detection

> heuristic to make sure this averaging window gets reset on a scene

> change.

Thanks for sharing your idea with me. I basically also noticed some poor quality tone mapping result for some hdr stream.
I will try your suggestion to see whether I can make it in good state so I can include it in next version.
In fact I have not thought detecting scene change quite well. A question for your idea is:
is it possible that your scene detection heuristic may still failed to detect some particular scene change and lead to poor tone mapping quality?

> 

> > +static double determine_signal_peak(AVFrame *in)

> > +{

> > +    AVFrameSideData *sd = av_frame_get_side_data(in,

> AV_FRAME_DATA_CONTENT_LIGHT_LEVEL);

> > +    double peak = 0;

> > +

> > +    if (sd) {

> > +        AVContentLightMetadata *clm = (AVContentLightMetadata *)sd->data;

> > +        peak = clm->MaxCLL;

> > +    }

> > +

> > +    sd = av_frame_get_side_data(in,

> AV_FRAME_DATA_MASTERING_DISPLAY_METADATA);

> > +    if (!peak && sd) {

> > +        AVMasteringDisplayMetadata *metadata =

> (AVMasteringDisplayMetadata *)sd->data;

> > +        if (metadata->has_luminance)

> > +            peak = av_q2d(metadata->max_luminance);

> > +    }

> > +

> > +    /* smpte2084 needs the side data above to work correctly

> > +     * if missing, assume that the original transfer was arib-std-b67 */

> > +    if (!peak)

> > +        peak = 1200;

> > +

> > +    return peak;

> > +}

> 

> This seems strange. For a source without peak tagging, you should

> probably be deciding based on the video frame's tagged transfer function

> (ST.2084 or STD-B67). If it's the former, default to 10,000, rather than

> 1200.

I just copy this piece of code from vf_tonemap.c. I think we need to fix it there first if this is wrong.
I guess the code was like this because we think that all video stream that truly use ST. 2084 should
Include the mastering display metadata, if it is absent, then transfer function should be arib-std-b67.

> 

> > +    switch(input_frames_ctx->sw_format) {

> > +    case AV_PIX_FMT_P010:

> > +        err = launch_kernel(avctx, ctx->kernel, output, input, peak);

> > +        if (err < 0) goto fail;

> > +        break;

> > +    default:

> > +        av_log(ctx, AV_LOG_ERROR, "unsupported format in

> tonemap_opencl.\n");

> > +        err = AVERROR(ENOSYS);

> > +        goto fail;

> > +    }

> 

> This seems a bit needlessly restrictive? At the very least, I would

> expect high-bit RGB and ideally float formats to also be supported.

Basically I am targeting full gpu pipeline transcoding(gpu decoding + gpu filtering + gpu encoding),
Most video streams I have encountered are YUV stream.
Could you show me what kind of use-case that need RGB support?
So I will try to see whether adding it in this patch or do it later.

> 

> Finally, I'm not entirely sure how you ingest HLG, but in the case of

> HLG it's important to run the HLG OOTF (to take the content from

> scene-referred to display-referred light) *before* tone-mapping, rather

> than after it. I would assume this is probably handled by other FFmpeg

> components but it might be worth double checking just to be sure.

In fact, I have not learned about HLG deeply. I still need some time to add HLG support.
> 

> ----

> 

> As a last note, you can find my GLSL(+Vulkan) implementations of the

> algorithm changes described above, as well as all of the related

> color-management code and various decision logic for what values to

> infer/default here:

> https://github.com/haasn/libplacebo/blob/master/src/shaders/colorspace.c

> 

> The documentation for the tunable parameters is here:

> https://github.com/haasn/libplacebo/blob/master/src/include/libplacebo/shade

> rs/colorspace.h

> 

> Of specific interest are the functions `pl_shader_tone_map` (the core tone

> mapping logic), `hdr_update_peak` (peak/avg detection using compute

> shaders + SSBOs) and `pl_shader_color_map` (the calling code that also

> does color conversions, OOTF application, etc.)

> 

> The way I implemented the logic there is fully generalized and allows

> going from any combination of source (space, peak, average) to any

> destination (space, peak, average), doing only the operations necessary

> while making sure to compensate for brightness differences. This also

> works well when viewing HDR material on HDR devices with a lower dynamic

> range than the original material, even when those devices are calibrated

> to SDR curves (search for `hdr_simulation` in that file).


Well, this is very useful reference for me. I need some time to digest it.
Thanks a lot.

> 

> I hope that may provide some assistance in (ultimately) making the

> ffmpeg tone mapping filter as good as it can be.

> 

> Thanks for reading,

> Niklas Haas
Niklas Haas May 8, 2018, 1:50 p.m. UTC | #7
Hello Ruiling,

> Thanks for sharing your idea with me. I basically also noticed some poor quality tone mapping result for some hdr stream.
> I will try your suggestion to see whether I can make it in good state so I can include it in next version.
> In fact I have not thought detecting scene change quite well. A question for your idea is:
> is it possible that your scene detection heuristic may still failed to detect some particular scene change and lead to poor tone mapping quality?

The way my scene change detection heuristic works like this: I trigger a
scene change (and therefore discard the frame averaging buffer) if the
distance between the current frame average brightness and the current
running average exceeds a threshold value, that threshold being (by
default) 20 cd/m².

So we can divide the failures of this algorithm into two categories:

1. False negative (scene change without resetting the buffer): This can
   only happen if there was an actual scene change but the average
   brightness change did not exceed 20 cd/m², i.e. the scenes are
   similar in brightness. I consider this a fairly harmful failure
   because that also means there's no visual discontinuity since the
   scenes are so similar to begin with.

2. False positive (buffer got reset without a scene change). This is the
   more worrying failure of the algorithm, since it can happen in the
   middle of a scene (e.g. as the result of a bright flash of light
   on-screen), which will manifest itself as a sudden decrease in the
   total frame brightness coinciding with the new source of light. (Or
   vice versa, a sudden increase in brightness coinciding with the
   sudden absence of a light source).

The scene change threshold is a trade-off. Lowering the value decreases
the likelihood of #1 but increases the likelihood of #2. Increasing the
value decreases the likelihood of #2, but increases the likelihood (and
apparent effect) of #1.

If you want to optimize or improve the algorithm, the case #2 is the one
I would be most interested in, i.e. reducing the rate of false
positives. This can surely be done in a smarter way, e.g. by comparing
more than just the scene average but also other easily obtained metrics.

If you have access to low-level frame information, you could do
something like increasing the threshold for non-keyframes significantly,
since keyframes are likely to coincide with scene changes or cuts this
might help the algorithm out.

> I just copy this piece of code from vf_tonemap.c. I think we need to fix it there first if this is wrong.
> I guess the code was like this because we think that all video stream that truly use ST. 2084 should
> Include the mastering display metadata, if it is absent, then transfer function should be arib-std-b67.

Yeah, good point. And ideally, maybe those two should share this logic
to avoid code duplication.

> Basically I am targeting full gpu pipeline transcoding(gpu decoding + gpu filtering + gpu encoding),
> Most video streams I have encountered are YUV stream.
> Could you show me what kind of use-case that need RGB support?
> So I will try to see whether adding it in this patch or do it later.

One particular use case I have in mind that would be good to support is
OpenEXR formats, which are typically RGB and high depth or floating point.

> In fact, I have not learned about HLG deeply. I still need some time to add HLG support.

Well the one key thing that's interesting about HLG is that the OOTF
(which forms part of the EOTF) is parametrized by the peak brightness of
the display. So when tone mapping (which is done in display-referred
space) a HLG signal to a lower brightness, the inverse OOTF you need to
apply to go back to HLG afterwards needs to be done on the new peak
brightness, to reflect the changes made to the signal. As said I'm not
sure which component handles the OOTF in FFmpeg land, so if you
implement it yourself (which may be a necessary) that would be a thing
to keep in mind.

> Well, this is very useful reference for me. I need some time to digest it.
> Thanks a lot.

No problem,
Niklas
Niklas Haas May 11, 2018, 8:10 a.m. UTC | #8
> The way my scene change detection heuristic works like this: I trigger a
> scene change (and therefore discard the frame averaging buffer) if the
> distance between the current frame average brightness and the current
> running average exceeds a threshold value, that threshold being (by
> default) 20 cd/m².
> 
> So we can divide the failures of this algorithm into two categories:
> 
> 1. False negative (scene change without resetting the buffer): This can
>    only happen if there was an actual scene change but the average
>    brightness change did not exceed 20 cd/m², i.e. the scenes are
>    similar in brightness. I consider this a fairly harmful failure
>    because that also means there's no visual discontinuity since the
>    scenes are so similar to begin with.
> 
> 2. False positive (buffer got reset without a scene change). This is the
>    more worrying failure of the algorithm, since it can happen in the
>    middle of a scene (e.g. as the result of a bright flash of light
>    on-screen), which will manifest itself as a sudden decrease in the
>    total frame brightness coinciding with the new source of light. (Or
>    vice versa, a sudden increase in brightness coinciding with the
>    sudden absence of a light source).
> 
> The scene change threshold is a trade-off. Lowering the value decreases
> the likelihood of #1 but increases the likelihood of #2. Increasing the
> value decreases the likelihood of #2, but increases the likelihood (and
> apparent effect) of #1.
> 
> If you want to optimize or improve the algorithm, the case #2 is the one
> I would be most interested in, i.e. reducing the rate of false
> positives. This can surely be done in a smarter way, e.g. by comparing
> more than just the scene average but also other easily obtained metrics.
> 
> If you have access to low-level frame information, you could do
> something like increasing the threshold for non-keyframes significantly,
> since keyframes are likely to coincide with scene changes or cuts this
> might help the algorithm out.

Thinking about this logic again, I came to realize that a different
strategy might be to check instead for a minimum threshold brightness
difference in a critical number of different areas of the screen. This
way, a very bright light source appearing or becoming occluded in one
local part of the frame will not trigger a scene change, while a sudden
change in brightness of a large part of the frame will.

I will try it if I get the opportunity to.
Moritz Barsnick May 11, 2018, 10:16 a.m. UTC | #9
On Fri, May 11, 2018 at 10:10:42 +0200, Niklas Haas wrote:
> Thinking about this logic again, I came to realize that a different
> strategy might be to check instead for a minimum threshold brightness
> difference in a critical number of different areas of the screen. This
> way, a very bright light source appearing or becoming occluded in one
> local part of the frame will not trigger a scene change, while a sudden
> change in brightness of a large part of the frame will.

Yes, when I read your previous email, that was my thought regarding
scene change detection as well. There are others already in ffmpeg, (or
perhaps I was looking at the one in the tool "motion"?), I believe.
They ideally don't just check for overall brightness change, but over
several squares / blocks and take that into consideration.

Reasoning:

- There's a quite high chance that a scene change with a totally
  different picture layout has approx. the same average brightness as
  the scene before.

- A strong change in only one or two of e.g. 100 rectangles should not
  indicate a scene change (while it might indicate motion, different
  concept there).

Moritz
diff mbox

Patch

diff --git a/configure b/configure
index 7f199c6..b9e464d 100755
--- a/configure
+++ b/configure
@@ -3395,6 +3395,7 @@  tinterlace_filter_deps="gpl"
 tinterlace_merge_test_deps="tinterlace_filter"
 tinterlace_pad_test_deps="tinterlace_filter"
 tonemap_filter_deps="const_nan"
+tonemap_opencl_filter_deps="opencl"
 unsharp_opencl_filter_deps="opencl"
 uspp_filter_deps="gpl avcodec"
 vaguedenoiser_filter_deps="gpl"
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 3454f25..7a1b0e8 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -348,6 +348,8 @@  OBJS-$(CONFIG_TINTERLACE_FILTER)             += vf_tinterlace.o
 OBJS-$(CONFIG_TLUT2_FILTER)                  += vf_lut2.o framesync.o
 OBJS-$(CONFIG_TMIX_FILTER)                   += vf_mix.o framesync.o
 OBJS-$(CONFIG_TONEMAP_FILTER)                += vf_tonemap.o
+OBJS-$(CONFIG_TONEMAP_OPENCL_FILTER)         += vf_tonemap_opencl.o colorspace_basic.o opencl.o \
+                                                opencl/tonemap.o opencl/colorspace_basic.o
 OBJS-$(CONFIG_TRANSPOSE_FILTER)              += vf_transpose.o
 OBJS-$(CONFIG_TRIM_FILTER)                   += trim.o
 OBJS-$(CONFIG_UNPREMULTIPLY_FILTER)          += vf_premultiply.o framesync.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index d958f9b..759097a 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -339,6 +339,7 @@  extern AVFilter ff_vf_tinterlace;
 extern AVFilter ff_vf_tlut2;
 extern AVFilter ff_vf_tmix;
 extern AVFilter ff_vf_tonemap;
+extern AVFilter ff_vf_tonemap_opencl;
 extern AVFilter ff_vf_transpose;
 extern AVFilter ff_vf_trim;
 extern AVFilter ff_vf_unpremultiply;
diff --git a/libavfilter/colorspace_basic.c b/libavfilter/colorspace_basic.c
new file mode 100644
index 0000000..93f9f08
--- /dev/null
+++ b/libavfilter/colorspace_basic.c
@@ -0,0 +1,89 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "colorspace_basic.h"
+
+
+void invert_matrix3x3(const double in[3][3], double out[3][3])
+{
+    double m00 = in[0][0], m01 = in[0][1], m02 = in[0][2],
+           m10 = in[1][0], m11 = in[1][1], m12 = in[1][2],
+           m20 = in[2][0], m21 = in[2][1], m22 = in[2][2];
+    int i, j;
+    double det;
+
+    out[0][0] =  (m11 * m22 - m21 * m12);
+    out[0][1] = -(m01 * m22 - m21 * m02);
+    out[0][2] =  (m01 * m12 - m11 * m02);
+    out[1][0] = -(m10 * m22 - m20 * m12);
+    out[1][1] =  (m00 * m22 - m20 * m02);
+    out[1][2] = -(m00 * m12 - m10 * m02);
+    out[2][0] =  (m10 * m21 - m20 * m11);
+    out[2][1] = -(m00 * m21 - m20 * m01);
+    out[2][2] =  (m00 * m11 - m10 * m01);
+
+    det = m00 * out[0][0] + m10 * out[0][1] + m20 * out[0][2];
+    det = 1.0 / det;
+
+    for (i = 0; i < 3; i++) {
+        for (j = 0; j < 3; j++)
+            out[i][j] *= det;
+    }
+}
+
+void mul3x3(double dst[3][3], const double src1[3][3], const double src2[3][3])
+{
+    int m, n;
+
+    for (m = 0; m < 3; m++)
+        for (n = 0; n < 3; n++)
+            dst[m][n] = src2[m][0] * src1[0][n] +
+                        src2[m][1] * src1[1][n] +
+                        src2[m][2] * src1[2][n];
+}
+/*
+ * see e.g. http://www.brucelindbloom.com/index.html?Eqn_RGB_XYZ_Matrix.html
+ */
+void fill_rgb2xyz_table(const struct ColorPrimaries *coeffs,
+                        const struct WhitePoint *wp,
+                        double rgb2xyz[3][3])
+{
+    double i[3][3], sr, sg, sb, zw;
+
+    rgb2xyz[0][0] = coeffs->xr / coeffs->yr;
+    rgb2xyz[0][1] = coeffs->xg / coeffs->yg;
+    rgb2xyz[0][2] = coeffs->xb / coeffs->yb;
+    rgb2xyz[1][0] = rgb2xyz[1][1] = rgb2xyz[1][2] = 1.0;
+    rgb2xyz[2][0] = (1.0 - coeffs->xr - coeffs->yr) / coeffs->yr;
+    rgb2xyz[2][1] = (1.0 - coeffs->xg - coeffs->yg) / coeffs->yg;
+    rgb2xyz[2][2] = (1.0 - coeffs->xb - coeffs->yb) / coeffs->yb;
+    invert_matrix3x3(rgb2xyz, i);
+    zw = 1.0 - wp->xw - wp->yw;
+    sr = i[0][0] * wp->xw + i[0][1] * wp->yw + i[0][2] * zw;
+    sg = i[1][0] * wp->xw + i[1][1] * wp->yw + i[1][2] * zw;
+    sb = i[2][0] * wp->xw + i[2][1] * wp->yw + i[2][2] * zw;
+    rgb2xyz[0][0] *= sr;
+    rgb2xyz[0][1] *= sg;
+    rgb2xyz[0][2] *= sb;
+    rgb2xyz[1][0] *= sr;
+    rgb2xyz[1][1] *= sg;
+    rgb2xyz[1][2] *= sb;
+    rgb2xyz[2][0] *= sr;
+    rgb2xyz[2][1] *= sg;
+    rgb2xyz[2][2] *= sb;
+}
diff --git a/libavfilter/colorspace_basic.h b/libavfilter/colorspace_basic.h
new file mode 100644
index 0000000..5647ca6
--- /dev/null
+++ b/libavfilter/colorspace_basic.h
@@ -0,0 +1,40 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVFILTER_COLORSPACE_BASIC_H
+#define AVFILTER_COLORSPACE_BASIC_H
+
+#include "libavutil/common.h"
+
+struct LumaCoefficients {
+    double cr, cg, cb;
+};
+
+struct ColorPrimaries {
+    double xr, yr, xg, yg, xb, yb;
+};
+
+struct WhitePoint {
+    double xw, yw;
+};
+
+void invert_matrix3x3(const double in[3][3], double out[3][3]);
+void mul3x3(double dst[3][3], const double src1[3][3], const double src2[3][3]);
+void fill_rgb2xyz_table(const struct ColorPrimaries *coeffs,
+                        const struct WhitePoint *wp, double rgb2xyz[3][3]);
+#endif
diff --git a/libavfilter/opencl/colorspace_basic.cl b/libavfilter/opencl/colorspace_basic.cl
new file mode 100644
index 0000000..478a4f3
--- /dev/null
+++ b/libavfilter/opencl/colorspace_basic.cl
@@ -0,0 +1,137 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+constant const float ST2084_M1 = 0.1593017578125f;
+constant const float ST2084_M2 = 78.84375f;
+constant const float ST2084_C1 = 0.8359375f;
+constant const float ST2084_C2 = 18.8515625f;
+constant const float ST2084_C3 = 18.6875f;
+
+__constant float yuv2rgb_bt2020[] = {
+    1.0f, 0.0f, 1.4746f,
+    1.0f, -0.16455f, -0.57135f,
+    1.0f, 1.8814f, 0.0f
+};
+
+__constant float yuv2rgb_bt709[] = {
+    1.0f, 0.0f, 1.5748f,
+    1.0f, -0.18732f, -0.46812f,
+    1.0f, 1.8556f, 0.0f
+};
+
+__constant float rgb2yuv_bt709[] = {
+    0.2126f, 0.7152f, 0.0722f,
+    -0.11457f, -0.38543f, 0.5f,
+    0.5f, -0.45415f, -0.04585f
+};
+
+__constant float rgb2yuv_bt2020[] ={
+    0.2627f, 0.678f, 0.0593f,
+    -0.1396f, -0.36037f, 0.5f,
+    0.5f, -0.4598f, -0.0402f,
+};
+
+float eotf_st2084(float x) {
+    float p = pow(x, 1.0f / ST2084_M2);
+    float a = max(p -ST2084_C1, 0.0f);
+    float b = max(ST2084_C2 - ST2084_C3 * p, 1e-6f);
+    float c  = pow(a / b, 1.0f / ST2084_M1);
+    return x > 0.0f ? c : 0.0f;
+}
+
+float inverse_eotf_bt1886(float c) {
+  return c < 0.0f ? 0.0f : pow(c, 1.0f / 2.4f);
+}
+
+float oetf_bt709(float c) {
+  c = c < 0.0f ? 0.0f : c;
+  float r1 = 4.5f * c;
+  float r2 = 1.099f * pow(c, 0.45f) - 0.099f;
+  return c < 0.018f ? r1 : r2;
+}
+float inverse_oetf_bt709(float c) {
+  float r1 = c / 4.5f;
+  float r2 = pow((c + 0.099f) / 1.099f, 1.0f / 0.45f);
+  return c < 0.081f ? r1 : r2;
+}
+
+float get_luma(float r, float g, float b) {
+  return r * YUV_COFF[0] + g * YUV_COFF[1] + b * YUV_COFF[2];
+}
+
+float3 yuv2rgb(float y, float u, float v) {
+#ifdef FULL_RANGE_IN
+    u -= 0.5f; v -= 0.5f;
+#else
+    y = (y * 255.0f -  16.0f) / 219.0f;
+    u = (u * 255.0f - 128.0f) / 224.0f;
+    v = (v * 255.0f - 128.0f) / 224.0f;
+#endif
+    float r = y*RGB_COFF[0] + u*RGB_COFF[1] + v*RGB_COFF[2];
+    float g = y*RGB_COFF[3] + u*RGB_COFF[4] + v*RGB_COFF[5];
+    float b = y*RGB_COFF[6] + u*RGB_COFF[7] + v*RGB_COFF[8];
+    return (float3)(r, g, b);
+}
+
+float3 yuv2lrgb(float y, float u, float v, float post_scale) {
+    float3 rgb = yuv2rgb(y, u, v);
+    float r = linearize(rgb.x);
+    float g = linearize(rgb.y);
+    float b = linearize(rgb.z);
+    r *= post_scale;
+    g *= post_scale;
+    b *= post_scale;
+    return (float3)(r, g, b);
+}
+
+float3 rgb2yuv(float r, float g, float b) {
+    float y = r*YUV_COFF[0] + g*YUV_COFF[1] + b*YUV_COFF[2];
+    float u = r*YUV_COFF[3] + g*YUV_COFF[4] + b*YUV_COFF[5];
+    float v = r*YUV_COFF[6] + g*YUV_COFF[7] + b*YUV_COFF[8];
+#ifdef FULL_RANGE_OUT
+    u += 0.5f; v += 0.5f;
+#else
+    y = (219.0f * y + 16.0f) / 255.0f;
+    u = (224.0f * u + 128.0f) / 255.0f;
+    v = (224.0f * v + 128.0f) / 255.0f;
+#endif
+    return (float3)(y, u, v);
+}
+
+float3 lrgb2yuv(float r, float g, float b, float pre_scale) {
+    r *= pre_scale;
+    g *= pre_scale;
+    b *= pre_scale;
+
+    r = delinearize(r);
+    g = delinearize(g);
+    b = delinearize(b);
+
+    return rgb2yuv(r, g, b);
+}
+
+float3 lrgb2lrgb(float r, float g, float b) {
+#ifdef RGB2RGB_PASSTHROUGH
+    return (float3)(r, g, b);
+#else
+    float rr = rgb2rgb[0] * r + rgb2rgb[1] * g + rgb2rgb[2] * b;
+    float gg = rgb2rgb[3] * r + rgb2rgb[4] * g + rgb2rgb[5] * b;
+    float bb = rgb2rgb[6] * r + rgb2rgb[7] * g + rgb2rgb[8] * b;
+    return (float3)(rr, gg, bb);
+#endif
+}
diff --git a/libavfilter/opencl/tonemap.cl b/libavfilter/opencl/tonemap.cl
new file mode 100644
index 0000000..e0aca27
--- /dev/null
+++ b/libavfilter/opencl/tonemap.cl
@@ -0,0 +1,136 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+
+#define ST2084_MAX_LUMINANCE 10000.0f
+#define REFERENCE_WHITE 100.0f
+
+extern float3 lrgb2yuv(float, float, float, float);
+extern float3 yuv2lrgb(float, float, float, float);
+extern float get_luma(float, float, float);
+
+float hable_f(float in) {
+    float a = 0.15f, b = 0.50f, c = 0.10f, d = 0.20f, e = 0.02f, f = 0.30f;
+    return (in * (in * a + b * c) + d * e) / (in * (in * a + b) + d * f) - e / f;
+}
+
+float direct(float s, float peak) {
+    return s;
+}
+
+float linear(float s, float peak) {
+    return s * tone_param / peak;
+}
+
+float gamma(float s, float peak) {
+    float p = s > 0.05f ? s /peak : 0.05f / peak;
+    float v = pow(p, 1.0f / tone_param);
+    return s > 0.05f ? v : (s * v /0.05f);
+}
+
+float clip(float s, float peak) {
+    return clamp(s * tone_param, 0.0f, 1.0f);
+}
+
+float reinhard(float s, float peak) {
+    return s / (s + tone_param) * (peak + tone_param) / peak;
+}
+
+float hable(float s, float peak) {
+    return hable_f(s)/hable_f(peak);
+}
+
+float mobius(float s, float peak) {
+    float j = tone_param;
+    float a, b;
+
+    if (s <= j)
+        return s;
+
+    a = -j * j * (peak - 1.0f) / (j * j - 2.0f * j + peak);
+    b = (j * j - 2.0f * j * peak + peak) / max(peak - 1.0f, 1e-6f);
+
+    return (b * b + 2.0f * b * j + j * j) / (b - a) * (s + a) / (s + b);
+}
+
+float3 map_one_pixel_rgb(float3 rgb, float peak) {
+    // de-saturate
+    float luma = get_luma(rgb.x, rgb.y, rgb.z);
+    float overbright = max(luma - 2.0f, 1e-6f) / max(luma, 1e-6f);
+    rgb = mix(rgb, (float3)luma, (float3)overbright);
+
+    float sig = max(max(rgb.x, max(rgb.y, rgb.z)), 1e-6f);
+    float sig_old = sig;
+    sig = TONE_FUNC(sig, peak);
+    rgb *= (sig/sig_old);
+    return rgb;
+}
+
+float3 map_one_pixel_yuv(float y, float u, float v, float peak, int m, int n) {
+    float3 c = yuv2lrgb(y, u, v, ST2084_MAX_LUMINANCE / peak);
+    c = map_one_pixel_rgb(c, peak / REFERENCE_WHITE);
+    return lrgb2yuv(c.x, c.y, c.z, 1.0f);
+}
+
+__kernel void tonemap(__write_only image2d_t dst1,
+                      __write_only image2d_t dst2,
+                      __read_only  image2d_t src1,
+                      __read_only  image2d_t src2,
+#ifdef THIRD_PLANE
+                      __write_only image2d_t dst3,
+                      __read_only  image2d_t src3,
+#endif
+                      float peak
+                      )
+{
+    const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |
+                               CLK_FILTER_NEAREST);
+    int xi = get_global_id(0);
+    int yi = get_global_id(1);
+    // each work item process four pixels
+    int x = 2 * xi;
+    int y = 2 * yi;
+
+    float y0 = read_imagef(src1, sampler, (int2)(x,     y)).x;
+    float y1 = read_imagef(src1, sampler, (int2)(x + 1, y)).x;
+    float y2 = read_imagef(src1, sampler, (int2)(x,     y + 1)).x;
+    float y3 = read_imagef(src1, sampler, (int2)(x + 1, y + 1)).x;
+#ifdef THIRD_PLANE
+    float u = read_imagef(src2, sampler, (int2)(xi, yi)).x;
+    float v = read_imagef(src3, sampler, (int2)(xi, yi)).x;
+    float2 uv = (float2)(u, v);
+#else
+    float2 uv = read_imagef(src2, sampler, (int2)(xi,     yi)).xy;
+#endif
+
+    float3 yuv0 = map_one_pixel_yuv(y0, uv.x, uv.y, peak, x, y);
+    float3 yuv1 = map_one_pixel_yuv(y1, uv.x, uv.y, peak, x+1, y);
+    float3 yuv2 = map_one_pixel_yuv(y2, uv.x, uv.y, peak, x, y+1);
+    float3 yuv3 = map_one_pixel_yuv(y3, uv.x, uv.y, peak, x+1,y+1);
+
+    write_imagef(dst1, (int2)(x, y), (float4)(yuv0.x, 0.0f, 0.0f, 1.0f));
+    write_imagef(dst1, (int2)(x+1, y), (float4)(yuv1.x, 0.0f, 0.0f, 1.0f));
+    write_imagef(dst1, (int2)(x, y+1), (float4)(yuv2.x, 0.0f, 0.0f, 1.0f));
+    write_imagef(dst1, (int2)(x+1, y+1), (float4)(yuv3.x, 0.0f, 0.0f, 1.0f));
+#ifdef THIRD_PLANE
+    write_imagef(dst2, (int2)(xi, yi), (float4)(yuv0.y, 0.0f, 0.0f, 1.0f));
+    write_imagef(dst3, (int2)(xi, yi), (float4)(yuv0.z, 0.0f, 0.0f, 1.0f));
+#else
+    write_imagef(dst2, (int2)(xi, yi), (float4)(yuv0.y, yuv0.z, 0.0f, 1.0f));
+#endif
+}
diff --git a/libavfilter/opencl_source.h b/libavfilter/opencl_source.h
index 4bb9969..c5b3f37 100644
--- a/libavfilter/opencl_source.h
+++ b/libavfilter/opencl_source.h
@@ -21,7 +21,9 @@ 
 
 extern const char *ff_opencl_source_avgblur;
 extern const char *ff_opencl_source_convolution;
+extern const char *ff_opencl_source_colorspace_basic;
 extern const char *ff_opencl_source_overlay;
+extern const char *ff_opencl_source_tonemap;
 extern const char *ff_opencl_source_unsharp;
 
 #endif /* AVFILTER_OPENCL_SOURCE_H */
diff --git a/libavfilter/vf_tonemap_opencl.c b/libavfilter/vf_tonemap_opencl.c
new file mode 100644
index 0000000..72676e5
--- /dev/null
+++ b/libavfilter/vf_tonemap_opencl.c
@@ -0,0 +1,472 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include <float.h>
+
+#include "libavutil/bprint.h"
+#include "libavutil/common.h"
+#include "libavutil/imgutils.h"
+#include "libavutil/mastering_display_metadata.h"
+#include "libavutil/mem.h"
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
+
+#include "avfilter.h"
+#include "internal.h"
+#include "opencl.h"
+#include "opencl_source.h"
+#include "video.h"
+#include "colorspace_basic.h"
+
+enum TonemapAlgorithm {
+    TONEMAP_NONE,
+    TONEMAP_LINEAR,
+    TONEMAP_GAMMA,
+    TONEMAP_CLIP,
+    TONEMAP_REINHARD,
+    TONEMAP_HABLE,
+    TONEMAP_MOBIUS,
+    TONEMAP_MAX,
+};
+
+typedef struct TonemapOpenCLContext {
+    OpenCLFilterContext ocf;
+
+    enum AVColorSpace colorspace, colorspace_in, colorspace_out;
+    enum AVColorTransferCharacteristic trc, trc_in, trc_out;
+    enum AVColorPrimaries primaries, primaries_in, primaries_out;
+
+    enum TonemapAlgorithm tonemap;
+    double           peak;
+    double           param;
+    int              initialised;
+    cl_kernel        kernel;
+    cl_command_queue command_queue;
+} TonemapOpenCLContext;
+
+const char *yuv_coff[AVCOL_SPC_NB] = {
+    [AVCOL_SPC_BT709] = "rgb2yuv_bt709",
+    [AVCOL_SPC_BT2020_NCL] = "rgb2yuv_bt2020",
+};
+
+const char *rgb_coff[AVCOL_SPC_NB] = {
+    [AVCOL_SPC_BT709] = "yuv2rgb_bt709",
+    [AVCOL_SPC_BT2020_NCL] = "yuv2rgb_bt2020",
+};
+
+const char *linearize_funcs[AVCOL_TRC_NB] = {
+    [AVCOL_TRC_SMPTE2084] = "eotf_st2084",
+};
+
+const char *delinearize_funcs[AVCOL_TRC_NB] = {
+    [AVCOL_TRC_BT709]     = "inverse_eotf_bt1886",
+    [AVCOL_TRC_BT2020_10] = "inverse_eotf_bt1886",
+
+};
+struct ColorPrimaries primaries_table[AVCOL_PRI_NB] = {
+    [AVCOL_PRI_BT709]  = { 0.640, 0.330, 0.300, 0.600, 0.150, 0.060 },
+    [AVCOL_PRI_BT2020] = { 0.708, 0.292, 0.170, 0.797, 0.131, 0.046 },
+};
+
+struct WhitePoint whitepoint_table[AVCOL_PRI_NB] = {
+    [AVCOL_PRI_BT709]  = { 0.3127, 0.3290 },
+    [AVCOL_PRI_BT2020] = { 0.3127, 0.3290 },
+};
+
+const char *tonemap_func[TONEMAP_MAX] = {
+    [TONEMAP_NONE]     = "direct",
+    [TONEMAP_LINEAR]   = "linear",
+    [TONEMAP_GAMMA]    = "gamma",
+    [TONEMAP_CLIP]     = "clip",
+    [TONEMAP_REINHARD] = "reinhard",
+    [TONEMAP_HABLE]    = "hable",
+    [TONEMAP_MOBIUS]   = "mobius",
+};
+
+static void get_rgb2rgb_matrix(enum AVColorPrimaries in, enum AVColorPrimaries out,
+                               double rgb2rgb[3][3]) {
+    double rgb2xyz[3][3], xyz2rgb[3][3];
+
+    fill_rgb2xyz_table(&primaries_table[out], &whitepoint_table[out], rgb2xyz);
+    invert_matrix3x3(rgb2xyz, xyz2rgb);
+    fill_rgb2xyz_table(&primaries_table[in], &whitepoint_table[in], rgb2xyz);
+    mul3x3(rgb2rgb, rgb2xyz, xyz2rgb);
+}
+
+#define OPENCL_SOURCE_NB 3
+static int tonemap_opencl_init(AVFilterContext *avctx)
+{
+    TonemapOpenCLContext *ctx = avctx->priv;
+    int rgb2rgb_passthrough = 1;
+    double rgb2rgb[3][3];
+    cl_int cle;
+    int err;
+    AVBPrint header;
+    const char *opencl_sources[OPENCL_SOURCE_NB];
+
+    av_bprint_init(&header, 256, AV_BPRINT_SIZE_AUTOMATIC);
+
+    switch(ctx->tonemap) {
+    case TONEMAP_GAMMA:
+        if (isnan(ctx->param))
+            ctx->param = 1.8f;
+        break;
+    case TONEMAP_REINHARD:
+        if (!isnan(ctx->param))
+            ctx->param = (1.0f - ctx->param) / ctx->param;
+        break;
+    case TONEMAP_MOBIUS:
+        if (isnan(ctx->param))
+            ctx->param = 0.3f;
+        break;
+    }
+
+    if (isnan(ctx->param))
+        ctx->param = 1.0f;
+
+    av_bprintf(&header, "__constant const tone_param = %.4f;\n", ctx->param);
+    av_bprintf(&header, "#define TONE_FUNC %s\n", tonemap_func[ctx->tonemap]);
+
+    if (ctx->primaries_out != ctx->primaries_in) {
+      get_rgb2rgb_matrix(ctx->primaries_in, ctx->primaries_out, rgb2rgb);
+      rgb2rgb_passthrough = 0;
+    }
+
+    if (rgb2rgb_passthrough)
+      av_bprintf(&header, "#define RGB2RGB_PASSTHROUGH\n");
+    av_bprintf(&header, "#define RGB_COFF %s\n",    rgb_coff[ctx->colorspace_in]);
+    av_bprintf(&header, "#define YUV_COFF %s\n",    yuv_coff[ctx->colorspace_out]);
+    av_bprintf(&header, "#define linearize %s\n",   linearize_funcs[ctx->trc_in]);
+    av_bprintf(&header, "#define delinearize %s\n", delinearize_funcs[ctx->trc_out]);
+
+    av_bprintf(&header, "__constant float rgb2rgb[9] = {\n");
+    av_bprintf(&header, "    %.4ff, %.4ff, %.4ff,\n",  rgb2rgb[0][0], rgb2rgb[0][1], rgb2rgb[0][2]);
+    av_bprintf(&header, "    %.4ff, %.4ff, %.4ff,\n",  rgb2rgb[1][0], rgb2rgb[1][1], rgb2rgb[1][2]);
+    av_bprintf(&header, "    %.4ff, %.4ff, %.4ff};\n", rgb2rgb[2][0], rgb2rgb[2][1], rgb2rgb[2][2]);
+
+    opencl_sources[0] = header.str;
+    opencl_sources[1] = ff_opencl_source_tonemap;
+    opencl_sources[2] = ff_opencl_source_colorspace_basic;
+    err = ff_opencl_filter_load_program(avctx, opencl_sources, OPENCL_SOURCE_NB);
+
+    av_bprint_finalize(&header, NULL);
+    if (err < 0)
+        goto fail;
+
+    ctx->command_queue = clCreateCommandQueue(ctx->ocf.hwctx->context,
+                                              ctx->ocf.hwctx->device_id,
+                                              0, &cle);
+    if (!ctx->command_queue) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create OpenCL "
+               "command queue: %d.\n", cle);
+        err = AVERROR(EIO);
+        goto fail;
+    }
+
+    ctx->kernel = clCreateKernel(ctx->ocf.program, "tonemap", &cle);
+    if (!ctx->kernel) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create kernel: %d.\n", cle);
+        err = AVERROR(EIO);
+        goto fail;
+    }
+
+    ctx->initialised = 1;
+    return 0;
+
+fail:
+    if (ctx->command_queue)
+        clReleaseCommandQueue(ctx->command_queue);
+    if (ctx->kernel)
+        clReleaseKernel(ctx->kernel);
+    return err;
+}
+
+static int tonemap_opencl_config_output(AVFilterLink *outlink)
+{
+    AVFilterContext *avctx = outlink->src;
+    TonemapOpenCLContext *s = avctx->priv;
+    //AVFilterLink *inlink = outlink->src->inputs[0];
+    int ret;
+
+    s->ocf.output_format = AV_PIX_FMT_NV12;
+    ret = ff_opencl_filter_config_output(outlink);
+    if (ret < 0)
+        return ret;
+
+    return 0;
+}
+
+static int launch_kernel(AVFilterContext *avctx, cl_kernel kernel,
+                         AVFrame *output, AVFrame *input, float peak) {
+    TonemapOpenCLContext *ctx = avctx->priv;
+    int err = AVERROR(ENOSYS);
+    size_t global_work[2];
+    size_t local_work[2];
+    cl_int cle;
+
+    cle = clSetKernelArg(kernel, 0, sizeof(cl_mem), &output->data[0]);
+    if (cle != CL_SUCCESS) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
+               "destination image 1st plane: %d.\n", cle);
+        return AVERROR(EINVAL);
+    }
+
+    cle = clSetKernelArg(kernel, 1, sizeof(cl_mem), &output->data[1]);
+    if (cle != CL_SUCCESS) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
+               "destination image 2nd plane: %d.\n", cle);
+        return AVERROR(EINVAL);
+    }
+
+    cle = clSetKernelArg(kernel, 2, sizeof(cl_mem), &input->data[0]);
+    if (cle != CL_SUCCESS) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
+               "source image 1st plane: %d.\n", cle);
+        return AVERROR(EINVAL);
+    }
+
+    cle = clSetKernelArg(kernel, 3, sizeof(cl_mem), &input->data[1]);
+    if (cle != CL_SUCCESS) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
+               "source image 2nd plane: %d.\n", cle);
+        return AVERROR(EINVAL);
+    }
+
+    cle = clSetKernelArg(kernel, 4, sizeof(cl_float), &peak);
+    if (cle != CL_SUCCESS) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to set kernel "
+               "peak luma: %d.\n", cle);
+        return AVERROR(EINVAL);
+    }
+
+    local_work[0]  = 16;
+    local_work[1]  = 16;
+    // Note the work size based on uv plane, as we process a 2x2 quad in one workitem
+    err = ff_opencl_filter_work_size_from_image(avctx, global_work, output,
+                                                1, 16);
+    if (err < 0)
+        return err;
+
+    cle = clEnqueueNDRangeKernel(ctx->command_queue, kernel, 2, NULL,
+                                 global_work, local_work,
+                                 0, NULL, NULL);
+    if (cle != CL_SUCCESS) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to enqueue kernel: %d.\n",
+               cle);
+        return AVERROR(EIO);
+    }
+    return 0;
+}
+
+static double determine_signal_peak(AVFrame *in)
+{
+    AVFrameSideData *sd = av_frame_get_side_data(in, AV_FRAME_DATA_CONTENT_LIGHT_LEVEL);
+    double peak = 0;
+
+    if (sd) {
+        AVContentLightMetadata *clm = (AVContentLightMetadata *)sd->data;
+        peak = clm->MaxCLL;
+    }
+
+    sd = av_frame_get_side_data(in, AV_FRAME_DATA_MASTERING_DISPLAY_METADATA);
+    if (!peak && sd) {
+        AVMasteringDisplayMetadata *metadata = (AVMasteringDisplayMetadata *)sd->data;
+        if (metadata->has_luminance)
+            peak = av_q2d(metadata->max_luminance);
+    }
+
+    /* smpte2084 needs the side data above to work correctly
+     * if missing, assume that the original transfer was arib-std-b67 */
+    if (!peak)
+        peak = 1200;
+
+    return peak;
+}
+
+static int tonemap_opencl_filter_frame(AVFilterLink *inlink, AVFrame *input)
+{
+    AVFilterContext    *avctx = inlink->dst;
+    AVFilterLink     *outlink = avctx->outputs[0];
+    TonemapOpenCLContext *ctx = avctx->priv;
+    AVFrame *output = NULL;
+    cl_int cle;
+    int err;
+    double peak = ctx->peak;
+
+    AVHWFramesContext *input_frames_ctx =
+        (AVHWFramesContext*)input->hw_frames_ctx->data;
+
+    av_log(ctx, AV_LOG_DEBUG, "Filter input: %s, %ux%u (%"PRId64").\n",
+           av_get_pix_fmt_name(input->format),
+           input->width, input->height, input->pts);
+
+    if (!input->hw_frames_ctx)
+        return AVERROR(EINVAL);
+
+    output = ff_get_video_buffer(outlink, outlink->w, outlink->h);
+    if (!output) {
+        err = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    err = av_frame_copy_props(output, input);
+    if (err < 0)
+        goto fail;
+
+    if (!peak)
+        peak = determine_signal_peak(input);
+
+    if (ctx->trc != -1)
+        output->color_trc = ctx->trc;
+    if (ctx->primaries != -1)
+        output->color_primaries = ctx->primaries;
+    if (ctx->colorspace != -1)
+        output->colorspace = ctx->colorspace;
+
+    ctx->trc_in = input->color_trc;
+    ctx->trc_out = output->color_trc;
+    ctx->colorspace_in = input->colorspace;
+    ctx->colorspace_out = output->colorspace;
+    ctx->primaries_in = input->color_primaries;
+    ctx->primaries_out = output->color_primaries;
+
+    assert(output->sw_format == AV_PIX_FMT_NV12);
+
+    if (!ctx->initialised) {
+        err = tonemap_opencl_init(avctx);
+        if (err < 0)
+            goto fail;
+    }
+
+    switch(input_frames_ctx->sw_format) {
+    case AV_PIX_FMT_P010:
+        err = launch_kernel(avctx, ctx->kernel, output, input, peak);
+        if (err < 0) goto fail;
+        break;
+    default:
+        av_log(ctx, AV_LOG_ERROR, "unsupported format in tonemap_opencl.\n");
+        err = AVERROR(ENOSYS);
+        goto fail;
+    }
+
+    cle = clFinish(ctx->command_queue);
+    if (cle != CL_SUCCESS) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to finish command queue: %d.\n",
+               cle);
+        err = AVERROR(EIO);
+        goto fail;
+    }
+
+
+    av_frame_free(&input);
+
+    av_log(ctx, AV_LOG_DEBUG, "Filter output: %s, %ux%u (%"PRId64").\n",
+           av_get_pix_fmt_name(output->format),
+           output->width, output->height, output->pts);
+
+    return ff_filter_frame(outlink, output);
+
+fail:
+    clFinish(ctx->command_queue);
+    av_frame_free(&input);
+    av_frame_free(&output);
+    return err;
+}
+
+static av_cold void tonemap_opencl_uninit(AVFilterContext *avctx)
+{
+    TonemapOpenCLContext *ctx = avctx->priv;
+    cl_int cle;
+
+    if (ctx->kernel) {
+        cle = clReleaseKernel(ctx->kernel);
+        if (cle != CL_SUCCESS)
+            av_log(avctx, AV_LOG_ERROR, "Failed to release "
+                   "kernel: %d.\n", cle);
+    }
+
+    if (ctx->command_queue) {
+        cle = clReleaseCommandQueue(ctx->command_queue);
+        if (cle != CL_SUCCESS)
+            av_log(avctx, AV_LOG_ERROR, "Failed to release "
+                   "command queue: %d.\n", cle);
+    }
+
+    ff_opencl_filter_uninit(avctx);
+}
+
+#define OFFSET(x) offsetof(TonemapOpenCLContext, x)
+#define FLAGS (AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM)
+static const AVOption tonemap_opencl_options[] = {
+    { "tonemap",      "tonemap algorithm selection", OFFSET(tonemap), AV_OPT_TYPE_INT, {.i64 = TONEMAP_NONE}, TONEMAP_NONE, TONEMAP_MAX - 1, FLAGS, "tonemap" },
+    {     "none",     0, 0, AV_OPT_TYPE_CONST, {.i64 = TONEMAP_NONE},              0, 0, FLAGS, "tonemap" },
+    {     "linear",   0, 0, AV_OPT_TYPE_CONST, {.i64 = TONEMAP_LINEAR},            0, 0, FLAGS, "tonemap" },
+    {     "gamma",    0, 0, AV_OPT_TYPE_CONST, {.i64 = TONEMAP_GAMMA},             0, 0, FLAGS, "tonemap" },
+    {     "clip",     0, 0, AV_OPT_TYPE_CONST, {.i64 = TONEMAP_CLIP},              0, 0, FLAGS, "tonemap" },
+    {     "reinhard", 0, 0, AV_OPT_TYPE_CONST, {.i64 = TONEMAP_REINHARD},          0, 0, FLAGS, "tonemap" },
+    {     "hable",    0, 0, AV_OPT_TYPE_CONST, {.i64 = TONEMAP_HABLE},             0, 0, FLAGS, "tonemap" },
+    {     "mobius",   0, 0, AV_OPT_TYPE_CONST, {.i64 = TONEMAP_MOBIUS},            0, 0, FLAGS, "tonemap" },
+    { "transfer", "set transfer characteristic", OFFSET(trc), AV_OPT_TYPE_INT, {.i64 = -1}, -1, INT_MAX, FLAGS, "transfer" },
+    { "t",        "set transfer characteristic", OFFSET(trc), AV_OPT_TYPE_INT, {.i64 = -1}, -1, INT_MAX, FLAGS, "transfer" },
+    {     "bt709",            0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_TRC_BT709},         0, 0, FLAGS, "transfer" },
+    {     "bt2020",           0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_TRC_BT2020_10},     0, 0, FLAGS, "transfer" },
+    { "matrix", "set colorspace matrix", OFFSET(colorspace), AV_OPT_TYPE_INT, {.i64 = -1}, -1, INT_MAX, FLAGS, "matrix" },
+    { "m",      "set colorspace matrix", OFFSET(colorspace), AV_OPT_TYPE_INT, {.i64 = -1}, -1, INT_MAX, FLAGS, "matrix" },
+    {     "bt709",            0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_SPC_BT709},         0, 0, FLAGS, "matrix" },
+    {     "bt2020",           0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_SPC_BT2020_NCL},    0, 0, FLAGS, "matrix" },
+    { "primaries", "set color primaries", OFFSET(primaries), AV_OPT_TYPE_INT, {.i64 = -1}, -1, INT_MAX, FLAGS, "primaries" },
+    { "p",         "set color primaries", OFFSET(primaries), AV_OPT_TYPE_INT, {.i64 = -1}, -1, INT_MAX, FLAGS, "primaries" },
+    {     "bt709",            0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_PRI_BT709},         0, 0, FLAGS, "primaries" },
+    {     "bt2020",           0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_PRI_BT2020},        0, 0, FLAGS, "primaries" },
+    { "peak",      "signal peak override", OFFSET(peak), AV_OPT_TYPE_DOUBLE, {.dbl = 0}, 0, DBL_MAX, FLAGS },
+    { "param",     "tonemap parameter",   OFFSET(param), AV_OPT_TYPE_DOUBLE, {.dbl = NAN}, DBL_MIN, DBL_MAX, FLAGS },
+    { NULL }
+};
+
+AVFILTER_DEFINE_CLASS(tonemap_opencl);
+
+static const AVFilterPad tonemap_opencl_inputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .filter_frame = &tonemap_opencl_filter_frame,
+        .config_props = &ff_opencl_filter_config_input,
+    },
+    { NULL }
+};
+
+static const AVFilterPad tonemap_opencl_outputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .config_props = &tonemap_opencl_config_output,
+    },
+    { NULL }
+};
+
+AVFilter ff_vf_tonemap_opencl = {
+    .name           = "tonemap_opencl",
+    .description    = NULL_IF_CONFIG_SMALL("perform HDR to SDR conversion with tonemapping"),
+    .priv_size      = sizeof(TonemapOpenCLContext),
+    .priv_class     = &tonemap_opencl_class,
+    .init           = &ff_opencl_filter_init,
+    .uninit         = &tonemap_opencl_uninit,
+    .query_formats  = &ff_opencl_filter_query_formats,
+    .inputs         = tonemap_opencl_inputs,
+    .outputs        = tonemap_opencl_outputs,
+    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
+};