diff mbox

[FFmpeg-devel,v4,1/2] lavfi: add opencl tonemap filter.

Message ID 1529373452-376-1-git-send-email-ruiling.song@intel.com
State New
Headers show

Commit Message

Ruiling Song June 19, 2018, 1:57 a.m. UTC
This filter does HDR(HDR10/HLG) to SDR conversion with tone-mapping.

An example command to use this filter with vaapi codecs:
FFMPEG -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device \
opencl=ocl@va -hwaccel vaapi -hwaccel_device va -hwaccel_output_format \
vaapi -i INPUT -filter_hw_device ocl -filter_complex \
'[0:v]hwmap,tonemap_opencl=t=bt2020:tonemap=linear:format=p010[x1]; \
[x1]hwmap=derive_device=vaapi:reverse=1' -c:v hevc_vaapi -profile 2 OUTPUT

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
---
As I didn't receive any other comment on v3, this version only fix the comment from Michael.
And also include some little change to leverage CL_SET_KERNEL_ARG() macro.

Thanks!
Ruiling

 configure                               |   1 +
 libavfilter/Makefile                    |   2 +
 libavfilter/allfilters.c                |   1 +
 libavfilter/colorspace.c                |  90 +++++
 libavfilter/colorspace.h                |  41 +++
 libavfilter/opencl/colorspace_common.cl | 220 +++++++++++
 libavfilter/opencl/tonemap.cl           | 272 ++++++++++++++
 libavfilter/opencl_source.h             |   2 +
 libavfilter/vf_tonemap_opencl.c         | 624 ++++++++++++++++++++++++++++++++
 9 files changed, 1253 insertions(+)
 create mode 100644 libavfilter/colorspace.c
 create mode 100644 libavfilter/colorspace.h
 create mode 100644 libavfilter/opencl/colorspace_common.cl
 create mode 100644 libavfilter/opencl/tonemap.cl
 create mode 100644 libavfilter/vf_tonemap_opencl.c

Comments

Moritz Barsnick June 20, 2018, 11:32 a.m. UTC | #1
On Tue, Jun 19, 2018 at 09:57:31 +0800, Ruiling Song wrote:
> As I didn't receive any other comment on v3, this version only fix the comment from Michael.
> And also include some little change to leverage CL_SET_KERNEL_ARG() macro.

Could you kindly add some documentation to doc/filters.texi, at least
describing the filter's options?

> This filter does HDR(HDR10/HLG) to SDR conversion with tone-mapping.
> 
> An example command to use this filter with vaapi codecs:
> FFMPEG -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device \
> opencl=ocl@va -hwaccel vaapi -hwaccel_device va -hwaccel_output_format \
> vaapi -i INPUT -filter_hw_device ocl -filter_complex \
> '[0:v]hwmap,tonemap_opencl=t=bt2020:tonemap=linear:format=p010[x1]; \
> [x1]hwmap=derive_device=vaapi:reverse=1' -c:v hevc_vaapi -profile 2 OUTPUT

This could also go into the documentation.

Thanks,
Moritz
Mark Thompson June 21, 2018, 12:32 a.m. UTC | #2
On 19/06/18 02:57, Ruiling Song wrote:
> This filter does HDR(HDR10/HLG) to SDR conversion with tone-mapping.
> 
> An example command to use this filter with vaapi codecs:
> FFMPEG -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device \
> opencl=ocl@va -hwaccel vaapi -hwaccel_device va -hwaccel_output_format \
> vaapi -i INPUT -filter_hw_device ocl -filter_complex \
> '[0:v]hwmap,tonemap_opencl=t=bt2020:tonemap=linear:format=p010[x1]; \
> [x1]hwmap=derive_device=vaapi:reverse=1' -c:v hevc_vaapi -profile 2 OUTPUT
> 
> Signed-off-by: Ruiling Song <ruiling.song@intel.com>
> ---
> As I didn't receive any other comment on v3, this version only fix the comment from Michael.
> And also include some little change to leverage CL_SET_KERNEL_ARG() macro.
> 
> Thanks!
> Ruiling
> 
>  configure                               |   1 +
>  libavfilter/Makefile                    |   2 +
>  libavfilter/allfilters.c                |   1 +
>  libavfilter/colorspace.c                |  90 +++++
>  libavfilter/colorspace.h                |  41 +++
>  libavfilter/opencl/colorspace_common.cl | 220 +++++++++++
>  libavfilter/opencl/tonemap.cl           | 272 ++++++++++++++
>  libavfilter/opencl_source.h             |   2 +
>  libavfilter/vf_tonemap_opencl.c         | 624 ++++++++++++++++++++++++++++++++
>  9 files changed, 1253 insertions(+)
>  create mode 100644 libavfilter/colorspace.c
>  create mode 100644 libavfilter/colorspace.h
>  create mode 100644 libavfilter/opencl/colorspace_common.cl
>  create mode 100644 libavfilter/opencl/tonemap.cl
>  create mode 100644 libavfilter/vf_tonemap_opencl.c

I did a bit more testing, LGTM; therefore applied.

Some further thoughts:
* Did you get anywhere with testing on other platforms?  (I'm happy with the current state with it working on at least two different platforms including the actually-useful GPU-with-interop ones, but it would be nice to know what was going wrong in the other cases.)
* The single-frame-delay effect ends up looking a bit weird when I go looking for it - I now keep seeing the flash of a single frame at a different brightness on some transitions, though I'm not sure it's obvious enough that I would notice often if I didn't already know it was there.  How much performance would it actually cost to use the correct frame rather than the previous one?
* +1 to the comment from Moritz about documentation if you wouldn't mind writing some as a separate patch.

Thanks,

- Mark
Ruiling Song June 21, 2018, 12:53 a.m. UTC | #3
> -----Original Message-----

> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of

> Moritz Barsnick

> Sent: Wednesday, June 20, 2018 7:32 PM

> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>

> Subject: Re: [FFmpeg-devel] [PATCH v4 1/2] lavfi: add opencl tonemap filter.

> 

> On Tue, Jun 19, 2018 at 09:57:31 +0800, Ruiling Song wrote:

> > As I didn't receive any other comment on v3, this version only fix the comment

> from Michael.

> > And also include some little change to leverage CL_SET_KERNEL_ARG() macro.

> 

> Could you kindly add some documentation to doc/filters.texi, at least

> describing the filter's options?

> 

> > This filter does HDR(HDR10/HLG) to SDR conversion with tone-mapping.

> >

> > An example command to use this filter with vaapi codecs:

> > FFMPEG -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device \

> > opencl=ocl@va -hwaccel vaapi -hwaccel_device va -hwaccel_output_format \

> > vaapi -i INPUT -filter_hw_device ocl -filter_complex \

> > '[0:v]hwmap,tonemap_opencl=t=bt2020:tonemap=linear:format=p010[x1]; \

> > [x1]hwmap=derive_device=vaapi:reverse=1' -c:v hevc_vaapi -profile 2

> OUTPUT

> 

> This could also go into the documentation.


Yes, thanks. Although I am not quite good at English writing, I will try to add some document and send out for review.

Thanks!
Ruiling
> 

> Thanks,

> Moritz

> _______________________________________________

> ffmpeg-devel mailing list

> ffmpeg-devel@ffmpeg.org

> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Jun Zhao June 21, 2018, 1:06 a.m. UTC | #4
On Thu, Jun 21, 2018 at 8:33 AM Mark Thompson <sw@jkqxz.net> wrote:
>
> On 19/06/18 02:57, Ruiling Song wrote:
> > This filter does HDR(HDR10/HLG) to SDR conversion with tone-mapping.
> >
> > An example command to use this filter with vaapi codecs:
> > FFMPEG -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device \
> > opencl=ocl@va -hwaccel vaapi -hwaccel_device va -hwaccel_output_format \
> > vaapi -i INPUT -filter_hw_device ocl -filter_complex \
> > '[0:v]hwmap,tonemap_opencl=t=bt2020:tonemap=linear:format=p010[x1]; \
> > [x1]hwmap=derive_device=vaapi:reverse=1' -c:v hevc_vaapi -profile 2 OUTPUT
> >
> > Signed-off-by: Ruiling Song <ruiling.song@intel.com>
> > ---
> > As I didn't receive any other comment on v3, this version only fix the comment from Michael.
> > And also include some little change to leverage CL_SET_KERNEL_ARG() macro.
> >
> > Thanks!
> > Ruiling
> >
> >  configure                               |   1 +
> >  libavfilter/Makefile                    |   2 +
> >  libavfilter/allfilters.c                |   1 +
> >  libavfilter/colorspace.c                |  90 +++++
> >  libavfilter/colorspace.h                |  41 +++
> >  libavfilter/opencl/colorspace_common.cl | 220 +++++++++++
> >  libavfilter/opencl/tonemap.cl           | 272 ++++++++++++++
> >  libavfilter/opencl_source.h             |   2 +
> >  libavfilter/vf_tonemap_opencl.c         | 624 ++++++++++++++++++++++++++++++++
> >  9 files changed, 1253 insertions(+)
> >  create mode 100644 libavfilter/colorspace.c
> >  create mode 100644 libavfilter/colorspace.h
> >  create mode 100644 libavfilter/opencl/colorspace_common.cl
> >  create mode 100644 libavfilter/opencl/tonemap.cl
> >  create mode 100644 libavfilter/vf_tonemap_opencl.c
>
> I did a bit more testing, LGTM; therefore applied.
>
> Some further thoughts:
> * Did you get anywhere with testing on other platforms?  (I'm happy with the current state with it working on at least two different platforms including the actually-useful GPU-with-interop ones, but it would be nice to know what was going wrong in the other cases.)
Now I start to run intel NEO OpenCL GPU driver (now available on
github (https://github.com/intel/compute-runtime).) + iHD open source
driver(https://github.com/intel/media-driver), any status will update
to the mail list.
> * The single-frame-delay effect ends up looking a bit weird when I go looking for it - I now keep seeing the flash of a single frame at a different brightness on some transitions, though I'm not sure it's obvious enough that I would often notice if I didn't already know it was there.  How much performance would it actually cost to use the correct frame rather than the previous one?
> * +1 to the comment from Moritz about documentation if you wouldn't mind writing some as a separate patch.
>
+1 for comment, now OpenCL AVFilter use the complex command option,
the other thing is if someone uses the FFmpeg API to enable the OpenCL
AVFilter, I think we need some sample code in doc/examples to demo the
API enable the OpenCL HWaccel.
> Thanks,
>
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Ruiling Song June 21, 2018, 6:03 a.m. UTC | #5
> -----Original Message-----

> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of

> Mark Thompson

> Sent: Thursday, June 21, 2018 8:33 AM

> To: ffmpeg-devel@ffmpeg.org

> Subject: Re: [FFmpeg-devel] [PATCH v4 1/2] lavfi: add opencl tonemap filter.

> 

> On 19/06/18 02:57, Ruiling Song wrote:

> > This filter does HDR(HDR10/HLG) to SDR conversion with tone-mapping.

> >

> > An example command to use this filter with vaapi codecs:

> > FFMPEG -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device \

> > opencl=ocl@va -hwaccel vaapi -hwaccel_device va -hwaccel_output_format \

> > vaapi -i INPUT -filter_hw_device ocl -filter_complex \

> > '[0:v]hwmap,tonemap_opencl=t=bt2020:tonemap=linear:format=p010[x1]; \

> > [x1]hwmap=derive_device=vaapi:reverse=1' -c:v hevc_vaapi -profile 2

> OUTPUT

> >

> > Signed-off-by: Ruiling Song <ruiling.song@intel.com>

> > ---

> > As I didn't receive any other comment on v3, this version only fix the comment

> from Michael.

> > And also include some little change to leverage CL_SET_KERNEL_ARG() macro.

> >

> > Thanks!

> > Ruiling

> >

> >  configure                               |   1 +

> >  libavfilter/Makefile                    |   2 +

> >  libavfilter/allfilters.c                |   1 +

> >  libavfilter/colorspace.c                |  90 +++++

> >  libavfilter/colorspace.h                |  41 +++

> >  libavfilter/opencl/colorspace_common.cl | 220 +++++++++++

> >  libavfilter/opencl/tonemap.cl           | 272 ++++++++++++++

> >  libavfilter/opencl_source.h             |   2 +

> >  libavfilter/vf_tonemap_opencl.c         | 624

> ++++++++++++++++++++++++++++++++

> >  9 files changed, 1253 insertions(+)

> >  create mode 100644 libavfilter/colorspace.c

> >  create mode 100644 libavfilter/colorspace.h

> >  create mode 100644 libavfilter/opencl/colorspace_common.cl

> >  create mode 100644 libavfilter/opencl/tonemap.cl

> >  create mode 100644 libavfilter/vf_tonemap_opencl.c

> 

> I did a bit more testing, LGTM; therefore applied.

> 

> Some further thoughts:

> * Did you get anywhere with testing on other platforms?  (I'm happy with the

> current state with it working on at least two different platforms including the

> actually-useful GPU-with-interop ones, but it would be nice to know what was

> going wrong in the other cases.)

So, this version with segfault fix still does not work on Mali platform, same error as before?
For the other platform, do you mean other hardware? Currently I don't have other hardware to use.
But I will add support for yuv420p so we can make it work with pocl.

> * The single-frame-delay effect ends up looking a bit weird when I go looking for

> it - I now keep seeing the flash of a single frame at a different brightness on

> some transitions, though I'm not sure it's obvious enough that I would notice

> often if I didn't already know it was there.  How much performance would it

> actually cost to use the correct frame rather than the previous one?

I will try to remove this one frame delay. But it need some time.
I think it is not easy to predict how much performance effect it would be. May be not too much.
I will continue improve this filter step by step.

> * +1 to the comment from Moritz about documentation if you wouldn't mind

> writing some as a separate patch.

Will add it.

> 

> Thanks,

> 

> - Mark

> _______________________________________________

> ffmpeg-devel mailing list

> ffmpeg-devel@ffmpeg.org

> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Mark Thompson July 2, 2018, 10:33 p.m. UTC | #6
On 21/06/18 07:03, Song, Ruiling wrote:
>> -----Original Message-----
>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of
>> Mark Thompson
>> Sent: Thursday, June 21, 2018 8:33 AM
>> To: ffmpeg-devel@ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH v4 1/2] lavfi: add opencl tonemap filter.
>>
>> On 19/06/18 02:57, Ruiling Song wrote:
>>> This filter does HDR(HDR10/HLG) to SDR conversion with tone-mapping.
>>>
>>> An example command to use this filter with vaapi codecs:
>>> FFMPEG -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device \
>>> opencl=ocl@va -hwaccel vaapi -hwaccel_device va -hwaccel_output_format \
>>> vaapi -i INPUT -filter_hw_device ocl -filter_complex \
>>> '[0:v]hwmap,tonemap_opencl=t=bt2020:tonemap=linear:format=p010[x1]; \
>>> [x1]hwmap=derive_device=vaapi:reverse=1' -c:v hevc_vaapi -profile 2
>> OUTPUT
>>>
>>> Signed-off-by: Ruiling Song <ruiling.song@intel.com>
>>> ---
>>> As I didn't receive any other comment on v3, this version only fix the comment
>> from Michael.
>>> And also include some little change to leverage CL_SET_KERNEL_ARG() macro.
>>>
>>> Thanks!
>>> Ruiling
>>>
>>>  configure                               |   1 +
>>>  libavfilter/Makefile                    |   2 +
>>>  libavfilter/allfilters.c                |   1 +
>>>  libavfilter/colorspace.c                |  90 +++++
>>>  libavfilter/colorspace.h                |  41 +++
>>>  libavfilter/opencl/colorspace_common.cl | 220 +++++++++++
>>>  libavfilter/opencl/tonemap.cl           | 272 ++++++++++++++
>>>  libavfilter/opencl_source.h             |   2 +
>>>  libavfilter/vf_tonemap_opencl.c         | 624
>> ++++++++++++++++++++++++++++++++
>>>  9 files changed, 1253 insertions(+)
>>>  create mode 100644 libavfilter/colorspace.c
>>>  create mode 100644 libavfilter/colorspace.h
>>>  create mode 100644 libavfilter/opencl/colorspace_common.cl
>>>  create mode 100644 libavfilter/opencl/tonemap.cl
>>>  create mode 100644 libavfilter/vf_tonemap_opencl.c
>>
>> I did a bit more testing, LGTM; therefore applied.
>>
>> Some further thoughts:
>> * Did you get anywhere with testing on other platforms?  (I'm happy with the
>> current state with it working on at least two different platforms including the
>> actually-useful GPU-with-interop ones, but it would be nice to know what was
>> going wrong in the other cases.)
> So, this version with segfault fix still does not work on Mali platform, same error as before?

Complete log below, running on a RK3288 with Mali T760.  (The input file here is from <http://4kmedia.org/lg-new-york-hdr-uhd-4k-demo/>, but it doesn't appear to matter.)

Thanks,

- Mark


ffmpeg started on 2018-07-02 at 22:27:59
Report written to "ffmpeg-20180702-222759.log"
Command line:
./ffmpeg_g -report -v 55 -y -threads 1 -i "/home/mrt/test/LG New York HDR UHD 4K Demo.ts" -init_hw_device opencl -filter_hw_device opencl0 -an -vf "format=p010,hwupload,tonemap_opencl=format=nv12,hwdownload,format=nv12" -c:v libx264 out.mp4
ffmpeg version N-91405-g54b425a7fa Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 6.3.0 (Debian 6.3.0-18+deb9u1) 20170516
  configuration: --enable-debug --enable-opencl --enable-libdrm --enable-rkmpp --enable-gpl --enable-version3 --enable-libx264 --extra-ldflags='-L/usr/local/lib -lmali-midgard-r13p0-fbdev'
  libavutil      56. 18.102 / 56. 18.102
  libavcodec     58. 20.104 / 58. 20.104
  libavformat    58. 17.101 / 58. 17.101
  libavdevice    58.  4.101 / 58.  4.101
  libavfilter     7. 25.100 /  7. 25.100
  libswscale      5.  2.100 /  5.  2.100
  libswresample   3.  2.100 /  3.  2.100
  libpostproc    55.  2.100 / 55.  2.100
Splitting the commandline.
Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'.
Reading option '-v' ... matched as option 'v' (set logging level) with argument '55'.
Reading option '-y' ... matched as option 'y' (overwrite output files) with argument '1'.
Reading option '-threads' ... matched as AVOption 'threads' with argument '1'.
Reading option '-i' ... matched as input url with argument '/home/mrt/test/LG New York HDR UHD 4K Demo.ts'.
Reading option '-init_hw_device' ... matched as option 'init_hw_device' (initialise hardware device) with argument 'opencl'.
Reading option '-filter_hw_device' ... matched as option 'filter_hw_device' (set hardware device used when filtering) with argument 'opencl0'.
Reading option '-an' ... matched as option 'an' (disable audio) with argument '1'.
Reading option '-vf' ... matched as option 'vf' (set video filters) with argument 'format=p010,hwupload,tonemap_opencl=format=nv12,hwdownload,format=nv12'.
Reading option '-c:v' ... matched as option 'c' (codec name) with argument 'libx264'.
Reading option 'out.mp4' ... matched as output url.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option report (generate a report) with argument 1.
Applying option v (set logging level) with argument 55.
Applying option y (overwrite output files) with argument 1.
Applying option init_hw_device (initialise hardware device) with argument opencl.
[AVHWDeviceContext @ 0x828e42b0] 1 OpenCL platforms found.
[AVHWDeviceContext @ 0x828e42b0] 1 OpenCL devices found on platform "ARM Platform".
[AVHWDeviceContext @ 0x828e42b0] 0.0: ARM Platform / Mali-T760
[AVHWDeviceContext @ 0x828e42b0] cl_arm_import_memory found as platform extension.
[AVHWDeviceContext @ 0x828e42b0] cl_khr_image2d_from_buffer found as platform extension.
Applying option filter_hw_device (set hardware device used when filtering) with argument opencl0.
Successfully parsed a group of options.
Parsing a group of options: input url /home/mrt/test/LG New York HDR UHD 4K Demo.ts.
Successfully parsed a group of options.
Opening an input file: /home/mrt/test/LG New York HDR UHD 4K Demo.ts.
[NULL @ 0x82936f70] Opening '/home/mrt/test/LG New York HDR UHD 4K Demo.ts' for reading
[file @ 0x82937760] Setting default whitelist 'file,crypto'
[mpegts @ 0x82936f70] Format mpegts probed with size=2048 and score=50
[mpegts @ 0x82936f70] stream=0 stream_type=24 pid=101 prog_reg_desc=
[mpegts @ 0x82936f70] stream=1 stream_type=f pid=102 prog_reg_desc=
[mpegts @ 0x82936f70] Before avformat_find_stream_info() pos: 0 bytes read:32768 seeks:0 nb_streams:2
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 20(IDR_N_LP), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 38(FD_NUT), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding VPS
[hevc @ 0x8293b490] Main 10 profile bitstream
[hevc @ 0x8293b490] Decoding SPS
[hevc @ 0x8293b490] Main 10 profile bitstream
[hevc @ 0x8293b490] Decoding VUI
[hevc @ 0x8293b490] Decoding PPS
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] Skipped PREFIX SEI 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] Skipped PREFIX SEI 5
[hevc @ 0x8293b490] Decoding SEI
[AVBSFContext @ 0x829c6100] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[AVBSFContext @ 0x829c6100] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[AVBSFContext @ 0x829c6100] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[AVBSFContext @ 0x829c6100] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[AVBSFContext @ 0x829c6100] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[AVBSFContext @ 0x829c6100] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[AVBSFContext @ 0x829c6100] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[AVBSFContext @ 0x829c6100] nal_unit_type: 20(IDR_N_LP), nuh_layer_id: 0, temporal_id: 0
[AVBSFContext @ 0x829c6100] nal_unit_type: 38(FD_NUT), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 20(IDR_N_LP), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 38(FD_NUT), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding VPS
[hevc @ 0x8293b490] Main 10 profile bitstream
[hevc @ 0x8293b490] Decoding SPS
[hevc @ 0x8293b490] Main 10 profile bitstream
[hevc @ 0x8293b490] Decoding VUI
[hevc @ 0x8293b490] Decoding PPS
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] Skipped PREFIX SEI 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] Skipped PREFIX SEI 5
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] Format yuv420p10le chosen by get_format().
[hevc @ 0x8293b490] Mastering Display Metadata:
[hevc @ 0x8293b490] r(0.6800,0.3200) g(0.2650,0.6900) b(0.1500 0.0600) wp(0.3127, 0.3290)
[hevc @ 0x8293b490] min_luminance=0.050000, max_luminance=1200.000000
[hevc @ 0x8293b490] Output frame with POC 0.
[hevc @ 0x8293b490] Decoded frame with POC 0.
[hevc @ 0x8293b490] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding VPS
[hevc @ 0x8293b490] Main 10 profile bitstream
[hevc @ 0x8293b490] Decoding SPS
[hevc @ 0x8293b490] Main 10 profile bitstream
[hevc @ 0x8293b490] Decoding VUI
[hevc @ 0x8293b490] Decoding PPS
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 38(FD_NUT), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 20(IDR_N_LP), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 38(FD_NUT), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding VPS
[hevc @ 0x8293b490] Main 10 profile bitstream
[hevc @ 0x8293b490] Decoding SPS
[hevc @ 0x8293b490] Main 10 profile bitstream
[hevc @ 0x8293b490] Decoding VUI
[hevc @ 0x8293b490] Decoding PPS
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] Skipped PREFIX SEI 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] Skipped PREFIX SEI 6
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] Skipped PREFIX SEI 5
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 38(FD_NUT), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[hevc @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x8293b490] Decoding SEI
[mpegts @ 0x82936f70] Probe buffer size limit of 5000000 bytes reached
[mpegts @ 0x82936f70] start time for stream 1 is not set in estimate_timings_from_pts
[mpegts @ 0x82936f70] probing stream 1 pp:2500
[mpegts @ 0x82936f70] Probe with size=1688, packets=1 detected aac with score=25
[mpegts @ 0x82936f70] probed stream 1
[mpegts @ 0x82936f70] stream 1 : no TS found at start of file, duration not set
[mpegts @ 0x82936f70] Could not find codec parameters for stream 1 (Audio: aac ([15][0][0][0] / 0x000F), 0 channels): unspecified sample format
Consider increasing the value for the 'analyzeduration' and 'probesize' options
[mpegts @ 0x82936f70] After avformat_find_stream_info() pos: 0 bytes read:29676400 seeks:8 frames:18
Input #0, mpegts, from '/home/mrt/test/LG New York HDR UHD 4K Demo.ts':
  Duration: 00:01:12.24, start: 0.999989, bitrate: 52032 kb/s
  Program 1 
    Stream #0:0[0x101], 18, 1/90000: Video: hevc (Main 10), 1 reference frame ([36][0][0][0] / 0x0024), yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 3840x2160 [SAR 1:1 DAR 16:9], 0/1, 25 fps, 25 tbr, 90k tbn, 25 tbc
    Stream #0:1[0x102](und), 0, 1/90000: Audio: aac ([15][0][0][0] / 0x000F), 0 channels
Successfully opened the file.
Parsing a group of options: output url out.mp4.
Applying option an (disable audio) with argument 1.
Applying option vf (set video filters) with argument format=p010,hwupload,tonemap_opencl=format=nv12,hwdownload,format=nv12.
Applying option c:v (codec name) with argument libx264.
Successfully parsed a group of options.
Opening an output file: out.mp4.
[file @ 0x82961e10] Setting default whitelist 'file,crypto'
Successfully opened the file.
[hevc @ 0x82960f30] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x82960f30] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x82960f30] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x82960f30] Decoding VPS
[hevc @ 0x82960f30] Main 10 profile bitstream
[hevc @ 0x82960f30] Decoding SPS
[hevc @ 0x82960f30] Main 10 profile bitstream
[hevc @ 0x82960f30] Decoding VUI
[hevc @ 0x82960f30] Decoding PPS
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> h264 (libx264))
Press [q] to stop, [?] for help
cur_dts is invalid (this is harmless if it occurs once at the start per stream)
[NULL @ 0x8293b490] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[NULL @ 0x8293b490] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[NULL @ 0x8293b490] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[NULL @ 0x8293b490] Decoding VPS
[NULL @ 0x8293b490] Main 10 profile bitstream
[NULL @ 0x8293b490] Decoding SPS
[NULL @ 0x8293b490] Main 10 profile bitstream
[NULL @ 0x8293b490] Decoding VUI
[NULL @ 0x8293b490] Decoding PPS
[NULL @ 0x8293b490] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[NULL @ 0x8293b490] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[NULL @ 0x8293b490] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[NULL @ 0x8293b490] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[NULL @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[NULL @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[NULL @ 0x8293b490] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[NULL @ 0x8293b490] nal_unit_type: 20(IDR_N_LP), nuh_layer_id: 0, temporal_id: 0
[NULL @ 0x8293b490] nal_unit_type: 38(FD_NUT), nuh_layer_id: 0, temporal_id: 0
[NULL @ 0x8293b490] Decoding VPS
[NULL @ 0x8293b490] Main 10 profile bitstream
[NULL @ 0x8293b490] Decoding SPS
[NULL @ 0x8293b490] Main 10 profile bitstream
[NULL @ 0x8293b490] Decoding VUI
[NULL @ 0x8293b490] Decoding PPS
[NULL @ 0x8293b490] Decoding SEI
[NULL @ 0x8293b490] Decoding SEI
[NULL @ 0x8293b490] Skipped PREFIX SEI 0
[NULL @ 0x8293b490] Decoding SEI
[NULL @ 0x8293b490] Decoding SEI
[NULL @ 0x8293b490] Skipped PREFIX SEI 5
[NULL @ 0x8293b490] Decoding SEI
[hevc @ 0x82960f30] nal_unit_type: 35(AUD), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x82960f30] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x82960f30] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x82960f30] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x82960f30] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x82960f30] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x82960f30] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x82960f30] nal_unit_type: 20(IDR_N_LP), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x82960f30] nal_unit_type: 38(FD_NUT), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x82960f30] Decoding VPS
[hevc @ 0x82960f30] Main 10 profile bitstream
[hevc @ 0x82960f30] Decoding SPS
[hevc @ 0x82960f30] Main 10 profile bitstream
[hevc @ 0x82960f30] Decoding VUI
[hevc @ 0x82960f30] Decoding PPS
[hevc @ 0x82960f30] Decoding SEI
[hevc @ 0x82960f30] Decoding SEI
[hevc @ 0x82960f30] Skipped PREFIX SEI 0
[hevc @ 0x82960f30] Decoding SEI
[hevc @ 0x82960f30] Decoding SEI
[hevc @ 0x82960f30] Skipped PREFIX SEI 5
[hevc @ 0x82960f30] Decoding SEI
[hevc @ 0x82960f30] Format yuv420p10le chosen by get_format().
[hevc @ 0x82960f30] Mastering Display Metadata:
[hevc @ 0x82960f30] r(0.6800,0.3200) g(0.2650,0.6900) b(0.1500 0.0600) wp(0.3127, 0.3290)
[hevc @ 0x82960f30] min_luminance=0.050000, max_luminance=1200.000000
[hevc @ 0x82960f30] Output frame with POC 0.
[hevc @ 0x82960f30] Decoded frame with POC 0.
detected 4 logical cores
[Parsed_format_0 @ 0x829b7ee0] Setting 'pix_fmts' to value 'p010'
[Parsed_tonemap_opencl_2 @ 0x829b8460] Setting 'format' to value 'nv12'
[Parsed_format_4 @ 0x829b8d00] Setting 'pix_fmts' to value 'nv12'
[graph 0 input from stream 0:0 @ 0x829b92c0] Setting 'video_size' to value '3840x2160'
[graph 0 input from stream 0:0 @ 0x829b92c0] Setting 'pix_fmt' to value '64'
[graph 0 input from stream 0:0 @ 0x829b92c0] Setting 'time_base' to value '1/90000'
[graph 0 input from stream 0:0 @ 0x829b92c0] Setting 'pixel_aspect' to value '1/1'
[graph 0 input from stream 0:0 @ 0x829b92c0] Setting 'sws_param' to value 'flags=2'
[graph 0 input from stream 0:0 @ 0x829b92c0] Setting 'frame_rate' to value '25/1'
[graph 0 input from stream 0:0 @ 0x829b92c0] w:3840 h:2160 pixfmt:yuv420p10le tb:1/90000 fr:25/1 sar:1/1 sws_param:flags=2
[format @ 0x8295b990] Setting 'pix_fmts' to value 'yuv420p|yuvj420p|yuv422p|yuvj422p|yuv444p|yuvj444p|nv12|nv16|nv21'
[AVHWDeviceContext @ 0x828e42b0] Maximum supported image size 65536x65536.
[AVHWDeviceContext @ 0x828e42b0] Format yuv420p supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuv422p supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuv444p supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuv410p supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuv411p supported.
[AVHWDeviceContext @ 0x828e42b0] Format gray supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuvj420p supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuvj422p supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuvj444p supported.
[AVHWDeviceContext @ 0x828e42b0] Format nv12 supported.
[AVHWDeviceContext @ 0x828e42b0] Format nv21 supported.
[AVHWDeviceContext @ 0x828e42b0] Format argb supported.
[AVHWDeviceContext @ 0x828e42b0] Format rgba supported.
[AVHWDeviceContext @ 0x828e42b0] Format abgr supported.
[AVHWDeviceContext @ 0x828e42b0] Format bgra supported.
[AVHWDeviceContext @ 0x828e42b0] Format gray16le supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuv440p supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuvj440p supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuva420p supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuv420p16le supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuv422p16le supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuv444p16le supported.
[AVHWDeviceContext @ 0x828e42b0] Format ya8 supported.
[AVHWDeviceContext @ 0x828e42b0] Format gbrp supported.
[AVHWDeviceContext @ 0x828e42b0] Format gbrp16le supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuva422p supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuva444p supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuva420p16le supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuva422p16le supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuva444p16le supported.
[AVHWDeviceContext @ 0x828e42b0] Format nv16 supported.
[AVHWDeviceContext @ 0x828e42b0] Format rgba64le supported.
[AVHWDeviceContext @ 0x828e42b0] Format bgra64le supported.
[AVHWDeviceContext @ 0x828e42b0] Format ya16le supported.
[AVHWDeviceContext @ 0x828e42b0] Format gbrap supported.
[AVHWDeviceContext @ 0x828e42b0] Format gbrap16le supported.
[AVHWDeviceContext @ 0x828e42b0] Format yuvj411p supported.
[AVHWDeviceContext @ 0x828e42b0] Format ayuv64le supported.
[AVHWDeviceContext @ 0x828e42b0] Format p010le supported.
[AVHWDeviceContext @ 0x828e42b0] Format p016le supported.
[auto_scaler_0 @ 0x829b9d60] Setting 'flags' to value 'bicubic'
[auto_scaler_0 @ 0x829b9d60] w:iw h:ih flags:'bicubic' interl:0
[Parsed_format_0 @ 0x829b7ee0] auto-inserting filter 'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the filter 'Parsed_format_0'
[AVFilterGraph @ 0x829c8440] query_formats: 8 queried, 6 merged, 1 already done, 0 delayed
[auto_scaler_0 @ 0x829b9d60] w:3840 h:2160 fmt:yuv420p10le sar:1/1 -> w:3840 h:2160 fmt:p010le sar:1/1 flags:0x4
[hwupload @ 0x8295ff20] Surface format is p010le.
[tonemap_opencl @ 0x829b84d0] Filter input: opencl, 3840x2160 (0).
[tonemap_opencl @ 0x829b84d0] tone mapping transfer from smpte2084 to bt709
[tonemap_opencl @ 0x829b84d0] mapping colorspace from bt2020nc to bt2020nc
[tonemap_opencl @ 0x829b84d0] mapping primaries from bt2020 to bt2020
[tonemap_opencl @ 0x829b84d0] mapping range from tv to tv
[Parsed_tonemap_opencl_2 @ 0x829b8460] Generated OpenCL header:
__constant const float tone_param = 1.0000f;
__constant const float desat_param = 0.5000f;
__constant const float target_peak = 1.0000f;
__constant const float sdr_avg = 0.2500f;
__constant const float scene_threshold = 0.2000f;
#define TONE_FUNC direct
#define DETECTION_FRAMES 63
#define chroma_loc 0
#define RGB2RGB_PASSTHROUGH
#define rgb_matrix yuv2rgb_bt2020
#define yuv_matrix rgb2yuv_bt2020
constant float3 luma_src = {0.2627f, 0.6780f, 0.0593f};
constant float3 luma_dst = {0.2627f, 0.6780f, 0.0593f};
#define linearize eotf_st2084
#define delinearize inverse_eotf_bt1886

[Parsed_tonemap_opencl_2 @ 0x829b8460] Failed to enqueue kernel: -5.
Error while filtering: Input/output error
Failed to inject frame into filter network: Input/output error
Error while processing the decoded data for stream #0:0
[AVIOContext @ 0x8295f1d0] Statistics: 0 seeks, 0 writeouts
[AVIOContext @ 0x8293f8b0] Statistics: 30167920 bytes read, 8 seeks
Conversion failed!
diff mbox

Patch

diff --git a/configure b/configure
index 333e326..d9c5d63 100755
--- a/configure
+++ b/configure
@@ -3411,6 +3411,7 @@  tinterlace_filter_deps="gpl"
 tinterlace_merge_test_deps="tinterlace_filter"
 tinterlace_pad_test_deps="tinterlace_filter"
 tonemap_filter_deps="const_nan"
+tonemap_opencl_filter_deps="opencl const_nan"
 unsharp_opencl_filter_deps="opencl"
 uspp_filter_deps="gpl avcodec"
 vaguedenoiser_filter_deps="gpl"
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 5b4be49..d2c85cf 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -356,6 +356,8 @@  OBJS-$(CONFIG_TINTERLACE_FILTER)             += vf_tinterlace.o
 OBJS-$(CONFIG_TLUT2_FILTER)                  += vf_lut2.o framesync.o
 OBJS-$(CONFIG_TMIX_FILTER)                   += vf_mix.o framesync.o
 OBJS-$(CONFIG_TONEMAP_FILTER)                += vf_tonemap.o
+OBJS-$(CONFIG_TONEMAP_OPENCL_FILTER)         += vf_tonemap_opencl.o colorspace.o opencl.o \
+                                                opencl/tonemap.o opencl/colorspace_common.o
 OBJS-$(CONFIG_TRANSPOSE_FILTER)              += vf_transpose.o
 OBJS-$(CONFIG_TRIM_FILTER)                   += trim.o
 OBJS-$(CONFIG_UNPREMULTIPLY_FILTER)          += vf_premultiply.o framesync.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index f2d27d2..fa85c29 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -345,6 +345,7 @@  extern AVFilter ff_vf_tinterlace;
 extern AVFilter ff_vf_tlut2;
 extern AVFilter ff_vf_tmix;
 extern AVFilter ff_vf_tonemap;
+extern AVFilter ff_vf_tonemap_opencl;
 extern AVFilter ff_vf_transpose;
 extern AVFilter ff_vf_trim;
 extern AVFilter ff_vf_unpremultiply;
diff --git a/libavfilter/colorspace.c b/libavfilter/colorspace.c
new file mode 100644
index 0000000..7fd7bdf
--- /dev/null
+++ b/libavfilter/colorspace.c
@@ -0,0 +1,90 @@ 
+/*
+ * Copyright (c) 2016 Ronald S. Bultje <rsbultje@gmail.com>
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "colorspace.h"
+
+
+void invert_matrix3x3(const double in[3][3], double out[3][3])
+{
+    double m00 = in[0][0], m01 = in[0][1], m02 = in[0][2],
+           m10 = in[1][0], m11 = in[1][1], m12 = in[1][2],
+           m20 = in[2][0], m21 = in[2][1], m22 = in[2][2];
+    int i, j;
+    double det;
+
+    out[0][0] =  (m11 * m22 - m21 * m12);
+    out[0][1] = -(m01 * m22 - m21 * m02);
+    out[0][2] =  (m01 * m12 - m11 * m02);
+    out[1][0] = -(m10 * m22 - m20 * m12);
+    out[1][1] =  (m00 * m22 - m20 * m02);
+    out[1][2] = -(m00 * m12 - m10 * m02);
+    out[2][0] =  (m10 * m21 - m20 * m11);
+    out[2][1] = -(m00 * m21 - m20 * m01);
+    out[2][2] =  (m00 * m11 - m10 * m01);
+
+    det = m00 * out[0][0] + m10 * out[0][1] + m20 * out[0][2];
+    det = 1.0 / det;
+
+    for (i = 0; i < 3; i++) {
+        for (j = 0; j < 3; j++)
+            out[i][j] *= det;
+    }
+}
+
+void mul3x3(double dst[3][3], const double src1[3][3], const double src2[3][3])
+{
+    int m, n;
+
+    for (m = 0; m < 3; m++)
+        for (n = 0; n < 3; n++)
+            dst[m][n] = src2[m][0] * src1[0][n] +
+                        src2[m][1] * src1[1][n] +
+                        src2[m][2] * src1[2][n];
+}
+/*
+ * see e.g. http://www.brucelindbloom.com/index.html?Eqn_RGB_XYZ_Matrix.html
+ */
+void fill_rgb2xyz_table(const struct PrimaryCoefficients *coeffs,
+                        const struct WhitepointCoefficients *wp,
+                        double rgb2xyz[3][3])
+{
+    double i[3][3], sr, sg, sb, zw;
+
+    rgb2xyz[0][0] = coeffs->xr / coeffs->yr;
+    rgb2xyz[0][1] = coeffs->xg / coeffs->yg;
+    rgb2xyz[0][2] = coeffs->xb / coeffs->yb;
+    rgb2xyz[1][0] = rgb2xyz[1][1] = rgb2xyz[1][2] = 1.0;
+    rgb2xyz[2][0] = (1.0 - coeffs->xr - coeffs->yr) / coeffs->yr;
+    rgb2xyz[2][1] = (1.0 - coeffs->xg - coeffs->yg) / coeffs->yg;
+    rgb2xyz[2][2] = (1.0 - coeffs->xb - coeffs->yb) / coeffs->yb;
+    invert_matrix3x3(rgb2xyz, i);
+    zw = 1.0 - wp->xw - wp->yw;
+    sr = i[0][0] * wp->xw + i[0][1] * wp->yw + i[0][2] * zw;
+    sg = i[1][0] * wp->xw + i[1][1] * wp->yw + i[1][2] * zw;
+    sb = i[2][0] * wp->xw + i[2][1] * wp->yw + i[2][2] * zw;
+    rgb2xyz[0][0] *= sr;
+    rgb2xyz[0][1] *= sg;
+    rgb2xyz[0][2] *= sb;
+    rgb2xyz[1][0] *= sr;
+    rgb2xyz[1][1] *= sg;
+    rgb2xyz[1][2] *= sb;
+    rgb2xyz[2][0] *= sr;
+    rgb2xyz[2][1] *= sg;
+    rgb2xyz[2][2] *= sb;
+}
diff --git a/libavfilter/colorspace.h b/libavfilter/colorspace.h
new file mode 100644
index 0000000..d330917
--- /dev/null
+++ b/libavfilter/colorspace.h
@@ -0,0 +1,41 @@ 
+/*
+ * Copyright (c) 2016 Ronald S. Bultje <rsbultje@gmail.com>
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVFILTER_COLORSPACE_H
+#define AVFILTER_COLORSPACE_H
+
+#include "libavutil/common.h"
+
+struct LumaCoefficients {
+    double cr, cg, cb;
+};
+
+struct PrimaryCoefficients {
+    double xr, yr, xg, yg, xb, yb;
+};
+
+struct WhitepointCoefficients {
+    double xw, yw;
+};
+
+void invert_matrix3x3(const double in[3][3], double out[3][3]);
+void mul3x3(double dst[3][3], const double src1[3][3], const double src2[3][3]);
+void fill_rgb2xyz_table(const struct PrimaryCoefficients *coeffs,
+                        const struct WhitepointCoefficients *wp, double rgb2xyz[3][3]);
+#endif
diff --git a/libavfilter/opencl/colorspace_common.cl b/libavfilter/opencl/colorspace_common.cl
new file mode 100644
index 0000000..94a4dd0
--- /dev/null
+++ b/libavfilter/opencl/colorspace_common.cl
@@ -0,0 +1,220 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#define ST2084_MAX_LUMINANCE 10000.0f
+#define REFERENCE_WHITE 100.0f
+
+#if chroma_loc == 1
+    #define chroma_sample(a,b,c,d) (((a) + (c)) * 0.5f)
+#elif chroma_loc == 3
+    #define chroma_sample(a,b,c,d) (a)
+#elif chroma_loc == 4
+    #define chroma_sample(a,b,c,d) (((a) + (b)) * 0.5f)
+#elif chroma_loc == 5
+    #define chroma_sample(a,b,c,d) (c)
+#elif chroma_loc == 6
+    #define chroma_sample(a,b,c,d) (((c) + (d)) * 0.5f)
+#else
+    #define chroma_sample(a,b,c,d) (((a) + (b) + (c) + (d)) * 0.25f)
+#endif
+
+constant const float ST2084_M1 = 0.1593017578125f;
+constant const float ST2084_M2 = 78.84375f;
+constant const float ST2084_C1 = 0.8359375f;
+constant const float ST2084_C2 = 18.8515625f;
+constant const float ST2084_C3 = 18.6875f;
+
+__constant float yuv2rgb_bt2020[] = {
+    1.0f, 0.0f, 1.4746f,
+    1.0f, -0.16455f, -0.57135f,
+    1.0f, 1.8814f, 0.0f
+};
+
+__constant float yuv2rgb_bt709[] = {
+    1.0f, 0.0f, 1.5748f,
+    1.0f, -0.18732f, -0.46812f,
+    1.0f, 1.8556f, 0.0f
+};
+
+__constant float rgb2yuv_bt709[] = {
+    0.2126f, 0.7152f, 0.0722f,
+    -0.11457f, -0.38543f, 0.5f,
+    0.5f, -0.45415f, -0.04585f
+};
+
+__constant float rgb2yuv_bt2020[] ={
+    0.2627f, 0.678f, 0.0593f,
+    -0.1396f, -0.36037f, 0.5f,
+    0.5f, -0.4598f, -0.0402f,
+};
+
+
+float get_luma_dst(float3 c) {
+    return luma_dst.x * c.x + luma_dst.y * c.y + luma_dst.z * c.z;
+}
+
+float get_luma_src(float3 c) {
+    return luma_src.x * c.x + luma_src.y * c.y + luma_src.z * c.z;
+}
+
+float3 get_chroma_sample(float3 a, float3 b, float3 c, float3 d) {
+    return chroma_sample(a, b, c, d);
+}
+
+float eotf_st2084(float x) {
+    float p = powr(x, 1.0f / ST2084_M2);
+    float a = max(p -ST2084_C1, 0.0f);
+    float b = max(ST2084_C2 - ST2084_C3 * p, 1e-6f);
+    float c  = powr(a / b, 1.0f / ST2084_M1);
+    return x > 0.0f ? c * ST2084_MAX_LUMINANCE / REFERENCE_WHITE : 0.0f;
+}
+
+__constant const float HLG_A = 0.17883277f;
+__constant const float HLG_B = 0.28466892f;
+__constant const float HLG_C = 0.55991073f;
+
+// linearizer for HLG
+float inverse_oetf_hlg(float x) {
+    float a = 4.0f * x * x;
+    float b = exp((x - HLG_C) / HLG_A) + HLG_B;
+    return x < 0.5f ? a : b;
+}
+
+// delinearizer for HLG
+float oetf_hlg(float x) {
+    float a = 0.5f * sqrt(x);
+    float b = HLG_A * log(x - HLG_B) + HLG_C;
+    return x <= 1.0f ? a : b;
+}
+
+float3 ootf_hlg(float3 c, float peak) {
+    float luma = get_luma_src(c);
+    float gamma =  1.2f + 0.42f * log10(peak * REFERENCE_WHITE / 1000.0f);
+    gamma = max(1.0f, gamma);
+    float factor = peak * powr(luma, gamma - 1.0f) / powr(12.0f, gamma);
+    return c * factor;
+}
+
+float3 inverse_ootf_hlg(float3 c, float peak) {
+    float gamma = 1.2f + 0.42f * log10(peak * REFERENCE_WHITE / 1000.0f);
+    c *=  powr(12.0f, gamma) / peak;
+    c /= powr(get_luma_dst(c), (gamma - 1.0f) / gamma);
+    return c;
+}
+
+float inverse_eotf_bt1886(float c) {
+    return c < 0.0f ? 0.0f : powr(c, 1.0f / 2.4f);
+}
+
+float oetf_bt709(float c) {
+    c = c < 0.0f ? 0.0f : c;
+    float r1 = 4.5f * c;
+    float r2 = 1.099f * powr(c, 0.45f) - 0.099f;
+    return c < 0.018f ? r1 : r2;
+}
+float inverse_oetf_bt709(float c) {
+    float r1 = c / 4.5f;
+    float r2 = powr((c + 0.099f) / 1.099f, 1.0f / 0.45f);
+    return c < 0.081f ? r1 : r2;
+}
+
+float3 yuv2rgb(float y, float u, float v) {
+#ifdef FULL_RANGE_IN
+    u -= 0.5f; v -= 0.5f;
+#else
+    y = (y * 255.0f -  16.0f) / 219.0f;
+    u = (u * 255.0f - 128.0f) / 224.0f;
+    v = (v * 255.0f - 128.0f) / 224.0f;
+#endif
+    float r = y * rgb_matrix[0] + u * rgb_matrix[1] + v * rgb_matrix[2];
+    float g = y * rgb_matrix[3] + u * rgb_matrix[4] + v * rgb_matrix[5];
+    float b = y * rgb_matrix[6] + u * rgb_matrix[7] + v * rgb_matrix[8];
+    return (float3)(r, g, b);
+}
+
+float3 yuv2lrgb(float3 yuv) {
+    float3 rgb = yuv2rgb(yuv.x, yuv.y, yuv.z);
+    float r = linearize(rgb.x);
+    float g = linearize(rgb.y);
+    float b = linearize(rgb.z);
+    return (float3)(r, g, b);
+}
+
+float3 rgb2yuv(float r, float g, float b) {
+    float y = r*yuv_matrix[0] + g*yuv_matrix[1] + b*yuv_matrix[2];
+    float u = r*yuv_matrix[3] + g*yuv_matrix[4] + b*yuv_matrix[5];
+    float v = r*yuv_matrix[6] + g*yuv_matrix[7] + b*yuv_matrix[8];
+#ifdef FULL_RANGE_OUT
+    u += 0.5f; v += 0.5f;
+#else
+    y = (219.0f * y + 16.0f) / 255.0f;
+    u = (224.0f * u + 128.0f) / 255.0f;
+    v = (224.0f * v + 128.0f) / 255.0f;
+#endif
+    return (float3)(y, u, v);
+}
+
+float rgb2y(float r, float g, float b) {
+    float y = r*yuv_matrix[0] + g*yuv_matrix[1] + b*yuv_matrix[2];
+    y = (219.0f * y + 16.0f) / 255.0f;
+    return y;
+}
+
+float3 lrgb2yuv(float3 c) {
+    float r = delinearize(c.x);
+    float g = delinearize(c.y);
+    float b = delinearize(c.z);
+
+    return rgb2yuv(r, g, b);
+}
+
+float lrgb2y(float3 c) {
+    float r = delinearize(c.x);
+    float g = delinearize(c.y);
+    float b = delinearize(c.z);
+
+    return rgb2y(r, g, b);
+}
+
+float3 lrgb2lrgb(float3 c) {
+#ifdef RGB2RGB_PASSTHROUGH
+    return c;
+#else
+    float r = c.x, g = c.y, b = c.z;
+    float rr = rgb2rgb[0] * r + rgb2rgb[1] * g + rgb2rgb[2] * b;
+    float gg = rgb2rgb[3] * r + rgb2rgb[4] * g + rgb2rgb[5] * b;
+    float bb = rgb2rgb[6] * r + rgb2rgb[7] * g + rgb2rgb[8] * b;
+    return (float3)(rr, gg, bb);
+#endif
+}
+
+float3 ootf(float3 c, float peak) {
+#ifdef ootf_impl
+    return ootf_impl(c, peak);
+#else
+    return c;
+#endif
+}
+
+float3 inverse_ootf(float3 c, float peak) {
+#ifdef inverse_ootf_impl
+    return inverse_ootf_impl(c, peak);
+#else
+    return c;
+#endif
+}
diff --git a/libavfilter/opencl/tonemap.cl b/libavfilter/opencl/tonemap.cl
new file mode 100644
index 0000000..9448ba4
--- /dev/null
+++ b/libavfilter/opencl/tonemap.cl
@@ -0,0 +1,272 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#define REFERENCE_WHITE 100.0f
+extern float3 lrgb2yuv(float3);
+extern float  lrgb2y(float3);
+extern float3 yuv2lrgb(float3);
+extern float3 lrgb2lrgb(float3);
+extern float  get_luma_src(float3);
+extern float  get_luma_dst(float3);
+extern float3 ootf(float3 c, float peak);
+extern float3 inverse_ootf(float3 c, float peak);
+extern float3 get_chroma_sample(float3, float3, float3, float3);
+
+struct detection_result {
+    float peak;
+    float average;
+};
+
+float hable_f(float in) {
+    float a = 0.15f, b = 0.50f, c = 0.10f, d = 0.20f, e = 0.02f, f = 0.30f;
+    return (in * (in * a + b * c) + d * e) / (in * (in * a + b) + d * f) - e / f;
+}
+
+float direct(float s, float peak) {
+    return s;
+}
+
+float linear(float s, float peak) {
+    return s * tone_param / peak;
+}
+
+float gamma(float s, float peak) {
+    float p = s > 0.05f ? s /peak : 0.05f / peak;
+    float v = powr(p, 1.0f / tone_param);
+    return s > 0.05f ? v : (s * v /0.05f);
+}
+
+float clip(float s, float peak) {
+    return clamp(s * tone_param, 0.0f, 1.0f);
+}
+
+float reinhard(float s, float peak) {
+    return s / (s + tone_param) * (peak + tone_param) / peak;
+}
+
+float hable(float s, float peak) {
+    return hable_f(s)/hable_f(peak);
+}
+
+float mobius(float s, float peak) {
+    float j = tone_param;
+    float a, b;
+
+    if (s <= j)
+        return s;
+
+    a = -j * j * (peak - 1.0f) / (j * j - 2.0f * j + peak);
+    b = (j * j - 2.0f * j * peak + peak) / max(peak - 1.0f, 1e-6f);
+
+    return (b * b + 2.0f * b * j + j * j) / (b - a) * (s + a) / (s + b);
+}
+
+// detect peak/average signal of a frame, the algorithm was ported from:
+// libplacebo (https://github.com/haasn/libplacebo)
+struct detection_result
+detect_peak_avg(global uint *util_buf, __local uint *sum_wg,
+            float signal, float peak) {
+// layout of the util buffer
+//
+// Name:             : Size (units of 4-bytes)
+// average buffer    : detection_frames + 1
+// peak buffer       : detection_frames + 1
+// workgroup counter : 1
+// total of peak     : 1
+// total of average  : 1
+// frame index       : 1
+// frame number      : 1
+    global uint *avg_buf = util_buf;
+    global uint *peak_buf = avg_buf + DETECTION_FRAMES + 1;
+    global uint *counter_wg_p = peak_buf + DETECTION_FRAMES + 1;
+    global uint *max_total_p = counter_wg_p + 1;
+    global uint *avg_total_p = max_total_p + 1;
+    global uint *frame_idx_p = avg_total_p + 1;
+    global uint *scene_frame_num_p = frame_idx_p + 1;
+
+    uint frame_idx = *frame_idx_p;
+    uint scene_frame_num = *scene_frame_num_p;
+
+    size_t lidx = get_local_id(0);
+    size_t lidy = get_local_id(1);
+    size_t lsizex = get_local_size(0);
+    size_t lsizey = get_local_size(1);
+    uint num_wg = get_num_groups(0) * get_num_groups(1);
+    size_t group_idx = get_group_id(0);
+    size_t group_idy = get_group_id(1);
+    struct detection_result r = {peak, sdr_avg};
+    if (lidx == 0 && lidy == 0)
+        *sum_wg = 0;
+    barrier(CLK_LOCAL_MEM_FENCE);
+
+    // update workgroup sum
+    atomic_add(sum_wg, (uint)(signal * REFERENCE_WHITE));
+    barrier(CLK_LOCAL_MEM_FENCE);
+
+    // update frame peak/avg using work-group-average.
+    if (lidx == 0 && lidy == 0) {
+        uint avg_wg = *sum_wg / (lsizex * lsizey);
+        atomic_max(&peak_buf[frame_idx], avg_wg);
+        atomic_add(&avg_buf[frame_idx], avg_wg);
+    }
+
+    if (scene_frame_num > 0) {
+        float peak = (float)*max_total_p / (REFERENCE_WHITE * scene_frame_num);
+        float avg = (float)*avg_total_p / (REFERENCE_WHITE * scene_frame_num);
+        r.peak = max(1.0f, peak);
+        r.average = max(0.25f, avg);
+    }
+
+    if (lidx == 0 && lidy == 0 && atomic_add(counter_wg_p, 1) == num_wg - 1) {
+        *counter_wg_p = 0;
+        avg_buf[frame_idx] /= num_wg;
+
+        if (scene_threshold > 0.0f) {
+            uint cur_max = peak_buf[frame_idx];
+            uint cur_avg = avg_buf[frame_idx];
+            int diff = (int)(scene_frame_num * cur_avg) - (int)*avg_total_p;
+
+            if (abs(diff) > scene_frame_num * scene_threshold * REFERENCE_WHITE) {
+                for (uint i = 0; i < DETECTION_FRAMES + 1; i++)
+                  avg_buf[i] = 0;
+                for (uint i = 0; i < DETECTION_FRAMES + 1; i++)
+                  peak_buf[i] = 0;
+                *avg_total_p = *max_total_p = 0;
+                *scene_frame_num_p = 0;
+                avg_buf[frame_idx] = cur_avg;
+                peak_buf[frame_idx] = cur_max;
+            }
+        }
+        uint next = (frame_idx + 1) % (DETECTION_FRAMES + 1);
+        // add current frame, subtract next frame
+        *max_total_p += peak_buf[frame_idx] - peak_buf[next];
+        *avg_total_p += avg_buf[frame_idx] - avg_buf[next];
+        // reset next frame
+        peak_buf[next] = avg_buf[next] = 0;
+        *frame_idx_p = next;
+        *scene_frame_num_p = min(*scene_frame_num_p + 1,
+                                 (uint)DETECTION_FRAMES);
+    }
+    return r;
+}
+
+float3 map_one_pixel_rgb(float3 rgb, float peak, float average) {
+    float sig = max(max(rgb.x, max(rgb.y, rgb.z)), 1e-6f);
+
+    // Rescale the variables in order to bring it into a representation where
+    // 1.0 represents the dst_peak. This is because all of the tone mapping
+    // algorithms are defined in such a way that they map to the range [0.0, 1.0].
+    if (target_peak > 1.0f) {
+        sig *= 1.0f / target_peak;
+        peak *= 1.0f / target_peak;
+    }
+
+    float sig_old = sig;
+
+    // Scale the signal to compensate for differences in the average brightness
+    float slope = min(1.0f, sdr_avg / average);
+    sig *= slope;
+    peak *= slope;
+
+    // Desaturate the color using a coefficient dependent on the signal level
+    if (desat_param > 0.0f) {
+        float luma = get_luma_dst(rgb);
+        float coeff = max(sig - 0.18f, 1e-6f) / max(sig, 1e-6f);
+        coeff = native_powr(coeff, 10.0f / desat_param);
+        rgb = mix(rgb, (float3)luma, (float3)coeff);
+        sig = mix(sig, luma * slope, coeff);
+    }
+
+    sig = TONE_FUNC(sig, peak);
+
+    sig = min(sig, 1.0f);
+    rgb *= (sig/sig_old);
+    return rgb;
+}
+// map from source space YUV to destination space RGB
+float3 map_to_dst_space_from_yuv(float3 yuv, float peak) {
+    float3 c = yuv2lrgb(yuv);
+    c = ootf(c, peak);
+    c = lrgb2lrgb(c);
+    return c;
+}
+
+__kernel void tonemap(__write_only image2d_t dst1,
+                      __read_only  image2d_t src1,
+                      __write_only image2d_t dst2,
+                      __read_only  image2d_t src2,
+                      global uint *util_buf,
+                      float peak
+                      )
+{
+    __local uint sum_wg;
+    const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |
+                               CLK_ADDRESS_CLAMP_TO_EDGE   |
+                               CLK_FILTER_NEAREST);
+    int xi = get_global_id(0);
+    int yi = get_global_id(1);
+    // each work item process four pixels
+    int x = 2 * xi;
+    int y = 2 * yi;
+
+    float y0 = read_imagef(src1, sampler, (int2)(x,     y)).x;
+    float y1 = read_imagef(src1, sampler, (int2)(x + 1, y)).x;
+    float y2 = read_imagef(src1, sampler, (int2)(x,     y + 1)).x;
+    float y3 = read_imagef(src1, sampler, (int2)(x + 1, y + 1)).x;
+    float2 uv = read_imagef(src2, sampler, (int2)(xi,     yi)).xy;
+
+    float3 c0 = map_to_dst_space_from_yuv((float3)(y0, uv.x, uv.y), peak);
+    float3 c1 = map_to_dst_space_from_yuv((float3)(y1, uv.x, uv.y), peak);
+    float3 c2 = map_to_dst_space_from_yuv((float3)(y2, uv.x, uv.y), peak);
+    float3 c3 = map_to_dst_space_from_yuv((float3)(y3, uv.x, uv.y), peak);
+
+    float sig0 = max(c0.x, max(c0.y, c0.z));
+    float sig1 = max(c1.x, max(c1.y, c1.z));
+    float sig2 = max(c2.x, max(c2.y, c2.z));
+    float sig3 = max(c3.x, max(c3.y, c3.z));
+    float sig = max(sig0, max(sig1, max(sig2, sig3)));
+
+    struct detection_result r = detect_peak_avg(util_buf, &sum_wg, sig, peak);
+
+    float3 c0_old = c0, c1_old = c1, c2_old = c2;
+    c0 = map_one_pixel_rgb(c0, r.peak, r.average);
+    c1 = map_one_pixel_rgb(c1, r.peak, r.average);
+    c2 = map_one_pixel_rgb(c2, r.peak, r.average);
+    c3 = map_one_pixel_rgb(c3, r.peak, r.average);
+
+    c0 = inverse_ootf(c0, target_peak);
+    c1 = inverse_ootf(c1, target_peak);
+    c2 = inverse_ootf(c2, target_peak);
+    c3 = inverse_ootf(c3, target_peak);
+
+    y0 = lrgb2y(c0);
+    y1 = lrgb2y(c1);
+    y2 = lrgb2y(c2);
+    y3 = lrgb2y(c3);
+    float3 chroma_c = get_chroma_sample(c0, c1, c2, c3);
+    float3 chroma = lrgb2yuv(chroma_c);
+
+    if (xi < get_image_width(dst2) && yi < get_image_height(dst2)) {
+        write_imagef(dst1, (int2)(x, y), (float4)(y0, 0.0f, 0.0f, 1.0f));
+        write_imagef(dst1, (int2)(x+1, y), (float4)(y1, 0.0f, 0.0f, 1.0f));
+        write_imagef(dst1, (int2)(x, y+1), (float4)(y2, 0.0f, 0.0f, 1.0f));
+        write_imagef(dst1, (int2)(x+1, y+1), (float4)(y3, 0.0f, 0.0f, 1.0f));
+        write_imagef(dst2, (int2)(xi, yi),
+                     (float4)(chroma.y, chroma.z, 0.0f, 1.0f));
+    }
+}
diff --git a/libavfilter/opencl_source.h b/libavfilter/opencl_source.h
index 4bb9969..a241bdb 100644
--- a/libavfilter/opencl_source.h
+++ b/libavfilter/opencl_source.h
@@ -21,7 +21,9 @@ 
 
 extern const char *ff_opencl_source_avgblur;
 extern const char *ff_opencl_source_convolution;
+extern const char *ff_opencl_source_colorspace_common;
 extern const char *ff_opencl_source_overlay;
+extern const char *ff_opencl_source_tonemap;
 extern const char *ff_opencl_source_unsharp;
 
 #endif /* AVFILTER_OPENCL_SOURCE_H */
diff --git a/libavfilter/vf_tonemap_opencl.c b/libavfilter/vf_tonemap_opencl.c
new file mode 100644
index 0000000..6b8bbee
--- /dev/null
+++ b/libavfilter/vf_tonemap_opencl.c
@@ -0,0 +1,624 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include <float.h>
+
+#include "libavutil/avassert.h"
+#include "libavutil/bprint.h"
+#include "libavutil/common.h"
+#include "libavutil/imgutils.h"
+#include "libavutil/mastering_display_metadata.h"
+#include "libavutil/mem.h"
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
+
+#include "avfilter.h"
+#include "internal.h"
+#include "opencl.h"
+#include "opencl_source.h"
+#include "video.h"
+#include "colorspace.h"
+
+// TODO:
+// - seperate peak-detection from tone-mapping kernel to solve
+//    one-frame-delay issue.
+// - import colorspace matrix generation from vf_colorspace.c
+// - more format support
+
+#define DETECTION_FRAMES 63
+#define REFERENCE_WHITE 100.0f
+
+enum TonemapAlgorithm {
+    TONEMAP_NONE,
+    TONEMAP_LINEAR,
+    TONEMAP_GAMMA,
+    TONEMAP_CLIP,
+    TONEMAP_REINHARD,
+    TONEMAP_HABLE,
+    TONEMAP_MOBIUS,
+    TONEMAP_MAX,
+};
+
+typedef struct TonemapOpenCLContext {
+    OpenCLFilterContext ocf;
+
+    enum AVColorSpace colorspace, colorspace_in, colorspace_out;
+    enum AVColorTransferCharacteristic trc, trc_in, trc_out;
+    enum AVColorPrimaries primaries, primaries_in, primaries_out;
+    enum AVColorRange range, range_in, range_out;
+    enum AVChromaLocation chroma_loc;
+
+    enum TonemapAlgorithm tonemap;
+    enum AVPixelFormat    format;
+    double                peak;
+    double                param;
+    double                desat_param;
+    double                target_peak;
+    double                scene_threshold;
+    int                   initialised;
+    cl_kernel             kernel;
+    cl_command_queue      command_queue;
+    cl_mem                util_mem;
+} TonemapOpenCLContext;
+
+const char *yuv_coff[AVCOL_SPC_NB] = {
+    [AVCOL_SPC_BT709] = "rgb2yuv_bt709",
+    [AVCOL_SPC_BT2020_NCL] = "rgb2yuv_bt2020",
+};
+
+const char *rgb_coff[AVCOL_SPC_NB] = {
+    [AVCOL_SPC_BT709] = "yuv2rgb_bt709",
+    [AVCOL_SPC_BT2020_NCL] = "yuv2rgb_bt2020",
+};
+
+const char *linearize_funcs[AVCOL_TRC_NB] = {
+    [AVCOL_TRC_SMPTE2084] = "eotf_st2084",
+    [AVCOL_TRC_ARIB_STD_B67] = "inverse_oetf_hlg",
+};
+
+const char *delinearize_funcs[AVCOL_TRC_NB] = {
+    [AVCOL_TRC_BT709]     = "inverse_eotf_bt1886",
+    [AVCOL_TRC_BT2020_10] = "inverse_eotf_bt1886",
+};
+
+static const struct LumaCoefficients luma_coefficients[AVCOL_SPC_NB] = {
+    [AVCOL_SPC_BT709]      = { 0.2126, 0.7152, 0.0722 },
+    [AVCOL_SPC_BT2020_NCL] = { 0.2627, 0.6780, 0.0593 },
+};
+
+struct PrimaryCoefficients primaries_table[AVCOL_PRI_NB] = {
+    [AVCOL_PRI_BT709]  = { 0.640, 0.330, 0.300, 0.600, 0.150, 0.060 },
+    [AVCOL_PRI_BT2020] = { 0.708, 0.292, 0.170, 0.797, 0.131, 0.046 },
+};
+
+struct WhitepointCoefficients whitepoint_table[AVCOL_PRI_NB] = {
+    [AVCOL_PRI_BT709]  = { 0.3127, 0.3290 },
+    [AVCOL_PRI_BT2020] = { 0.3127, 0.3290 },
+};
+
+const char *tonemap_func[TONEMAP_MAX] = {
+    [TONEMAP_NONE]     = "direct",
+    [TONEMAP_LINEAR]   = "linear",
+    [TONEMAP_GAMMA]    = "gamma",
+    [TONEMAP_CLIP]     = "clip",
+    [TONEMAP_REINHARD] = "reinhard",
+    [TONEMAP_HABLE]    = "hable",
+    [TONEMAP_MOBIUS]   = "mobius",
+};
+
+static void get_rgb2rgb_matrix(enum AVColorPrimaries in, enum AVColorPrimaries out,
+                               double rgb2rgb[3][3]) {
+    double rgb2xyz[3][3], xyz2rgb[3][3];
+
+    fill_rgb2xyz_table(&primaries_table[out], &whitepoint_table[out], rgb2xyz);
+    invert_matrix3x3(rgb2xyz, xyz2rgb);
+    fill_rgb2xyz_table(&primaries_table[in], &whitepoint_table[in], rgb2xyz);
+    mul3x3(rgb2rgb, rgb2xyz, xyz2rgb);
+}
+
+#define OPENCL_SOURCE_NB 3
+// Average light level for SDR signals. This is equal to a signal level of 0.5
+// under a typical presentation gamma of about 2.0.
+static const float sdr_avg = 0.25f;
+
+static int tonemap_opencl_init(AVFilterContext *avctx)
+{
+    TonemapOpenCLContext *ctx = avctx->priv;
+    int rgb2rgb_passthrough = 1;
+    double rgb2rgb[3][3];
+    struct LumaCoefficients luma_src, luma_dst;
+    cl_int cle;
+    int err;
+    AVBPrint header;
+    const char *opencl_sources[OPENCL_SOURCE_NB];
+
+    av_bprint_init(&header, 1024, AV_BPRINT_SIZE_AUTOMATIC);
+
+    switch(ctx->tonemap) {
+    case TONEMAP_GAMMA:
+        if (isnan(ctx->param))
+            ctx->param = 1.8f;
+        break;
+    case TONEMAP_REINHARD:
+        if (!isnan(ctx->param))
+            ctx->param = (1.0f - ctx->param) / ctx->param;
+        break;
+    case TONEMAP_MOBIUS:
+        if (isnan(ctx->param))
+            ctx->param = 0.3f;
+        break;
+    }
+
+    if (isnan(ctx->param))
+        ctx->param = 1.0f;
+
+    // SDR peak is 1.0f
+    ctx->target_peak = 1.0f;
+    av_log(ctx, AV_LOG_DEBUG, "tone mapping transfer from %s to %s\n",
+           av_color_transfer_name(ctx->trc_in),
+           av_color_transfer_name(ctx->trc_out));
+    av_log(ctx, AV_LOG_DEBUG, "mapping colorspace from %s to %s\n",
+           av_color_space_name(ctx->colorspace_in),
+           av_color_space_name(ctx->colorspace_out));
+    av_log(ctx, AV_LOG_DEBUG, "mapping primaries from %s to %s\n",
+           av_color_primaries_name(ctx->primaries_in),
+           av_color_primaries_name(ctx->primaries_out));
+    av_log(ctx, AV_LOG_DEBUG, "mapping range from %s to %s\n",
+           av_color_range_name(ctx->range_in),
+           av_color_range_name(ctx->range_out));
+    // checking valid value just because of limited implementaion
+    // please remove when more functionalities are implemented
+    av_assert0(ctx->trc_out == AVCOL_TRC_BT709 ||
+               ctx->trc_out == AVCOL_TRC_BT2020_10);
+    av_assert0(ctx->trc_in == AVCOL_TRC_SMPTE2084||
+               ctx->trc_in == AVCOL_TRC_ARIB_STD_B67);
+    av_assert0(ctx->colorspace_in == AVCOL_SPC_BT2020_NCL ||
+               ctx->colorspace_in == AVCOL_SPC_BT709);
+    av_assert0(ctx->primaries_in == AVCOL_PRI_BT2020 ||
+               ctx->primaries_in == AVCOL_PRI_BT709);
+
+    av_bprintf(&header, "__constant const float tone_param = %.4ff;\n",
+               ctx->param);
+    av_bprintf(&header, "__constant const float desat_param = %.4ff;\n",
+               ctx->desat_param);
+    av_bprintf(&header, "__constant const float target_peak = %.4ff;\n",
+               ctx->target_peak);
+    av_bprintf(&header, "__constant const float sdr_avg = %.4ff;\n", sdr_avg);
+    av_bprintf(&header, "__constant const float scene_threshold = %.4ff;\n",
+               ctx->scene_threshold);
+    av_bprintf(&header, "#define TONE_FUNC %s\n", tonemap_func[ctx->tonemap]);
+    av_bprintf(&header, "#define DETECTION_FRAMES %d\n", DETECTION_FRAMES);
+
+    if (ctx->primaries_out != ctx->primaries_in) {
+        get_rgb2rgb_matrix(ctx->primaries_in, ctx->primaries_out, rgb2rgb);
+        rgb2rgb_passthrough = 0;
+    }
+    if (ctx->range_in == AVCOL_RANGE_JPEG)
+        av_bprintf(&header, "#define FULL_RANGE_IN\n");
+
+    if (ctx->range_out == AVCOL_RANGE_JPEG)
+        av_bprintf(&header, "#define FULL_RANGE_OUT\n");
+
+    av_bprintf(&header, "#define chroma_loc %d\n", (int)ctx->chroma_loc);
+
+    if (rgb2rgb_passthrough)
+        av_bprintf(&header, "#define RGB2RGB_PASSTHROUGH\n");
+    else {
+        av_bprintf(&header, "__constant float rgb2rgb[9] = {\n");
+        av_bprintf(&header, "    %.4ff, %.4ff, %.4ff,\n",
+                   rgb2rgb[0][0], rgb2rgb[0][1], rgb2rgb[0][2]);
+        av_bprintf(&header, "    %.4ff, %.4ff, %.4ff,\n",
+                   rgb2rgb[1][0], rgb2rgb[1][1], rgb2rgb[1][2]);
+        av_bprintf(&header, "    %.4ff, %.4ff, %.4ff};\n",
+                   rgb2rgb[2][0], rgb2rgb[2][1], rgb2rgb[2][2]);
+    }
+
+    av_bprintf(&header, "#define rgb_matrix %s\n",
+               rgb_coff[ctx->colorspace_in]);
+    av_bprintf(&header, "#define yuv_matrix %s\n",
+               yuv_coff[ctx->colorspace_out]);
+
+    luma_src = luma_coefficients[ctx->colorspace_in];
+    luma_dst = luma_coefficients[ctx->colorspace_out];
+    av_bprintf(&header, "constant float3 luma_src = {%.4ff, %.4ff, %.4ff};\n",
+               luma_src.cr, luma_src.cg, luma_src.cb);
+    av_bprintf(&header, "constant float3 luma_dst = {%.4ff, %.4ff, %.4ff};\n",
+               luma_dst.cr, luma_dst.cg, luma_dst.cb);
+
+    av_bprintf(&header, "#define linearize %s\n", linearize_funcs[ctx->trc_in]);
+    av_bprintf(&header, "#define delinearize %s\n",
+               delinearize_funcs[ctx->trc_out]);
+
+    if (ctx->trc_in == AVCOL_TRC_ARIB_STD_B67)
+        av_bprintf(&header, "#define ootf_impl ootf_hlg\n");
+
+    if (ctx->trc_out == AVCOL_TRC_ARIB_STD_B67)
+        av_bprintf(&header, "#define inverse_ootf_impl inverse_ootf_hlg\n");
+
+    av_log(avctx, AV_LOG_DEBUG, "Generated OpenCL header:\n%s\n", header.str);
+    opencl_sources[0] = header.str;
+    opencl_sources[1] = ff_opencl_source_tonemap;
+    opencl_sources[2] = ff_opencl_source_colorspace_common;
+    err = ff_opencl_filter_load_program(avctx, opencl_sources, OPENCL_SOURCE_NB);
+
+    av_bprint_finalize(&header, NULL);
+    if (err < 0)
+        goto fail;
+
+    ctx->command_queue = clCreateCommandQueue(ctx->ocf.hwctx->context,
+                                              ctx->ocf.hwctx->device_id,
+                                              0, &cle);
+    if (!ctx->command_queue) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create OpenCL "
+               "command queue: %d.\n", cle);
+        err = AVERROR(EIO);
+        goto fail;
+    }
+
+    ctx->kernel = clCreateKernel(ctx->ocf.program, "tonemap", &cle);
+    if (!ctx->kernel) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create kernel: %d.\n", cle);
+        err = AVERROR(EIO);
+        goto fail;
+    }
+
+    ctx->util_mem =
+        clCreateBuffer(ctx->ocf.hwctx->context, 0,
+                       (2 * DETECTION_FRAMES + 7) * sizeof(unsigned),
+                       NULL, &cle);
+    if (cle != CL_SUCCESS) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create util buffer: %d.\n", cle);
+        err = AVERROR(EIO);
+        goto fail;
+    }
+
+    ctx->initialised = 1;
+    return 0;
+
+fail:
+    if (ctx->util_mem)
+        clReleaseMemObject(ctx->util_mem);
+    if (ctx->command_queue)
+        clReleaseCommandQueue(ctx->command_queue);
+    if (ctx->kernel)
+        clReleaseKernel(ctx->kernel);
+    return err;
+}
+
+static int tonemap_opencl_config_output(AVFilterLink *outlink)
+{
+    AVFilterContext *avctx = outlink->src;
+    TonemapOpenCLContext *s = avctx->priv;
+    int ret;
+    if (s->format == AV_PIX_FMT_NONE)
+        av_log(avctx, AV_LOG_WARNING, "format not set, use default format NV12\n");
+    else {
+      if (s->format != AV_PIX_FMT_P010 &&
+          s->format != AV_PIX_FMT_NV12) {
+        av_log(avctx, AV_LOG_ERROR, "unsupported output format,"
+               "only p010/nv12 supported now\n");
+        return AVERROR(EINVAL);
+      }
+    }
+
+    s->ocf.output_format = s->format == AV_PIX_FMT_NONE ? AV_PIX_FMT_NV12 : s->format;
+    ret = ff_opencl_filter_config_output(outlink);
+    if (ret < 0)
+        return ret;
+
+    return 0;
+}
+
+static int launch_kernel(AVFilterContext *avctx, cl_kernel kernel,
+                         AVFrame *output, AVFrame *input, float peak) {
+    TonemapOpenCLContext *ctx = avctx->priv;
+    int err = AVERROR(ENOSYS);
+    size_t global_work[2];
+    size_t local_work[2];
+    cl_int cle;
+
+    CL_SET_KERNEL_ARG(kernel, 0, cl_mem, &output->data[0]);
+    CL_SET_KERNEL_ARG(kernel, 1, cl_mem, &input->data[0]);
+    CL_SET_KERNEL_ARG(kernel, 2, cl_mem, &output->data[1]);
+    CL_SET_KERNEL_ARG(kernel, 3, cl_mem, &input->data[1]);
+    CL_SET_KERNEL_ARG(kernel, 4, cl_mem, &ctx->util_mem);
+    CL_SET_KERNEL_ARG(kernel, 5, cl_float, &peak);
+
+    local_work[0]  = 16;
+    local_work[1]  = 16;
+    // Note the work size based on uv plane, as we process a 2x2 quad in one workitem
+    err = ff_opencl_filter_work_size_from_image(avctx, global_work, output,
+                                                1, 16);
+    if (err < 0)
+        return err;
+
+    cle = clEnqueueNDRangeKernel(ctx->command_queue, kernel, 2, NULL,
+                                 global_work, local_work,
+                                 0, NULL, NULL);
+    if (cle != CL_SUCCESS) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to enqueue kernel: %d.\n",
+               cle);
+        return AVERROR(EIO);
+    }
+    return 0;
+fail:
+    return err;
+}
+
+static double determine_signal_peak(AVFrame *in)
+{
+    AVFrameSideData *sd = av_frame_get_side_data(in, AV_FRAME_DATA_CONTENT_LIGHT_LEVEL);
+    double peak = 0;
+
+    if (sd) {
+        AVContentLightMetadata *clm = (AVContentLightMetadata *)sd->data;
+        peak = clm->MaxCLL / REFERENCE_WHITE;
+    }
+
+    sd = av_frame_get_side_data(in, AV_FRAME_DATA_MASTERING_DISPLAY_METADATA);
+    if (!peak && sd) {
+        AVMasteringDisplayMetadata *metadata = (AVMasteringDisplayMetadata *)sd->data;
+        if (metadata->has_luminance)
+            peak = av_q2d(metadata->max_luminance) / REFERENCE_WHITE;
+    }
+
+    // For untagged source, use peak of 10000 if SMPTE ST.2084
+    // otherwise assume HLG with reference display peak 1000.
+    if (!peak)
+        peak = in->color_trc == AVCOL_TRC_SMPTE2084 ? 100.0f : 10.0f;
+
+    return peak;
+}
+
+static void update_metadata(AVFrame *in, double peak) {
+    AVFrameSideData *sd = av_frame_get_side_data(in, AV_FRAME_DATA_CONTENT_LIGHT_LEVEL);
+
+    if (sd) {
+        AVContentLightMetadata *clm = (AVContentLightMetadata *)sd->data;
+        clm->MaxCLL = (unsigned)(peak * REFERENCE_WHITE);
+    }
+
+    sd = av_frame_get_side_data(in, AV_FRAME_DATA_MASTERING_DISPLAY_METADATA);
+    if (sd) {
+        AVMasteringDisplayMetadata *metadata = (AVMasteringDisplayMetadata *)sd->data;
+        if (metadata->has_luminance)
+            metadata->max_luminance =av_d2q(peak * REFERENCE_WHITE, 10000);
+    }
+}
+
+static int tonemap_opencl_filter_frame(AVFilterLink *inlink, AVFrame *input)
+{
+    AVFilterContext    *avctx = inlink->dst;
+    AVFilterLink     *outlink = avctx->outputs[0];
+    TonemapOpenCLContext *ctx = avctx->priv;
+    AVFrame *output = NULL;
+    cl_int cle;
+    int err;
+    double peak = ctx->peak;
+
+    AVHWFramesContext *input_frames_ctx =
+        (AVHWFramesContext*)input->hw_frames_ctx->data;
+
+    av_log(ctx, AV_LOG_DEBUG, "Filter input: %s, %ux%u (%"PRId64").\n",
+           av_get_pix_fmt_name(input->format),
+           input->width, input->height, input->pts);
+
+    if (!input->hw_frames_ctx)
+        return AVERROR(EINVAL);
+
+    output = ff_get_video_buffer(outlink, outlink->w, outlink->h);
+    if (!output) {
+        err = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    err = av_frame_copy_props(output, input);
+    if (err < 0)
+        goto fail;
+
+    if (!peak)
+        peak = determine_signal_peak(input);
+
+    if (ctx->trc != -1)
+        output->color_trc = ctx->trc;
+    if (ctx->primaries != -1)
+        output->color_primaries = ctx->primaries;
+    if (ctx->colorspace != -1)
+        output->colorspace = ctx->colorspace;
+    if (ctx->range != -1)
+        output->color_range = ctx->range;
+
+    ctx->trc_in = input->color_trc;
+    ctx->trc_out = output->color_trc;
+    ctx->colorspace_in = input->colorspace;
+    ctx->colorspace_out = output->colorspace;
+    ctx->primaries_in = input->color_primaries;
+    ctx->primaries_out = output->color_primaries;
+    ctx->range_in = input->color_range;
+    ctx->range_out = output->color_range;
+    ctx->chroma_loc = output->chroma_location;
+
+    if (!ctx->initialised) {
+        if (!(input->color_trc == AVCOL_TRC_SMPTE2084 ||
+            input->color_trc == AVCOL_TRC_ARIB_STD_B67)) {
+            av_log(ctx, AV_LOG_ERROR, "unsupported transfer function characteristic.\n");
+            err = AVERROR(ENOSYS);
+            goto fail;
+        }
+
+        if (input_frames_ctx->sw_format != AV_PIX_FMT_P010) {
+            av_log(ctx, AV_LOG_ERROR, "unsupported format in tonemap_opencl.\n");
+            err = AVERROR(ENOSYS);
+            goto fail;
+        }
+
+        err = tonemap_opencl_init(avctx);
+        if (err < 0)
+            goto fail;
+    }
+
+    switch(input_frames_ctx->sw_format) {
+    case AV_PIX_FMT_P010:
+        err = launch_kernel(avctx, ctx->kernel, output, input, peak);
+        if (err < 0) goto fail;
+        break;
+    default:
+        err = AVERROR(ENOSYS);
+        goto fail;
+    }
+
+    cle = clFinish(ctx->command_queue);
+    if (cle != CL_SUCCESS) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to finish command queue: %d.\n",
+               cle);
+        err = AVERROR(EIO);
+        goto fail;
+    }
+
+    av_frame_free(&input);
+
+    update_metadata(output, ctx->target_peak);
+
+    av_log(ctx, AV_LOG_DEBUG, "Tone-mapping output: %s, %ux%u (%"PRId64").\n",
+           av_get_pix_fmt_name(output->format),
+           output->width, output->height, output->pts);
+#ifndef NDEBUG
+    {
+        uint32_t *ptr, *max_total_p, *avg_total_p, *frame_number_p;
+        float peak_detected, avg_detected;
+        unsigned map_size = (2 * DETECTION_FRAMES  + 7) * sizeof(unsigned);
+        ptr = (void *)clEnqueueMapBuffer(ctx->command_queue, ctx->util_mem,
+                                         CL_TRUE, CL_MAP_READ, 0, map_size,
+                                         0, NULL, NULL, &cle);
+        // For the layout of the util buffer, refer tonemap.cl
+        if (ptr) {
+            max_total_p = ptr + 2 * (DETECTION_FRAMES + 1) + 1;
+            avg_total_p = max_total_p + 1;
+            frame_number_p = avg_total_p + 2;
+            peak_detected = (float)*max_total_p / (REFERENCE_WHITE * (*frame_number_p));
+            avg_detected = (float)*avg_total_p / (REFERENCE_WHITE * (*frame_number_p));
+            av_log(ctx, AV_LOG_DEBUG, "peak %f, avg %f will be used for next frame\n",
+                   peak_detected, avg_detected);
+            clEnqueueUnmapMemObject(ctx->command_queue, ctx->util_mem, ptr, 0,
+                                    NULL, NULL);
+        }
+    }
+#endif
+
+    return ff_filter_frame(outlink, output);
+
+fail:
+    clFinish(ctx->command_queue);
+    av_frame_free(&input);
+    av_frame_free(&output);
+    return err;
+}
+
+static av_cold void tonemap_opencl_uninit(AVFilterContext *avctx)
+{
+    TonemapOpenCLContext *ctx = avctx->priv;
+    cl_int cle;
+
+    if (ctx->util_mem)
+        clReleaseMemObject(ctx->util_mem);
+    if (ctx->kernel) {
+        cle = clReleaseKernel(ctx->kernel);
+        if (cle != CL_SUCCESS)
+            av_log(avctx, AV_LOG_ERROR, "Failed to release "
+                   "kernel: %d.\n", cle);
+    }
+
+    if (ctx->command_queue) {
+        cle = clReleaseCommandQueue(ctx->command_queue);
+        if (cle != CL_SUCCESS)
+            av_log(avctx, AV_LOG_ERROR, "Failed to release "
+                   "command queue: %d.\n", cle);
+    }
+
+    ff_opencl_filter_uninit(avctx);
+}
+
+#define OFFSET(x) offsetof(TonemapOpenCLContext, x)
+#define FLAGS (AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM)
+static const AVOption tonemap_opencl_options[] = {
+    { "tonemap",      "tonemap algorithm selection", OFFSET(tonemap), AV_OPT_TYPE_INT, {.i64 = TONEMAP_NONE}, TONEMAP_NONE, TONEMAP_MAX - 1, FLAGS, "tonemap" },
+    {     "none",     0, 0, AV_OPT_TYPE_CONST, {.i64 = TONEMAP_NONE},              0, 0, FLAGS, "tonemap" },
+    {     "linear",   0, 0, AV_OPT_TYPE_CONST, {.i64 = TONEMAP_LINEAR},            0, 0, FLAGS, "tonemap" },
+    {     "gamma",    0, 0, AV_OPT_TYPE_CONST, {.i64 = TONEMAP_GAMMA},             0, 0, FLAGS, "tonemap" },
+    {     "clip",     0, 0, AV_OPT_TYPE_CONST, {.i64 = TONEMAP_CLIP},              0, 0, FLAGS, "tonemap" },
+    {     "reinhard", 0, 0, AV_OPT_TYPE_CONST, {.i64 = TONEMAP_REINHARD},          0, 0, FLAGS, "tonemap" },
+    {     "hable",    0, 0, AV_OPT_TYPE_CONST, {.i64 = TONEMAP_HABLE},             0, 0, FLAGS, "tonemap" },
+    {     "mobius",   0, 0, AV_OPT_TYPE_CONST, {.i64 = TONEMAP_MOBIUS},            0, 0, FLAGS, "tonemap" },
+    { "transfer", "set transfer characteristic", OFFSET(trc), AV_OPT_TYPE_INT, {.i64 = AVCOL_TRC_BT709}, -1, INT_MAX, FLAGS, "transfer" },
+    { "t",        "set transfer characteristic", OFFSET(trc), AV_OPT_TYPE_INT, {.i64 = AVCOL_TRC_BT709}, -1, INT_MAX, FLAGS, "transfer" },
+    {     "bt709",            0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_TRC_BT709},         0, 0, FLAGS, "transfer" },
+    {     "bt2020",           0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_TRC_BT2020_10},     0, 0, FLAGS, "transfer" },
+    { "matrix", "set colorspace matrix", OFFSET(colorspace), AV_OPT_TYPE_INT, {.i64 = -1}, -1, INT_MAX, FLAGS, "matrix" },
+    { "m",      "set colorspace matrix", OFFSET(colorspace), AV_OPT_TYPE_INT, {.i64 = -1}, -1, INT_MAX, FLAGS, "matrix" },
+    {     "bt709",            0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_SPC_BT709},         0, 0, FLAGS, "matrix" },
+    {     "bt2020",           0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_SPC_BT2020_NCL},    0, 0, FLAGS, "matrix" },
+    { "primaries", "set color primaries", OFFSET(primaries), AV_OPT_TYPE_INT, {.i64 = -1}, -1, INT_MAX, FLAGS, "primaries" },
+    { "p",         "set color primaries", OFFSET(primaries), AV_OPT_TYPE_INT, {.i64 = -1}, -1, INT_MAX, FLAGS, "primaries" },
+    {     "bt709",            0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_PRI_BT709},         0, 0, FLAGS, "primaries" },
+    {     "bt2020",           0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_PRI_BT2020},        0, 0, FLAGS, "primaries" },
+    { "range",         "set color range", OFFSET(range), AV_OPT_TYPE_INT, {.i64 = -1}, -1, INT_MAX, FLAGS, "range" },
+    { "r",             "set color range", OFFSET(range), AV_OPT_TYPE_INT, {.i64 = -1}, -1, INT_MAX, FLAGS, "range" },
+    {     "tv",            0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_RANGE_MPEG},         0, 0, FLAGS, "range" },
+    {     "pc",            0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_RANGE_JPEG},         0, 0, FLAGS, "range" },
+    {     "limited",       0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_RANGE_MPEG},         0, 0, FLAGS, "range" },
+    {     "full",          0,       0,                 AV_OPT_TYPE_CONST, {.i64 = AVCOL_RANGE_JPEG},         0, 0, FLAGS, "range" },
+    { "format",    "output pixel format", OFFSET(format), AV_OPT_TYPE_PIXEL_FMT, {.i64 = AV_PIX_FMT_NONE}, AV_PIX_FMT_NONE, INT_MAX, FLAGS, "fmt" },
+    { "peak",      "signal peak override", OFFSET(peak), AV_OPT_TYPE_DOUBLE, {.dbl = 0}, 0, DBL_MAX, FLAGS },
+    { "param",     "tonemap parameter",   OFFSET(param), AV_OPT_TYPE_DOUBLE, {.dbl = NAN}, DBL_MIN, DBL_MAX, FLAGS },
+    { "desat",     "desaturation parameter",   OFFSET(desat_param), AV_OPT_TYPE_DOUBLE, {.dbl = 0.5}, 0, DBL_MAX, FLAGS },
+    { "threshold", "scene detection threshold",   OFFSET(scene_threshold), AV_OPT_TYPE_DOUBLE, {.dbl = 0.2}, 0, DBL_MAX, FLAGS },
+    { NULL }
+};
+
+AVFILTER_DEFINE_CLASS(tonemap_opencl);
+
+static const AVFilterPad tonemap_opencl_inputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .filter_frame = &tonemap_opencl_filter_frame,
+        .config_props = &ff_opencl_filter_config_input,
+    },
+    { NULL }
+};
+
+static const AVFilterPad tonemap_opencl_outputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .config_props = &tonemap_opencl_config_output,
+    },
+    { NULL }
+};
+
+AVFilter ff_vf_tonemap_opencl = {
+    .name           = "tonemap_opencl",
+    .description    = NULL_IF_CONFIG_SMALL("perform HDR to SDR conversion with tonemapping"),
+    .priv_size      = sizeof(TonemapOpenCLContext),
+    .priv_class     = &tonemap_opencl_class,
+    .init           = &ff_opencl_filter_init,
+    .uninit         = &tonemap_opencl_uninit,
+    .query_formats  = &ff_opencl_filter_query_formats,
+    .inputs         = tonemap_opencl_inputs,
+    .outputs        = tonemap_opencl_outputs,
+    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
+};