[FFmpeg-devel,v4] avfilter: add vf_overlay_videotoolbox

Message ID	20240227163502.28268-1-gnattuoc@me.com
State	New
Headers	show Delivered-To: ffmpegpatchwork2@gmail.com Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; To: ffmpeg-devel@ffmpeg.org Date: Wed, 28 Feb 2024 00:35:03 +0800 Message-Id: <20240227163502.28268-1-gnattuoc@me.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4] avfilter: add vf_overlay_videotoolbox Precedence: list From: gnattu via ffmpeg-devel <ffmpeg-devel@ffmpeg.org> Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Cc: gnattu <gnattuoc@me.com>, zhilizhao@tencent.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Series	[FFmpeg-devel,v4] avfilter: add vf_overlay_videotoolbox \| expand [FFmpeg-devel,v4] avfilter: add vf_overlay_videotoolbox

Message ID

20240227163502.28268-1-gnattuoc@me.com

State

New

Headers

Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
To: ffmpeg-devel@ffmpeg.org
Date: Wed, 28 Feb 2024 00:35:03 +0800
Message-Id: <20240227163502.28268-1-gnattuoc@me.com>
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH v4] avfilter: add vf_overlay_videotoolbox
Precedence: list
From: gnattu via ffmpeg-devel <ffmpeg-devel@ffmpeg.org>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: gnattu <gnattuoc@me.com>, zhilizhao@tencent.com
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Series

[FFmpeg-devel,v4] avfilter: add vf_overlay_videotoolbox | expand

Checks

Context	Check	Description
yinshiyou/make_loongarch64	success	Make finished
yinshiyou/make_fate_loongarch64	success	Make fate finished
andriy/make_x86	success	Make finished
andriy/make_fate_x86	success	Make fate finished

Context

Check

Description

yinshiyou/make_loongarch64

success

Make finished

yinshiyou/make_fate_loongarch64

success

Make fate finished

andriy/make_x86

success

Make finished

andriy/make_fate_x86

success

Make fate finished

Commit Message

Gnattu OC Feb. 27, 2024, 4:35 p.m. UTC

Overlay filter for VideoToolbox hwframes. Unlike most hardware
overlay filters, this filter does not require the two inputs to
have the same pixel format; instead, it will perform format
conversion automatically with hardware accelerated methods.

Signed-off-by: Gnattu OC <gnattuoc@me.com>
---
Changes from v3:

- Fixes an issue that 8bit depth BGRA overlay frames are not correctly converted to 16bit
- Added a constraint to input pixel formats as VideoToolbox cannot convert all of its hardwareframes

 Changelog                                     |   1 +
 configure                                     |   1 +
 doc/filters.texi                              |  52 ++
 libavfilter/Makefile                          |   3 +
 libavfilter/allfilters.c                      |   1 +
 libavfilter/metal/utils.m                     |   7 +-
 .../metal/vf_overlay_videotoolbox.metal       |  58 ++
 libavfilter/vf_overlay_videotoolbox.m         | 609 ++++++++++++++++++
 8 files changed, 730 insertions(+), 2 deletions(-)
 create mode 100644 libavfilter/metal/vf_overlay_videotoolbox.metal
 create mode 100644 libavfilter/vf_overlay_videotoolbox.m

Comments

Gnattu OC March 5, 2024, 11:38 p.m. UTC | #1

A ping for this as it is already a week.

> On Feb 28, 2024, at 00:35, gnattu via ffmpeg-devel <ffmpeg-devel@ffmpeg.org> wrote:
> 
> Overlay filter for VideoToolbox hwframes. Unlike most hardware
> overlay filters, this filter does not require the two inputs to
> have the same pixel format; instead, it will perform format
> conversion automatically with hardware accelerated methods.
> 
> Signed-off-by: Gnattu OC <gnattuoc@me.com>
> ---
> Changes from v3:
> 
> - Fixes an issue that 8bit depth BGRA overlay frames are not correctly converted to 16bit
> - Added a constraint to input pixel formats as VideoToolbox cannot convert all of its hardwareframes
> 
> Changelog                                     |   1 +
> configure                                     |   1 +
> doc/filters.texi                              |  52 ++
> libavfilter/Makefile                          |   3 +
> libavfilter/allfilters.c                      |   1 +
> libavfilter/metal/utils.m                     |   7 +-
> .../metal/vf_overlay_videotoolbox.metal       |  58 ++
> libavfilter/vf_overlay_videotoolbox.m         | 609 ++++++++++++++++++
> 8 files changed, 730 insertions(+), 2 deletions(-)
> create mode 100644 libavfilter/metal/vf_overlay_videotoolbox.metal
> create mode 100644 libavfilter/vf_overlay_videotoolbox.m
> 
> diff --git a/Changelog b/Changelog
> index 610ee61dd6..3ecfdab81b 100644
> --- a/Changelog
> +++ b/Changelog
> @@ -27,6 +27,7 @@ version <next>:
> - a C11-compliant compiler is now required; note that this requirement
>   will be bumped to C17 in the near future, so consider updating your
>   build environment if it lacks C17 support
> +- VideoToolbox overlay filter
> 
> version 6.1:
> - libaribcaption decoder
> diff --git a/configure b/configure
> index 23066efa32..a7c349d126 100755
> --- a/configure
> +++ b/configure
> @@ -3807,6 +3807,7 @@ overlay_qsv_filter_deps="libmfx"
> overlay_qsv_filter_select="qsvvpp"
> overlay_vaapi_filter_deps="vaapi VAProcPipelineCaps_blend_flags"
> overlay_vulkan_filter_deps="vulkan spirv_compiler"
> +overlay_videotoolbox_filter_deps="metal corevideo coreimage videotoolbox"
> owdenoise_filter_deps="gpl"
> pad_opencl_filter_deps="opencl"
> pan_filter_deps="swresample"
> diff --git a/doc/filters.texi b/doc/filters.texi
> index e0436a5755..bfb77562cb 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -19033,6 +19033,58 @@ See @ref{framesync}.
> 
> This filter also supports the @ref{framesync} options.
> 
> +@section overlay_videotoolbox
> +
> +Overlay one video on top of another.
> +
> +This is the VideoToolbox variant of the @ref{overlay} filter.
> +It takes two inputs and has one output. The first input is the "main" video on which the second input is overlaid.
> +It only accepts VideoToolbox frames. The underlying input pixel formats do not have to match.
> +Different input pixel formats and color spaces will be automatically converted using hardware accelerated methods.
> +The final output will have the same pixel format and color space as the "main" input.
> +
> +The filter accepts the following options:
> +
> +@table @option
> +
> +@item x
> +Set the x coordinate of the overlaid video on the main video.
> +Default value is @code{0}.
> +
> +@item y
> +Set the y coordinate of the overlaid video on the main video.
> +Default value is @code{0}.
> +
> +@item eof_action
> +See @ref{framesync}.
> +
> +@item shortest
> +See @ref{framesync}.
> +
> +@item repeatlast
> +See @ref{framesync}.
> +
> +@end table
> +
> +@subsection Examples
> +
> +@itemize
> +@item
> +Overlay an image LOGO at the top-left corner of the INPUT video.
> +The INPUT video is in nv12 format and the LOGO image is in rgba format.
> +@example
> +-hwaccel videotoolbox -i INPUT -i LOGO -codec:v:0 h264_videotoolbox -filter_complex "[0:v]format=nv12,hwupload[a], [1:v]format=rgba,hwupload[b], [a][b]overlay_videotoolbox" OUTPUT
> +@end example
> +@item
> +Overlay an SDR video OVERLAY at the top-left corner of the HDR video MAIN.
> +The INPUT video is in p010 format and the LOGO image is in nv12 format.
> +The OUTPUT video will also be an HDR video with OVERLAY mapped to HDR.
> +@example
> +-hwaccel videotoolbox -i MAIN -i OVERLAY -codec:v:0 hevc_videotoolbox -tag:v hvc1 -filter_complex "[0:v]format=p010,hwupload[a], [1:v]format=nv12,hwupload[b], [a][b]overlay_videotoolbox" OUTPUT
> +@end example
> +
> +@end itemize
> +
> @section owdenoise
> 
> Apply Overcomplete Wavelet denoiser.
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index f6c1d641d6..ea1389ab57 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -413,6 +413,9 @@ OBJS-$(CONFIG_OVERLAY_OPENCL_FILTER)         += vf_overlay_opencl.o opencl.o \
> OBJS-$(CONFIG_OVERLAY_QSV_FILTER)            += vf_overlay_qsv.o framesync.o
> OBJS-$(CONFIG_OVERLAY_VAAPI_FILTER)          += vf_overlay_vaapi.o framesync.o vaapi_vpp.o
> OBJS-$(CONFIG_OVERLAY_VULKAN_FILTER)         += vf_overlay_vulkan.o vulkan.o vulkan_filter.o
> +OBJS-$(CONFIG_OVERLAY_VIDEOTOOLBOX_FILTER)     += vf_overlay_videotoolbox.o framesync.o \
> +                                                metal/vf_overlay_videotoolbox.metallib.o \
> +                                                metal/utils.o
> OBJS-$(CONFIG_OWDENOISE_FILTER)              += vf_owdenoise.o
> OBJS-$(CONFIG_PAD_FILTER)                    += vf_pad.o
> OBJS-$(CONFIG_PAD_OPENCL_FILTER)             += vf_pad_opencl.o opencl.o opencl/pad.o
> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> index 149bf50997..ec9d975ecb 100644
> --- a/libavfilter/allfilters.c
> +++ b/libavfilter/allfilters.c
> @@ -389,6 +389,7 @@ extern const AVFilter ff_vf_overlay_qsv;
> extern const AVFilter ff_vf_overlay_vaapi;
> extern const AVFilter ff_vf_overlay_vulkan;
> extern const AVFilter ff_vf_overlay_cuda;
> +extern const AVFilter ff_vf_overlay_videotoolbox;
> extern const AVFilter ff_vf_owdenoise;
> extern const AVFilter ff_vf_pad;
> extern const AVFilter ff_vf_pad_opencl;
> diff --git a/libavfilter/metal/utils.m b/libavfilter/metal/utils.m
> index f365d3ceea..db5c5f6f10 100644
> --- a/libavfilter/metal/utils.m
> +++ b/libavfilter/metal/utils.m
> @@ -55,6 +55,9 @@ CVMetalTextureRef ff_metal_texture_from_pixbuf(void *ctx,
> {
>     CVMetalTextureRef tex = NULL;
>     CVReturn ret;
> +    bool is_planer = CVPixelBufferIsPlanar(pixbuf);
> +    size_t width = is_planer ? CVPixelBufferGetWidthOfPlane(pixbuf, plane) : CVPixelBufferGetWidth(pixbuf);
> +    size_t height = is_planer ? CVPixelBufferGetHeightOfPlane(pixbuf, plane) : CVPixelBufferGetHeight(pixbuf);
> 
>     ret = CVMetalTextureCacheCreateTextureFromImage(
>         NULL,
> @@ -62,8 +65,8 @@ CVMetalTextureRef ff_metal_texture_from_pixbuf(void *ctx,
>         pixbuf,
>         NULL,
>         format,
> -        CVPixelBufferGetWidthOfPlane(pixbuf, plane),
> -        CVPixelBufferGetHeightOfPlane(pixbuf, plane),
> +        width,
> +        height,
>         plane,
>         &tex
>     );
> diff --git a/libavfilter/metal/vf_overlay_videotoolbox.metal b/libavfilter/metal/vf_overlay_videotoolbox.metal
> new file mode 100644
> index 0000000000..936e57e03e
> --- /dev/null
> +++ b/libavfilter/metal/vf_overlay_videotoolbox.metal
> @@ -0,0 +1,58 @@
> +/*
> + * Copyright (C) 2024 Gnattu OC <gnattuoc@me.com>
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include <metal_stdlib>
> +#include <metal_integer>
> +#include <metal_texture>
> +
> +using namespace metal;
> +
> +struct mtlBlendParams {
> +    uint x_position;
> +    uint y_position;
> +};
> +
> +/*
> + * Blend shader for premultiplied alpha textures
> + */
> +kernel void blend_shader(
> +                         texture2d<float, access::read> source [[ texture(0) ]],
> +                         texture2d<float, access::read> mask [[ texture(1) ]],
> +                         texture2d<float, access::write> dest [[ texture(2) ]],
> +                         constant mtlBlendParams& params [[ buffer(3) ]],
> +                         uint2 gid [[ thread_position_in_grid ]])
> +{
> +    const auto mask_size = uint2(mask.get_width(),
> +                                 mask.get_height());
> +    const auto loc_overlay = uint2(params.x_position, params.y_position);
> +    if (gid.x <  loc_overlay.x ||
> +        gid.y <  loc_overlay.y ||
> +        gid.x >= mask_size.x + loc_overlay.x ||
> +        gid.y >= mask_size.y + loc_overlay.y)
> +    {
> +        float4 source_color = source.read(gid);
> +        dest.write(source_color, gid);
> +    } else {
> +        float4 source_color = source.read(gid);
> +        float4 mask_color = mask.read((gid - loc_overlay));
> +        float4 result_color = source_color * (1.0f - mask_color.w) + (mask_color * mask_color.w);
> +        dest.write(result_color, gid);
> +    }
> +}
> diff --git a/libavfilter/vf_overlay_videotoolbox.m b/libavfilter/vf_overlay_videotoolbox.m
> new file mode 100644
> index 0000000000..a6614c1f55
> --- /dev/null
> +++ b/libavfilter/vf_overlay_videotoolbox.m
> @@ -0,0 +1,609 @@
> +/*
> + * Copyright (C) 2024 Gnattu OC <gnattuoc@me.com>
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include <CoreImage/CoreImage.h>
> +#include <VideoToolbox/VideoToolbox.h>
> +#include "internal.h"
> +#include "metal/utils.h"
> +#include "framesync.h"
> +#include "libavutil/hwcontext.h"
> +#include "libavutil/pixdesc.h"
> +#include "libavutil/opt.h"
> +#include "libavutil/objc.h"
> +#include "video.h"
> +
> +#include <assert.h>
> +
> +extern char ff_vf_overlay_videotoolbox_metallib_data[];
> +extern unsigned int ff_vf_overlay_videotoolbox_metallib_len;
> +
> +// Although iOS 8.0 introduced basic Metal support, its feature set is not complete and does not have CoreImage compatability.
> +// We have to set the minimum iOS version to 9.0.
> +typedef struct API_AVAILABLE(macos(10.11), ios(9.0)) OverlayVideoToolboxContext {
> +    AVBufferRef *device_ref;
> +    FFFrameSync fs;
> +
> +    CVMetalTextureCacheRef texture_cache;
> +    CVPixelBufferRef input_main_pixel_buffer_cache;
> +    CVPixelBufferRef input_overlay_pixel_buffer_cache;
> +    CVPixelBufferRef output_pixel_buffer_cache;
> +    CIContext *ci_ctx;
> +    VTPixelTransferSessionRef vt_session;
> +
> +    id<MTLDevice> mtl_device;
> +    id<MTLLibrary> mtl_library;
> +    id<MTLCommandQueue> mtl_queue;
> +    id<MTLComputePipelineState> mtl_pipeline;
> +    id<MTLFunction> mtl_function;
> +    id<MTLBuffer> mtl_params_buffer;
> +
> +    uint x_position;
> +    uint y_position;
> +    uint hwframe_ctx_allocated;
> +} OverlayVideoToolboxContext API_AVAILABLE(macos(10.11), ios(9.0));
> +
> +typedef struct MtlBlendParams {
> +    uint x_position;
> +    uint y_position;
> +} MtlBlendParams;
> +
> +// Using sizeof(OverlayVideoToolboxContext) without an availability check will error
> +// if we're targeting an older OS version, so we need to calculate the size ourselves
> +// (we'll statically verify it's correct in overlay_videotoolbox_init behind a check)
> +#define OVERLAY_VT_CTX_SIZE (sizeof(FFFrameSync) + sizeof(uint) * 3 + sizeof(void*) * 13 + 4)
> +
> +// Neither VideoToolbox nor CoreImage can convert YUV420P frames into 16-bit depth color formats.
> +// Additionally, the only hardware formats that support an Alpha channel are AYUV64 and BGRA.
> +// However, neither can be directly manipulated with YUV420P frames.
> +// In such cases, the user will have to use NV12 instead.
> +static const enum AVPixelFormat supported_main_formats[] = {
> +    AV_PIX_FMT_NV12,
> +    AV_PIX_FMT_P010,
> +    AV_PIX_FMT_NONE,
> +};
> +
> +static const enum AVPixelFormat supported_overlay_formats[] = {
> +    AV_PIX_FMT_NV12,
> +    AV_PIX_FMT_P010,
> +    AV_PIX_FMT_AYUV64,
> +    AV_PIX_FMT_BGRA,
> +    AV_PIX_FMT_NONE,
> +};
> +
> +/**
> + * Helper to find out if provided format is supported by filter
> + */
> +static int format_is_supported(const enum AVPixelFormat formats[], enum AVPixelFormat fmt)
> +{
> +    for (int i = 0; formats[i] != AV_PIX_FMT_NONE; i++)
> +        if (formats[i] == fmt)
> +            return 1;
> +    return 0;
> +}
> +
> +static void call_kernel(AVFilterContext *avctx,
> +                        id<MTLTexture> dst,
> +                        id<MTLTexture> main,
> +                        id<MTLTexture> overlay,
> +                        uint x_position,
> +                        uint y_position) API_AVAILABLE(macos(10.11), ios(9.0))
> +{
> +    OverlayVideoToolboxContext *ctx = avctx->priv;
> +    // Both the command buffer and encoder are auto-released by objc on default.
> +    // Use CFBridgingRetain to get a more C-like behavior.
> +    id<MTLCommandBuffer> buffer = CFBridgingRetain(ctx->mtl_queue.commandBuffer);
> +    id<MTLComputeCommandEncoder> encoder = CFBridgingRetain((__bridge id<MTLCommandBuffer>)buffer.computeCommandEncoder);
> +
> +    MtlBlendParams *params = (MtlBlendParams *)ctx->mtl_params_buffer.contents;
> +    *params = (MtlBlendParams){
> +        .x_position = x_position,
> +        .y_position = y_position,
> +    };
> +
> +    [(__bridge id<MTLComputeCommandEncoder>)encoder setTexture: main atIndex: 0];
> +    [(__bridge id<MTLComputeCommandEncoder>)encoder setTexture: overlay atIndex: 1];
> +    [(__bridge id<MTLComputeCommandEncoder>)encoder setTexture: dst atIndex: 2];
> +    [(__bridge id<MTLComputeCommandEncoder>)encoder setBuffer: ctx->mtl_params_buffer offset: 0 atIndex: 3];
> +    ff_metal_compute_encoder_dispatch(ctx->mtl_device, ctx->mtl_pipeline, (__bridge id<MTLComputeCommandEncoder>)encoder, dst.width, dst.height);
> +    [(__bridge id<MTLComputeCommandEncoder>)encoder endEncoding];
> +
> +    [(__bridge id<MTLCommandBuffer>)buffer commit];
> +    [(__bridge id<MTLCommandBuffer>)buffer waitUntilCompleted];
> +
> +    ff_objc_release(&encoder);
> +    ff_objc_release(&buffer);
> +}
> +
> +// Copies and/or converts one pixel buffer to another.
> +// This transparently handles pixel format and color spaces, and will do a conversion if needed.
> +static int transfer_pixel_buffer(OverlayVideoToolboxContext *ctx, CVPixelBufferRef source, CVPixelBufferRef destination)
> +{
> +    if (@available(macOS 10.8, iOS 16.0, *)) {
> +        int ret = 0;
> +        ret = VTPixelTransferSessionTransferImage(ctx->vt_session, source, destination);
> +        if (ret < 0)
> +            return ret;
> +    } else {
> +        CIImage *temp_image = NULL;
> +        temp_image = CFBridgingRetain([CIImage imageWithCVPixelBuffer: source]);
> +        [(__bridge CIContext*)ctx->ci_ctx render: (__bridge CIImage*)temp_image toCVPixelBuffer: destination];
> +        CFRelease(temp_image);
> +        CVBufferPropagateAttachments(source, destination);
> +    }
> +    return 0;
> +}
> +
> +static int overlay_vt_blend(FFFrameSync *fs) API_AVAILABLE(macos(10.11), ios(9.0))
> +{
> +    AVFilterContext *avctx = fs->parent;
> +    OverlayVideoToolboxContext *ctx = avctx->priv;
> +    AVFilterLink *outlink = avctx->outputs[0];
> +    AVFilterLink *inlink_main = avctx->inputs[0];
> +    AVFilterLink *inlink_overlay = avctx->inputs[1];
> +    AVFrame *input_main, *input_overlay;
> +    AVFrame *output;
> +    AVHWFramesContext *frames_ctx = (AVHWFramesContext*)inlink_main->hw_frames_ctx->data;
> +    AVHWFramesContext *frames_ctx_overlay = (AVHWFramesContext*)inlink_overlay->hw_frames_ctx->data;
> +    const AVPixFmtDescriptor *in_overlay_desc;
> +
> +    CVMetalTextureRef main, dst, overlay;
> +    id<MTLTexture> tex_main, tex_overlay, tex_dst;
> +
> +    MTLPixelFormat mtl_format;
> +    OSType cv_format;
> +    int ret;
> +    int i, overlay_planes = 0;
> +
> +    in_overlay_desc = av_pix_fmt_desc_get(frames_ctx_overlay->sw_format);
> +    if (@available(macOS 11.3, iOS 14.2, *)) {
> +        mtl_format = MTLPixelFormatRGBA16Unorm;
> +        cv_format = kCVPixelFormatType_64RGBALE;
> +    } else {
> +        // On older OS versions, 64-bit RGBA with 16-bit little-endian full-range samples is not supported.
> +        // To handle inputs with color depth greater than 8, convert colors to float type during filtering on these versions.
> +        mtl_format = MTLPixelFormatRGBA16Float;
> +        cv_format = kCVPixelFormatType_64RGBAHalf;
> +    }
> +
> +    // read main and overlay frames from inputs
> +    ret = ff_framesync_get_frame(fs, 0, &input_main, 0);
> +    if (ret < 0)
> +        return ret;
> +    ret = ff_framesync_get_frame(fs, 1, &input_overlay, 0);
> +    if (ret < 0)
> +        return ret;
> +
> +    if (!input_main)
> +        return AVERROR_BUG;
> +
> +    output = ff_get_video_buffer(outlink, outlink->w, outlink->h);
> +    if (!output)
> +        return AVERROR(ENOMEM);
> +
> +    ret = av_frame_copy_props(output, input_main);
> +    if (ret < 0)
> +        return ret;
> +
> +    if (!input_overlay) {
> +        ret = transfer_pixel_buffer(ctx, (CVPixelBufferRef)input_main->data[3], (CVPixelBufferRef)output->data[3]);
> +        if (ret < 0)
> +            return ret;
> +        return ff_filter_frame(outlink, output);
> +    }
> +
> +    for (i = 0; i < in_overlay_desc->nb_components; i++)
> +        overlay_planes = FFMAX(overlay_planes,
> +                               in_overlay_desc->comp[i].plane + 1);
> +
> +    if (!ctx->input_overlay_pixel_buffer_cache) {
> +        ret = CVPixelBufferCreate(kCFAllocatorDefault,
> +                                  CVPixelBufferGetWidthOfPlane((CVPixelBufferRef)input_overlay->data[3], 0),
> +                                  CVPixelBufferGetHeightOfPlane((CVPixelBufferRef)input_overlay->data[3], 0),
> +                                  cv_format,
> +                                  (__bridge CFDictionaryRef)@{
> +                                      (NSString *)kCVPixelBufferCGImageCompatibilityKey: @(YES),
> +                                      (NSString *)kCVPixelBufferMetalCompatibilityKey: @(YES)
> +                                  },
> +                                  &ctx->input_overlay_pixel_buffer_cache);
> +        if (ret < 0)
> +            return ret;
> +    }
> +
> +    if (!ctx->input_main_pixel_buffer_cache) {
> +        ret = CVPixelBufferCreate(kCFAllocatorDefault,
> +                                  CVPixelBufferGetWidthOfPlane((CVPixelBufferRef)input_main->data[3], 0),
> +                                  CVPixelBufferGetHeightOfPlane((CVPixelBufferRef)input_main->data[3], 0),
> +                                  cv_format,
> +                                  (__bridge CFDictionaryRef)@{
> +                                      (NSString *)kCVPixelBufferCGImageCompatibilityKey: @(YES),
> +                                      (NSString *)kCVPixelBufferMetalCompatibilityKey: @(YES)
> +                                  },
> +                                  &ctx->input_main_pixel_buffer_cache);
> +        if (ret < 0)
> +            return ret;
> +    }
> +
> +    if (!ctx->output_pixel_buffer_cache) {
> +        ret = CVPixelBufferCreate(kCFAllocatorDefault,
> +                                  CVPixelBufferGetWidthOfPlane((CVPixelBufferRef)input_main->data[3], 0),
> +                                  CVPixelBufferGetHeightOfPlane((CVPixelBufferRef)input_main->data[3], 0),
> +                                  cv_format,
> +                                  (__bridge CFDictionaryRef)@{
> +                                      (NSString *)kCVPixelBufferCGImageCompatibilityKey: @(YES),
> +                                      (NSString *)kCVPixelBufferMetalCompatibilityKey: @(YES)
> +                                  },
> +                                  &ctx->output_pixel_buffer_cache);
> +        if (ret < 0)
> +            return ret;
> +    }
> +
> +    ret = transfer_pixel_buffer(ctx, (CVPixelBufferRef)input_main->data[3], ctx->input_main_pixel_buffer_cache);
> +    if (ret < 0)
> +        return ret;
> +
> +    ret = transfer_pixel_buffer(ctx, (CVPixelBufferRef)input_overlay->data[3], ctx->input_overlay_pixel_buffer_cache);
> +    if (ret < 0)
> +        return ret;
> +
> +    overlay = ff_metal_texture_from_pixbuf(avctx, ctx->texture_cache, ctx->input_overlay_pixel_buffer_cache, 0, mtl_format);
> +    main = ff_metal_texture_from_pixbuf(avctx, ctx->texture_cache, ctx->input_main_pixel_buffer_cache, 0, mtl_format);
> +    dst = ff_metal_texture_from_pixbuf(avctx, ctx->texture_cache, ctx->output_pixel_buffer_cache, 0, mtl_format);
> +
> +    if (!overlay || !main || !dst) {
> +        return AVERROR(ENOSYS);
> +    }
> +
> +    tex_main = CVMetalTextureGetTexture(main);
> +    tex_overlay  = CVMetalTextureGetTexture(overlay);
> +    tex_dst = CVMetalTextureGetTexture(dst);
> +
> +    call_kernel(avctx, tex_dst, tex_main, tex_overlay, ctx->x_position, ctx->y_position);
> +
> +    ret = transfer_pixel_buffer(ctx, ctx->output_pixel_buffer_cache, (CVPixelBufferRef)output->data[3]);
> +    if (ret < 0) {
> +        CFRelease(main);
> +        CFRelease(overlay);
> +        CFRelease(dst);
> +        return ret;
> +    }
> +
> +    CFRelease(main);
> +    CFRelease(overlay);
> +    CFRelease(dst);
> +
> +    return ff_filter_frame(outlink, output);
> +}
> +
> +static av_cold void do_uninit(AVFilterContext *avctx) API_AVAILABLE(macos(10.11), ios(9.0))
> +{
> +    OverlayVideoToolboxContext *ctx = avctx->priv;
> +
> +    if (ctx->hwframe_ctx_allocated) {
> +        av_buffer_unref(&ctx->device_ref);
> +        ctx->hwframe_ctx_allocated = 0;
> +    }
> +    ff_framesync_uninit(&ctx->fs);
> +
> +    if (ctx->ci_ctx) {
> +        CFRelease(ctx->ci_ctx);
> +        ctx->ci_ctx = NULL;
> +    }
> +
> +    ff_objc_release(&ctx->mtl_params_buffer);
> +    ff_objc_release(&ctx->mtl_function);
> +    ff_objc_release(&ctx->mtl_pipeline);
> +    ff_objc_release(&ctx->mtl_queue);
> +    ff_objc_release(&ctx->mtl_library);
> +    ff_objc_release(&ctx->mtl_device);
> +
> +    if (ctx->texture_cache) {
> +        CFRelease(ctx->texture_cache);
> +        ctx->texture_cache = NULL;
> +    }
> +    if (ctx->input_main_pixel_buffer_cache) {
> +        CFRelease(ctx->input_main_pixel_buffer_cache);
> +        ctx->input_main_pixel_buffer_cache = NULL;
> +    }
> +    if (ctx->input_overlay_pixel_buffer_cache) {
> +        CFRelease(ctx->input_overlay_pixel_buffer_cache);
> +        ctx->input_overlay_pixel_buffer_cache = NULL;
> +    }
> +    if (ctx->output_pixel_buffer_cache) {
> +        CFRelease(ctx->output_pixel_buffer_cache);
> +        ctx->output_pixel_buffer_cache = NULL;
> +    }
> +    if (ctx->vt_session) {
> +        VTPixelTransferSessionInvalidate(ctx->vt_session);
> +        CFRelease(ctx->vt_session);
> +        ctx->vt_session = NULL;
> +    }
> +}
> +
> +static av_cold void overlay_videotoolbox_uninit(AVFilterContext *ctx)
> +{
> +    if (@available(macOS 10.11, iOS 9.0, *)) {
> +        do_uninit(ctx);
> +    }
> +}
> +
> +static av_cold int do_init(AVFilterContext *avctx) API_AVAILABLE(macos(10.11), ios(9.0))
> +{
> +    OverlayVideoToolboxContext *ctx = avctx->priv;
> +    NSError *err = nil;
> +    CVReturn ret;
> +    dispatch_data_t libData;
> +
> +    ctx->mtl_device = MTLCreateSystemDefaultDevice();
> +    if (!ctx->mtl_device) {
> +        av_log(avctx, AV_LOG_ERROR, "Unable to find Metal device\n");
> +        goto fail;
> +    }
> +
> +    av_log(ctx, AV_LOG_INFO, "Using Metal device: %s\n", ctx->mtl_device.name.UTF8String);
> +
> +    libData = dispatch_data_create(
> +        ff_vf_overlay_videotoolbox_metallib_data,
> +        ff_vf_overlay_videotoolbox_metallib_len,
> +        nil,
> +        nil);
> +
> +    ctx->mtl_library = [ctx->mtl_device newLibraryWithData: libData error: &err];
> +    dispatch_release(libData);
> +    libData = nil;
> +    ctx->mtl_function = [ctx->mtl_library newFunctionWithName: @"blend_shader"];
> +    if (!ctx->mtl_function) {
> +        av_log(avctx, AV_LOG_ERROR, "Failed to create Metal function!\n");
> +        goto fail;
> +    }
> +
> +    ctx->mtl_queue = ctx->mtl_device.newCommandQueue;
> +    if (!ctx->mtl_queue) {
> +        av_log(avctx, AV_LOG_ERROR, "Failed to create Metal command queue!\n");
> +        goto fail;
> +    }
> +
> +    ctx->mtl_pipeline = [ctx->mtl_device
> +        newComputePipelineStateWithFunction: ctx->mtl_function
> +        error: &err];
> +    if (err) {
> +        av_log(ctx, AV_LOG_ERROR, "Failed to create Metal compute pipeline: %s\n", err.description.UTF8String);
> +        goto fail;
> +    }
> +
> +    ctx->mtl_params_buffer = [ctx->mtl_device
> +        newBufferWithLength: sizeof(MtlBlendParams)
> +        options: MTLResourceStorageModeShared];
> +    if (!ctx->mtl_params_buffer) {
> +        av_log(avctx, AV_LOG_ERROR, "Failed to create Metal buffer for parameters\n");
> +        goto fail;
> +    }
> +
> +    ret = CVMetalTextureCacheCreate(
> +        NULL,
> +        NULL,
> +        ctx->mtl_device,
> +        NULL,
> +        &ctx->texture_cache
> +    );
> +    if (ret != kCVReturnSuccess) {
> +        av_log(avctx, AV_LOG_ERROR, "Failed to create CVMetalTextureCache: %d\n", ret);
> +        goto fail;
> +    }
> +
> +    if (@available(macOS 10.8, iOS 16.0, *)) {
> +        ret = VTPixelTransferSessionCreate(NULL, &ctx->vt_session);
> +        if (ret != kCVReturnSuccess) {
> +            av_log(avctx, AV_LOG_ERROR, "Failed to create VTPixelTransferSession: %d\n", ret);
> +            goto fail;
> +        }
> +    } else {
> +        // Use CoreImage as fallback for old OS.
> +        // CoreImage has comparable performance to VTPixelTransferSession, but it supports less pixel formats than VTPixelTransferSession.
> +        // Warn user about possible incorrect results.
> +        av_log(avctx, AV_LOG_WARNING, "VTPixelTransferSessionTransferImage is not available on this OS version, fallback using CoreImage\n");
> +        av_log(avctx, AV_LOG_WARNING, "Try an overlay with BGRA format if you see no overlay\n");
> +        if (@available(macOS 10.15, iOS 13.0, *)) {
> +            ctx->ci_ctx = CFBridgingRetain([CIContext contextWithMTLCommandQueue: ctx->mtl_queue]);
> +        } else {
> +            ctx->ci_ctx = CFBridgingRetain([CIContext contextWithMTLDevice: ctx->mtl_device]);
> +        }
> +    }
> +
> +    ctx->fs.on_event = &overlay_vt_blend;
> +
> +    return 0;
> +fail:
> +    overlay_videotoolbox_uninit(avctx);
> +    return AVERROR_EXTERNAL;
> +}
> +
> +static av_cold int overlay_videotoolbox_init(AVFilterContext *ctx)
> +{
> +    if (@available(macOS 10.11, iOS 9.0, *)) {
> +        // Ensure we calculated OVERLAY_VT_CTX_SIZE correctly
> +        static_assert(OVERLAY_VT_CTX_SIZE == sizeof(OverlayVideoToolboxContext), "Incorrect OVERLAY_VT_CTX_SIZE value!");
> +        return do_init(ctx);
> +    } else {
> +        av_log(ctx, AV_LOG_ERROR, "Metal is not available on this OS version\n");
> +        return AVERROR(ENOSYS);
> +    }
> +}
> +
> +static int do_config_output(AVFilterLink *link) API_AVAILABLE(macos(10.11), ios(9.0))
> +{
> +    AVFilterContext *avctx = link->src;
> +    AVFilterLink *inlink_main = avctx->inputs[0];
> +    AVFilterLink *inlink_overlay = avctx->inputs[1];
> +    OverlayVideoToolboxContext *ctx = avctx->priv;
> +    AVHWFramesContext *main_frames, *output_frames, *overlay_frames;
> +    AVBufferRef *input_ref, *overlay_ref;
> +    int ret = 0;
> +
> +    if (!inlink_main->hw_frames_ctx ||
> +        !inlink_overlay->hw_frames_ctx) {
> +        av_log(avctx, AV_LOG_ERROR, "An input HW frames reference is "
> +               "required to associate the processing device.\n");
> +        return AVERROR(EINVAL);
> +    }
> +
> +    input_ref = inlink_main->hw_frames_ctx;
> +    overlay_ref = inlink_overlay->hw_frames_ctx;
> +    main_frames = (AVHWFramesContext*)input_ref->data;
> +    overlay_frames = (AVHWFramesContext*)overlay_ref->data;
> +    av_assert0(main_frames);
> +    av_assert0(overlay_frames);
> +
> +    if (!format_is_supported(supported_main_formats, main_frames->sw_format)) {
> +        av_log(ctx, AV_LOG_ERROR, "Unsupported main input format: %s.\n",
> +               av_get_pix_fmt_name(main_frames->sw_format));
> +        if (main_frames->sw_format == AV_PIX_FMT_YUV420P) {
> +            av_log(ctx, AV_LOG_WARNING, "Hint: Use %s instead of %s.\n",
> +                   av_get_pix_fmt_name(AV_PIX_FMT_NV12),
> +                   av_get_pix_fmt_name(AV_PIX_FMT_YUV420P));
> +        }
> +        return AVERROR(ENOSYS);
> +    }
> +
> +    if (!format_is_supported(supported_overlay_formats, overlay_frames->sw_format)) {
> +        av_log(ctx, AV_LOG_ERROR, "Unsupported overlay input format: %s.\n",
> +               av_get_pix_fmt_name(overlay_frames->sw_format));
> +        if (overlay_frames->sw_format == AV_PIX_FMT_YUV420P) {
> +            av_log(ctx, AV_LOG_WARNING, "Hint: Use %s instead of %s.\n",
> +                   av_get_pix_fmt_name(AV_PIX_FMT_NV12),
> +                   av_get_pix_fmt_name(AV_PIX_FMT_YUV420P));
> +        }
> +        return AVERROR(ENOSYS);
> +    }
> +
> +    ctx->device_ref = av_buffer_ref(main_frames->device_ref);
> +    if (!ctx->device_ref) {
> +        av_log(ctx, AV_LOG_ERROR, "A device reference create failed.\n");
> +        return AVERROR(ENOMEM);
> +    }
> +
> +    link->hw_frames_ctx = av_hwframe_ctx_alloc(ctx->device_ref);
> +    if (!link->hw_frames_ctx) {
> +        av_log(avctx, AV_LOG_ERROR, "Failed to create HW frame context "
> +               "for output.\n");
> +        ret = AVERROR(ENOMEM);
> +        return ret;
> +    }
> +    ctx->hwframe_ctx_allocated = 1;
> +
> +    output_frames = (AVHWFramesContext*)link->hw_frames_ctx->data;
> +
> +    output_frames->format    = AV_PIX_FMT_VIDEOTOOLBOX;
> +    output_frames->sw_format = main_frames->sw_format;
> +    output_frames->width     = inlink_main->w;
> +    output_frames->height    = inlink_main->h;
> +
> +    ret = ff_filter_init_hw_frames(avctx, link, 1);
> +    if (ret < 0)
> +        return ret;
> +
> +    ret = av_hwframe_ctx_init(link->hw_frames_ctx);
> +    if (ret < 0) {
> +        av_log(avctx, AV_LOG_ERROR, "Failed to initialise VideoToolbox frame "
> +               "context for output: %d\n", ret);
> +        return ret;
> +    }
> +
> +    link->time_base = inlink_main->time_base;
> +    ctx->fs.time_base = link->time_base;
> +
> +    ret = ff_framesync_init_dualinput(&ctx->fs, avctx);
> +    if (ret < 0)
> +        return ret;
> +
> +    ret = ff_framesync_configure(&ctx->fs);
> +    return ret;
> +}
> +
> +static int config_output(AVFilterLink *link)
> +{
> +    AVFilterContext *ctx = link->src;
> +    if (@available(macOS 10.11, iOS 9.0, *)) {
> +        return do_config_output(link);
> +    } else {
> +        av_log(ctx, AV_LOG_ERROR, "Metal is not available on this OS version\n");
> +        return AVERROR(ENOSYS);
> +    }
> +}
> +
> +static int overlay_videotoolbox_activate(AVFilterContext *avctx)
> +{
> +    OverlayVideoToolboxContext *ctx = avctx->priv;
> +    return ff_framesync_activate(&ctx->fs);
> +}
> +
> +#define FLAGS (AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM)
> +#define OFFSET(x) offsetof(OverlayVideoToolboxContext, x)
> +
> +static const AVOption overlay_videotoolbox_options[] = {
> +    { "x", "Overlay x position",
> +        OFFSET(x_position), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INT_MAX, .flags = FLAGS },
> +    { "y", "Overlay y position",
> +        OFFSET(y_position), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INT_MAX, .flags = FLAGS },
> +    { "eof_action", "Action to take when encountering EOF from secondary input ",
> +        OFFSET(fs.opt_eof_action), AV_OPT_TYPE_INT, { .i64 = EOF_ACTION_REPEAT },
> +        EOF_ACTION_REPEAT, EOF_ACTION_PASS, .flags = FLAGS, .unit = "eof_action" },
> +        { "repeat", "Repeat the previous frame.",   0, AV_OPT_TYPE_CONST, { .i64 = EOF_ACTION_REPEAT }, .flags = FLAGS, .unit = "eof_action" },
> +        { "endall", "End both streams.",            0, AV_OPT_TYPE_CONST, { .i64 = EOF_ACTION_ENDALL }, .flags = FLAGS, .unit = "eof_action" },
> +        { "pass",   "Pass through the main input.", 0, AV_OPT_TYPE_CONST, { .i64 = EOF_ACTION_PASS },   .flags = FLAGS, .unit = "eof_action" },
> +    { "shortest", "force termination when the shortest input terminates", OFFSET(fs.opt_shortest), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, FLAGS },
> +    { "repeatlast", "repeat overlay of the last overlay frame", OFFSET(fs.opt_repeatlast), AV_OPT_TYPE_BOOL, { .i64 = 1 }, 0, 1, FLAGS },
> +    { NULL },
> +};
> +
> +FRAMESYNC_DEFINE_CLASS(overlay_videotoolbox, OverlayVideoToolboxContext, fs);
> +
> +static const AVFilterPad overlay_videotoolbox_inputs[] = {
> +    {
> +        .name         = "main",
> +        .type         = AVMEDIA_TYPE_VIDEO,
> +    },
> +    {
> +        .name         = "overlay",
> +        .type         = AVMEDIA_TYPE_VIDEO,
> +    },
> +};
> +
> +static const AVFilterPad overlay_videotoolbox_outputs[] = {
> +    {
> +        .name          = "default",
> +        .type          = AVMEDIA_TYPE_VIDEO,
> +        .config_props  = config_output,
> +    },
> +};
> +
> +const AVFilter ff_vf_overlay_videotoolbox = {
> +    .name           = "overlay_videotoolbox",
> +    .description    = NULL_IF_CONFIG_SMALL("Overlay filter for VideoToolbox frames using Metal compute"),
> +    .priv_size      = OVERLAY_VT_CTX_SIZE,
> +    .priv_class     = &overlay_videotoolbox_class,
> +    .init           = overlay_videotoolbox_init,
> +    .uninit         = overlay_videotoolbox_uninit,
> +    .activate       = overlay_videotoolbox_activate,
> +    .preinit        = overlay_videotoolbox_framesync_preinit,
> +    FILTER_SINGLE_PIXFMT(AV_PIX_FMT_VIDEOTOOLBOX),
> +    FILTER_INPUTS(overlay_videotoolbox_inputs),
> +    FILTER_OUTPUTS(overlay_videotoolbox_outputs),
> +    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
> +};
> -- 
> 2.39.3 (Apple Git-145)
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

diff --git a/Changelog b/Changelog
index 610ee61dd6..3ecfdab81b 100644
--- a/Changelog
+++ b/Changelog
@@ -27,6 +27,7 @@  version <next>:
 - a C11-compliant compiler is now required; note that this requirement
   will be bumped to C17 in the near future, so consider updating your
   build environment if it lacks C17 support
+- VideoToolbox overlay filter
 
 version 6.1:
 - libaribcaption decoder
diff --git a/configure b/configure
index 23066efa32..a7c349d126 100755
--- a/configure
+++ b/configure
@@ -3807,6 +3807,7 @@  overlay_qsv_filter_deps="libmfx"
 overlay_qsv_filter_select="qsvvpp"
 overlay_vaapi_filter_deps="vaapi VAProcPipelineCaps_blend_flags"
 overlay_vulkan_filter_deps="vulkan spirv_compiler"
+overlay_videotoolbox_filter_deps="metal corevideo coreimage videotoolbox"
 owdenoise_filter_deps="gpl"
 pad_opencl_filter_deps="opencl"
 pan_filter_deps="swresample"
diff --git a/doc/filters.texi b/doc/filters.texi
index e0436a5755..bfb77562cb 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -19033,6 +19033,58 @@  See @ref{framesync}.
 
 This filter also supports the @ref{framesync} options.
 
+@section overlay_videotoolbox
+
+Overlay one video on top of another.
+
+This is the VideoToolbox variant of the @ref{overlay} filter.
+It takes two inputs and has one output. The first input is the "main" video on which the second input is overlaid.
+It only accepts VideoToolbox frames. The underlying input pixel formats do not have to match.
+Different input pixel formats and color spaces will be automatically converted using hardware accelerated methods.
+The final output will have the same pixel format and color space as the "main" input.
+
+The filter accepts the following options:
+
+@table @option
+
+@item x
+Set the x coordinate of the overlaid video on the main video.
+Default value is @code{0}.
+
+@item y
+Set the y coordinate of the overlaid video on the main video.
+Default value is @code{0}.
+
+@item eof_action
+See @ref{framesync}.
+
+@item shortest
+See @ref{framesync}.
+
+@item repeatlast
+See @ref{framesync}.
+
+@end table
+
+@subsection Examples
+
+@itemize
+@item
+Overlay an image LOGO at the top-left corner of the INPUT video.
+The INPUT video is in nv12 format and the LOGO image is in rgba format.
+@example
+-hwaccel videotoolbox -i INPUT -i LOGO -codec:v:0 h264_videotoolbox -filter_complex "[0:v]format=nv12,hwupload[a], [1:v]format=rgba,hwupload[b], [a][b]overlay_videotoolbox" OUTPUT
+@end example
+@item
+Overlay an SDR video OVERLAY at the top-left corner of the HDR video MAIN.
+The INPUT video is in p010 format and the LOGO image is in nv12 format.
+The OUTPUT video will also be an HDR video with OVERLAY mapped to HDR.
+@example
+-hwaccel videotoolbox -i MAIN -i OVERLAY -codec:v:0 hevc_videotoolbox -tag:v hvc1 -filter_complex "[0:v]format=p010,hwupload[a], [1:v]format=nv12,hwupload[b], [a][b]overlay_videotoolbox" OUTPUT
+@end example
+
+@end itemize
+
 @section owdenoise
 
 Apply Overcomplete Wavelet denoiser.
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index f6c1d641d6..ea1389ab57 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -413,6 +413,9 @@  OBJS-$(CONFIG_OVERLAY_OPENCL_FILTER)         += vf_overlay_opencl.o opencl.o \
 OBJS-$(CONFIG_OVERLAY_QSV_FILTER)            += vf_overlay_qsv.o framesync.o
 OBJS-$(CONFIG_OVERLAY_VAAPI_FILTER)          += vf_overlay_vaapi.o framesync.o vaapi_vpp.o
 OBJS-$(CONFIG_OVERLAY_VULKAN_FILTER)         += vf_overlay_vulkan.o vulkan.o vulkan_filter.o
+OBJS-$(CONFIG_OVERLAY_VIDEOTOOLBOX_FILTER)     += vf_overlay_videotoolbox.o framesync.o \
+                                                metal/vf_overlay_videotoolbox.metallib.o \
+                                                metal/utils.o
 OBJS-$(CONFIG_OWDENOISE_FILTER)              += vf_owdenoise.o
 OBJS-$(CONFIG_PAD_FILTER)                    += vf_pad.o
 OBJS-$(CONFIG_PAD_OPENCL_FILTER)             += vf_pad_opencl.o opencl.o opencl/pad.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 149bf50997..ec9d975ecb 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -389,6 +389,7 @@  extern const AVFilter ff_vf_overlay_qsv;
 extern const AVFilter ff_vf_overlay_vaapi;
 extern const AVFilter ff_vf_overlay_vulkan;
 extern const AVFilter ff_vf_overlay_cuda;
+extern const AVFilter ff_vf_overlay_videotoolbox;
 extern const AVFilter ff_vf_owdenoise;
 extern const AVFilter ff_vf_pad;
 extern const AVFilter ff_vf_pad_opencl;
diff --git a/libavfilter/metal/utils.m b/libavfilter/metal/utils.m
index f365d3ceea..db5c5f6f10 100644
--- a/libavfilter/metal/utils.m
+++ b/libavfilter/metal/utils.m
@@ -55,6 +55,9 @@  CVMetalTextureRef ff_metal_texture_from_pixbuf(void *ctx,
 {
     CVMetalTextureRef tex = NULL;
     CVReturn ret;
+    bool is_planer = CVPixelBufferIsPlanar(pixbuf);
+    size_t width = is_planer ? CVPixelBufferGetWidthOfPlane(pixbuf, plane) : CVPixelBufferGetWidth(pixbuf);
+    size_t height = is_planer ? CVPixelBufferGetHeightOfPlane(pixbuf, plane) : CVPixelBufferGetHeight(pixbuf);
 
     ret = CVMetalTextureCacheCreateTextureFromImage(
         NULL,
@@ -62,8 +65,8 @@  CVMetalTextureRef ff_metal_texture_from_pixbuf(void *ctx,
         pixbuf,
         NULL,
         format,
-        CVPixelBufferGetWidthOfPlane(pixbuf, plane),
-        CVPixelBufferGetHeightOfPlane(pixbuf, plane),
+        width,
+        height,
         plane,
         &tex
     );
diff --git a/libavfilter/metal/vf_overlay_videotoolbox.metal b/libavfilter/metal/vf_overlay_videotoolbox.metal
new file mode 100644
index 0000000000..936e57e03e
--- /dev/null
+++ b/libavfilter/metal/vf_overlay_videotoolbox.metal
@@ -0,0 +1,58 @@ 
+/*
+ * Copyright (C) 2024 Gnattu OC <gnattuoc@me.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <metal_stdlib>
+#include <metal_integer>
+#include <metal_texture>
+
+using namespace metal;
+
+struct mtlBlendParams {
+    uint x_position;
+    uint y_position;
+};
+
+/*
+ * Blend shader for premultiplied alpha textures
+ */
+kernel void blend_shader(
+                         texture2d<float, access::read> source [[ texture(0) ]],
+                         texture2d<float, access::read> mask [[ texture(1) ]],
+                         texture2d<float, access::write> dest [[ texture(2) ]],
+                         constant mtlBlendParams& params [[ buffer(3) ]],
+                         uint2 gid [[ thread_position_in_grid ]])
+{
+    const auto mask_size = uint2(mask.get_width(),
+                                 mask.get_height());
+    const auto loc_overlay = uint2(params.x_position, params.y_position);
+    if (gid.x <  loc_overlay.x ||
+        gid.y <  loc_overlay.y ||
+        gid.x >= mask_size.x + loc_overlay.x ||
+        gid.y >= mask_size.y + loc_overlay.y)
+    {
+        float4 source_color = source.read(gid);
+        dest.write(source_color, gid);
+    } else {
+        float4 source_color = source.read(gid);
+        float4 mask_color = mask.read((gid - loc_overlay));
+        float4 result_color = source_color * (1.0f - mask_color.w) + (mask_color * mask_color.w);
+        dest.write(result_color, gid);
+    }
+}
diff --git a/libavfilter/vf_overlay_videotoolbox.m b/libavfilter/vf_overlay_videotoolbox.m
new file mode 100644
index 0000000000..a6614c1f55
--- /dev/null
+++ b/libavfilter/vf_overlay_videotoolbox.m
@@ -0,0 +1,609 @@ 
+/*
+ * Copyright (C) 2024 Gnattu OC <gnattuoc@me.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <CoreImage/CoreImage.h>
+#include <VideoToolbox/VideoToolbox.h>
+#include "internal.h"
+#include "metal/utils.h"
+#include "framesync.h"
+#include "libavutil/hwcontext.h"
+#include "libavutil/pixdesc.h"
+#include "libavutil/opt.h"
+#include "libavutil/objc.h"
+#include "video.h"
+
+#include <assert.h>
+
+extern char ff_vf_overlay_videotoolbox_metallib_data[];
+extern unsigned int ff_vf_overlay_videotoolbox_metallib_len;
+
+// Although iOS 8.0 introduced basic Metal support, its feature set is not complete and does not have CoreImage compatability.
+// We have to set the minimum iOS version to 9.0.
+typedef struct API_AVAILABLE(macos(10.11), ios(9.0)) OverlayVideoToolboxContext {
+    AVBufferRef *device_ref;
+    FFFrameSync fs;
+
+    CVMetalTextureCacheRef texture_cache;
+    CVPixelBufferRef input_main_pixel_buffer_cache;
+    CVPixelBufferRef input_overlay_pixel_buffer_cache;
+    CVPixelBufferRef output_pixel_buffer_cache;
+    CIContext *ci_ctx;
+    VTPixelTransferSessionRef vt_session;
+
+    id<MTLDevice> mtl_device;
+    id<MTLLibrary> mtl_library;
+    id<MTLCommandQueue> mtl_queue;
+    id<MTLComputePipelineState> mtl_pipeline;
+    id<MTLFunction> mtl_function;
+    id<MTLBuffer> mtl_params_buffer;
+
+    uint x_position;
+    uint y_position;
+    uint hwframe_ctx_allocated;
+} OverlayVideoToolboxContext API_AVAILABLE(macos(10.11), ios(9.0));
+
+typedef struct MtlBlendParams {
+    uint x_position;
+    uint y_position;
+} MtlBlendParams;
+
+// Using sizeof(OverlayVideoToolboxContext) without an availability check will error
+// if we're targeting an older OS version, so we need to calculate the size ourselves
+// (we'll statically verify it's correct in overlay_videotoolbox_init behind a check)
+#define OVERLAY_VT_CTX_SIZE (sizeof(FFFrameSync) + sizeof(uint) * 3 + sizeof(void*) * 13 + 4)
+
+// Neither VideoToolbox nor CoreImage can convert YUV420P frames into 16-bit depth color formats.
+// Additionally, the only hardware formats that support an Alpha channel are AYUV64 and BGRA.
+// However, neither can be directly manipulated with YUV420P frames.
+// In such cases, the user will have to use NV12 instead.
+static const enum AVPixelFormat supported_main_formats[] = {
+    AV_PIX_FMT_NV12,
+    AV_PIX_FMT_P010,
+    AV_PIX_FMT_NONE,
+};
+
+static const enum AVPixelFormat supported_overlay_formats[] = {
+    AV_PIX_FMT_NV12,
+    AV_PIX_FMT_P010,
+    AV_PIX_FMT_AYUV64,
+    AV_PIX_FMT_BGRA,
+    AV_PIX_FMT_NONE,
+};
+
+/**
+ * Helper to find out if provided format is supported by filter
+ */
+static int format_is_supported(const enum AVPixelFormat formats[], enum AVPixelFormat fmt)
+{
+    for (int i = 0; formats[i] != AV_PIX_FMT_NONE; i++)
+        if (formats[i] == fmt)
+            return 1;
+    return 0;
+}
+
+static void call_kernel(AVFilterContext *avctx,
+                        id<MTLTexture> dst,
+                        id<MTLTexture> main,
+                        id<MTLTexture> overlay,
+                        uint x_position,
+                        uint y_position) API_AVAILABLE(macos(10.11), ios(9.0))
+{
+    OverlayVideoToolboxContext *ctx = avctx->priv;
+    // Both the command buffer and encoder are auto-released by objc on default.
+    // Use CFBridgingRetain to get a more C-like behavior.
+    id<MTLCommandBuffer> buffer = CFBridgingRetain(ctx->mtl_queue.commandBuffer);
+    id<MTLComputeCommandEncoder> encoder = CFBridgingRetain((__bridge id<MTLCommandBuffer>)buffer.computeCommandEncoder);
+
+    MtlBlendParams *params = (MtlBlendParams *)ctx->mtl_params_buffer.contents;
+    *params = (MtlBlendParams){
+        .x_position = x_position,
+        .y_position = y_position,
+    };
+
+    [(__bridge id<MTLComputeCommandEncoder>)encoder setTexture: main atIndex: 0];
+    [(__bridge id<MTLComputeCommandEncoder>)encoder setTexture: overlay atIndex: 1];
+    [(__bridge id<MTLComputeCommandEncoder>)encoder setTexture: dst atIndex: 2];
+    [(__bridge id<MTLComputeCommandEncoder>)encoder setBuffer: ctx->mtl_params_buffer offset: 0 atIndex: 3];
+    ff_metal_compute_encoder_dispatch(ctx->mtl_device, ctx->mtl_pipeline, (__bridge id<MTLComputeCommandEncoder>)encoder, dst.width, dst.height);
+    [(__bridge id<MTLComputeCommandEncoder>)encoder endEncoding];
+
+    [(__bridge id<MTLCommandBuffer>)buffer commit];
+    [(__bridge id<MTLCommandBuffer>)buffer waitUntilCompleted];
+
+    ff_objc_release(&encoder);
+    ff_objc_release(&buffer);
+}
+
+// Copies and/or converts one pixel buffer to another.
+// This transparently handles pixel format and color spaces, and will do a conversion if needed.
+static int transfer_pixel_buffer(OverlayVideoToolboxContext *ctx, CVPixelBufferRef source, CVPixelBufferRef destination)
+{
+    if (@available(macOS 10.8, iOS 16.0, *)) {
+        int ret = 0;
+        ret = VTPixelTransferSessionTransferImage(ctx->vt_session, source, destination);
+        if (ret < 0)
+            return ret;
+    } else {
+        CIImage *temp_image = NULL;
+        temp_image = CFBridgingRetain([CIImage imageWithCVPixelBuffer: source]);
+        [(__bridge CIContext*)ctx->ci_ctx render: (__bridge CIImage*)temp_image toCVPixelBuffer: destination];
+        CFRelease(temp_image);
+        CVBufferPropagateAttachments(source, destination);
+    }
+    return 0;
+}
+
+static int overlay_vt_blend(FFFrameSync *fs) API_AVAILABLE(macos(10.11), ios(9.0))
+{
+    AVFilterContext *avctx = fs->parent;
+    OverlayVideoToolboxContext *ctx = avctx->priv;
+    AVFilterLink *outlink = avctx->outputs[0];
+    AVFilterLink *inlink_main = avctx->inputs[0];
+    AVFilterLink *inlink_overlay = avctx->inputs[1];
+    AVFrame *input_main, *input_overlay;
+    AVFrame *output;
+    AVHWFramesContext *frames_ctx = (AVHWFramesContext*)inlink_main->hw_frames_ctx->data;
+    AVHWFramesContext *frames_ctx_overlay = (AVHWFramesContext*)inlink_overlay->hw_frames_ctx->data;
+    const AVPixFmtDescriptor *in_overlay_desc;
+
+    CVMetalTextureRef main, dst, overlay;
+    id<MTLTexture> tex_main, tex_overlay, tex_dst;
+
+    MTLPixelFormat mtl_format;
+    OSType cv_format;
+    int ret;
+    int i, overlay_planes = 0;
+
+    in_overlay_desc = av_pix_fmt_desc_get(frames_ctx_overlay->sw_format);
+    if (@available(macOS 11.3, iOS 14.2, *)) {
+        mtl_format = MTLPixelFormatRGBA16Unorm;
+        cv_format = kCVPixelFormatType_64RGBALE;
+    } else {
+        // On older OS versions, 64-bit RGBA with 16-bit little-endian full-range samples is not supported.
+        // To handle inputs with color depth greater than 8, convert colors to float type during filtering on these versions.
+        mtl_format = MTLPixelFormatRGBA16Float;
+        cv_format = kCVPixelFormatType_64RGBAHalf;
+    }
+
+    // read main and overlay frames from inputs
+    ret = ff_framesync_get_frame(fs, 0, &input_main, 0);
+    if (ret < 0)
+        return ret;
+    ret = ff_framesync_get_frame(fs, 1, &input_overlay, 0);
+    if (ret < 0)
+        return ret;
+
+    if (!input_main)
+        return AVERROR_BUG;
+
+    output = ff_get_video_buffer(outlink, outlink->w, outlink->h);
+    if (!output)
+        return AVERROR(ENOMEM);
+
+    ret = av_frame_copy_props(output, input_main);
+    if (ret < 0)
+        return ret;
+
+    if (!input_overlay) {
+        ret = transfer_pixel_buffer(ctx, (CVPixelBufferRef)input_main->data[3], (CVPixelBufferRef)output->data[3]);
+        if (ret < 0)
+            return ret;
+        return ff_filter_frame(outlink, output);
+    }
+
+    for (i = 0; i < in_overlay_desc->nb_components; i++)
+        overlay_planes = FFMAX(overlay_planes,
+                               in_overlay_desc->comp[i].plane + 1);
+
+    if (!ctx->input_overlay_pixel_buffer_cache) {
+        ret = CVPixelBufferCreate(kCFAllocatorDefault,
+                                  CVPixelBufferGetWidthOfPlane((CVPixelBufferRef)input_overlay->data[3], 0),
+                                  CVPixelBufferGetHeightOfPlane((CVPixelBufferRef)input_overlay->data[3], 0),
+                                  cv_format,
+                                  (__bridge CFDictionaryRef)@{
+                                      (NSString *)kCVPixelBufferCGImageCompatibilityKey: @(YES),
+                                      (NSString *)kCVPixelBufferMetalCompatibilityKey: @(YES)
+                                  },
+                                  &ctx->input_overlay_pixel_buffer_cache);
+        if (ret < 0)
+            return ret;
+    }
+
+    if (!ctx->input_main_pixel_buffer_cache) {
+        ret = CVPixelBufferCreate(kCFAllocatorDefault,
+                                  CVPixelBufferGetWidthOfPlane((CVPixelBufferRef)input_main->data[3], 0),
+                                  CVPixelBufferGetHeightOfPlane((CVPixelBufferRef)input_main->data[3], 0),
+                                  cv_format,
+                                  (__bridge CFDictionaryRef)@{
+                                      (NSString *)kCVPixelBufferCGImageCompatibilityKey: @(YES),
+                                      (NSString *)kCVPixelBufferMetalCompatibilityKey: @(YES)
+                                  },
+                                  &ctx->input_main_pixel_buffer_cache);
+        if (ret < 0)
+            return ret;
+    }
+
+    if (!ctx->output_pixel_buffer_cache) {
+        ret = CVPixelBufferCreate(kCFAllocatorDefault,
+                                  CVPixelBufferGetWidthOfPlane((CVPixelBufferRef)input_main->data[3], 0),
+                                  CVPixelBufferGetHeightOfPlane((CVPixelBufferRef)input_main->data[3], 0),
+                                  cv_format,
+                                  (__bridge CFDictionaryRef)@{
+                                      (NSString *)kCVPixelBufferCGImageCompatibilityKey: @(YES),
+                                      (NSString *)kCVPixelBufferMetalCompatibilityKey: @(YES)
+                                  },
+                                  &ctx->output_pixel_buffer_cache);
+        if (ret < 0)
+            return ret;
+    }
+
+    ret = transfer_pixel_buffer(ctx, (CVPixelBufferRef)input_main->data[3], ctx->input_main_pixel_buffer_cache);
+    if (ret < 0)
+        return ret;
+
+    ret = transfer_pixel_buffer(ctx, (CVPixelBufferRef)input_overlay->data[3], ctx->input_overlay_pixel_buffer_cache);
+    if (ret < 0)
+        return ret;
+
+    overlay = ff_metal_texture_from_pixbuf(avctx, ctx->texture_cache, ctx->input_overlay_pixel_buffer_cache, 0, mtl_format);
+    main = ff_metal_texture_from_pixbuf(avctx, ctx->texture_cache, ctx->input_main_pixel_buffer_cache, 0, mtl_format);
+    dst = ff_metal_texture_from_pixbuf(avctx, ctx->texture_cache, ctx->output_pixel_buffer_cache, 0, mtl_format);
+
+    if (!overlay || !main || !dst) {
+        return AVERROR(ENOSYS);
+    }
+
+    tex_main = CVMetalTextureGetTexture(main);
+    tex_overlay  = CVMetalTextureGetTexture(overlay);
+    tex_dst = CVMetalTextureGetTexture(dst);
+
+    call_kernel(avctx, tex_dst, tex_main, tex_overlay, ctx->x_position, ctx->y_position);
+
+    ret = transfer_pixel_buffer(ctx, ctx->output_pixel_buffer_cache, (CVPixelBufferRef)output->data[3]);
+    if (ret < 0) {
+        CFRelease(main);
+        CFRelease(overlay);
+        CFRelease(dst);
+        return ret;
+    }
+
+    CFRelease(main);
+    CFRelease(overlay);
+    CFRelease(dst);
+
+    return ff_filter_frame(outlink, output);
+}
+
+static av_cold void do_uninit(AVFilterContext *avctx) API_AVAILABLE(macos(10.11), ios(9.0))
+{
+    OverlayVideoToolboxContext *ctx = avctx->priv;
+
+    if (ctx->hwframe_ctx_allocated) {
+        av_buffer_unref(&ctx->device_ref);
+        ctx->hwframe_ctx_allocated = 0;
+    }
+    ff_framesync_uninit(&ctx->fs);
+
+    if (ctx->ci_ctx) {
+        CFRelease(ctx->ci_ctx);
+        ctx->ci_ctx = NULL;
+    }
+
+    ff_objc_release(&ctx->mtl_params_buffer);
+    ff_objc_release(&ctx->mtl_function);
+    ff_objc_release(&ctx->mtl_pipeline);
+    ff_objc_release(&ctx->mtl_queue);
+    ff_objc_release(&ctx->mtl_library);
+    ff_objc_release(&ctx->mtl_device);
+
+    if (ctx->texture_cache) {
+        CFRelease(ctx->texture_cache);
+        ctx->texture_cache = NULL;
+    }
+    if (ctx->input_main_pixel_buffer_cache) {
+        CFRelease(ctx->input_main_pixel_buffer_cache);
+        ctx->input_main_pixel_buffer_cache = NULL;
+    }
+    if (ctx->input_overlay_pixel_buffer_cache) {
+        CFRelease(ctx->input_overlay_pixel_buffer_cache);
+        ctx->input_overlay_pixel_buffer_cache = NULL;
+    }
+    if (ctx->output_pixel_buffer_cache) {
+        CFRelease(ctx->output_pixel_buffer_cache);
+        ctx->output_pixel_buffer_cache = NULL;
+    }
+    if (ctx->vt_session) {
+        VTPixelTransferSessionInvalidate(ctx->vt_session);
+        CFRelease(ctx->vt_session);
+        ctx->vt_session = NULL;
+    }
+}
+
+static av_cold void overlay_videotoolbox_uninit(AVFilterContext *ctx)
+{
+    if (@available(macOS 10.11, iOS 9.0, *)) {
+        do_uninit(ctx);
+    }
+}
+
+static av_cold int do_init(AVFilterContext *avctx) API_AVAILABLE(macos(10.11), ios(9.0))
+{
+    OverlayVideoToolboxContext *ctx = avctx->priv;
+    NSError *err = nil;
+    CVReturn ret;
+    dispatch_data_t libData;
+
+    ctx->mtl_device = MTLCreateSystemDefaultDevice();
+    if (!ctx->mtl_device) {
+        av_log(avctx, AV_LOG_ERROR, "Unable to find Metal device\n");
+        goto fail;
+    }
+
+    av_log(ctx, AV_LOG_INFO, "Using Metal device: %s\n", ctx->mtl_device.name.UTF8String);
+
+    libData = dispatch_data_create(
+        ff_vf_overlay_videotoolbox_metallib_data,
+        ff_vf_overlay_videotoolbox_metallib_len,
+        nil,
+        nil);
+
+    ctx->mtl_library = [ctx->mtl_device newLibraryWithData: libData error: &err];
+    dispatch_release(libData);
+    libData = nil;
+    ctx->mtl_function = [ctx->mtl_library newFunctionWithName: @"blend_shader"];
+    if (!ctx->mtl_function) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create Metal function!\n");
+        goto fail;
+    }
+
+    ctx->mtl_queue = ctx->mtl_device.newCommandQueue;
+    if (!ctx->mtl_queue) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create Metal command queue!\n");
+        goto fail;
+    }
+
+    ctx->mtl_pipeline = [ctx->mtl_device
+        newComputePipelineStateWithFunction: ctx->mtl_function
+        error: &err];
+    if (err) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to create Metal compute pipeline: %s\n", err.description.UTF8String);
+        goto fail;
+    }
+
+    ctx->mtl_params_buffer = [ctx->mtl_device
+        newBufferWithLength: sizeof(MtlBlendParams)
+        options: MTLResourceStorageModeShared];
+    if (!ctx->mtl_params_buffer) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create Metal buffer for parameters\n");
+        goto fail;
+    }
+
+    ret = CVMetalTextureCacheCreate(
+        NULL,
+        NULL,
+        ctx->mtl_device,
+        NULL,
+        &ctx->texture_cache
+    );
+    if (ret != kCVReturnSuccess) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create CVMetalTextureCache: %d\n", ret);
+        goto fail;
+    }
+
+    if (@available(macOS 10.8, iOS 16.0, *)) {
+        ret = VTPixelTransferSessionCreate(NULL, &ctx->vt_session);
+        if (ret != kCVReturnSuccess) {
+            av_log(avctx, AV_LOG_ERROR, "Failed to create VTPixelTransferSession: %d\n", ret);
+            goto fail;
+        }
+    } else {
+        // Use CoreImage as fallback for old OS.
+        // CoreImage has comparable performance to VTPixelTransferSession, but it supports less pixel formats than VTPixelTransferSession.
+        // Warn user about possible incorrect results.
+        av_log(avctx, AV_LOG_WARNING, "VTPixelTransferSessionTransferImage is not available on this OS version, fallback using CoreImage\n");
+        av_log(avctx, AV_LOG_WARNING, "Try an overlay with BGRA format if you see no overlay\n");
+        if (@available(macOS 10.15, iOS 13.0, *)) {
+            ctx->ci_ctx = CFBridgingRetain([CIContext contextWithMTLCommandQueue: ctx->mtl_queue]);
+        } else {
+            ctx->ci_ctx = CFBridgingRetain([CIContext contextWithMTLDevice: ctx->mtl_device]);
+        }
+    }
+
+    ctx->fs.on_event = &overlay_vt_blend;
+
+    return 0;
+fail:
+    overlay_videotoolbox_uninit(avctx);
+    return AVERROR_EXTERNAL;
+}
+
+static av_cold int overlay_videotoolbox_init(AVFilterContext *ctx)
+{
+    if (@available(macOS 10.11, iOS 9.0, *)) {
+        // Ensure we calculated OVERLAY_VT_CTX_SIZE correctly
+        static_assert(OVERLAY_VT_CTX_SIZE == sizeof(OverlayVideoToolboxContext), "Incorrect OVERLAY_VT_CTX_SIZE value!");
+        return do_init(ctx);
+    } else {
+        av_log(ctx, AV_LOG_ERROR, "Metal is not available on this OS version\n");
+        return AVERROR(ENOSYS);
+    }
+}
+
+static int do_config_output(AVFilterLink *link) API_AVAILABLE(macos(10.11), ios(9.0))
+{
+    AVFilterContext *avctx = link->src;
+    AVFilterLink *inlink_main = avctx->inputs[0];
+    AVFilterLink *inlink_overlay = avctx->inputs[1];
+    OverlayVideoToolboxContext *ctx = avctx->priv;
+    AVHWFramesContext *main_frames, *output_frames, *overlay_frames;
+    AVBufferRef *input_ref, *overlay_ref;
+    int ret = 0;
+
+    if (!inlink_main->hw_frames_ctx ||
+        !inlink_overlay->hw_frames_ctx) {
+        av_log(avctx, AV_LOG_ERROR, "An input HW frames reference is "
+               "required to associate the processing device.\n");
+        return AVERROR(EINVAL);
+    }
+
+    input_ref = inlink_main->hw_frames_ctx;
+    overlay_ref = inlink_overlay->hw_frames_ctx;
+    main_frames = (AVHWFramesContext*)input_ref->data;
+    overlay_frames = (AVHWFramesContext*)overlay_ref->data;
+    av_assert0(main_frames);
+    av_assert0(overlay_frames);
+
+    if (!format_is_supported(supported_main_formats, main_frames->sw_format)) {
+        av_log(ctx, AV_LOG_ERROR, "Unsupported main input format: %s.\n",
+               av_get_pix_fmt_name(main_frames->sw_format));
+        if (main_frames->sw_format == AV_PIX_FMT_YUV420P) {
+            av_log(ctx, AV_LOG_WARNING, "Hint: Use %s instead of %s.\n",
+                   av_get_pix_fmt_name(AV_PIX_FMT_NV12),
+                   av_get_pix_fmt_name(AV_PIX_FMT_YUV420P));
+        }
+        return AVERROR(ENOSYS);
+    }
+
+    if (!format_is_supported(supported_overlay_formats, overlay_frames->sw_format)) {
+        av_log(ctx, AV_LOG_ERROR, "Unsupported overlay input format: %s.\n",
+               av_get_pix_fmt_name(overlay_frames->sw_format));
+        if (overlay_frames->sw_format == AV_PIX_FMT_YUV420P) {
+            av_log(ctx, AV_LOG_WARNING, "Hint: Use %s instead of %s.\n",
+                   av_get_pix_fmt_name(AV_PIX_FMT_NV12),
+                   av_get_pix_fmt_name(AV_PIX_FMT_YUV420P));
+        }
+        return AVERROR(ENOSYS);
+    }
+
+    ctx->device_ref = av_buffer_ref(main_frames->device_ref);
+    if (!ctx->device_ref) {
+        av_log(ctx, AV_LOG_ERROR, "A device reference create failed.\n");
+        return AVERROR(ENOMEM);
+    }
+
+    link->hw_frames_ctx = av_hwframe_ctx_alloc(ctx->device_ref);
+    if (!link->hw_frames_ctx) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to create HW frame context "
+               "for output.\n");
+        ret = AVERROR(ENOMEM);
+        return ret;
+    }
+    ctx->hwframe_ctx_allocated = 1;
+
+    output_frames = (AVHWFramesContext*)link->hw_frames_ctx->data;
+
+    output_frames->format    = AV_PIX_FMT_VIDEOTOOLBOX;
+    output_frames->sw_format = main_frames->sw_format;
+    output_frames->width     = inlink_main->w;
+    output_frames->height    = inlink_main->h;
+
+    ret = ff_filter_init_hw_frames(avctx, link, 1);
+    if (ret < 0)
+        return ret;
+
+    ret = av_hwframe_ctx_init(link->hw_frames_ctx);
+    if (ret < 0) {
+        av_log(avctx, AV_LOG_ERROR, "Failed to initialise VideoToolbox frame "
+               "context for output: %d\n", ret);
+        return ret;
+    }
+
+    link->time_base = inlink_main->time_base;
+    ctx->fs.time_base = link->time_base;
+
+    ret = ff_framesync_init_dualinput(&ctx->fs, avctx);
+    if (ret < 0)
+        return ret;
+
+    ret = ff_framesync_configure(&ctx->fs);
+    return ret;
+}
+
+static int config_output(AVFilterLink *link)
+{
+    AVFilterContext *ctx = link->src;
+    if (@available(macOS 10.11, iOS 9.0, *)) {
+        return do_config_output(link);
+    } else {
+        av_log(ctx, AV_LOG_ERROR, "Metal is not available on this OS version\n");
+        return AVERROR(ENOSYS);
+    }
+}
+
+static int overlay_videotoolbox_activate(AVFilterContext *avctx)
+{
+    OverlayVideoToolboxContext *ctx = avctx->priv;
+    return ff_framesync_activate(&ctx->fs);
+}
+
+#define FLAGS (AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM)
+#define OFFSET(x) offsetof(OverlayVideoToolboxContext, x)
+
+static const AVOption overlay_videotoolbox_options[] = {
+    { "x", "Overlay x position",
+        OFFSET(x_position), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INT_MAX, .flags = FLAGS },
+    { "y", "Overlay y position",
+        OFFSET(y_position), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INT_MAX, .flags = FLAGS },
+    { "eof_action", "Action to take when encountering EOF from secondary input ",
+        OFFSET(fs.opt_eof_action), AV_OPT_TYPE_INT, { .i64 = EOF_ACTION_REPEAT },
+        EOF_ACTION_REPEAT, EOF_ACTION_PASS, .flags = FLAGS, .unit = "eof_action" },
+        { "repeat", "Repeat the previous frame.",   0, AV_OPT_TYPE_CONST, { .i64 = EOF_ACTION_REPEAT }, .flags = FLAGS, .unit = "eof_action" },
+        { "endall", "End both streams.",            0, AV_OPT_TYPE_CONST, { .i64 = EOF_ACTION_ENDALL }, .flags = FLAGS, .unit = "eof_action" },
+        { "pass",   "Pass through the main input.", 0, AV_OPT_TYPE_CONST, { .i64 = EOF_ACTION_PASS },   .flags = FLAGS, .unit = "eof_action" },
+    { "shortest", "force termination when the shortest input terminates", OFFSET(fs.opt_shortest), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, FLAGS },
+    { "repeatlast", "repeat overlay of the last overlay frame", OFFSET(fs.opt_repeatlast), AV_OPT_TYPE_BOOL, { .i64 = 1 }, 0, 1, FLAGS },
+    { NULL },
+};
+
+FRAMESYNC_DEFINE_CLASS(overlay_videotoolbox, OverlayVideoToolboxContext, fs);
+
+static const AVFilterPad overlay_videotoolbox_inputs[] = {
+    {
+        .name         = "main",
+        .type         = AVMEDIA_TYPE_VIDEO,
+    },
+    {
+        .name         = "overlay",
+        .type         = AVMEDIA_TYPE_VIDEO,
+    },
+};
+
+static const AVFilterPad overlay_videotoolbox_outputs[] = {
+    {
+        .name          = "default",
+        .type          = AVMEDIA_TYPE_VIDEO,
+        .config_props  = config_output,
+    },
+};
+
+const AVFilter ff_vf_overlay_videotoolbox = {
+    .name           = "overlay_videotoolbox",
+    .description    = NULL_IF_CONFIG_SMALL("Overlay filter for VideoToolbox frames using Metal compute"),
+    .priv_size      = OVERLAY_VT_CTX_SIZE,
+    .priv_class     = &overlay_videotoolbox_class,
+    .init           = overlay_videotoolbox_init,
+    .uninit         = overlay_videotoolbox_uninit,
+    .activate       = overlay_videotoolbox_activate,
+    .preinit        = overlay_videotoolbox_framesync_preinit,
+    FILTER_SINGLE_PIXFMT(AV_PIX_FMT_VIDEOTOOLBOX),
+    FILTER_INPUTS(overlay_videotoolbox_inputs),
+    FILTER_OUTPUTS(overlay_videotoolbox_outputs),
+    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
+};

[FFmpeg-devel,v4] avfilter: add vf_overlay_videotoolbox

Checks

Commit Message

Comments

Patch