diff mbox series

[FFmpeg-devel,4/9] avfilter/overlay_subs: Add overlay_subs filter

Message ID MN2PR04MB5981A828D76860101EB425DBBAC09@MN2PR04MB5981.namprd04.prod.outlook.com
State New
Headers show
Series [FFmpeg-devel,1/9] lavu/frame: avframe add type property
Related show

Checks

Context Check Description
andriy/x86_make_warn warning New warnings during build
andriy/x86_make success Make finished
andriy/x86_make_fate success Make fate finished
andriy/PPC64_make success Make finished
andriy/PPC64_make_fate success Make fate finished

Commit Message

Soft Works Aug. 19, 2021, 7:43 a.m. UTC
Signed-off-by: softworkz <softworkz@hotmail.com>
---
 libavfilter/Makefile          |    1 +
 libavfilter/allfilters.c      |    1 +
 libavfilter/vf_overlay_subs.c | 1173 +++++++++++++++++++++++++++++++++
 libavfilter/vf_overlay_subs.h |   88 +++
 4 files changed, 1263 insertions(+)
 create mode 100644 libavfilter/vf_overlay_subs.c
 create mode 100644 libavfilter/vf_overlay_subs.h

Comments

Paul B Mahol Aug. 19, 2021, 7:52 a.m. UTC | #1
Copy pasted code, so big NACK.
Soft Works Aug. 19, 2021, 7:55 a.m. UTC | #2
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Paul B
> Mahol
> Sent: Thursday, 19 August 2021 09:52
> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH 4/9] avfilter/overlay_subs: Add
> overlay_subs filter
> 
> Copy pasted code, so big NACK.

I know. I'd need some advice at this point how to do it.

Can you help me?

softworkz
Paul B Mahol Aug. 19, 2021, 7:57 a.m. UTC | #3
Reuse old code and just put new/changed stuff into vf_overlay.c
Soft Works Aug. 19, 2021, 8:11 a.m. UTC | #4
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Paul B
> Mahol
> Sent: Thursday, 19 August 2021 09:58
> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH 4/9] avfilter/overlay_subs: Add
> overlay_subs filter
> 
> Reuse old code and just put new/changed stuff into vf_overlay.c


The primary problem I have is that the regular overlay filter just 
enforces an overlay format that matches the primary input video format:

e.g.: when the primary video is yuv420p, it requests yuva420p for
the overlay input which causes a format filter to be inserted automatically.

That's not possible in this case because we have subtitle frames as 
input. Ideally, there wouldn't happen any format conversion up-front.
Neither for the main video, nor for the overlay rects.
What I mean is, that only the pixels of the overlay area would be touched,
like for example: convert yuv to rgb, blend with rgba overlay, convert
back rgb to yuv.

Does that make sense?

softworkz
Nicolas George Aug. 19, 2021, 8:28 a.m. UTC | #5
Soft Works (12021-08-19):
> The primary problem I have is that the regular overlay filter just 
> enforces an overlay format that matches the primary input video format:
> 
> e.g.: when the primary video is yuv420p, it requests yuva420p for
> the overlay input which causes a format filter to be inserted automatically.
> 
> That's not possible in this case because we have subtitle frames as 
> input. Ideally, there wouldn't happen any format conversion up-front.
> Neither for the main video, nor for the overlay rects.
> What I mean is, that only the pixels of the overlay area would be touched,
> like for example: convert yuv to rgb, blend with rgba overlay, convert
> back rgb to yuv.
> 
> Does that make sense?

Yes and no. Yes it makes sense in itself.

But no, it does not make sense as an excuse to duplicate hundreds of
lines of code. If you want this special filter, you have to find a way
of sharing the code.

But I can say immediately that, after a quick glance, I do not like your
patch series at all, for a very simple reason: it is all over the place.

One of the key features of the sub2video hack was that it was entirely
contained: a few dozens lines of code in ffmpeg.c clearly identified.
Once proper support for subtitles in libavfilter is added, removing the
sub2video hack will be very easy.

As far as I can see, your proposal does not have that feature. Quite the
opposite: it takes the place in libavfilter that proper subtitles
support would take. As such, it makes it harder to implement subtitles
properly.

If you want to help on this, consider helping implementing real
subtitles support in libavfilter.

The first step towards this would be to continue cleaning up the
negotiation, so that a new type can be added without it becoming even
more spaghetti. This is the work I have started in
85a6404d7e6c759ddf71d6374812d7ff719728ec. If you want to help, review
reduce_formats(), swap_*() and pick_formats() to see if some are
redundant and to include them in the negotiation description structure.

Note that this refactoring of negotiation is not only for subtitles: it
also helps for data packets and for partial graph reconfiguration.

Regards,
Soft Works Aug. 19, 2021, 8:41 a.m. UTC | #6
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Nicolas
> George
> Sent: Thursday, 19 August 2021 10:28
> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH 4/9] avfilter/overlay_subs: Add
> overlay_subs filter
> 
> Soft Works (12021-08-19):
> > The primary problem I have is that the regular overlay filter just
> > enforces an overlay format that matches the primary input video format:
> >
> > e.g.: when the primary video is yuv420p, it requests yuva420p for
> > the overlay input which causes a format filter to be inserted
> automatically.
> >
> > That's not possible in this case because we have subtitle frames as
> > input. Ideally, there wouldn't happen any format conversion up-front.
> > Neither for the main video, nor for the overlay rects.
> > What I mean is, that only the pixels of the overlay area would be touched,
> > like for example: convert yuv to rgb, blend with rgba overlay, convert
> > back rgb to yuv.
> >
> > Does that make sense?
> 
> Yes and no. Yes it makes sense in itself.
> 
> But no, it does not make sense as an excuse to duplicate hundreds of
> lines of code. If you want this special filter, you have to find a way
> of sharing the code.

I forgot to add [RFC PATCH]. I'm clearly not intending to have those
duplicate (and even unused) lines to be committed.

It's just at a point where I'm wondering whether the idea that I have 
described above (touching only relevant regions) could be implemented
easily or whether there might already exist something similar.

> But I can say immediately that, after a quick glance, I do not like your
> patch series at all, for a very simple reason: it is all over the place.

Remove the new filters and it's quickly getting more compact.


> One of the key features of the sub2video hack was that it was entirely
> contained: a few dozens lines of code in ffmpeg.c clearly identified.
> Once proper support for subtitles in libavfilter is added, removing the
> sub2video hack will be very easy.
> 
> As far as I can see, your proposal does not have that feature. Quite the
> opposite: it takes the place in libavfilter that proper subtitles
> support would take. As such, it makes it harder to implement subtitles
> properly.
> If you want to help on this, consider helping implementing real
> subtitles support in libavfilter.

Where is my proposed code for libavfilter different from "proper"
subtitles support?

I think at the libavfilter side, this is pretty how it will have
to be for "proper" implementation.
(anyway, it can't be much wrong as it is done just analog to audio and 
video)

I think that the area that is remaining in a somewhat "dirty" state 
is the heartbeat mechanism, which I have kept from sub2video.
As mentioned, I didn't want to make a step too big and rather keep 
things working and compatible.

softworkz
Nicolas George Aug. 19, 2021, 9:03 a.m. UTC | #7
Soft Works (12021-08-19):
> I forgot to add [RFC PATCH]. I'm clearly not intending to have those
> duplicate (and even unused) lines to be committed.
> 
> It's just at a point where I'm wondering whether the idea that I have 
> described above (touching only relevant regions) could be implemented
> easily or whether there might already exist something similar.

Ok.

> > But I can say immediately that, after a quick glance, I do not like your
> > patch series at all, for a very simple reason: it is all over the place.
> Remove the new filters and it's quickly getting more compact.

The number of lines of code is not the issue, the issue is the amount of
existing code getting changed and that would need changing back to
revert and implement something cleaner.

I will only accept two options, and I expect most other developers, if
made to care about libavfilter, would agree:

- Quick-and-dirty code very closely constrained that can be removed
  easily. (This is what I did with sub2video, I doubt it is possible in
  libavfilter).

- Carefully designed code with always a clear path towards a clean
  definite solution.

It seemed your proposal was a quick-and-dirty solution that touched a
lot of code. With a second look, it is cleaner than that, but it does
not have a clear path towards a really clean solution.

> Where is my proposed code for libavfilter different from "proper"
> subtitles support?

Where are the sshowinfo, ssplit, ssetpts, ssettb, etc. utility filters?

They are needed. But if your reaction is "this is ridiculous", you are
100% right. The short of it is that we cannot add a third media type
without finding a way to have the neutral utility filters work for all
media types at once without boilerplate duplication.

That means negotiating the media type.

That requires refactoring the negotiation.

> I think at the libavfilter side, this is pretty how it will have
> to be for "proper" implementation.
> (anyway, it can't be much wrong as it is done just analog to audio and 
> video)

To begin with, we should properly discuss, *before impementing*:

- how a subtitle should be encoded into a frame, both bitmap and text;

- what kind of properties of the frame need to be negotiated.

> I think that the area that is remaining in a somewhat "dirty" state 
> is the heartbeat mechanism, which I have kept from sub2video.
> As mentioned, I didn't want to make a step too big and rather keep 
> things working and compatible.

Sure, that is ok. But for a proper implementation, even if you do not
implement it for years, you need to have a clear idea of how it will
work in the final version. Again, that means discussing before
implementing.

Regards,
Soft Works Aug. 19, 2021, 9:26 a.m. UTC | #8
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Nicolas
> George
> Sent: Thursday, 19 August 2021 11:03
> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH 4/9] avfilter/overlay_subs: Add
> overlay_subs filter
> 
> Soft Works (12021-08-19):
> > I forgot to add [RFC PATCH]. I'm clearly not intending to have those
> > duplicate (and even unused) lines to be committed.
> >
> > It's just at a point where I'm wondering whether the idea that I have
> > described above (touching only relevant regions) could be implemented
> > easily or whether there might already exist something similar.
> 
> Ok.
> 
> > > But I can say immediately that, after a quick glance, I do not like your
> > > patch series at all, for a very simple reason: it is all over the place.
> > Remove the new filters and it's quickly getting more compact.
> 
> The number of lines of code is not the issue, the issue is the amount of
> existing code getting changed and that would need changing back to
> revert and implement something cleaner.
> 
> I will only accept two options, and I expect most other developers, if
> made to care about libavfilter, would agree:
> 
> - Quick-and-dirty code very closely constrained that can be removed
>   easily. (This is what I did with sub2video, I doubt it is possible in
>   libavfilter).
> 
> - Carefully designed code with always a clear path towards a clean
>   definite solution.
> 
> It seemed your proposal was a quick-and-dirty solution that touched a
> lot of code. With a second look, it is cleaner than that, but it does
> not have a clear path towards a really clean solution.

Yes. To be honest, it was focused on getting a quick result, but while
doing so, I realized that it is less involved than expected.
I had deliberately avoided to re-read the earlier conversation and
look at Clement's code to keep a free mind. But looking at Clement's
code afterwards revealed that what I ended up with is quite similar
to his earlier work.

> 
> > Where is my proposed code for libavfilter different from "proper"
> > subtitles support?
> 
> Where are the sshowinfo, ssplit, ssetpts, ssettb, etc. utility filters?
> 
> They are needed. But if your reaction is "this is ridiculous", you are
> 100% right. The short of it is that we cannot add a third media type
> without finding a way to have the neutral utility filters work for all
> media types at once without boilerplate duplication.

No, I don’t think this would be ridiculous. I just wasn't sure how far
I should go with the initial patchset. I also had an scopy filter
that I had removed before submitting because it wasn't needed from 
a proof-of-concept perspective.

> 
> That means negotiating the media type.
> 
> That requires refactoring the negotiation.

I'm not sure what you mean exactly, probably I should take a look 
at your code that you mentioned.

> 
> > I think at the libavfilter side, this is pretty how it will have
> > to be for "proper" implementation.
> > (anyway, it can't be much wrong as it is done just analog to audio and
> > video)
> 
> To begin with, we should properly discuss, *before impementing*:
> 
> - how a subtitle should be encoded into a frame, both bitmap and text;
> 
> - what kind of properties of the frame need to be negotiated.

That's one of the differences between mine and Clement's proposals:

We already have AVSubtitle which is understood by encoders and decoders,
so I see no reason to convert this back and forth to something different
just for carrying around by AVFrame.


> To begin with, we should properly discuss, *before impementing*:

I need a solution and the discussion last year ended up nowhere after
discussing some freaky details about which I (and probably most others)
don't care at all.
From that experience I wanted to avoid this to happen again.


> 
> > I think that the area that is remaining in a somewhat "dirty" state
> > is the heartbeat mechanism, which I have kept from sub2video.
> > As mentioned, I didn't want to make a step too big and rather keep
> > things working and compatible.
> 
> Sure, that is ok. But for a proper implementation, even if you do not
> implement it for years, you need to have a clear idea of how it will
> work in the final version. Again, that means discussing before
> implementing.

I don't think that my patchset is going a totally wrong way, even though
some things might need adjustment, but it is clearly just a step and 
not arrival at a finish line.

The primary motivation for submitting this right away as is, is to 
avoid this do be discussed to death once again and make a step that

- doesn't break anything
- allows things to work that haven't been possible before

..even when it's not yet the perfect solution in all aspects.


softworkz
Nicolas George Aug. 19, 2021, 1:24 p.m. UTC | #9
Soft Works (12021-08-19):
> No, I don’t think this would be ridiculous. I just wasn't sure how far
> I should go with the initial patchset. I also had an scopy filter
> that I had removed before submitting because it wasn't needed from 
> a proof-of-concept perspective.

Well, it is already ridiculous, and more importantly annoying for both
users and developers to have all the utility filters duplicated for both
media types, it would be even worse to have them triplicated.

So I will state it plain an clearly:

I consider that negotiating the media type is an absolute prerequisite
for any project of adding support for more media types in libavfilter.

This is not that hard to do. As I pointed, I already started working on
it. It would have gone faster if there were other people interesting in,
reviewing the code trustfully and offering useful suggestions.

> I'm not sure what you mean exactly, probably I should take a look 
> at your code that you mentioned.

That would be a good idea for the sake of the discussion.

> We already have AVSubtitle which is understood by encoders and decoders,
> so I see no reason to convert this back and forth to something different
> just for carrying around by AVFrame.

There have been extensibility issues raised against AVSubtitle, and in
particular AVSubtitleRect. Reworking this API has been considered
necessary for a long time.

> I need a solution and the discussion last year ended up nowhere after
> discussing some freaky details about which I (and probably most others)
> don't care at all.
> From that experience I wanted to avoid this to happen again.

Not taking into consideration what was already said on a subject is
really not a good way of making the discussion progress.

Every once in a while, we have somebody arrive with a series of patch
and make a tantrum "but my code works and I want MY feature NOW!"
without consideration for users who may want to use the feature a little
differently or the developers who will have to keep that feature alive
continue developing FFmpeg around it.

I hope you do not intend to be that guy.

Extending FFmpeg, especially for a feature that central, requires
looking at the big picture, and looking at the big picture requires
having spent time maintaining code, developing different features and
interacting with users.

> I don't think that my patchset is going a totally wrong way, even though
> some things might need adjustment, but it is clearly just a step and 
> not arrival at a finish line.

Even if it not going in "a totally wrong way", since what you are doing
is part of the public API, any mistake made now will bug us for a very
long time.

Therefore I insist we are extra careful not to make mistakes.

> The primary motivation for submitting this right away as is, is to 
> avoid this do be discussed to death once again and make a step that
> 
> - doesn't break anything
> - allows things to work that haven't been possible before

- is part of the public API
- makes it harder to maintain libavfilter

"But my code works and I want MY feature in NOW. (And I won't be around
long enough to have to maintain it anyway.)"

Please do not be that guy.

I am very happy if somebody wants to seriously work on subtitles in
libavfilter, but seriously is an important word here. If it was as easy
as slapping a few pieces of code around like you did, I would already
have done it years ago.

The first step in libavfilter is to refactor the negotiation to make the
media type negotiated. It is already started.
Nicolas George Aug. 19, 2021, 1:37 p.m. UTC | #10
Nicolas George (12021-08-19):
> This is not that hard to do. As I pointed, I already started working on
> it. It would have gone faster if there were other people interesting in,
> reviewing the code trustfully and offering useful suggestions.

This is the notes I have taken about format negotiation.

avfilter_graph_config()

  graph_config_formats()

    query_formats()
      filter->query_formats()
      ff_merge_*()

    reduce_formats()
      = if input has only one format, try to keep only it on output

    swap_sample_fmts()
      = if input has only one format, put the most similar first on outputs

    swap_samplerates()
      = if input has only one sample rate, put the closest first on outputs

    swap_channel_layouts()

    pick_formats()

  graph_config_links()
  graph_check_links()
  graph_config_pointers()

The goal now is to describe every step of the negotiation in the
AVFilterNegotiation structure. Right now, the merging step is done. I
will be looking at the reduce and swap code now.

If you want to help, you can start getting familiar with how it works.
Soft Works Aug. 19, 2021, 10:36 p.m. UTC | #11
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Nicolas
> George
> Sent: Thursday, 19 August 2021 15:24
> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH 4/9] avfilter/overlay_subs: Add
> overlay_subs filter
> 
> Soft Works (12021-08-19):
> > No, I don’t think this would be ridiculous. I just wasn't sure how far
> > I should go with the initial patchset. I also had an scopy filter
> > that I had removed before submitting because it wasn't needed from
> > a proof-of-concept perspective.
> 
> Well, it is already ridiculous, and more importantly annoying for both
> users and developers to have all the utility filters duplicated for both
> media types, it would be even worse to have them triplicated.
> 
> So I will state it plain an clearly:
> 
> I consider that negotiating the media type is an absolute prerequisite
> for any project of adding support for more media types in libavfilter.
> 
> This is not that hard to do. As I pointed, I already started working on
> it. It would have gone faster if there were other people interesting in,
> reviewing the code trustfully and offering useful suggestions.
> 
> > I'm not sure what you mean exactly, probably I should take a look
> > at your code that you mentioned.
> 
> That would be a good idea for the sake of the discussion.

Will do.

> > We already have AVSubtitle which is understood by encoders and decoders,
> > so I see no reason to convert this back and forth to something different
> > just for carrying around by AVFrame.
> 
> There have been extensibility issues raised against AVSubtitle, and in
> particular AVSubtitleRect. Reworking this API has been considered
> necessary for a long time.

Yes, but I think that a step-by-step approach makes more sense
than raising a bar that high that it becomes hardly possible to
Surpass.

> 
> > I need a solution and the discussion last year ended up nowhere after
> > discussing some freaky details about which I (and probably most others)
> > don't care at all.
> > From that experience I wanted to avoid this to happen again.
> 
> Not taking into consideration what was already said on a subject is
> really not a good way of making the discussion progress.

There are substantial and practical concerns and there are concerns
that are of rather academic nature that might be valid in some way
but don't have much practical relevance. 
That's what I meant.

> Every once in a while, we have somebody arrive with a series of patch
> and make a tantrum "but my code works and I want MY feature NOW!"
> without consideration for users who may want to use the feature a little
> differently or the developers who will have to keep that feature alive
> continue developing FFmpeg around it.
> 
> I hope you do not intend to be that guy.
> 
> Extending FFmpeg, especially for a feature that central, requires
> looking at the big picture, and looking at the big picture requires
> having spent time maintaining code, developing different features and
> interacting with users.
> 
> > I don't think that my patchset is going a totally wrong way, even though
> > some things might need adjustment, but it is clearly just a step and
> > not arrival at a finish line.
> 
> Even if it not going in "a totally wrong way", since what you are doing
> is part of the public API, any mistake made now will bug us for a very
> long time.
> 
> Therefore I insist we are extra careful not to make mistakes.
> 
> > The primary motivation for submitting this right away as is, is to
> > avoid this do be discussed to death once again and make a step that
> >
> > - doesn't break anything
> > - allows things to work that haven't been possible before
> 
> - is part of the public API
> - makes it harder to maintain libavfilter
> 
> "But my code works and I want MY feature in NOW. (And I won't be around
> long enough to have to maintain it anyway.)"
> 
> Please do not be that guy.

I'm both. I am the guy that wants those features NOW.
But I'm also the guy who acknowledges your concerns and who
wants to collaborate with you (and anybody else who might chime in)
to come to a solution that is acceptable for all sides.

> I am very happy if somebody wants to seriously work on subtitles in
> libavfilter, but seriously is an important word here. If it was as easy
> as slapping a few pieces of code around like you did, I would already
> have done it years ago.

Why haven't you? I would have welcomed and supported that!


The intention of my submission was to demonstrate that it doesn't
take that much to come to a working solution.

Like I said above: In the past, the bar has been raised so high that
it has caused any progress to stall or die.

I'm not considering my submission as the one and only eat-it-or-leave-it
solution and I'm ready to work on it and get it into an even better 
shape, as long as it leads to a solution and not into another dead 
end.

> The first step in libavfilter is to refactor the negotiation to make the
> media type negotiated. It is already started.

I'll look into your work and get back shortly.

Thanks,
softworkz
diff mbox series

Patch

diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index db32cf1265..68a7f5cb88 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -357,6 +357,7 @@  OBJS-$(CONFIG_OVERLAY_CUDA_FILTER)           += vf_overlay_cuda.o framesync.o vf
 OBJS-$(CONFIG_OVERLAY_OPENCL_FILTER)         += vf_overlay_opencl.o opencl.o \
                                                 opencl/overlay.o framesync.o
 OBJS-$(CONFIG_OVERLAY_QSV_FILTER)            += vf_overlay_qsv.o framesync.o
+OBJS-$(CONFIG_OVERLAY_SUBS_FILTER)           += vf_overlay_subs.o framesync.o
 OBJS-$(CONFIG_OVERLAY_VULKAN_FILTER)         += vf_overlay_vulkan.o vulkan.o
 OBJS-$(CONFIG_OWDENOISE_FILTER)              += vf_owdenoise.o
 OBJS-$(CONFIG_PAD_FILTER)                    += vf_pad.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 73040d2824..abd0a47750 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -339,6 +339,7 @@  extern const AVFilter ff_vf_oscilloscope;
 extern const AVFilter ff_vf_overlay;
 extern const AVFilter ff_vf_overlay_opencl;
 extern const AVFilter ff_vf_overlay_qsv;
+extern const AVFilter ff_vf_overlay_subs;
 extern const AVFilter ff_vf_overlay_vulkan;
 extern const AVFilter ff_vf_overlay_cuda;
 extern const AVFilter ff_vf_owdenoise;
diff --git a/libavfilter/vf_overlay_subs.c b/libavfilter/vf_overlay_subs.c
new file mode 100644
index 0000000000..02659300a9
--- /dev/null
+++ b/libavfilter/vf_overlay_subs.c
@@ -0,0 +1,1173 @@ 
+/*
+ * Copyright (c) 2010 Stefano Sabatini
+ * Copyright (c) 2010 Baptiste Coudurier
+ * Copyright (c) 2007 Bobby Bingham
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * overlay one video on top of another
+ */
+
+#include "avfilter.h"
+#include "formats.h"
+#include "libavutil/common.h"
+#include "libavutil/eval.h"
+#include "libavutil/avstring.h"
+#include "libavutil/pixdesc.h"
+#include "libavutil/imgutils.h"
+#include "libavutil/mathematics.h"
+#include "libavutil/opt.h"
+#include "libavutil/timestamp.h"
+#include "internal.h"
+#include "drawutils.h"
+#include "framesync.h"
+#include "video.h"
+#include "vf_overlay_subs.h"
+
+#include "libavcodec/avcodec.h"
+
+typedef struct ThreadData {
+    AVFrame *dst;
+    AVSubtitleRect *src;
+} ThreadData;
+
+static const char *const var_names[] = {
+    "main_w",    "W", ///< width  of the main    video
+    "main_h",    "H", ///< height of the main    video
+    "overlay_w", "w", ///< width  of the overlay video
+    "overlay_h", "h", ///< height of the overlay video
+    "hsub",
+    "vsub",
+    "x",
+    "y",
+    "n",            ///< number of frame
+    "pos",          ///< position in the file
+    "t",            ///< timestamp expressed in seconds
+    NULL
+};
+
+#define MAIN    0
+#define OVERLAY 1
+
+#define R 0
+#define G 1
+#define B 2
+#define A 3
+
+#define Y 0
+#define U 1
+#define V 2
+
+enum EvalMode {
+    EVAL_MODE_INIT,
+    EVAL_MODE_FRAME,
+    EVAL_MODE_NB
+};
+
+static av_cold void uninit(AVFilterContext *ctx)
+{
+    OverlaySubsContext *s = ctx->priv;
+
+    ff_framesync_uninit(&s->fs);
+    av_expr_free(s->x_pexpr); s->x_pexpr = NULL;
+    av_expr_free(s->y_pexpr); s->y_pexpr = NULL;
+}
+
+static inline int normalize_xy(double d, int chroma_sub)
+{
+    if (isnan(d))
+        return INT_MAX;
+    return (int)d & ~((1 << chroma_sub) - 1);
+}
+
+static void eval_expr(AVFilterContext *ctx)
+{
+    OverlaySubsContext *s = ctx->priv;
+
+    s->var_values[VAR_X] = av_expr_eval(s->x_pexpr, s->var_values, NULL);
+    s->var_values[VAR_Y] = av_expr_eval(s->y_pexpr, s->var_values, NULL);
+    /* It is necessary if x is expressed from y  */
+    s->var_values[VAR_X] = av_expr_eval(s->x_pexpr, s->var_values, NULL);
+    s->x = normalize_xy(s->var_values[VAR_X], s->hsub);
+    s->y = normalize_xy(s->var_values[VAR_Y], s->vsub);
+}
+
+static int set_expr(AVExpr **pexpr, const char *expr, const char *option, void *log_ctx)
+{
+    int ret;
+    AVExpr *old = NULL;
+
+    if (*pexpr)
+        old = *pexpr;
+    ret = av_expr_parse(pexpr, expr, var_names,
+                        NULL, NULL, NULL, NULL, 0, log_ctx);
+    if (ret < 0) {
+        av_log(log_ctx, AV_LOG_ERROR,
+               "Error when evaluating the expression '%s' for %s\n",
+               expr, option);
+        *pexpr = old;
+        return ret;
+    }
+
+    av_expr_free(old);
+    return 0;
+}
+
+static int process_command(AVFilterContext *ctx, const char *cmd, const char *args,
+                           char *res, int res_len, int flags)
+{
+    OverlaySubsContext *s = ctx->priv;
+    int ret;
+
+    if      (!strcmp(cmd, "x"))
+        ret = set_expr(&s->x_pexpr, args, cmd, ctx);
+    else if (!strcmp(cmd, "y"))
+        ret = set_expr(&s->y_pexpr, args, cmd, ctx);
+    else
+        ret = AVERROR(ENOSYS);
+
+    if (ret < 0)
+        return ret;
+
+    if (s->eval_mode == EVAL_MODE_INIT) {
+        eval_expr(ctx);
+        av_log(ctx, AV_LOG_VERBOSE, "x:%f xi:%d y:%f yi:%d\n",
+               s->var_values[VAR_X], s->x,
+               s->var_values[VAR_Y], s->y);
+    }
+    return ret;
+}
+
+static const enum AVPixelFormat alpha_pix_fmts[] = {
+    AV_PIX_FMT_YUVA420P, AV_PIX_FMT_YUVA422P, AV_PIX_FMT_YUVA444P,
+    AV_PIX_FMT_YUVA420P10, AV_PIX_FMT_YUVA422P10,
+    AV_PIX_FMT_ARGB, AV_PIX_FMT_ABGR, AV_PIX_FMT_RGBA,
+    AV_PIX_FMT_BGRA, AV_PIX_FMT_GBRAP, AV_PIX_FMT_NONE
+};
+
+static int query_formats(AVFilterContext *ctx)
+{
+    OverlaySubsContext *s = ctx->priv;
+
+    /* overlay formats contains alpha, for avoiding conversion with alpha information loss */
+    static const enum AVPixelFormat main_pix_fmts_yuv420[] = {
+        AV_PIX_FMT_YUV420P, AV_PIX_FMT_YUVJ420P, AV_PIX_FMT_YUVA420P,
+        AV_PIX_FMT_NV12, AV_PIX_FMT_NV21,
+        AV_PIX_FMT_NONE
+    };
+
+    static const enum AVPixelFormat main_pix_fmts_yuv420p10[] = {
+        AV_PIX_FMT_YUV420P10, AV_PIX_FMT_YUVA420P10,
+        AV_PIX_FMT_NONE
+    };
+
+    static const enum AVPixelFormat main_pix_fmts_yuv422[] = {
+        AV_PIX_FMT_YUV422P, AV_PIX_FMT_YUVJ422P, AV_PIX_FMT_YUVA422P, AV_PIX_FMT_NONE
+    };
+
+    static const enum AVPixelFormat main_pix_fmts_yuv422p10[] = {
+        AV_PIX_FMT_YUV422P10, AV_PIX_FMT_YUVA422P10, AV_PIX_FMT_NONE
+    };
+
+    static const enum AVPixelFormat main_pix_fmts_yuv444[] = {
+        AV_PIX_FMT_YUV444P, AV_PIX_FMT_YUVJ444P, AV_PIX_FMT_YUVA444P, AV_PIX_FMT_NONE
+    };
+
+    static const enum AVPixelFormat main_pix_fmts_gbrp[] = {
+        AV_PIX_FMT_GBRP, AV_PIX_FMT_GBRAP, AV_PIX_FMT_NONE
+    };
+
+    static const enum AVPixelFormat main_pix_fmts_rgb[] = {
+        AV_PIX_FMT_ARGB,  AV_PIX_FMT_RGBA,
+        AV_PIX_FMT_ABGR,  AV_PIX_FMT_BGRA,
+        AV_PIX_FMT_RGB24, AV_PIX_FMT_BGR24,
+        AV_PIX_FMT_NONE
+    };
+
+    const enum AVPixelFormat *main_formats, *overlay_formats;
+    AVFilterFormats *formats;
+    int ret;
+
+    switch (s->format) {
+    case OVERLAY_FORMAT_YUV420:
+        main_formats    = main_pix_fmts_yuv420;
+        break;
+    case OVERLAY_FORMAT_YUV420P10:
+        main_formats    = main_pix_fmts_yuv420p10;
+        break;
+    case OVERLAY_FORMAT_YUV422:
+        main_formats    = main_pix_fmts_yuv422;
+        break;
+    case OVERLAY_FORMAT_YUV422P10:
+        main_formats    = main_pix_fmts_yuv422p10;
+        break;
+    case OVERLAY_FORMAT_YUV444:
+        main_formats    = main_pix_fmts_yuv444;
+        break;
+    case OVERLAY_FORMAT_RGB:
+        main_formats    = main_pix_fmts_rgb;
+        break;
+    case OVERLAY_FORMAT_GBRP:
+        main_formats    = main_pix_fmts_gbrp;
+        break;
+    case OVERLAY_FORMAT_AUTO:
+        return ff_set_common_formats(ctx, ff_make_format_list(alpha_pix_fmts));
+    default:
+        av_assert0(0);
+    }
+
+    formats = ff_make_format_list(main_formats);
+    if ((ret = ff_formats_ref(formats, &ctx->inputs[MAIN]->outcfg.formats)) < 0 ||
+        (ret = ff_formats_ref(formats, &ctx->outputs[MAIN]->incfg.formats)) < 0)
+        return ret;
+
+    return ff_formats_ref(ff_make_format_list(overlay_formats),
+                          &ctx->inputs[OVERLAY]->outcfg.formats);
+}
+
+static int config_input_overlay(AVFilterLink *inlink)
+{
+    AVFilterContext *ctx  = inlink->dst;
+    OverlaySubsContext  *s = inlink->dst->priv;
+    int ret;
+    const AVPixFmtDescriptor *pix_desc = av_pix_fmt_desc_get(AV_PIX_FMT_BGRA);
+
+    av_image_fill_max_pixsteps(s->overlay_pix_step, NULL, pix_desc);
+
+    /* Finish the configuration by evaluating the expressions
+       now when both inputs are configured. */
+    s->var_values[VAR_MAIN_W   ] = s->var_values[VAR_MW] = ctx->inputs[MAIN   ]->w;
+    s->var_values[VAR_MAIN_H   ] = s->var_values[VAR_MH] = ctx->inputs[MAIN   ]->h;
+    s->var_values[VAR_OVERLAY_W] = s->var_values[VAR_OW] = ctx->inputs[OVERLAY]->w;
+    s->var_values[VAR_OVERLAY_H] = s->var_values[VAR_OH] = ctx->inputs[OVERLAY]->h;
+    s->var_values[VAR_HSUB]  = 1<<pix_desc->log2_chroma_w;
+    s->var_values[VAR_VSUB]  = 1<<pix_desc->log2_chroma_h;
+    s->var_values[VAR_X]     = NAN;
+    s->var_values[VAR_Y]     = NAN;
+    s->var_values[VAR_N]     = 0;
+    s->var_values[VAR_T]     = NAN;
+    s->var_values[VAR_POS]   = NAN;
+
+    if ((ret = set_expr(&s->x_pexpr,      s->x_expr,      "x",      ctx)) < 0 ||
+        (ret = set_expr(&s->y_pexpr,      s->y_expr,      "y",      ctx)) < 0)
+        return ret;
+
+    s->overlay_is_packed_rgb =
+        ff_fill_rgba_map(s->overlay_rgba_map, AV_PIX_FMT_BGRA) >= 0;
+    s->overlay_has_alpha = ff_fmt_is_in(AV_PIX_FMT_BGRA, alpha_pix_fmts);
+
+    if (s->eval_mode == EVAL_MODE_INIT) {
+        eval_expr(ctx);
+        av_log(ctx, AV_LOG_VERBOSE, "x:%f xi:%d y:%f yi:%d\n",
+               s->var_values[VAR_X], s->x,
+               s->var_values[VAR_Y], s->y);
+    }
+
+    av_log(ctx, AV_LOG_VERBOSE,
+           "main w:%d h:%d fmt:%s overlay w:%d h:%d fmt:%s\n",
+           ctx->inputs[MAIN]->w, ctx->inputs[MAIN]->h,
+           av_get_pix_fmt_name(ctx->inputs[MAIN]->format),
+           ctx->inputs[OVERLAY]->w, ctx->inputs[OVERLAY]->h,
+           av_get_pix_fmt_name(ctx->inputs[OVERLAY]->format));
+    return 0;
+}
+
+static int config_output(AVFilterLink *outlink)
+{
+    AVFilterContext *ctx = outlink->src;
+    OverlaySubsContext *s = ctx->priv;
+    int ret;
+
+    if ((ret = ff_framesync_init_dualinput(&s->fs, ctx)) < 0)
+        return ret;
+
+    outlink->w = ctx->inputs[MAIN]->w;
+    outlink->h = ctx->inputs[MAIN]->h;
+    outlink->time_base = ctx->inputs[MAIN]->time_base;
+
+    return ff_framesync_configure(&s->fs);
+}
+
+// divide by 255 and round to nearest
+// apply a fast variant: (X+127)/255 = ((X+127)*257+257)>>16 = ((X+128)*257)>>16
+#define FAST_DIV255(x) ((((x) + 128) * 257) >> 16)
+
+// calculate the unpremultiplied alpha, applying the general equation:
+// alpha = alpha_overlay / ( (alpha_main + alpha_overlay) - (alpha_main * alpha_overlay) )
+// (((x) << 16) - ((x) << 9) + (x)) is a faster version of: 255 * 255 * x
+// ((((x) + (y)) << 8) - ((x) + (y)) - (y) * (x)) is a faster version of: 255 * (x + y)
+#define UNPREMULTIPLY_ALPHA(x, y) ((((x) << 16) - ((x) << 9) + (x)) / ((((x) + (y)) << 8) - ((x) + (y)) - (y) * (x)))
+
+/**
+ * Blend image in src to destination buffer dst at position (x, y).
+ */
+
+static av_always_inline void blend_slice_packed_rgb(AVFilterContext *ctx,
+                                   AVFrame *dst, const AVSubtitleRect *src,
+                                   int main_has_alpha, int x, int y,
+                                   int is_straight, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    int i, imax, j, jmax;
+    const int src_w = src->w;
+    const int src_h = src->h;
+    const int dst_w = dst->width;
+    const int dst_h = dst->height;
+    uint8_t alpha;          ///< the amount of overlay to blend on to main
+    const int dr = s->main_rgba_map[R];
+    const int dg = s->main_rgba_map[G];
+    const int db = s->main_rgba_map[B];
+    const int da = s->main_rgba_map[A];
+    const int dstep = s->main_pix_step[0];
+    const int sr = s->overlay_rgba_map[R];
+    const int sg = s->overlay_rgba_map[G];
+    const int sb = s->overlay_rgba_map[B];
+    const int sa = s->overlay_rgba_map[A];
+    const int sstep = s->overlay_pix_step[0];
+    int slice_start, slice_end;
+    uint8_t *S, *sp, *d, *dp;
+
+    i = FFMAX(-y, 0);
+    imax = FFMIN3(-y + dst_h, FFMIN(src_h, dst_h), y + src_h);
+
+    slice_start = i + (imax * jobnr) / nb_jobs;
+    slice_end = i + (imax * (jobnr+1)) / nb_jobs;
+
+    sp = src->data[0] + (slice_start)     * src->linesize[0];
+    dp = dst->data[0] + (y + slice_start) * dst->linesize[0];
+
+    for (i = slice_start; i < slice_end; i++) {
+        j = FFMAX(-x, 0);
+        S = sp + j     * sstep;
+        d = dp + (x+j) * dstep;
+
+        for (jmax = FFMIN(-x + dst_w, src_w); j < jmax; j++) {
+            alpha = S[sa];
+
+            // if the main channel has an alpha channel, alpha has to be calculated
+            // to create an un-premultiplied (straight) alpha value
+            if (main_has_alpha && alpha != 0 && alpha != 255) {
+                uint8_t alpha_d = d[da];
+                alpha = UNPREMULTIPLY_ALPHA(alpha, alpha_d);
+            }
+
+            switch (alpha) {
+            case 0:
+                break;
+            case 255:
+                d[dr] = S[sr];
+                d[dg] = S[sg];
+                d[db] = S[sb];
+                break;
+            default:
+                // main_value = main_value * (1 - alpha) + overlay_value * alpha
+                // since alpha is in the range 0-255, the result must divided by 255
+                d[dr] = is_straight ? FAST_DIV255(d[dr] * (255 - alpha) + S[sr] * alpha) :
+                        FFMIN(FAST_DIV255(d[dr] * (255 - alpha)) + S[sr], 255);
+                d[dg] = is_straight ? FAST_DIV255(d[dg] * (255 - alpha) + S[sg] * alpha) :
+                        FFMIN(FAST_DIV255(d[dg] * (255 - alpha)) + S[sg], 255);
+                d[db] = is_straight ? FAST_DIV255(d[db] * (255 - alpha) + S[sb] * alpha) :
+                        FFMIN(FAST_DIV255(d[db] * (255 - alpha)) + S[sb], 255);
+            }
+            if (main_has_alpha) {
+                switch (alpha) {
+                case 0:
+                    break;
+                case 255:
+                    d[da] = S[sa];
+                    break;
+                default:
+                    // apply alpha compositing: main_alpha += (1-main_alpha) * overlay_alpha
+                    d[da] += FAST_DIV255((255 - d[da]) * S[sa]);
+                }
+            }
+            d += dstep;
+            S += sstep;
+        }
+        dp += dst->linesize[0];
+        sp += src->linesize[0];
+    }
+}
+
+#define DEFINE_BLEND_PLANE(depth, nbits)                                                                   \
+static av_always_inline void blend_plane_##depth##_##nbits##bits(AVFilterContext *ctx,                     \
+                                         AVFrame *dst, const AVFrame *src,                                 \
+                                         int src_w, int src_h,                                             \
+                                         int dst_w, int dst_h,                                             \
+                                         int i, int hsub, int vsub,                                        \
+                                         int x, int y,                                                     \
+                                         int main_has_alpha,                                               \
+                                         int dst_plane,                                                    \
+                                         int dst_offset,                                                   \
+                                         int dst_step,                                                     \
+                                         int straight,                                                     \
+                                         int yuv,                                                          \
+                                         int jobnr,                                                        \
+                                         int nb_jobs)                                                      \
+{                                                                                                          \
+    OverlaySubsContext *octx = ctx->priv;                                                                      \
+    int src_wp = AV_CEIL_RSHIFT(src_w, hsub);                                                              \
+    int src_hp = AV_CEIL_RSHIFT(src_h, vsub);                                                              \
+    int dst_wp = AV_CEIL_RSHIFT(dst_w, hsub);                                                              \
+    int dst_hp = AV_CEIL_RSHIFT(dst_h, vsub);                                                              \
+    int yp = y>>vsub;                                                                                      \
+    int xp = x>>hsub;                                                                                      \
+    uint##depth##_t *s, *sp, *d, *dp, *dap, *a, *da, *ap;                                                  \
+    int jmax, j, k, kmax;                                                                                  \
+    int slice_start, slice_end;                                                                            \
+    const uint##depth##_t max = (1 << nbits) - 1;                                                          \
+    const uint##depth##_t mid = (1 << (nbits -1)) ;                                                        \
+    int bytes = depth / 8;                                                                                 \
+                                                                                                           \
+    dst_step /= bytes;                                                                                     \
+    j = FFMAX(-yp, 0);                                                                                     \
+    jmax = FFMIN3(-yp + dst_hp, FFMIN(src_hp, dst_hp), yp + src_hp);                                       \
+                                                                                                           \
+    slice_start = j + (jmax * jobnr) / nb_jobs;                                                            \
+    slice_end = j + (jmax * (jobnr+1)) / nb_jobs;                                                          \
+                                                                                                           \
+    sp = (uint##depth##_t *)(src->data[i] + (slice_start) * src->linesize[i]);                             \
+    dp = (uint##depth##_t *)(dst->data[dst_plane]                                                          \
+                      + (yp + slice_start) * dst->linesize[dst_plane]                                      \
+                      + dst_offset);                                                                       \
+    ap = (uint##depth##_t *)(src->data[3] + (slice_start << vsub) * src->linesize[3]);                     \
+    dap = (uint##depth##_t *)(dst->data[3] + ((yp + slice_start) << vsub) * dst->linesize[3]);             \
+                                                                                                           \
+    for (j = slice_start; j < slice_end; j++) {                                                            \
+        k = FFMAX(-xp, 0);                                                                                 \
+        d = dp + (xp+k) * dst_step;                                                                        \
+        s = sp + k;                                                                                        \
+        a = ap + (k<<hsub);                                                                                \
+        da = dap + ((xp+k) << hsub);                                                                       \
+        kmax = FFMIN(-xp + dst_wp, src_wp);                                                                \
+                                                                                                           \
+        if (nbits == 8 && ((vsub && j+1 < src_hp) || !vsub) && octx->blend_row[i]) {                       \
+            int c = octx->blend_row[i]((uint8_t*)d, (uint8_t*)da, (uint8_t*)s,                             \
+                    (uint8_t*)a, kmax - k, src->linesize[3]);                                              \
+                                                                                                           \
+            s += c;                                                                                        \
+            d += dst_step * c;                                                                             \
+            da += (1 << hsub) * c;                                                                         \
+            a += (1 << hsub) * c;                                                                          \
+            k += c;                                                                                        \
+        }                                                                                                  \
+        for (; k < kmax; k++) {                                                                            \
+            int alpha_v, alpha_h, alpha;                                                                   \
+                                                                                                           \
+            /* average alpha for color components, improve quality */                                      \
+            if (hsub && vsub && j+1 < src_hp && k+1 < src_wp) {                                            \
+                alpha = (a[0] + a[src->linesize[3]] +                                                      \
+                         a[1] + a[src->linesize[3]+1]) >> 2;                                               \
+            } else if (hsub || vsub) {                                                                     \
+                alpha_h = hsub && k+1 < src_wp ?                                                           \
+                    (a[0] + a[1]) >> 1 : a[0];                                                             \
+                alpha_v = vsub && j+1 < src_hp ?                                                           \
+                    (a[0] + a[src->linesize[3]]) >> 1 : a[0];                                              \
+                alpha = (alpha_v + alpha_h) >> 1;                                                          \
+            } else                                                                                         \
+                alpha = a[0];                                                                              \
+            /* if the main channel has an alpha channel, alpha has to be calculated */                     \
+            /* to create an un-premultiplied (straight) alpha value */                                     \
+            if (main_has_alpha && alpha != 0 && alpha != max) {                                            \
+                /* average alpha for color components, improve quality */                                  \
+                uint8_t alpha_d;                                                                           \
+                if (hsub && vsub && j+1 < src_hp && k+1 < src_wp) {                                        \
+                    alpha_d = (da[0] + da[dst->linesize[3]] +                                              \
+                               da[1] + da[dst->linesize[3]+1]) >> 2;                                       \
+                } else if (hsub || vsub) {                                                                 \
+                    alpha_h = hsub && k+1 < src_wp ?                                                       \
+                        (da[0] + da[1]) >> 1 : da[0];                                                      \
+                    alpha_v = vsub && j+1 < src_hp ?                                                       \
+                        (da[0] + da[dst->linesize[3]]) >> 1 : da[0];                                       \
+                    alpha_d = (alpha_v + alpha_h) >> 1;                                                    \
+                } else                                                                                     \
+                    alpha_d = da[0];                                                                       \
+                alpha = UNPREMULTIPLY_ALPHA(alpha, alpha_d);                                               \
+            }                                                                                              \
+            if (straight) {                                                                                \
+                if (nbits > 8)                                                                             \
+                   *d = (*d * (max - alpha) + *s * alpha) / max;                                           \
+                else                                                                                       \
+                    *d = FAST_DIV255(*d * (255 - alpha) + *s * alpha);                                     \
+            } else {                                                                                       \
+                if (nbits > 8) {                                                                           \
+                    if (i && yuv)                                                                          \
+                        *d = av_clip((*d * (max - alpha) + *s * alpha) / max + *s - mid, -mid, mid) + mid; \
+                    else                                                                                   \
+                        *d = FFMIN((*d * (max - alpha) + *s * alpha) / max + *s, max);                     \
+                } else {                                                                                   \
+                    if (i && yuv)                                                                          \
+                        *d = av_clip(FAST_DIV255((*d - mid) * (max - alpha)) + *s - mid, -mid, mid) + mid; \
+                    else                                                                                   \
+                        *d = FFMIN(FAST_DIV255(*d * (max - alpha)) + *s, max);                             \
+                }                                                                                          \
+            }                                                                                              \
+            s++;                                                                                           \
+            d += dst_step;                                                                                 \
+            da += 1 << hsub;                                                                               \
+            a += 1 << hsub;                                                                                \
+        }                                                                                                  \
+        dp += dst->linesize[dst_plane] / bytes;                                                            \
+        sp += src->linesize[i] / bytes;                                                                    \
+        ap += (1 << vsub) * src->linesize[3] / bytes;                                                      \
+        dap += (1 << vsub) * dst->linesize[3] / bytes;                                                     \
+    }                                                                                                      \
+}
+DEFINE_BLEND_PLANE(8, 8)
+DEFINE_BLEND_PLANE(16, 10)
+
+#define DEFINE_ALPHA_COMPOSITE(depth, nbits)                                                               \
+static inline void alpha_composite_##depth##_##nbits##bits(const AVFrame *src, const AVFrame *dst,         \
+                                   int src_w, int src_h,                                                   \
+                                   int dst_w, int dst_h,                                                   \
+                                   int x, int y,                                                           \
+                                   int jobnr, int nb_jobs)                                                 \
+{                                                                                                          \
+    uint##depth##_t alpha;          /* the amount of overlay to blend on to main */                        \
+    uint##depth##_t *s, *sa, *d, *da;                                                                      \
+    int i, imax, j, jmax;                                                                                  \
+    int slice_start, slice_end;                                                                            \
+    const uint##depth##_t max = (1 << nbits) - 1;                                                          \
+    int bytes = depth / 8;                                                                                 \
+                                                                                                           \
+    imax = FFMIN(-y + dst_h, src_h);                                                                       \
+    slice_start = (imax * jobnr) / nb_jobs;                                                                \
+    slice_end = ((imax * (jobnr+1)) / nb_jobs);                                                            \
+                                                                                                           \
+    i = FFMAX(-y, 0);                                                                                      \
+    sa = (uint##depth##_t *)(src->data[3] + (i + slice_start) * src->linesize[3]);                         \
+    da = (uint##depth##_t *)(dst->data[3] + (y + i + slice_start) * dst->linesize[3]);                     \
+                                                                                                           \
+    for (i = i + slice_start; i < slice_end; i++) {                                                        \
+        j = FFMAX(-x, 0);                                                                                  \
+        s = sa + j;                                                                                        \
+        d = da + x+j;                                                                                      \
+                                                                                                           \
+        for (jmax = FFMIN(-x + dst_w, src_w); j < jmax; j++) {                                             \
+            alpha = *s;                                                                                    \
+            if (alpha != 0 && alpha != max) {                                                              \
+                uint8_t alpha_d = *d;                                                                      \
+                alpha = UNPREMULTIPLY_ALPHA(alpha, alpha_d);                                               \
+            }                                                                                              \
+            if (alpha == max)                                                                              \
+                *d = *s;                                                                                   \
+            else if (alpha > 0) {                                                                          \
+                /* apply alpha compositing: main_alpha += (1-main_alpha) * overlay_alpha */                \
+                if (nbits > 8)                                                                             \
+                    *d += (max - *d) * *s / max;                                                           \
+                else                                                                                       \
+                    *d += FAST_DIV255((max - *d) * *s);                                                    \
+            }                                                                                              \
+            d += 1;                                                                                        \
+            s += 1;                                                                                        \
+        }                                                                                                  \
+        da += dst->linesize[3] / bytes;                                                                    \
+        sa += src->linesize[3] / bytes;                                                                    \
+    }                                                                                                      \
+}
+DEFINE_ALPHA_COMPOSITE(8, 8)
+DEFINE_ALPHA_COMPOSITE(16, 10)
+
+#define DEFINE_BLEND_SLICE_YUV(depth, nbits)                                                               \
+static av_always_inline void blend_slice_yuv_##depth##_##nbits##bits(AVFilterContext *ctx,                 \
+                                             AVFrame *dst, const AVSubtitleRect *src,                             \
+                                             int hsub, int vsub,                                           \
+                                             int main_has_alpha,                                           \
+                                             int x, int y,                                                 \
+                                             int is_straight,                                              \
+                                             int jobnr, int nb_jobs)                                       \
+{                                                                                                          \
+    OverlaySubsContext *s = ctx->priv;                                                                         \
+    const int src_w = src->w;                                                                          \
+    const int src_h = src->h;                                                                         \
+    const int dst_w = dst->width;                                                                          \
+    const int dst_h = dst->height;                                                                         \
+                                                                                                           \
+    blend_plane_##depth##_##nbits##bits(ctx, dst, src, src_w, src_h, dst_w, dst_h, 0, 0,       0,          \
+                x, y, main_has_alpha, s->main_desc->comp[0].plane, s->main_desc->comp[0].offset,           \
+                s->main_desc->comp[0].step, is_straight, 1, jobnr, nb_jobs);                               \
+    blend_plane_##depth##_##nbits##bits(ctx, dst, src, src_w, src_h, dst_w, dst_h, 1, hsub, vsub,          \
+                x, y, main_has_alpha, s->main_desc->comp[1].plane, s->main_desc->comp[1].offset,           \
+                s->main_desc->comp[1].step, is_straight, 1, jobnr, nb_jobs);                               \
+    blend_plane_##depth##_##nbits##bits(ctx, dst, src, src_w, src_h, dst_w, dst_h, 2, hsub, vsub,          \
+                x, y, main_has_alpha, s->main_desc->comp[2].plane, s->main_desc->comp[2].offset,           \
+                s->main_desc->comp[2].step, is_straight, 1, jobnr, nb_jobs);                               \
+                                                                                                           \
+    if (main_has_alpha)                                                                                    \
+        alpha_composite_##depth##_##nbits##bits(src, dst, src_w, src_h, dst_w, dst_h, x, y,                \
+                                                jobnr, nb_jobs);                                           \
+}
+DEFINE_BLEND_SLICE_YUV(8, 8)
+DEFINE_BLEND_SLICE_YUV(16, 10)
+
+static av_always_inline void blend_slice_planar_rgb(AVFilterContext *ctx,
+                                                    AVFrame *dst, const AVFrame *src,
+                                                    int hsub, int vsub,
+                                                    int main_has_alpha,
+                                                    int x, int y,
+                                                    int is_straight,
+                                                    int jobnr,
+                                                    int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    const int src_w = src->width;
+    const int src_h = src->height;
+    const int dst_w = dst->width;
+    const int dst_h = dst->height;
+
+    blend_plane_8_8bits(ctx, dst, src, src_w, src_h, dst_w, dst_h, 0, 0,   0, x, y, main_has_alpha,
+                s->main_desc->comp[1].plane, s->main_desc->comp[1].offset, s->main_desc->comp[1].step, is_straight, 0,
+                jobnr, nb_jobs);
+    blend_plane_8_8bits(ctx, dst, src, src_w, src_h, dst_w, dst_h, 1, hsub, vsub, x, y, main_has_alpha,
+                s->main_desc->comp[2].plane, s->main_desc->comp[2].offset, s->main_desc->comp[2].step, is_straight, 0,
+                jobnr, nb_jobs);
+    blend_plane_8_8bits(ctx, dst, src, src_w, src_h, dst_w, dst_h, 2, hsub, vsub, x, y, main_has_alpha,
+                s->main_desc->comp[0].plane, s->main_desc->comp[0].offset, s->main_desc->comp[0].step, is_straight, 0,
+                jobnr, nb_jobs);
+
+    if (main_has_alpha)
+        alpha_composite_8_8bits(src, dst, src_w, src_h, dst_w, dst_h, x, y, jobnr, nb_jobs);
+}
+
+static int blend_slice_yuv420(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_8_8bits(ctx, td->dst, td->src, 1, 1, 0, s->x, s->y, 1, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuva420(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_8_8bits(ctx, td->dst, td->src, 1, 1, 1, s->x, s->y, 1, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuv420p10(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_16_10bits(ctx, td->dst, td->src, 1, 1, 0, s->x, s->y, 1, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuva420p10(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_16_10bits(ctx, td->dst, td->src, 1, 1, 1, s->x, s->y, 1, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuv422p10(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_16_10bits(ctx, td->dst, td->src, 1, 0, 0, s->x, s->y, 1, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuva422p10(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_16_10bits(ctx, td->dst, td->src, 1, 0, 1, s->x, s->y, 1, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuv422(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_8_8bits(ctx, td->dst, td->src, 1, 0, 0, s->x, s->y, 1, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuva422(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_8_8bits(ctx, td->dst, td->src, 1, 0, 1, s->x, s->y, 1, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuv444(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_8_8bits(ctx, td->dst, td->src, 0, 0, 0, s->x, s->y, 1, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuva444(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_8_8bits(ctx, td->dst, td->src, 0, 0, 1, s->x, s->y, 1, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_gbrp(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_planar_rgb(ctx, td->dst, td->src, 0, 0, 0, s->x, s->y, 1, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_gbrap(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_planar_rgb(ctx, td->dst, td->src, 0, 0, 1, s->x, s->y, 1, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuv420_pm(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_8_8bits(ctx, td->dst, td->src, 1, 1, 0, s->x, s->y, 0, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuva420_pm(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_8_8bits(ctx, td->dst, td->src, 1, 1, 1, s->x, s->y, 0, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuv422_pm(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_8_8bits(ctx, td->dst, td->src, 1, 0, 0, s->x, s->y, 0, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuva422_pm(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_8_8bits(ctx, td->dst, td->src, 1, 0, 1, s->x, s->y, 0, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuv444_pm(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_8_8bits(ctx, td->dst, td->src, 0, 0, 0, s->x, s->y, 0, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_yuva444_pm(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_yuv_8_8bits(ctx, td->dst, td->src, 0, 0, 1, s->x, s->y, 0, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_gbrp_pm(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_planar_rgb(ctx, td->dst, td->src, 0, 0, 0, s->x, s->y, 0, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_gbrap_pm(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_planar_rgb(ctx, td->dst, td->src, 0, 0, 1, s->x, s->y, 0, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_rgb(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_packed_rgb(ctx, td->dst, td->src, 0, s->x, s->y, 1, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_rgba(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_packed_rgb(ctx, td->dst, td->src, 1, s->x, s->y, 1, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_rgb_pm(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_packed_rgb(ctx, td->dst, td->src, 0, s->x, s->y, 0, jobnr, nb_jobs);
+    return 0;
+}
+
+static int blend_slice_rgba_pm(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    OverlaySubsContext *s = ctx->priv;
+    ThreadData *td = arg;
+    blend_slice_packed_rgb(ctx, td->dst, td->src, 1, s->x, s->y, 0, jobnr, nb_jobs);
+    return 0;
+}
+
+static int config_input_main(AVFilterLink *inlink)
+{
+    OverlaySubsContext *s = inlink->dst->priv;
+    const AVPixFmtDescriptor *pix_desc = av_pix_fmt_desc_get(inlink->format);
+
+    av_image_fill_max_pixsteps(s->main_pix_step,    NULL, pix_desc);
+
+    s->hsub = pix_desc->log2_chroma_w;
+    s->vsub = pix_desc->log2_chroma_h;
+
+    s->main_desc = pix_desc;
+
+    s->main_is_packed_rgb =
+        ff_fill_rgba_map(s->main_rgba_map, inlink->format) >= 0;
+    s->main_has_alpha = ff_fmt_is_in(inlink->format, alpha_pix_fmts);
+
+    switch (s->format) {
+    case OVERLAY_FORMAT_YUV420:
+        s->blend_slice = s->main_has_alpha ? blend_slice_yuva420 : blend_slice_yuv420;
+        break;
+    case OVERLAY_FORMAT_YUV420P10:
+        s->blend_slice = s->main_has_alpha ? blend_slice_yuva420p10 : blend_slice_yuv420p10;
+        break;
+    case OVERLAY_FORMAT_YUV422:
+        s->blend_slice = s->main_has_alpha ? blend_slice_yuva422 : blend_slice_yuv422;
+        break;
+    case OVERLAY_FORMAT_YUV422P10:
+        s->blend_slice = s->main_has_alpha ? blend_slice_yuva422p10 : blend_slice_yuv422p10;
+        break;
+    case OVERLAY_FORMAT_YUV444:
+        s->blend_slice = s->main_has_alpha ? blend_slice_yuva444 : blend_slice_yuv444;
+        break;
+    case OVERLAY_FORMAT_RGB:
+        s->blend_slice = s->main_has_alpha ? blend_slice_rgba : blend_slice_rgb;
+        break;
+    case OVERLAY_FORMAT_GBRP:
+        s->blend_slice = s->main_has_alpha ? blend_slice_gbrap : blend_slice_gbrp;
+        break;
+    case OVERLAY_FORMAT_AUTO:
+        switch (inlink->format) {
+        case AV_PIX_FMT_YUVA420P:
+            s->blend_slice = blend_slice_yuva420;
+            break;
+        case AV_PIX_FMT_YUVA420P10:
+            s->blend_slice = blend_slice_yuva420p10;
+            break;
+        case AV_PIX_FMT_YUVA422P:
+            s->blend_slice = blend_slice_yuva422;
+            break;
+        case AV_PIX_FMT_YUVA422P10:
+            s->blend_slice = blend_slice_yuva422p10;
+            break;
+        case AV_PIX_FMT_YUVA444P:
+            s->blend_slice = blend_slice_yuva444;
+            break;
+        case AV_PIX_FMT_ARGB:
+        case AV_PIX_FMT_RGBA:
+        case AV_PIX_FMT_BGRA:
+        case AV_PIX_FMT_ABGR:
+            s->blend_slice = blend_slice_rgba;
+            break;
+        case AV_PIX_FMT_GBRAP:
+            s->blend_slice = blend_slice_gbrap;
+            break;
+        default:
+            av_assert0(0);
+            break;
+        }
+        break;
+    }
+
+    if (!s->alpha_format)
+        goto end;
+
+    switch (s->format) {
+    case OVERLAY_FORMAT_YUV420:
+        s->blend_slice = s->main_has_alpha ? blend_slice_yuva420_pm : blend_slice_yuv420_pm;
+        break;
+    case OVERLAY_FORMAT_YUV422:
+        s->blend_slice = s->main_has_alpha ? blend_slice_yuva422_pm : blend_slice_yuv422_pm;
+        break;
+    case OVERLAY_FORMAT_YUV444:
+        s->blend_slice = s->main_has_alpha ? blend_slice_yuva444_pm : blend_slice_yuv444_pm;
+        break;
+    case OVERLAY_FORMAT_RGB:
+        s->blend_slice = s->main_has_alpha ? blend_slice_rgba_pm : blend_slice_rgb_pm;
+        break;
+    case OVERLAY_FORMAT_GBRP:
+        s->blend_slice = s->main_has_alpha ? blend_slice_gbrap_pm : blend_slice_gbrp_pm;
+        break;
+    case OVERLAY_FORMAT_AUTO:
+        switch (inlink->format) {
+        case AV_PIX_FMT_YUVA420P:
+            s->blend_slice = blend_slice_yuva420_pm;
+            break;
+        case AV_PIX_FMT_YUVA422P:
+            s->blend_slice = blend_slice_yuva422_pm;
+            break;
+        case AV_PIX_FMT_YUVA444P:
+            s->blend_slice = blend_slice_yuva444_pm;
+            break;
+        case AV_PIX_FMT_ARGB:
+        case AV_PIX_FMT_RGBA:
+        case AV_PIX_FMT_BGRA:
+        case AV_PIX_FMT_ABGR:
+            s->blend_slice = blend_slice_rgba_pm;
+            break;
+        case AV_PIX_FMT_GBRAP:
+            s->blend_slice = blend_slice_gbrap_pm;
+            break;
+        default:
+            av_assert0(0);
+            break;
+        }
+        break;
+    }
+
+end:
+    if (ARCH_X86)
+        ff_overlay_init_x86(s, s->format, inlink->format,
+                            s->alpha_format, s->main_has_alpha);
+
+    return 0;
+}
+
+static void sub2video_copy_rect(uint8_t *dst, int dst_linesize, int w, int h,
+                                AVSubtitleRect *r)
+{
+    uint32_t *pal, *dst2;
+    uint8_t *src, *src2;
+    int x, y;
+
+    // hack:
+    ////if (r->y > 600)
+    ////    r->y = r->y - 300;
+
+    if (r->type != SUBTITLE_BITMAP) {
+        av_log(NULL, AV_LOG_WARNING, "sub2video: non-bitmap subtitle\n");
+        return;
+    }
+    if (r->x < 0 || r->x + r->w > w || r->y < 0 || r->y + r->h > h) {
+        av_log(NULL, AV_LOG_WARNING, "sub2video: rectangle (%d %d %d %d) overflowing %d %d\n",
+            r->x, r->y, r->w, r->h, w, h
+        );
+        return;
+    }
+
+    dst += r->y * dst_linesize + r->x * 4;
+    src = r->data[0];
+    pal = (uint32_t *)r->data[1];
+    for (y = 0; y < r->h; y++) {
+        dst2 = (uint32_t *)dst;
+        src2 = src;
+        for (x = 0; x < r->w; x++)
+            *(dst2++) = pal[*(src2++)];
+        dst += dst_linesize;
+        src += r->linesize[0];
+    }
+}
+
+static int do_blend(FFFrameSync *fs)
+{
+    AVFilterContext *ctx = fs->parent;
+    AVFrame *mainpic, *second;
+    OverlaySubsContext *s = ctx->priv;
+    AVFilterLink *inlink = ctx->inputs[0];
+    int ret;
+
+    ret = ff_framesync_dualinput_get_writable(fs, &mainpic, &second);
+    if (ret < 0)
+        return ret;
+    if (!second)
+        return ff_filter_frame(ctx->outputs[0], mainpic);
+
+    if (s->eval_mode == EVAL_MODE_FRAME) {
+        int64_t pos = mainpic->pkt_pos;
+
+        s->var_values[VAR_N] = inlink->frame_count_out;
+        s->var_values[VAR_T] = mainpic->pts == AV_NOPTS_VALUE ?
+            NAN : mainpic->pts * av_q2d(inlink->time_base);
+        s->var_values[VAR_POS] = pos == -1 ? NAN : pos;
+
+        s->var_values[VAR_OVERLAY_W] = s->var_values[VAR_OW] = second->width;
+        s->var_values[VAR_OVERLAY_H] = s->var_values[VAR_OH] = second->height;
+        s->var_values[VAR_MAIN_W   ] = s->var_values[VAR_MW] = mainpic->width;
+        s->var_values[VAR_MAIN_H   ] = s->var_values[VAR_MH] = mainpic->height;
+
+        eval_expr(ctx);
+        av_log(ctx, AV_LOG_DEBUG, "n:%f t:%f pos:%f x:%f xi:%d y:%f yi:%d\n",
+               s->var_values[VAR_N], s->var_values[VAR_T], s->var_values[VAR_POS],
+               s->var_values[VAR_X], s->x,
+               s->var_values[VAR_Y], s->y);
+    }
+
+    if (second->buf[0] && second->buf[0]->data) {
+
+        AVSubtitle *sub = (AVSubtitle *)second->buf[0]->data;
+        uint8_t *dst;
+        int dst_linesize;
+        int num_rects, i;
+
+        num_rects = sub->num_rects;
+        dst = mainpic->data[0];
+        dst_linesize = mainpic->linesize[0];
+
+        for (i = 0; i < num_rects; i++) {
+            AVSubtitleRect  *sub_rect = sub->rects[i];
+
+            sub_rect->y += s->y;
+            sub_rect->x += s->x;
+
+            sub2video_copy_rect(dst, dst_linesize, mainpic->width, mainpic->height, sub_rect);
+
+            sub_rect->y -= s->y;
+            sub_rect->x -= s->x;
+            continue;
+
+            if (sub_rect->x < 0 || sub_rect->x + sub_rect->w > mainpic->width || sub_rect->y < 0 || sub_rect->y + sub_rect->h > mainpic->height) {
+                av_log(NULL, AV_LOG_WARNING, "sub2video: rectangle (%d %d %d %d) overflowing %d %d\n",
+                    sub_rect->x, sub_rect->y, sub_rect->w, sub_rect->h, mainpic->width, mainpic->height
+                );
+
+                sub_rect->y -= s->y;
+                sub_rect->x -= s->x;
+
+                continue;
+            }
+
+            if (sub_rect->x < mainpic->width  && sub_rect->x + sub_rect->w  >= 0 &&
+                sub_rect->y < mainpic->height && sub_rect->y + sub_rect->h >= 0) {
+                ThreadData td;
+
+                td.dst = mainpic;
+                td.src = sub_rect;
+                ctx->internal->execute(ctx, s->blend_slice, &td, NULL, FFMIN(FFMAX(1, FFMIN3(sub_rect->y + sub_rect->h, FFMIN(sub_rect->h, mainpic->height), mainpic->height - sub_rect->y)),
+                                                                             ff_filter_get_nb_threads(ctx)));
+            }
+
+            sub_rect->y -= s->y;
+            sub_rect->x -= s->x;
+        }
+    }
+
+    return ff_filter_frame(ctx->outputs[0], mainpic);
+}
+
+static av_cold int init(AVFilterContext *ctx)
+{
+    OverlaySubsContext *s = ctx->priv;
+
+    s->fs.on_event = do_blend;
+    return 0;
+}
+
+static int activate(AVFilterContext *ctx)
+{
+    OverlaySubsContext *s = ctx->priv;
+    return ff_framesync_activate(&s->fs);
+}
+
+#define OFFSET(x) offsetof(OverlaySubsContext, x)
+#define FLAGS AV_OPT_FLAG_VIDEO_PARAM|AV_OPT_FLAG_FILTERING_PARAM
+
+static const AVOption overlay_subs_options[] = {
+    { "x", "set the x expression", OFFSET(x_expr), AV_OPT_TYPE_STRING, {.str = "0"}, 0, 0, FLAGS },
+    { "y", "set the y expression", OFFSET(y_expr), AV_OPT_TYPE_STRING, {.str = "0"}, 0, 0, FLAGS },
+    { "eof_action", "Action to take when encountering EOF from secondary input ",
+        OFFSET(fs.opt_eof_action), AV_OPT_TYPE_INT, { .i64 = EOF_ACTION_REPEAT },
+        EOF_ACTION_REPEAT, EOF_ACTION_PASS, .flags = FLAGS, "eof_action" },
+        { "repeat", "Repeat the previous frame.",   0, AV_OPT_TYPE_CONST, { .i64 = EOF_ACTION_REPEAT }, .flags = FLAGS, "eof_action" },
+        { "endall", "End both streams.",            0, AV_OPT_TYPE_CONST, { .i64 = EOF_ACTION_ENDALL }, .flags = FLAGS, "eof_action" },
+        { "pass",   "Pass through the main input.", 0, AV_OPT_TYPE_CONST, { .i64 = EOF_ACTION_PASS },   .flags = FLAGS, "eof_action" },
+    { "eval", "specify when to evaluate expressions", OFFSET(eval_mode), AV_OPT_TYPE_INT, {.i64 = EVAL_MODE_FRAME}, 0, EVAL_MODE_NB-1, FLAGS, "eval" },
+         { "init",  "eval expressions once during initialization", 0, AV_OPT_TYPE_CONST, {.i64=EVAL_MODE_INIT},  .flags = FLAGS, .unit = "eval" },
+         { "frame", "eval expressions per-frame",                  0, AV_OPT_TYPE_CONST, {.i64=EVAL_MODE_FRAME}, .flags = FLAGS, .unit = "eval" },
+    { "shortest", "force termination when the shortest input terminates", OFFSET(fs.opt_shortest), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, FLAGS },
+    { "format", "set output format", OFFSET(format), AV_OPT_TYPE_INT, {.i64=OVERLAY_FORMAT_YUV420}, 0, OVERLAY_FORMAT_NB-1, FLAGS, "format" },
+        { "yuv420", "", 0, AV_OPT_TYPE_CONST, {.i64=OVERLAY_FORMAT_YUV420}, .flags = FLAGS, .unit = "format" },
+        { "yuv420p10", "", 0, AV_OPT_TYPE_CONST, {.i64=OVERLAY_FORMAT_YUV420P10}, .flags = FLAGS, .unit = "format" },
+        { "yuv422", "", 0, AV_OPT_TYPE_CONST, {.i64=OVERLAY_FORMAT_YUV422}, .flags = FLAGS, .unit = "format" },
+        { "yuv422p10", "", 0, AV_OPT_TYPE_CONST, {.i64=OVERLAY_FORMAT_YUV422P10}, .flags = FLAGS, .unit = "format" },
+        { "yuv444", "", 0, AV_OPT_TYPE_CONST, {.i64=OVERLAY_FORMAT_YUV444}, .flags = FLAGS, .unit = "format" },
+        { "rgb",    "", 0, AV_OPT_TYPE_CONST, {.i64=OVERLAY_FORMAT_RGB},    .flags = FLAGS, .unit = "format" },
+        { "gbrp",   "", 0, AV_OPT_TYPE_CONST, {.i64=OVERLAY_FORMAT_GBRP},   .flags = FLAGS, .unit = "format" },
+        { "auto",   "", 0, AV_OPT_TYPE_CONST, {.i64=OVERLAY_FORMAT_AUTO},   .flags = FLAGS, .unit = "format" },
+    { "repeatlast", "repeat overlay of the last overlay frame", OFFSET(fs.opt_repeatlast), AV_OPT_TYPE_BOOL, {.i64=1}, 0, 1, FLAGS },
+    { "alpha", "alpha format", OFFSET(alpha_format), AV_OPT_TYPE_INT, {.i64=0}, 0, 1, FLAGS, "alpha_format" },
+        { "straight",      "", 0, AV_OPT_TYPE_CONST, {.i64=0}, .flags = FLAGS, .unit = "alpha_format" },
+        { "premultiplied", "", 0, AV_OPT_TYPE_CONST, {.i64=1}, .flags = FLAGS, .unit = "alpha_format" },
+    { NULL }
+};
+
+FRAMESYNC_DEFINE_CLASS(overlay_subs, OverlaySubsContext, fs);
+
+static const AVFilterPad avfilter_vf_overlay_inputs[] = {
+    {
+        .name         = "main",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .config_props = config_input_main,
+    },
+    {
+        .name         = "overlay",
+        .type         = AVMEDIA_TYPE_SUBTITLE,
+        .config_props = config_input_overlay,
+    },
+    { NULL }
+};
+
+static const AVFilterPad avfilter_vf_overlay_outputs[] = {
+    {
+        .name          = "default",
+        .type          = AVMEDIA_TYPE_VIDEO,
+        .config_props  = config_output,
+    },
+    { NULL }
+};
+
+const AVFilter ff_vf_overlay_subs = {
+    .name          = "overlay_subs",
+    .description   = NULL_IF_CONFIG_SMALL("Overlay graphical subtitles on top of the input."),
+    .preinit       = overlay_subs_framesync_preinit,
+    .init          = init,
+    .uninit        = uninit,
+    .priv_size     = sizeof(OverlaySubsContext),
+    .priv_class    = &overlay_subs_class,
+    .query_formats = query_formats,
+    .activate      = activate,
+    .process_command = process_command,
+    .inputs        = avfilter_vf_overlay_inputs,
+    .outputs       = avfilter_vf_overlay_outputs,
+    .flags         = AVFILTER_FLAG_SUPPORT_TIMELINE_INTERNAL |
+                     AVFILTER_FLAG_SLICE_THREADS,
+};
diff --git a/libavfilter/vf_overlay_subs.h b/libavfilter/vf_overlay_subs.h
new file mode 100644
index 0000000000..f3d0e0353b
--- /dev/null
+++ b/libavfilter/vf_overlay_subs.h
@@ -0,0 +1,88 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVFILTER_OVERLAY_SUBS_H
+#define AVFILTER_OVERLAY_SUBS_H
+
+#include "libavutil/eval.h"
+#include "libavutil/pixdesc.h"
+#include "framesync.h"
+#include "avfilter.h"
+
+enum var_name {
+    VAR_MAIN_W,    VAR_MW,
+    VAR_MAIN_H,    VAR_MH,
+    VAR_OVERLAY_W, VAR_OW,
+    VAR_OVERLAY_H, VAR_OH,
+    VAR_HSUB,
+    VAR_VSUB,
+    VAR_X,
+    VAR_Y,
+    VAR_N,
+    VAR_POS,
+    VAR_T,
+    VAR_VARS_NB
+};
+
+enum OverlayFormat {
+    OVERLAY_FORMAT_YUV420,
+    OVERLAY_FORMAT_YUV420P10,
+    OVERLAY_FORMAT_YUV422,
+    OVERLAY_FORMAT_YUV422P10,
+    OVERLAY_FORMAT_YUV444,
+    OVERLAY_FORMAT_RGB,
+    OVERLAY_FORMAT_GBRP,
+    OVERLAY_FORMAT_AUTO,
+    OVERLAY_FORMAT_NB
+};
+
+typedef struct OverlaySubsContext {
+    const AVClass *class;
+    int x, y;                   ///< position of overlaid picture
+
+    uint8_t main_is_packed_rgb;
+    uint8_t main_rgba_map[4];
+    uint8_t main_has_alpha;
+    uint8_t overlay_is_packed_rgb;
+    uint8_t overlay_rgba_map[4];
+    uint8_t overlay_has_alpha;
+    int format;                 ///< OverlayFormat
+    int alpha_format;
+    int eval_mode;              ///< EvalMode
+
+    FFFrameSync fs;
+
+    int main_pix_step[4];       ///< steps per pixel for each plane of the main output
+    int overlay_pix_step[4];    ///< steps per pixel for each plane of the overlay
+    int hsub, vsub;             ///< chroma subsampling values
+    const AVPixFmtDescriptor *main_desc; ///< format descriptor for main input
+
+    double var_values[VAR_VARS_NB];
+    char *x_expr, *y_expr;
+
+    AVExpr *x_pexpr, *y_pexpr;
+
+    int (*blend_row[4])(uint8_t *d, uint8_t *da, uint8_t *s, uint8_t *a, int w,
+                        ptrdiff_t alinesize);
+    int (*blend_slice)(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs);
+} OverlaySubsContext;
+
+void ff_overlay_init_x86(OverlaySubsContext *s, int format, int pix_format,
+                         int alpha_format, int main_has_alpha);
+
+#endif /* AVFILTER_OVERLAY_SUBS_H */