diff mbox

[FFmpeg-devel] WIP: subtitles in AVFrame

Message ID 20161102220934.26010-1-u@pkh.me
State New
Headers show

Commit Message

Clément Bœsch Nov. 2, 2016, 10:09 p.m. UTC
---
So this is just a prototype that is starting to work. It needs a split
(and completion of the long TODO list) before review but I wanted to
share it for early feedbacks.

So long story short: AVSubtitle → AVFrame, you have to use the new M:N
API, and it's integrated into lavfi.

Now a bit longer: the AVFrame->extended_data are
AVFrameSubtitleRectangle (declared in lavu/frame). Frames are
refcounted. So far this is not really meaningful as a frame ref will
alloc and copy every rectangle but that's a limitation we will have to
deal with for now but it allows proper integration into the framework.

It actually simplifies a lot the workflow so I'm optimistic about the
end result. Unfortunately, there is a lot of work remaining.

End note: despite my obsession with subtitles, I'd like to remind
everyone that I actually hate working with them, I just believe it's
really important to get that historical burden out of sight. That means,
if anyone wants to help with that, they are extremelly welcome.

Thanks,

TODO:
- what to do about sub2video?
- properly deal with -ss and -t (need strim filter?)
- sub_start_display/sub_end_display needs to be honored
- find a test case for dvbsub as it's likely broken (ffmpeg.c hack is
  removed and should be replaced by a EAGAIN logic in lavc/utils.c)
- make it pass FATE:
  * fix cc/subcc
  * broke various other stuff
- split commit
- Changelog/APIchanges
- proper API doxy
- update lavfi/subtitles?
---
 ffmpeg.c                   | 231 ++++++++++++++++-----------------------------
 ffmpeg_filter.c            |  94 +++++++++++++++++-
 ffmpeg_opt.c               |  33 ++++++-
 libavcodec/avcodec.h       |   9 ++
 libavcodec/utils.c         | 214 +++++++++++++++++++++++++++++++++++++++++
 libavfilter/Makefile       |   3 +
 libavfilter/allfilters.c   |   8 ++
 libavfilter/avfilter.c     |   8 ++
 libavfilter/buffersink.c   |  28 ++++++
 libavfilter/buffersrc.c    |  59 ++++++++++++
 libavfilter/fifo.c         |  31 ++++++
 libavfilter/formats.c      |   4 +
 libavfilter/internal.h     |   8 ++
 libavfilter/sf_snull.c     |  48 ++++++++++
 libavfilter/subtitle.c     |  60 ++++++++++++
 libavfilter/subtitle.h     |  31 ++++++
 libavfilter/vf_subtitles.c |  42 +++++----
 libavformat/assenc.c       |   3 +
 libavutil/frame.c          | 131 +++++++++++++++++++------
 libavutil/frame.h          |  59 ++++++++++--
 tests/ref/fate/sub-charenc |   6 +-
 21 files changed, 895 insertions(+), 215 deletions(-)
 create mode 100644 libavfilter/sf_snull.c
 create mode 100644 libavfilter/subtitle.c
 create mode 100644 libavfilter/subtitle.h

Comments

Carl Eugen Hoyos Nov. 2, 2016, 10:11 p.m. UTC | #1
2016-11-02 23:09 GMT+01:00 Clément Bœsch <u@pkh.me>:

> - find a test case for dvbsub

Please elaborate;-)

Thanks for working on this, Carl Eugen
Clément Bœsch Nov. 2, 2016, 10:23 p.m. UTC | #2
On Wed, Nov 02, 2016 at 11:11:06PM +0100, Carl Eugen Hoyos wrote:
> 2016-11-02 23:09 GMT+01:00 Clément Bœsch <u@pkh.me>:
> 
> > - find a test case for dvbsub
> 
> Please elaborate;-)
> 

I need a FATE test involving dvb transcoding.
Marton Balint Nov. 3, 2016, 12:05 a.m. UTC | #3
On Wed, 2 Nov 2016, Clément Bœsch wrote:

> ---
> So this is just a prototype that is starting to work. It needs a split
> (and completion of the long TODO list) before review but I wanted to
> share it for early feedbacks.

Great, thanks!

> So long story short: AVSubtitle → AVFrame, you have to use the new M:N
> API, and it's integrated into lavfi.
>
> Now a bit longer: the AVFrame->extended_data are
> AVFrameSubtitleRectangle (declared in lavu/frame).

Wouldn't it make sense to simply use video AVFrames here instead of a 
separate struct?

Thanks,
Marton
Nicolas George Nov. 3, 2016, 6:36 p.m. UTC | #4
Le duodi 12 brumaire, an CCXXV, Clement Boesch a écrit :
> ---
> So this is just a prototype that is starting to work. It needs a split
> (and completion of the long TODO list) before review but I wanted to
> share it for early feedbacks.
> 
> So long story short: AVSubtitle → AVFrame, you have to use the new M:N
> API, and it's integrated into lavfi.

Yay!

> Now a bit longer: the AVFrame->extended_data are
> AVFrameSubtitleRectangle (declared in lavu/frame). Frames are
> refcounted. So far this is not really meaningful as a frame ref will
> alloc and copy every rectangle but that's a limitation we will have to
> deal with for now but it allows proper integration into the framework.

Can you briefly explain how you handled the duration and sparseness?

Also, I see you kept the text field a simple string with ASS syntax. Is
it final?

> It actually simplifies a lot the workflow so I'm optimistic about the
> end result. Unfortunately, there is a lot of work remaining.
> 
> End note: despite my obsession with subtitles, I'd like to remind
> everyone that I actually hate working with them, I just believe it's
> really important to get that historical burden out of sight. That means,
> if anyone wants to help with that, they are extremelly welcome.
> 
> Thanks,
> 
> TODO:
> - what to do about sub2video?

The plan was always to remove it as soon as a proper solution in
libavfilter was implemented and does the same job. Preferably with a
short transitional period for the users, but not if it is too
inconvenient.

> - properly deal with -ss and -t (need strim filter?)
> - sub_start_display/sub_end_display needs to be honored
> - find a test case for dvbsub as it's likely broken (ffmpeg.c hack is
>   removed and should be replaced by a EAGAIN logic in lavc/utils.c)
> - make it pass FATE:
>   * fix cc/subcc
>   * broke various other stuff
> - split commit
> - Changelog/APIchanges
> - proper API doxy
> - update lavfi/subtitles?

Regards,
Clément Bœsch Nov. 7, 2016, 1:38 p.m. UTC | #5
On Thu, Nov 03, 2016 at 01:05:01AM +0100, Marton Balint wrote:
[...]
> > So long story short: AVSubtitle → AVFrame, you have to use the new M:N
> > API, and it's integrated into lavfi.
> > 
> > Now a bit longer: the AVFrame->extended_data are
> > AVFrameSubtitleRectangle (declared in lavu/frame).
> 
> Wouldn't it make sense to simply use video AVFrames here instead of a
> separate struct?
> 

mmh. AVFrame for each rectangle? So we will have AVFrame in AVFrame, and
the need to add fields such as x and y in that AVFrame? I'm not too fond
of this but it could be considered yes
Clément Bœsch Nov. 11, 2016, 11:40 a.m. UTC | #6
(sorry for initially replying in private by mistake)

On Thu, Nov 03, 2016 at 07:36:29PM +0100, Nicolas George wrote:
> Le duodi 12 brumaire, an CCXXV, Clement Boesch a écrit :
> > ---
> > So this is just a prototype that is starting to work. It needs a split
> > (and completion of the long TODO list) before review but I wanted to
> > share it for early feedbacks.
> >
> > So long story short: AVSubtitle → AVFrame, you have to use the new M:N
> > API, and it's integrated into lavfi.
>
> Yay!
>
> > Now a bit longer: the AVFrame->extended_data are
> > AVFrameSubtitleRectangle (declared in lavu/frame). Frames are
> > refcounted. So far this is not really meaningful as a frame ref will
> > alloc and copy every rectangle but that's a limitation we will have to
> > deal with for now but it allows proper integration into the framework.
>
> Can you briefly explain how you handled the duration and sparseness?
>

I didn't. Duration is somehow broken currently for some reason. I did
nothing for sparseness: the reason I added basic support in lavfi is
because it was much simpler to handle at ffmpeg.c level, so it's currently
just a passthrough mechanism.

> Also, I see you kept the text field a simple string with ASS syntax. Is
> it final?
>

I did't like having multiple fields for text based data. If we want to
decode in another form, we can still add an option to print out verbatim
text instead of ASS markup.

> > It actually simplifies a lot the workflow so I'm optimistic about the
> > end result. Unfortunately, there is a lot of work remaining.
> >
> > End note: despite my obsession with subtitles, I'd like to remind
> > everyone that I actually hate working with them, I just believe it's
> > really important to get that historical burden out of sight. That
> > means,
> > if anyone wants to help with that, they are extremelly welcome.
> >
> > Thanks,
> >
> > TODO:
> > - what to do about sub2video?
>
> The plan was always to remove it as soon as a proper solution in
> libavfilter was implemented and does the same job. Preferably with a
> short transitional period for the users, but not if it is too
> inconvenient.
>

Yeah I guess I'll need to write a filter to blend in the final patchset
submission.

> > - properly deal with -ss and -t (need strim filter?)
> > - sub_start_display/sub_end_display needs to be honored
> > - find a test case for dvbsub as it's likely broken (ffmpeg.c hack is
> >   removed and should be replaced by a EAGAIN logic in lavc/utils.c)
> > - make it pass FATE:
> >   * fix cc/subcc
> >   * broke various other stuff
> > - split commit
> > - Changelog/APIchanges
> > - proper API doxy
> > - update lavfi/subtitles?
>
> Regards,
>
> --
>   Nicolas George
Nicolas George Nov. 11, 2016, 11:45 a.m. UTC | #7
Le septidi 17 brumaire, an CCXXV, Clement Boesch a écrit :
> I didn't. Duration is somehow broken currently for some reason. I did
> nothing for sparseness: the reason I added basic support in lavfi is
> because it was much simpler to handle at ffmpeg.c level, so it's currently
> just a passthrough mechanism.

A decision will need to be made pretty soon, though, or it will not help
anything much. Sparseness will make things break badly OOM-killer-style
if someone tries to use subtitles coming from the same multiplexed file
than the corresponding video. Duration affects the visible API, a
decision on it is more or less final. Here are the results of my current
thoughts:

For sparseness, the solution is to use heartbeat frames: if you have a
subtitle event at 10s and the next one at 50s, emit a frame with no
payload and just a timestamp at 10.1, 10.2, ... 49.1.

Whoever is supposed to emit the frame can be decided later. The simplest
idea is to connect sbuffersrc to a master vbuffersrc: when a frame is
added on the master, consider generating heartbeat frames on the slaves.

The code needs to be ready immediately to handle the heartbeat frames.
At least have a way of expressing them: maybe data[0] == NULL? And the
code needs to not segfault on them.

The duration issue is more tricky, because there are so many cases. Here
is a scheme I think should work:

Each subtitle screen decodes into two subtitles frames: one to show it,
one to clear it. The clear frame needs to reference the corresponding
start frame, to allow for overlap.

  For example, the following ASS dialogue:

  Dialogue: 0,0:00:10.00,0:00:15.00,,,,,,,Long dialogue line.
  Dialogue: 0,0:00:12.00,0:00:13.00,,,,,,,Short.

  would decode into:

  pts=10 id=42 text="Long dialogue line."
  pts=12 id=43 text="Short."
  pts=13 id=43 clear
  pts=15 id=42 clear

When the duration is entirely reliable (text files read all at once and
correctly processed), the decoder generates both frames immediately and
keeps the clear frame in a reorder buffer.

When the duration is not entirely reliable, the decoder should generate
the clear frame when it gets the corresponding packet (either the clear
packet or the next start packet). If the duration is known but not
reliable (dvdsub, dvbsub), the decoder should use it as a cap when
waiting for the actual end.

The decoder needs some kind of heartbeat flush API to get the pending
clear frames. We may want an option to disable internal reordering and
get clear frames immediately.

When the duration is not known but not reliable, we may set some kind of
"expiration timestamp" on the start frame, but I am not sure it is
necessary.

Whether the duration is reliable or not is a property of both the codec
and the format. For example, mkvmerge does not de-overlap events when
muxing SRT into Matroska, therefore the duration is not known. On the
other hand, when lavf reads directly from a SRT file, it can de-overlap
easily. I suppose it means AVCodecParameters needs an extra field.

> I did't like having multiple fields for text based data. If we want to
> decode in another form, we can still add an option to print out verbatim
> text instead of ASS markup.

I think we are not talking about the same thing. A long time ago, we
considered replacing the ASS markup with a simple text field with
styling in a separate, non-text, structure. Did you discard that idea?

For CSS-based subtitles, a richer data structure would make it slightly
less hard to preserve the structure of the styling information.

> Yeah I guess I'll need to write a filter to blend in the final patchset
> submission.

Or just a sub->video filter.

Regards,
wm4 Nov. 11, 2016, 11:48 a.m. UTC | #8
On Fri, 11 Nov 2016 12:45:03 +0100
Nicolas George <george@nsup.org> wrote:

> Le septidi 17 brumaire, an CCXXV, Clement Boesch a écrit :
> > I didn't. Duration is somehow broken currently for some reason. I did
> > nothing for sparseness: the reason I added basic support in lavfi is
> > because it was much simpler to handle at ffmpeg.c level, so it's currently
> > just a passthrough mechanism.  
> 
> A decision will need to be made pretty soon, though, or it will not help
> anything much. Sparseness will make things break badly OOM-killer-style
> if someone tries to use subtitles coming from the same multiplexed file
> than the corresponding video. Duration affects the visible API, a
> decision on it is more or less final. Here are the results of my current
> thoughts:
> 
> For sparseness, the solution is to use heartbeat frames: if you have a
> subtitle event at 10s and the next one at 50s, emit a frame with no
> payload and just a timestamp at 10.1, 10.2, ... 49.1.

Sounds like a bad idea.

> 
> Whoever is supposed to emit the frame can be decided later. The simplest
> idea is to connect sbuffersrc to a master vbuffersrc: when a frame is
> added on the master, consider generating heartbeat frames on the slaves.
> 
> The code needs to be ready immediately to handle the heartbeat frames.
> At least have a way of expressing them: maybe data[0] == NULL? And the
> code needs to not segfault on them.
> 
> The duration issue is more tricky, because there are so many cases. Here
> is a scheme I think should work:
> 
> Each subtitle screen decodes into two subtitles frames: one to show it,
> one to clear it. The clear frame needs to reference the corresponding
> start frame, to allow for overlap.
> 
>   For example, the following ASS dialogue:
> 
>   Dialogue: 0,0:00:10.00,0:00:15.00,,,,,,,Long dialogue line.
>   Dialogue: 0,0:00:12.00,0:00:13.00,,,,,,,Short.
> 
>   would decode into:
> 
>   pts=10 id=42 text="Long dialogue line."
>   pts=12 id=43 text="Short."
>   pts=13 id=43 clear
>   pts=15 id=42 clear
> 
> When the duration is entirely reliable (text files read all at once and
> correctly processed), the decoder generates both frames immediately and
> keeps the clear frame in a reorder buffer.
> 
> When the duration is not entirely reliable, the decoder should generate
> the clear frame when it gets the corresponding packet (either the clear
> packet or the next start packet). If the duration is known but not
> reliable (dvdsub, dvbsub), the decoder should use it as a cap when
> waiting for the actual end.
> 
> The decoder needs some kind of heartbeat flush API to get the pending
> clear frames. We may want an option to disable internal reordering and
> get clear frames immediately.
> 
> When the duration is not known but not reliable, we may set some kind of
> "expiration timestamp" on the start frame, but I am not sure it is
> necessary.
> 
> Whether the duration is reliable or not is a property of both the codec
> and the format. For example, mkvmerge does not de-overlap events when
> muxing SRT into Matroska, therefore the duration is not known. On the
> other hand, when lavf reads directly from a SRT file, it can de-overlap
> easily. I suppose it means AVCodecParameters needs an extra field.
> 
> > I did't like having multiple fields for text based data. If we want to
> > decode in another form, we can still add an option to print out verbatim
> > text instead of ASS markup.  
> 
> I think we are not talking about the same thing. A long time ago, we
> considered replacing the ASS markup with a simple text field with
> styling in a separate, non-text, structure. Did you discard that idea?
> 
> For CSS-based subtitles, a richer data structure would make it slightly
> less hard to preserve the structure of the styling information.
> 
> > Yeah I guess I'll need to write a filter to blend in the final patchset
> > submission.  
> 
> Or just a sub->video filter.
> 
> Regards,
>
Nicolas George Nov. 11, 2016, 11:49 a.m. UTC | #9
Le primidi 21 brumaire, an CCXXV, wm4 a écrit :
> Sounds like a bad idea.

Do you have a better one?
wm4 Nov. 11, 2016, 11:59 a.m. UTC | #10
On Fri, 11 Nov 2016 12:49:31 +0100
Nicolas George <george@nsup.org> wrote:

> Le primidi 21 brumaire, an CCXXV, wm4 a écrit :
> > Sounds like a bad idea.  
> 
> Do you have a better one?

No, but I'm confident that you can find one after having thought about
the problem for so long.
Nicolas George Nov. 11, 2016, 12:45 p.m. UTC | #11
Le primidi 21 brumaire, an CCXXV, wm4 a écrit :
> No, but I'm confident that you can find one after having thought about
> the problem for so long.

Well, I have thought about it for long, and not only is this the
solution I have found, but I do not think it is a "bad idea" at all.

And I would really appreciate if in the future you refrained from that
kind of useless empty remark. You can raise practical concerns, ask for
explanations or rationales, of course. But a purely negative reply that
took you all of three minutes in answer to the result of years of design
is just incredibly rude.

Regards,
wm4 Nov. 11, 2016, 2:06 p.m. UTC | #12
On Fri, 11 Nov 2016 13:45:54 +0100
Nicolas George <george@nsup.org> wrote:

> Le primidi 21 brumaire, an CCXXV, wm4 a écrit :
> > No, but I'm confident that you can find one after having thought about
> > the problem for so long.  
> 
> Well, I have thought about it for long, and not only is this the
> solution I have found, but I do not think it is a "bad idea" at all.

OK, let's think about alternative approaches. How about not trying to
let libavfilter do synchronizing subtitles and other streams in all
cases? Why not just send nothing to the subtitle buffer sources if the
subtitles are sparse and there is no new packet? If it's sparse, make
it sparse.

I assume the whole point of this exercise is to prevent excessive
buffering (e.g. not trying to read the next subtitle packet, which
might read most of the file, and necessitate buffering it in memory).
E.g. if you overlay video on subtitles, you'd normally require new
frames for both subtitles and video. If you'd treat subtitles like
video and audio, you'd have to try to read the next subtitle packet to
know, well, whether there's a new subtitle or whether to use the
previous one.

If I understood this correctly, you want to send empty frames every
100ms if the duration of a subtitle is unknown. Why is it 100ms? Does
it make a difference if it's 100ms or a few seconds (or until EOF)
until the subtitle is naturally terminated? Why not 1ms? This seems
like a very "approximate" solution - sure, it works, but it's akin to
using polling and sleep calls in I/O or multithreadeded code.

Maybe a heartbeat on every video frame? What if there's no video
stream? Video can change timestamps and frame count, video could go
sparser un-sparse at certain points.

This related idea of slaving sbuffersrc to a master vbuffersrc. This
approach works on video output, while sparseness is really a problem at
the source (i.e. demuxer). It's questionable how this would work with
subtitles demuxed from separate files (which also might have A/V
streams). It also works on the video output, while the issue of
subtitles with unknown duration is mostly a demuxing issue. What
happens if there's a video->sub filter, how would it send heartbeats?
Would it require a new libavfilter graph syntax for filters that
generate subtitles within the graph, and would it require users to
explicitly specify a "companion" video source?

The whole problem is that it's hard to do determine whether a new
subtitle frame should be available at a certain point inside
libavfilter, which in turn is hard because with how generic libavfilter
wants to be.

It seems to me that not libavfilter should handle this, but the one
which feeds libavfilter. If it feeds a new video frame to it, but no
subtitle frame, it means there is no new subtitle yet due to
sparseness. There is actually no need for weird heartbeat frames. The
libavfilter API user is in the position to know whether the subtitle
demuxer/decoder can produce a new packet/frame. It would be crazy if
the API user had to send heartbeat frames in these situations, and
had to care about how many heartbeats are sent when.

In complex cases (where audio/video/subs are connected in a
non-trivial way, possibly converting to each other at certain points),
the user would have to be careful which buffersinks to read in order
not to trigger excessive readahead. Also the user would possibly have to
disable "automagic" synchronization mechanisms in other parts of
libavfilter.

Even then, you would need to "filter" sparse frames (update their
timestamps, produce new ones, etc). This sounds very complex.

What about filtering subtitles alone? This should be possible?

Why would libavfilter in general be responsible to sync subtitles and
video anyway? It should do that only on filters which have both
subtitle and video inputs or so.

Why does this need decoder enhancements anyway? How about it just uses
the API in its current extent, which applications have handled for
years? Again, special snowflake libavfilter/ffmpeg.c.

Btw. video frames can also be sparse (think of mp4s that contain slide
shows). Are we going to get video heartbeat frames? How are all the
filters going to handle it?

Even for not-sparse video, there seem to be cases (possibly fixed now)
where libavfilter just excessively buffers when using ffmpeg.c. I'm
still fighting such cases with my own libavfilter API use.
(Interestingly, this often involves sparse video.)

(Oh, and I don't claim to have understood the problem in its whole
extent. But I do have a lot of experience with subtitles and
unfortunately also with using libavfilter in a "generic" way.)

> And I would really appreciate if in the future you refrained from that
> kind of useless empty remark. You can raise practical concerns, ask for
> explanations or rationales, of course. But a purely negative reply that
> took you all of three minutes in answer to the result of years of design
> is just incredibly rude.

I find your conduct incredibly rude as well. It's not nice to take
every reply as an offense, instead of e.g. starting a discussion.
It's also not nice to call my remarks "useless".

No, pointing out that a solution is sub-optimal is not rudeness.

Why are you asking me for a better solution? You're the one who wants
subtitles in libavfilter, not me. Thus it's your responsibility to come
up with a good design. If there's no good design, then it's a sure sign
that it's not a good idea to have subtitles in libavfilter. Indeed,
subtitles are incredibly complex, and there are many cases that
somehow need to be handled in a generic way, and it's not necessarily a
good idea to add this complexity to libavfilter, which designed to
handle audio and video data (and even after years of work, isn't that
good at handling audio and video at the same time). It's like trying to
push a square peg through a round hole. I haven't come to the
conclusion yet that this is the case, so hold that thought.

Please try not to reply to every paragraph separately or so. This makes
it hard to follow discussions. In fact, I won't take part in a
discussion of that kind. It wastes so much time because you can get lost
in meaningless details.
Nicolas George Nov. 11, 2016, 2:36 p.m. UTC | #13
Le primidi 21 brumaire, an CCXXV, wm4 a écrit :
> OK, let's think about alternative approaches. How about not trying to
> let libavfilter do synchronizing subtitles and other streams in all
> cases? Why not just send nothing to the subtitle buffer sources if the
> subtitles are sparse and there is no new packet? If it's sparse, make
> it sparse.

Sometimes a filter needs sync. Therefore the framework must be able to
provide it. Your suggestion does not allow that.

> I assume the whole point of this exercise is to prevent excessive
> buffering (e.g. not trying to read the next subtitle packet, which
> might read most of the file, and necessitate buffering it in memory).
> E.g. if you overlay video on subtitles, you'd normally require new
> frames for both subtitles and video. If you'd treat subtitles like
> video and audio, you'd have to try to read the next subtitle packet to
> know, well, whether there's a new subtitle or whether to use the
> previous one.

You understand the issue.

> If I understood this correctly, you want to send empty frames every
> 100ms if the duration of a subtitle is unknown. Why is it 100ms? Does
> it make a difference if it's 100ms or a few seconds (or until EOF)
> until the subtitle is naturally terminated? Why not 1ms? This seems
> like a very "approximate" solution - sure, it works, but it's akin to
> using polling and sleep calls in I/O or multithreadeded code.

That was just an example to help readers understand.

> Maybe a heartbeat on every video frame?

That is exactly the idea, you did not read the mail carefully enough
before judging it "bad idea".

>					  What if there's no video
> stream?

Then there is probably no need to sync.

> This related idea of slaving sbuffersrc to a master vbuffersrc. This
> approach works on video output,

I have no idea why you start talking about output.

>				  while sparseness is really a problem at
> the source (i.e. demuxer). It's questionable how this would work with
> subtitles demuxed from separate files

When the subtitles are demuxed from a file separate from the stream they
need to sync, the next frame is available on demand, and therefore the
heartbeat frames are not needed, nor generated.

>					(which also might have A/V
> streams).

If that happens, it seems like a very strange and pathological use case.
FFmpeg can not perform miracles.

>	    It also works on the video output, while the issue of
> subtitles with unknown duration is mostly a demuxing issue. What

Again, I do not know why you speak about output.

> happens if there's a video->sub filter, how would it send heartbeats?

I do not know, please tell me what it does exactly and I will be able to
answer you.

> Would it require a new libavfilter graph syntax for filters that
> generate subtitles within the graph, and would it require users to
> explicitly specify a "companion" video source?

Probably not.

> The whole problem is that it's hard to do determine whether a new
> subtitle frame should be available at a certain point inside
> libavfilter, which in turn is hard because with how generic libavfilter
> wants to be.

Exactly.

> It seems to me that not libavfilter should handle this, but the one
> which feeds libavfilter.

Its collaboration is mandatory. But the bulk of the work should be done
by the library, not duplicated in all applications.

>			   If it feeds a new video frame to it, but no
> subtitle frame, it means there is no new subtitle yet due to
> sparseness.

Libavfilter can not know when there is "no subtitle frame", only when
there is one. Hence the need to tell it explicitly:

>	      There is actually no need for weird heartbeat frames.

... with a heartbeat frame.

>								    The
> libavfilter API user is in the position to know whether the subtitle
> demuxer/decoder can produce a new packet/frame. It would be crazy if
                                                              ^^^^^
Disparaging judgement without grounds.

> the API user had to send heartbeat frames in these situations, and
> had to care about how many heartbeats are sent when.

Good thing it does not, then.

> In complex cases (where audio/video/subs are connected in a
> non-trivial way, possibly converting to each other at certain points),
> the user would have to be careful which buffersinks to read in order
> not to trigger excessive readahead. Also the user would possibly have to
> disable "automagic" synchronization mechanisms in other parts of
> libavfilter.
> 
> Even then, you would need to "filter" sparse frames (update their
> timestamps, produce new ones, etc). This sounds very complex.

Yes, it is complex. The problem is complex, the solution has little
chance of being simple.

> What about filtering subtitles alone? This should be possible?

Yes.

> Why would libavfilter in general be responsible to sync subtitles and
> video anyway? It should do that only on filters which have both
> subtitle and video inputs or so.

I do not understand why you are trying to make a distinction between
lavfi and individual filters. The sparseness issue is not about where
the code reside but about what information is available or not.

> Why does this need decoder enhancements anyway? How about it just uses
> the API in its current extent, which applications have handled for
> years? Again, special snowflake libavfilter/ffmpeg.c.

Mu.

> Btw. video frames can also be sparse (think of mp4s that contain slide
> shows). Are we going to get video heartbeat frames? How are all the
> filters going to handle it?

Currently they do not, but this is a separate issue.

> Even for not-sparse video, there seem to be cases (possibly fixed now)
> where libavfilter just excessively buffers when using ffmpeg.c. I'm
> still fighting such cases with my own libavfilter API use.
> (Interestingly, this often involves sparse video.)

The case with not-sparse video involve sparse video?

> (Oh, and I don't claim to have understood the problem in its whole
> extent. But I do have a lot of experience with subtitles and
> unfortunately also with using libavfilter in a "generic" way.)
> 
> > And I would really appreciate if in the future you refrained from that
> > kind of useless empty remark. You can raise practical concerns, ask for
> > explanations or rationales, of course. But a purely negative reply that
> > took you all of three minutes in answer to the result of years of design
> > is just incredibly rude.
> 
> I find your conduct incredibly rude as well. It's not nice to take
> every reply as an offense, instead of e.g. starting a discussion.
> It's also not nice to call my remarks "useless".
> 
> No, pointing out that a solution is sub-optimal is not rudeness.

Indeed. Next time, do that. Give arguments, ask question. Refrain from a
disparaging blanked judgement.

> Why are you asking me for a better solution?

Because you are the one who claimed the solution I propose is bad.
Unless the w in your nickname actually means Winston, claiming that a
solution is bad when the is no better one is just a waste of time.

>						You're the one who wants
> subtitles in libavfilter, not me.

You could have noticed that I am far from being the only one.

>				    Thus it's your responsibility to come
> up with a good design.

I did.

>			If there's no good design, then it's a sure sign
> that it's not a good idea to have subtitles in libavfilter. Indeed,

Or it is a sure sign that your aesthetics considerations are skewed.

> subtitles are incredibly complex, and there are many cases that
> somehow need to be handled in a generic way, and it's not necessarily a
> good idea to add this complexity to libavfilter, which designed to
> handle audio and video data (and even after years of work, isn't that
> good at handling audio and video at the same time). It's like trying to
> push a square peg through a round hole. I haven't come to the
> conclusion yet that this is the case, so hold that thought.
> 
> Please try not to reply to every paragraph separately or so. This makes
> it hard to follow discussions. In fact, I won't take part in a
> discussion of that kind. It wastes so much time because you can get lost
> in meaningless details.

This is exactly what I did because this is exactly what needed to be
done for that kind of discussion.
wm4 Nov. 11, 2016, 3:04 p.m. UTC | #14
On Fri, 11 Nov 2016 15:36:04 +0100
Nicolas George <george@nsup.org> wrote:

> > It seems to me that not libavfilter should handle this, but the one
> > which feeds libavfilter.  
> 
> Its collaboration is mandatory. But the bulk of the work should be done
> by the library, not duplicated in all applications.

I'm saying that a good solution is application-specific. It depends on
the exact data flow depending on what kind of filter graph is used.

> >			   If it feeds a new video frame to it, but no
> > subtitle frame, it means there is no new subtitle yet due to
> > sparseness.  
> 
> Libavfilter can not know when there is "no subtitle frame", only when
> there is one. Hence the need to tell it explicitly:
> 
> >	      There is actually no need for weird heartbeat frames.  
> 
> ... with a heartbeat frame.

libavfilter still doesn't know whether there are subtitle frames at
a future timestamps if you send heartbeat frames.

What I'm suggesting is that the API user has to be aware of the
requirements his filter chain might have on subtitle input. Thus, if
the user requests output for which a subtitle frame would have been
required, but it didn't send one, libavfilter assumes that there is
none.

If you want your heartbeat frames to work, they would have to go into
the reverse direction: send the timestamp up to the source filters, let
the user check the timestamp, and then send a heartbeat frame if for
this _specific_ timestamp no subtitle frame will be available. But that
is also complex.

> > Please try not to reply to every paragraph separately or so. This makes
> > it hard to follow discussions. In fact, I won't take part in a
> > discussion of that kind. It wastes so much time because you can get lost
> > in meaningless details.  
> 
> This is exactly what I did because this is exactly what needed to be
> done for that kind of discussion.

I consider this reply very disrespectful. Maybe read the CoC?
Nicolas George Nov. 11, 2016, 3:25 p.m. UTC | #15
Le primidi 21 brumaire, an CCXXV, wm4 a écrit :
> I'm saying that a good solution is application-specific. It depends on
> the exact data flow depending on what kind of filter graph is used.

The library should still try to make things as simple as possible.

> libavfilter still doesn't know whether there are subtitle frames at
> a future timestamps if you send heartbeat frames.

Care to explain that statement?

Apparently you still have not understood how it is supposed to work.
Here is the problem, once and for all.

Consider a subtitle_hardburn filter with a video input and a subtitle
input. The last subtitle segment ended at t=500, the next one starts at
t=1500. A video frames arrives at t=1000. In order to process that
frame, the filter needs to know that there is no subtitle to hardburn at
that time, and for that, it needs the next subtitle frame; thus, it
buffers the video frame and requests a subtitle one.

If the subtitle comes from a separate file, the frame at t=1500 is read,
injected to the filter, and it works.

If the subtitle comes from the same file, trying to read that frame
cause the video at t=1000.1 to be read, then at t=1000.2... until
t=1499.9, thus buffering 5000 video frames before being able to process.

With heartbeat frames, the video frame at t=1000.1 generates one. When
it arrives to the filter, the filter can do its work.

Hence, the heartbeat frames reduced the latency to huge and unbounded to
just one.

> What I'm suggesting is that the API user has to be aware of the
> requirements his filter chain might have on subtitle input. Thus, if
> the user requests output for which a subtitle frame would have been
> required, but it didn't send one, libavfilter assumes that there is
> none.

Therefore, you are trying to turn the request_frame() call into the
equivalent of the heartbeat frames. Sorry, it can not work that way.
Data in libavfilter has to flow forwards, not backwards.

> If you want your heartbeat frames to work,

They work the way I described.

>					     they would have to go into
> the reverse direction: send the timestamp up to the source filters, let
> the user check the timestamp, and then send a heartbeat frame if for
> this _specific_ timestamp no subtitle frame will be available. But that
> is also complex.

Maybe that could work, but I am very dubious because of the big picture.
But even if it does, it would mean a huge change in all the API. I like
my idea better, and at least I know it works.

> I consider this reply very disrespectful.

It is no more disrespectful than trying to dictate how I should answer
in the first place.

>					    Maybe read the CoC?

Please do.
wm4 Nov. 11, 2016, 3:46 p.m. UTC | #16
On Wed,  2 Nov 2016 23:09:34 +0100
Clément Bœsch <u@pkh.me> wrote:

> ---
> So this is just a prototype that is starting to work. It needs a split
> (and completion of the long TODO list) before review but I wanted to
> share it for early feedbacks.
> 
> So long story short: AVSubtitle → AVFrame, you have to use the new M:N
> API, and it's integrated into lavfi.
> 
> Now a bit longer: the AVFrame->extended_data are
> AVFrameSubtitleRectangle (declared in lavu/frame). Frames are
> refcounted. So far this is not really meaningful as a frame ref will
> alloc and copy every rectangle but that's a limitation we will have to
> deal with for now but it allows proper integration into the framework.
> 
> It actually simplifies a lot the workflow so I'm optimistic about the
> end result. Unfortunately, there is a lot of work remaining.
> 
> End note: despite my obsession with subtitles, I'd like to remind
> everyone that I actually hate working with them, I just believe it's
> really important to get that historical burden out of sight. That means,
> if anyone wants to help with that, they are extremelly welcome.
> 
> Thanks,
> 
> TODO:
> - what to do about sub2video?
> - properly deal with -ss and -t (need strim filter?)
> - sub_start_display/sub_end_display needs to be honored
> - find a test case for dvbsub as it's likely broken (ffmpeg.c hack is
>   removed and should be replaced by a EAGAIN logic in lavc/utils.c)
> - make it pass FATE:
>   * fix cc/subcc
>   * broke various other stuff
> - split commit
> - Changelog/APIchanges
> - proper API doxy
> - update lavfi/subtitles?
> ---
>  ffmpeg.c                   | 231 ++++++++++++++++-----------------------------
>  ffmpeg_filter.c            |  94 +++++++++++++++++-
>  ffmpeg_opt.c               |  33 ++++++-
>  libavcodec/avcodec.h       |   9 ++
>  libavcodec/utils.c         | 214 +++++++++++++++++++++++++++++++++++++++++
>  libavfilter/Makefile       |   3 +
>  libavfilter/allfilters.c   |   8 ++
>  libavfilter/avfilter.c     |   8 ++
>  libavfilter/buffersink.c   |  28 ++++++
>  libavfilter/buffersrc.c    |  59 ++++++++++++
>  libavfilter/fifo.c         |  31 ++++++
>  libavfilter/formats.c      |   4 +
>  libavfilter/internal.h     |   8 ++
>  libavfilter/sf_snull.c     |  48 ++++++++++
>  libavfilter/subtitle.c     |  60 ++++++++++++
>  libavfilter/subtitle.h     |  31 ++++++
>  libavfilter/vf_subtitles.c |  42 +++++----
>  libavformat/assenc.c       |   3 +
>  libavutil/frame.c          | 131 +++++++++++++++++++------
>  libavutil/frame.h          |  59 ++++++++++--
>  tests/ref/fate/sub-charenc |   6 +-
>  21 files changed, 895 insertions(+), 215 deletions(-)
>  create mode 100644 libavfilter/sf_snull.c
>  create mode 100644 libavfilter/subtitle.c
>  create mode 100644 libavfilter/subtitle.h
> 
> diff --git a/ffmpeg.c b/ffmpeg.c
> index 36a921a..8222b8b 100644
> --- a/ffmpeg.c
> +++ b/ffmpeg.c

Can't comment on those.

> diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
> index 211112f..0912bff 100644
> --- a/libavcodec/avcodec.h
> +++ b/libavcodec/avcodec.h
> @@ -4838,7 +4838,10 @@ int avcodec_decode_video2(AVCodecContext *avctx, AVFrame *picture,
>   *                 must be freed with avsubtitle_free if *got_sub_ptr is set.
>   * @param[in,out] got_sub_ptr Zero if no subtitle could be decompressed, otherwise, it is nonzero.
>   * @param[in] avpkt The input AVPacket containing the input buffer.
> + *
> + * @deprecated Use avcodec_send_packet() and avcodec_receive_frame().
>   */
> +attribute_deprecated
>  int avcodec_decode_subtitle2(AVCodecContext *avctx, AVSubtitle *sub,
>                              int *got_sub_ptr,
>                              AVPacket *avpkt);
> @@ -5322,6 +5325,12 @@ attribute_deprecated
>  int avcodec_encode_video2(AVCodecContext *avctx, AVPacket *avpkt,
>                            const AVFrame *frame, int *got_packet_ptr);
>  
> +/**
> + * Encode a subtitle.
> + *
> + * @deprecated use avcodec_send_frame()/avcodec_receive_packet() instead
> + */
> +attribute_deprecated
>  int avcodec_encode_subtitle(AVCodecContext *avctx, uint8_t *buf, int buf_size,
>                              const AVSubtitle *sub);
>  
> diff --git a/libavcodec/utils.c b/libavcodec/utils.c
> index 87de15f..add7edf 100644
> --- a/libavcodec/utils.c
> +++ b/libavcodec/utils.c
> @@ -2050,6 +2050,86 @@ int avcodec_encode_subtitle(AVCodecContext *avctx, uint8_t *buf, int buf_size,
>      return ret;
>  }
>  
> +// TODO: delete this compatibility code when all subtitles encoders moved
> +// to send_frame
> +static int encode_subtitle_frame(AVCodecContext *avctx,
> +                                 AVPacket *avpkt,
> +                                 const AVFrame *frame,
> +                                 int *got_packet_ptr)
> +{
> +    int i, ret;
> +
> +FF_DISABLE_DEPRECATION_WARNINGS
> +    AVPacket tmp_pkt;
> +    AVSubtitle subtitle;
> +
> +    get_subtitle_defaults(&subtitle);
> +
> +    /* Allocate (once) a temporary large output subtitle buffer */
> +    if (!avctx->internal->byte_buffer) {
> +        const int subtitle_out_max_size = 1024 * 1024;
> +        uint8_t *subtitle_out = av_malloc(subtitle_out_max_size);
> +        if (!subtitle_out)
> +            return AVERROR(ENOMEM);
> +
> +        avctx->internal->byte_buffer      = subtitle_out;
> +        avctx->internal->byte_buffer_size = subtitle_out_max_size;
> +    }

This is used for encoding below... does the old API really not have a
better way for this?

I would have thought 1MB is a bit small, but then again I looked at
ffmpeg.c, and it's using the same fixed size buffer.

> +
> +    /* Craft an AVSubtitle from the AVFrame */
> +    subtitle.format = frame->format == AV_PIX_FMT_NONE;
> +    subtitle.rects  = av_mallocz_array(frame->sub_nb_rects, sizeof(*subtitle.rects));
> +    if (!subtitle.rects)
> +        return AVERROR(ENOMEM);
> +    subtitle.num_rects = frame->sub_nb_rects;
> +    subtitle.pts = frame->pts;
> +
> +    for (i = 0; i < frame->sub_nb_rects; i++) {
> +        const AVFrameSubtitleRectangle *src_rect = (AVFrameSubtitleRectangle *)frame->extended_data[i];
> +        AVSubtitleRect *dst_rect;
> +
> +        dst_rect = av_mallocz(sizeof(*subtitle.rects[i]));
> +        if (!dst_rect)
> +            return AVERROR(ENOMEM);

Leaks at least subtitle.rects (if you care...).

> +        subtitle.rects[i] = dst_rect;
> +
> +        if (subtitle.format) {
> +            dst_rect->type = SUBTITLE_ASS;
> +            dst_rect->ass  = src_rect->text;
> +        } else {
> +            dst_rect->type = SUBTITLE_BITMAP;
> +            dst_rect->x = src_rect->x;
> +            dst_rect->y = src_rect->x;
> +            dst_rect->w = src_rect->w;
> +            dst_rect->h = src_rect->h;
> +            memcpy(dst_rect->data,     src_rect->data,     sizeof(dst_rect->data));
> +            memcpy(dst_rect->linesize, src_rect->linesize, sizeof(dst_rect->linesize));
> +        }
> +        dst_rect->flags = src_rect->flags;
> +    }
> +
> +    /* Send it to the old API */
> +    ret = avcodec_encode_subtitle(avctx,
> +                                  avctx->internal->byte_buffer,
> +                                  avctx->internal->byte_buffer_size,
> +                                  &subtitle);
> +    if (ret < 0)
> +        return ret;
> +
> +    /* Wrap the encoded buffer into a ref-counted AVPacket */
> +    av_init_packet(&tmp_pkt);
> +    tmp_pkt.data = avctx->internal->byte_buffer;
> +    tmp_pkt.size = ret;

> +    av_packet_ref(avpkt, &tmp_pkt);

Can fail.

> +    avpkt->pts = frame->pts;
> +    avpkt->dts = frame->pkt_dts;
> +    avpkt->duration = frame->pkt_duration;
> +FF_ENABLE_DEPRECATION_WARNINGS
> +
> +    *got_packet_ptr = 1;
> +    return ret;
> +}
> +
>  /**
>   * Attempt to guess proper monotonic timestamps for decoded video frames
>   * which might have incorrect times. Input timestamps may wrap around, in
> @@ -2764,6 +2844,134 @@ void avsubtitle_free(AVSubtitle *sub)
>      memset(sub, 0, sizeof(AVSubtitle));
>  }
>  
> +static int decode_subtitle_frame(AVCodecContext *avctx,
> +                                 AVFrame *frame,
> +                                 int *got_frame_ptr,
> +                                 const AVPacket *avpkt)
> +{
> +    int i, ret = 0;
> +
> +    *got_frame_ptr = 0;
> +
> +    if (!avctx->codec)
> +        return AVERROR(EINVAL);
> +
> +    if (!avpkt->data && avpkt->size) {
> +        av_log(avctx, AV_LOG_ERROR, "invalid packet: NULL data, size != 0\n");
> +        return AVERROR(EINVAL);
> +    }

didn't vobsubs have 0-sized packets at some point? Also, I think the
caller should check this (avcodec_send_packet).

> +
> +    av_assert0(avctx->codec->type == AVMEDIA_TYPE_SUBTITLE);
> +
> +    av_frame_unref(frame);

It's already guaranteed to be unreffed.

> +
> +    if ((avctx->codec->capabilities & AV_CODEC_CAP_DELAY) || avpkt->size) {
> +        AVPacket pkt_recoded;
> +        AVPacket tmp = *avpkt;
> +        int did_split = av_packet_split_side_data(&tmp);
> +
> +        if (did_split) {
> +            /* FFMIN() prevents overflow in case the packet wasn't allocated with
> +             * proper padding.
> +             * If the side data is smaller than the buffer padding size, the
> +             * remaining bytes should have already been filled with zeros by the
> +             * original packet allocation anyway. */
> +            memset(tmp.data + tmp.size, 0,
> +                   FFMIN(avpkt->size - tmp.size, AV_INPUT_BUFFER_PADDING_SIZE));
> +        }
> +        pkt_recoded = tmp;
> +
> +        ret = recode_subtitle(avctx, &pkt_recoded, &tmp);
> +        if (ret >= 0) {

IMO early exit would be nicer...

This is mostly the old code copy&pasted, I assume.

> +            int err_ret;
> +            AVSubtitle sub = {0};
> +
> +            avctx->internal->pkt = &pkt_recoded;
> +
> +            ret = avctx->codec->decode(avctx, &sub, got_frame_ptr, &pkt_recoded);
> +            av_assert1((ret >= 0) >= !!*got_frame_ptr &&
> +                       !!*got_frame_ptr >= !!sub.num_rects);
> +
> +            for (i = 0; i < sub.num_rects; i++) {
> +                if (sub.rects[i]->ass && !utf8_check(sub.rects[i]->ass)) {
> +                    av_log(avctx, AV_LOG_ERROR,
> +                           "Invalid UTF-8 in decoded subtitles text; "
> +                           "maybe missing -sub_charenc option\n");
> +                    avsubtitle_free(&sub);
> +                    return AVERROR_INVALIDDATA;
> +                }
> +            }

Hate.

> +
> +            frame->pts                   =
> +            frame->best_effort_timestamp = avpkt->pts;
> +            frame->pkt_duration          = avpkt->duration;
> +            frame->pkt_pos               = avpkt->pos;
> +            frame->sub_nb_rects          = sub.num_rects;
> +
> +            if (avctx->codec_descriptor->props & AV_CODEC_PROP_BITMAP_SUB)
> +                frame->format = AV_PIX_FMT_PAL8;
> +            else if (avctx->codec_descriptor->props & AV_CODEC_PROP_TEXT_SUB)
> +                frame->format = AV_PIX_FMT_NONE;

AFAIK these are just hints, not the definitive format. Why not use
sub.format?

> +
> +            frame->sub_start_display = sub.start_display_time ? av_rescale_q(sub.start_display_time, av_make_q(1, 1000), AV_TIME_BASE_Q)
> +                                                              : AV_NOPTS_VALUE;
> +            frame->sub_end_display   = sub.end_display_time   ? av_rescale_q(sub.end_display_time,   av_make_q(1, 1000), AV_TIME_BASE_Q)
> +                                                              : AV_NOPTS_VALUE;

end_display_time can be UINT32_MAX for "infinite". I think we should
clean this up here, and differentiate between "0", "unknown", and
"until next event".

> +            /* Allocate sub_nb_rects AVFrameSubtitleRectangle */
> +            err_ret = av_frame_get_buffer(frame, 0);
> +            if (err_ret < 0) {
> +                avsubtitle_free(&sub);
> +                return err_ret;
> +            }
> +
> +            /* Transfer data from AVSubtitleRect to AVFrameSubtitleRectangle */
> +            for (i = 0; i < sub.num_rects; i++) {
> +                AVSubtitleRect *src_rect = sub.rects[i];
> +                AVFrameSubtitleRectangle *dst_rect = (AVFrameSubtitleRectangle *)frame->extended_data[i];
> +
> +                if (frame->format == AV_PIX_FMT_NONE) {
> +                    dst_rect->text = src_rect->ass;
> +                } else {
> +                    dst_rect->x = src_rect->x;
> +                    dst_rect->y = src_rect->y;
> +                    dst_rect->w = src_rect->w;
> +                    dst_rect->h = src_rect->h;
> +                    memcpy(dst_rect->data,     src_rect->data,     sizeof(src_rect->data));
> +                    memcpy(dst_rect->linesize, src_rect->linesize, sizeof(src_rect->linesize));
> +                }
> +                dst_rect->flags = src_rect->flags;
> +
> +                // we free and make sure it is set to NULL so the data inside
> +                // is not freed after destructing the AVSubtitle
> +                av_freep(&sub.rects[i]);
> +            }
> +            sub.num_rects = 0;
> +            avsubtitle_free(&sub);
> +
> +            if (tmp.data != pkt_recoded.data) { // did we recode?
> +                /* prevent from destroying side data from original packet */
> +                pkt_recoded.side_data = NULL;
> +                pkt_recoded.side_data_elems = 0;
> +
> +                av_packet_unref(&pkt_recoded);
> +            }
> +            avctx->internal->pkt = NULL;
> +        }
> +
> +        if (did_split) {
> +            av_packet_free_side_data(&tmp);

This leaks if the code above returns due to an error, I think?

> +            if(ret == tmp.size)
> +                ret = avpkt->size;
> +        }
> +
> +        if (*got_frame_ptr)
> +            avctx->frame_number++;
> +    }
> +
> +    return ret;
> +}
> +
>  static int do_decode(AVCodecContext *avctx, AVPacket *pkt)
>  {
>      int got_frame;
> @@ -2792,6 +3000,9 @@ static int do_decode(AVCodecContext *avctx, AVPacket *pkt)
>      } else if (avctx->codec_type == AVMEDIA_TYPE_AUDIO) {
>          ret = avcodec_decode_audio4(avctx, avctx->internal->buffer_frame,
>                                      &got_frame, pkt);
> +    } else if (avctx->codec_type == AVMEDIA_TYPE_SUBTITLE) {
> +        ret = decode_subtitle_frame(avctx, avctx->internal->buffer_frame,
> +                                    &got_frame, pkt);
>      } else {
>          ret = AVERROR(EINVAL);
>      }
> @@ -2941,6 +3152,9 @@ static int do_encode(AVCodecContext *avctx, const AVFrame *frame, int *got_packe
>      } else if (avctx->codec_type == AVMEDIA_TYPE_AUDIO) {
>          ret = avcodec_encode_audio2(avctx, avctx->internal->buffer_pkt,
>                                      frame, got_packet);
> +    } else if (avctx->codec_type == AVMEDIA_TYPE_SUBTITLE) {
> +        ret = encode_subtitle_frame(avctx, avctx->internal->buffer_pkt,
> +                                    frame, got_packet);
>      } else {
>          ret = AVERROR(EINVAL);
>      }
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index 7ed4696..5972b27 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -329,6 +329,9 @@ OBJS-$(CONFIG_YUVTESTSRC_FILTER)             += vsrc_testsrc.o
>  
>  OBJS-$(CONFIG_NULLSINK_FILTER)               += vsink_nullsink.o
>  
> +# subtitle filters
> +OBJS-$(CONFIG_SNULL_FILTER)                  += sf_snull.o
> +
>  # multimedia filters
>  OBJS-$(CONFIG_ADRAWGRAPH_FILTER)             += f_drawgraph.o
>  OBJS-$(CONFIG_AHISTOGRAM_FILTER)             += avf_ahistogram.o
> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> index 82a65ee..e5d20cd 100644
> --- a/libavfilter/allfilters.c
> +++ b/libavfilter/allfilters.c
> @@ -45,6 +45,7 @@ void avfilter_register_all(void)
>          return;
>      initialized = 1;
>  
> +    /* audio filters */
>      REGISTER_FILTER(ABENCH,         abench,         af);
>      REGISTER_FILTER(ACOMPRESSOR,    acompressor,    af);
>      REGISTER_FILTER(ACROSSFADE,     acrossfade,     af);
> @@ -138,6 +139,7 @@ void avfilter_register_all(void)
>  
>      REGISTER_FILTER(ANULLSINK,      anullsink,      asink);
>  
> +    /* video filters */
>      REGISTER_FILTER(ALPHAEXTRACT,   alphaextract,   vf);
>      REGISTER_FILTER(ALPHAMERGE,     alphamerge,     vf);
>      REGISTER_FILTER(ASS,            ass,            vf);
> @@ -345,6 +347,9 @@ void avfilter_register_all(void)
>  
>      REGISTER_FILTER(NULLSINK,       nullsink,       vsink);
>  
> +    /* subtitle filters */
> +    REGISTER_FILTER(SNULL,          snull,          sf);
> +
>      /* multimedia filters */
>      REGISTER_FILTER(ADRAWGRAPH,     adrawgraph,     avf);
>      REGISTER_FILTER(AHISTOGRAM,     ahistogram,     avf);
> @@ -367,10 +372,13 @@ void avfilter_register_all(void)
>      /* those filters are part of public or internal API => registered
>       * unconditionally */
>      REGISTER_FILTER_UNCONDITIONAL(asrc_abuffer);
> +    REGISTER_FILTER_UNCONDITIONAL(ssrc_sbuffer);
>      REGISTER_FILTER_UNCONDITIONAL(vsrc_buffer);
>      REGISTER_FILTER_UNCONDITIONAL(asink_abuffer);
> +    REGISTER_FILTER_UNCONDITIONAL(ssink_sbuffer);
>      REGISTER_FILTER_UNCONDITIONAL(vsink_buffer);
>      REGISTER_FILTER_UNCONDITIONAL(af_afifo);
> +    REGISTER_FILTER_UNCONDITIONAL(sf_sfifo);
>      REGISTER_FILTER_UNCONDITIONAL(vf_fifo);
>      ff_opencl_register_filter_kernel_code_all();
>  }
> diff --git a/libavfilter/avfilter.c b/libavfilter/avfilter.c
> index 1d469c3..908ab2c 100644
> --- a/libavfilter/avfilter.c
> +++ b/libavfilter/avfilter.c
> @@ -314,6 +314,12 @@ int avfilter_config_links(AVFilterContext *filter)
>  
>                  if (!link->time_base.num && !link->time_base.den)
>                      link->time_base = (AVRational) {1, link->sample_rate};
> +                break;
> +
> +            case AVMEDIA_TYPE_SUBTITLE:
> +                if (!link->time_base.num && !link->time_base.den)
> +                    link->time_base = inlink ? inlink->time_base : AV_TIME_BASE_Q;
> +                break;
>              }
>  
>              if (link->src->nb_inputs && link->src->inputs[0]->hw_frames_ctx &&
> @@ -1201,6 +1207,8 @@ int ff_filter_frame(AVFilterLink *link, AVFrame *frame)
>              av_assert1(frame->width               == link->w);
>              av_assert1(frame->height               == link->h);
>          }
> +    } else if (link->type == AVMEDIA_TYPE_SUBTITLE) {
> +        // TODO?
>      } else {
>          if (frame->format != link->format) {
>              av_log(link->dst, AV_LOG_ERROR, "Format change is not supported\n");
> diff --git a/libavfilter/buffersink.c b/libavfilter/buffersink.c
> index 2feb56d..f38fb1a 100644
> --- a/libavfilter/buffersink.c
> +++ b/libavfilter/buffersink.c
> @@ -388,6 +388,11 @@ static int asink_query_formats(AVFilterContext *ctx)
>      return 0;
>  }
>  
> +static av_cold int ssink_init(AVFilterContext *ctx, void *opaque)
> +{
> +    return common_init(ctx);
> +}
> +
>  #define OFFSET(x) offsetof(BufferSinkContext, x)
>  #define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
>  static const AVOption buffersink_options[] = {
> @@ -405,9 +410,11 @@ static const AVOption abuffersink_options[] = {
>      { NULL },
>  };
>  #undef FLAGS
> +static const AVOption sbuffersink_options[] = {NULL};
>  
>  AVFILTER_DEFINE_CLASS(buffersink);
>  AVFILTER_DEFINE_CLASS(abuffersink);
> +AVFILTER_DEFINE_CLASS(sbuffersink);
>  
>  static const AVFilterPad avfilter_vsink_buffer_inputs[] = {
>      {
> @@ -452,3 +459,24 @@ AVFilter ff_asink_abuffer = {
>      .inputs      = avfilter_asink_abuffer_inputs,
>      .outputs     = NULL,
>  };
> +
> +static const AVFilterPad avfilter_ssink_sbuffer_inputs[] = {
> +    {
> +        .name         = "default",
> +        .type         = AVMEDIA_TYPE_SUBTITLE,
> +        .filter_frame = filter_frame,
> +    },
> +    { NULL }
> +};
> +
> +AVFilter ff_ssink_sbuffer = {
> +    .name          = "sbuffersink",
> +    .description   = NULL_IF_CONFIG_SMALL("Buffer subtitle frames, and make them available to the end of the filter graph."),
> +    .priv_class    = &sbuffersink_class,
> +    .priv_size     = sizeof(BufferSinkContext),
> +    .init_opaque   = ssink_init,
> +    .uninit        = uninit,
> +    //.query_formats = ssink_query_formats,
> +    .inputs        = avfilter_ssink_sbuffer_inputs,
> +    .outputs       = NULL,
> +};
> diff --git a/libavfilter/buffersrc.c b/libavfilter/buffersrc.c
> index 9294811..c20b7b2 100644
> --- a/libavfilter/buffersrc.c
> +++ b/libavfilter/buffersrc.c
> @@ -63,6 +63,9 @@ typedef struct BufferSourceContext {
>      uint64_t channel_layout;
>      char    *channel_layout_str;
>  
> +    /* subtitle only */
> +    int sub_format;
> +
>      int got_format_from_params;
>      int eof;
>  } BufferSourceContext;
> @@ -128,6 +131,8 @@ int av_buffersrc_parameters_set(AVFilterContext *ctx, AVBufferSrcParameters *par
>          if (param->channel_layout)
>              s->channel_layout = param->channel_layout;
>          break;
> +    case AVMEDIA_TYPE_SUBTITLE:
> +        break;
>      default:
>          return AVERROR_BUG;
>      }
> @@ -204,6 +209,8 @@ static int av_buffersrc_add_frame_internal(AVFilterContext *ctx,
>          CHECK_AUDIO_PARAM_CHANGE(ctx, s, frame->sample_rate, frame->channel_layout,
>                                   av_frame_get_channels(frame), frame->format);
>          break;
> +    case AVMEDIA_TYPE_SUBTITLE:
> +        break;
>      default:
>          return AVERROR(EINVAL);
>      }
> @@ -271,6 +278,7 @@ unsigned av_buffersrc_get_nb_failed_requests(AVFilterContext *buffer_src)
>  #define OFFSET(x) offsetof(BufferSourceContext, x)
>  #define A AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_AUDIO_PARAM
>  #define V AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
> +#define S AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_SUBTITLE_PARAM
>  
>  static const AVOption buffer_options[] = {
>      { "width",         NULL,                     OFFSET(w),                AV_OPT_TYPE_INT,      { .i64 = 0 }, 0, INT_MAX, V },
> @@ -306,6 +314,17 @@ static const AVOption abuffer_options[] = {
>  
>  AVFILTER_DEFINE_CLASS(abuffer);
>  
> +static const AVOption sbuffer_options[] = {
> +    { "time_base",      NULL, OFFSET(time_base),  AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, 0, INT_MAX, S },
> +    { "format",         NULL, OFFSET(sub_format), AV_OPT_TYPE_INT,      { .i64 = -1 }, -1, 1, S, "format" },
> +        { "unspecified", NULL, 0, AV_OPT_TYPE_CONST, {.i64=-1}, INT_MIN, INT_MAX, S, "format" },
> +        { "bitmap",      NULL, 0, AV_OPT_TYPE_CONST, {.i64=0},  INT_MIN, INT_MAX, S, "format" },
> +        { "text",        NULL, 0, AV_OPT_TYPE_CONST, {.i64=1},  INT_MIN, INT_MAX, S, "format" },

Implies no mixed subs. I'm fine with that, but might be restrictive.

> +    { NULL }
> +};
> +
> +AVFILTER_DEFINE_CLASS(sbuffer);
> +
>  static av_cold int init_audio(AVFilterContext *ctx)
>  {
>      BufferSourceContext *s = ctx->priv;
> @@ -397,6 +416,10 @@ static int query_formats(AVFilterContext *ctx)
>          if ((ret = ff_set_common_channel_layouts(ctx, channel_layouts)) < 0)
>              return ret;
>          break;
> +    case AVMEDIA_TYPE_SUBTITLE:
> +        if ((ret = ff_add_format(&formats, c->sub_format)) < 0)
> +            return ret;
> +        break;
>      default:
>          return AVERROR(EINVAL);
>      }
> @@ -424,6 +447,8 @@ static int config_props(AVFilterLink *link)
>          if (!c->channel_layout)
>              c->channel_layout = link->channel_layout;
>          break;
> +    case AVMEDIA_TYPE_SUBTITLE:
> +        break;
>      default:
>          return AVERROR(EINVAL);
>      }
> @@ -461,6 +486,15 @@ static int poll_frame(AVFilterLink *link)
>      return size/sizeof(AVFrame*);
>  }
>  
> +static av_cold int init_subtitle(AVFilterContext *ctx)
> +{
> +    BufferSourceContext *c = ctx->priv;
> +
> +    if (!(c->fifo = av_fifo_alloc(sizeof(AVFrame*))))
> +        return AVERROR(ENOMEM);
> +    return 0;
> +}
> +
>  static const AVFilterPad avfilter_vsrc_buffer_outputs[] = {
>      {
>          .name          = "default",
> @@ -510,3 +544,28 @@ AVFilter ff_asrc_abuffer = {
>      .outputs   = avfilter_asrc_abuffer_outputs,
>      .priv_class = &abuffer_class,
>  };
> +
> +static const AVFilterPad avfilter_ssrc_sbuffer_outputs[] = {
> +    {
> +        .name          = "default",
> +        .type          = AVMEDIA_TYPE_SUBTITLE,
> +        .request_frame = request_frame,
> +        .poll_frame    = poll_frame,
> +        .config_props  = config_props,
> +    },
> +    { NULL }
> +};
> +
> +AVFilter ff_ssrc_sbuffer = {
> +    .name          = "sbuffer",
> +    .description   = NULL_IF_CONFIG_SMALL("Buffer subtitle frames, and make them accessible to the filterchain."),
> +    .priv_size     = sizeof(BufferSourceContext),
> +    .query_formats = query_formats,
> +
> +    .init          = init_subtitle,
> +    .uninit        = uninit,
> +
> +    .inputs        = NULL,
> +    .outputs       = avfilter_ssrc_sbuffer_outputs,
> +    .priv_class    = &sbuffer_class,
> +};
> diff --git a/libavfilter/fifo.c b/libavfilter/fifo.c
> index abfbba1..feda5bd 100644
> --- a/libavfilter/fifo.c
> +++ b/libavfilter/fifo.c
> @@ -313,3 +313,34 @@ AVFilter ff_af_afifo = {
>      .inputs    = avfilter_af_afifo_inputs,
>      .outputs   = avfilter_af_afifo_outputs,
>  };
> +
> +static const AVFilterPad avfilter_sf_sfifo_inputs[] = {
> +    {
> +        .name             = "default",
> +        .type             = AVMEDIA_TYPE_SUBTITLE,
> +        .filter_frame     = add_to_queue,
> +    },
> +    { NULL }
> +};
> +
> +static const AVFilterPad avfilter_sf_sfifo_outputs[] = {
> +    {
> +        .name          = "default",
> +        .type          = AVMEDIA_TYPE_SUBTITLE,
> +        .request_frame = request_frame,
> +    },
> +    { NULL }
> +};
> +
> +AVFilter ff_sf_sfifo = {
> +    .name        = "sfifo",
> +    .description = NULL_IF_CONFIG_SMALL("Buffer input subtitles and send them when they are requested."),
> +
> +    .init      = init,
> +    .uninit    = uninit,
> +
> +    .priv_size = sizeof(FifoContext),
> +
> +    .inputs    = avfilter_sf_sfifo_inputs,
> +    .outputs   = avfilter_sf_sfifo_outputs,
> +};
> diff --git a/libavfilter/formats.c b/libavfilter/formats.c
> index 20f45e3..66bf138 100644
> --- a/libavfilter/formats.c
> +++ b/libavfilter/formats.c
> @@ -364,6 +364,10 @@ AVFilterFormats *ff_all_formats(enum AVMediaType type)
>                  return NULL;
>              fmt++;
>          }
> +    } else if (type == AVMEDIA_TYPE_SUBTITLE) {
> +        if (ff_add_format(&ret, 0 /* bitmap */) < 0 ||
> +            ff_add_format(&ret, 1 /* text */) < 0)
> +            return NULL;

Questionable, is probably preliminary code.

>      }
>  
>      return ret;
> diff --git a/libavfilter/internal.h b/libavfilter/internal.h
> index 3856012..cd8db73 100644
> --- a/libavfilter/internal.h
> +++ b/libavfilter/internal.h
> @@ -142,6 +142,14 @@ struct AVFilterPad {
>       * input pads only.
>       */
>      int needs_writable;
> +
> +    /**
> +     * Callback function to get a subtitle frame. If NULL, the filter system
> +     * will use ff_default_get_subtitle_buffer().
> +     *
> +     * Input subtitle pads only.
> +     */
> +    AVFrame *(*get_subtitle_buffer)(AVFilterLink *link, int nb_rects);
>  };
>  
>  struct AVFilterGraphInternal {
> diff --git a/libavfilter/sf_snull.c b/libavfilter/sf_snull.c
> new file mode 100644
> index 0000000..c20afdd
> --- /dev/null
> +++ b/libavfilter/sf_snull.c
> @@ -0,0 +1,48 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * null subtitle filter
> + */
> +
> +#include "avfilter.h"
> +#include "internal.h"
> +
> +static const AVFilterPad sf_snull_inputs[] = {
> +    {
> +        .name = "default",
> +        .type = AVMEDIA_TYPE_SUBTITLE,
> +    },
> +    { NULL }
> +};
> +
> +static const AVFilterPad sf_snull_outputs[] = {
> +    {
> +        .name = "default",
> +        .type = AVMEDIA_TYPE_SUBTITLE,
> +    },
> +    { NULL }
> +};
> +
> +AVFilter ff_sf_snull = {
> +    .name          = "snull",
> +    .description   = NULL_IF_CONFIG_SMALL("Pass the source unchanged to the output."),
> +    .inputs        = sf_snull_inputs,
> +    .outputs       = sf_snull_outputs,
> +};
> diff --git a/libavfilter/subtitle.c b/libavfilter/subtitle.c
> new file mode 100644
> index 0000000..88e43d5
> --- /dev/null
> +++ b/libavfilter/subtitle.c
> @@ -0,0 +1,60 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include "libavutil/common.h"
> +
> +#include "subtitle.h"
> +#include "avfilter.h"
> +#include "internal.h"
> +
> +AVFrame *ff_null_get_subtitle_buffer(AVFilterLink *link, int nb_rects)
> +{
> +    return ff_get_subtitle_buffer(link->dst->outputs[0], nb_rects);
> +}
> +
> +AVFrame *ff_default_get_subtitle_buffer(AVFilterLink *link, int nb_rects)
> +{
> +    AVFrame *frame = av_frame_alloc();
> +    int ret;
> +
> +    if (!frame)
> +        return NULL;
> +
> +    frame->sub_nb_rects = nb_rects;
> +    frame->format       = link->format;
> +    ret = av_frame_get_buffer(frame, 0);
> +    if (ret < 0) {
> +        av_frame_free(&frame);
> +        return NULL;
> +    }
> +
> +    return frame;
> +}
> +
> +AVFrame *ff_get_subtitle_buffer(AVFilterLink *link, int nb_samples)
> +{
> +    AVFrame *ret = NULL;
> +
> +    if (link->dstpad->get_subtitle_buffer)
> +        ret = link->dstpad->get_audio_buffer(link, nb_samples);
> +
> +    if (!ret)
> +        ret = ff_default_get_audio_buffer(link, nb_samples);
> +
> +    return ret;
> +}
> diff --git a/libavfilter/subtitle.h b/libavfilter/subtitle.h
> new file mode 100644
> index 0000000..1db9ebe
> --- /dev/null
> +++ b/libavfilter/subtitle.h
> @@ -0,0 +1,31 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef AVFILTER_SUBTITLE_H
> +#define AVFILTER_SUBTITLE_H
> +
> +#include "avfilter.h"
> +#include "internal.h"
> +
> +AVFrame *ff_default_get_subtitle_buffer(AVFilterLink *link, int nb_rects);
> +
> +AVFrame *ff_null_get_subtitle_buffer(AVFilterLink *link, int nb_rects);
> +
> +AVFrame *ff_get_subtitle_buffer(AVFilterLink *link, int nb_rects);
> +
> +#endif /* AVFILTER_SUBTITLE_H */
> diff --git a/libavfilter/vf_subtitles.c b/libavfilter/vf_subtitles.c
> index 0f22644..20b3c77 100644
> --- a/libavfilter/vf_subtitles.c
> +++ b/libavfilter/vf_subtitles.c
> @@ -305,6 +305,10 @@ static av_cold int init_subtitles(AVFilterContext *ctx)
>      AVStream *st;
>      AVPacket pkt;
>      AssContext *ass = ctx->priv;
> +    AVFrame *frame = av_frame_alloc();
> +
> +    if (!frame)
> +        return AVERROR(ENOMEM);
>  
>      /* Init libass */
>      ret = init(ctx);
> @@ -447,32 +451,32 @@ static av_cold int init_subtitles(AVFilterContext *ctx)
>      pkt.data = NULL;
>      pkt.size = 0;
>      while (av_read_frame(fmt, &pkt) >= 0) {
> -        int i, got_subtitle;
> -        AVSubtitle sub = {0};
> -
>          if (pkt.stream_index == sid) {
> -            ret = avcodec_decode_subtitle2(dec_ctx, &sub, &got_subtitle, &pkt);
> -            if (ret < 0) {
> -                av_log(ctx, AV_LOG_WARNING, "Error decoding: %s (ignored)\n",
> -                       av_err2str(ret));
> -            } else if (got_subtitle) {
> -                const int64_t start_time = av_rescale_q(sub.pts, AV_TIME_BASE_Q, av_make_q(1, 1000));
> -                const int64_t duration   = sub.end_display_time;
> -                for (i = 0; i < sub.num_rects; i++) {
> -                    char *ass_line = sub.rects[i]->ass;
> +            int i;
> +
> +            ret = avcodec_send_packet(dec_ctx, &pkt);
> +            if (ret < 0)
> +                break;
> +            // XXX probably not correct api usage
> +            ret = avcodec_receive_frame(dec_ctx, frame);
> +            if (ret >= 0) {

I've found that converting code which used to old API to the new API is
very messy. I think that's mostly because the data flow is rather
different.

Correct API usage would be something like:

  while (1) {
    if (av_codec_send_packet() == AVERROR_EAGAIN)
      add_packet_back_to_demuxer_queue()
    while (1) {
      ret = avcodec_receive_frame
      if (ret == AVERROR_EAGAIN)
         break; // new packet needed
      /* process frame */
    }
  }

Plus EOF and error handling.

avcodec_send_packet() returning EAGAIN in this specific situation can
probably be reasonably treated as error instead. Unless the decoder
violates the declared API rules.

> +                // XXX: honor start/end display time
> +                const int64_t start_time = av_rescale_q(frame->pts,          st->time_base, av_make_q(1, 1000));
> +                const int64_t duration   = av_rescale_q(frame->pkt_duration, st->time_base, av_make_q(1, 1000));
> +
> +                for (i = 0; i < frame->sub_nb_rects; i++) {
> +                    AVSubtitleRect *rect = (AVSubtitleRect *)frame->extended_data[i]; // XXX move to lavu
> +                    char *ass_line = rect->ass;
>                      if (!ass_line)
>                          break;
> -                    if (LIBAVCODEC_VERSION_INT < AV_VERSION_INT(57,25,100))
> -                        ass_process_data(ass->track, ass_line, strlen(ass_line));
> -                    else
> -                        ass_process_chunk(ass->track, ass_line, strlen(ass_line),
> -                                          start_time, duration);
> +                    ass_process_chunk(ass->track, ass_line, strlen(ass_line),
> +                                      start_time, duration);
>                  }
>              }
> +            av_packet_unref(&pkt);
>          }
> -        av_packet_unref(&pkt);
> -        avsubtitle_free(&sub);
>      }
> +    av_frame_free(&frame);
>  
>  end:
>      av_dict_free(&codec_opts);
> diff --git a/libavformat/assenc.c b/libavformat/assenc.c
> index d50f18f..97a8de8 100644
> --- a/libavformat/assenc.c
> +++ b/libavformat/assenc.c
> @@ -164,6 +164,9 @@ static int write_packet(AVFormatContext *s, AVPacket *pkt)
>      int hh2, mm2, ss2, ms2;
>      DialogueLine *dialogue = av_mallocz(sizeof(*dialogue));
>  
> +    if (pkt->duration < 0)
> +        end = INT64_MAX;

Wait what.

> +
>      if (!dialogue)
>          return AVERROR(ENOMEM);
>  
> diff --git a/libavutil/frame.c b/libavutil/frame.c
> index 53e6174..8bb3ed2 100644
> --- a/libavutil/frame.c
> +++ b/libavutil/frame.c
> @@ -216,52 +216,33 @@ fail:
>      return AVERROR(ENOMEM);
>  }
>  
> -static int get_audio_buffer(AVFrame *frame, int align)
> +static int get_data_buffer(AVFrame *frame, int n, size_t size)
>  {
> -    int channels;
> -    int planar   = av_sample_fmt_is_planar(frame->format);
> -    int planes;
> -    int ret, i;
> -
> -    if (!frame->channels)
> -        frame->channels = av_get_channel_layout_nb_channels(frame->channel_layout);
> -
> -    channels = frame->channels;
> -    planes = planar ? channels : 1;
> -
> -    CHECK_CHANNELS_CONSISTENCY(frame);
> -    if (!frame->linesize[0]) {
> -        ret = av_samples_get_buffer_size(&frame->linesize[0], channels,
> -                                         frame->nb_samples, frame->format,
> -                                         align);
> -        if (ret < 0)
> -            return ret;
> -    }
> +    int i;
>  
> -    if (planes > AV_NUM_DATA_POINTERS) {
> -        frame->extended_data = av_mallocz_array(planes,
> -                                          sizeof(*frame->extended_data));
> -        frame->extended_buf  = av_mallocz_array((planes - AV_NUM_DATA_POINTERS),
> -                                          sizeof(*frame->extended_buf));
> +    if (n > AV_NUM_DATA_POINTERS) {
> +        frame->extended_data = av_mallocz_array(n, sizeof(*frame->extended_data));
> +        frame->extended_buf  = av_mallocz_array(n - AV_NUM_DATA_POINTERS,
> +                                                sizeof(*frame->extended_buf));
>          if (!frame->extended_data || !frame->extended_buf) {
>              av_freep(&frame->extended_data);
>              av_freep(&frame->extended_buf);
>              return AVERROR(ENOMEM);
>          }
> -        frame->nb_extended_buf = planes - AV_NUM_DATA_POINTERS;
> +        frame->nb_extended_buf = n - AV_NUM_DATA_POINTERS;
>      } else
>          frame->extended_data = frame->data;
>  
> -    for (i = 0; i < FFMIN(planes, AV_NUM_DATA_POINTERS); i++) {
> -        frame->buf[i] = av_buffer_alloc(frame->linesize[0]);
> +    for (i = 0; i < FFMIN(n, AV_NUM_DATA_POINTERS); i++) {
> +        frame->buf[i] = av_buffer_alloc(size);
>          if (!frame->buf[i]) {
>              av_frame_unref(frame);
>              return AVERROR(ENOMEM);
>          }
>          frame->extended_data[i] = frame->data[i] = frame->buf[i]->data;
>      }
> -    for (i = 0; i < planes - AV_NUM_DATA_POINTERS; i++) {
> -        frame->extended_buf[i] = av_buffer_alloc(frame->linesize[0]);
> +    for (i = 0; i < n - AV_NUM_DATA_POINTERS; i++) {
> +        frame->extended_buf[i] = av_buffer_alloc(size);
>          if (!frame->extended_buf[i]) {
>              av_frame_unref(frame);
>              return AVERROR(ENOMEM);
> @@ -269,11 +250,50 @@ static int get_audio_buffer(AVFrame *frame, int align)
>          frame->extended_data[i + AV_NUM_DATA_POINTERS] = frame->extended_buf[i]->data;
>      }
>      return 0;
> +}
> +
> +static int get_audio_buffer(AVFrame *frame, int align)
> +{
> +    int channels;
> +    int planar   = av_sample_fmt_is_planar(frame->format);
> +    int planes;
> +    int ret;
> +
> +    if (!frame->channels)
> +        frame->channels = av_get_channel_layout_nb_channels(frame->channel_layout);
>  
> +    channels = frame->channels;
> +    planes = planar ? channels : 1;
> +
> +    CHECK_CHANNELS_CONSISTENCY(frame);
> +    if (!frame->linesize[0]) {
> +        ret = av_samples_get_buffer_size(&frame->linesize[0], channels,
> +                                         frame->nb_samples, frame->format,
> +                                         align);
> +        if (ret < 0)
> +            return ret;
> +    }
> +
> +    return get_data_buffer(frame, planes, frame->linesize[0]);
> +}
> +
> +static int get_subtitle_buffer(AVFrame *frame)
> +{
> +    int i, ret;
> +
> +    ret = get_data_buffer(frame, frame->sub_nb_rects, sizeof(AVFrameSubtitleRectangle));
> +    if (ret < 0)
> +        return ret;
> +    for (i = 0; i < frame->sub_nb_rects; i++)
> +        memset(frame->extended_data[i], 0, sizeof(AVFrameSubtitleRectangle));
> +    return 0;

I'm noticing it doesn't allocate the actual subtitle bitmap or text
data, which is a difference from audio/video.

>  }
>  
>  int av_frame_get_buffer(AVFrame *frame, int align)
>  {
> +    if (frame->sub_nb_rects)
> +        return get_subtitle_buffer(frame);

No format set is valid for subtitles?

> +
>      if (frame->format < 0)
>          return AVERROR(EINVAL);
>  
> @@ -320,6 +340,8 @@ FF_ENABLE_DEPRECATION_WARNINGS
>      dst->colorspace             = src->colorspace;
>      dst->color_range            = src->color_range;
>      dst->chroma_location        = src->chroma_location;
> +    dst->sub_start_display      = src->sub_start_display;
> +    dst->sub_end_display        = src->sub_end_display;
>  
>      av_dict_copy(&dst->metadata, src->metadata, 0);
>  
> @@ -561,6 +583,7 @@ int av_frame_make_writable(AVFrame *frame)
>      tmp.channels       = frame->channels;
>      tmp.channel_layout = frame->channel_layout;
>      tmp.nb_samples     = frame->nb_samples;
> +    tmp.sub_nb_rects   = frame->sub_nb_rects;
>      ret = av_frame_get_buffer(&tmp, 32);
>      if (ret < 0)
>          return ret;
> @@ -716,11 +739,57 @@ static int frame_copy_audio(AVFrame *dst, const AVFrame *src)
>      return 0;
>  }
>  
> +static int frame_copy_subtitle(AVFrame *dst, const AVFrame *src)
> +{
> +    int i, ret;
> +
> +    for (i = 0; i < dst->sub_nb_rects; i++) {
> +        const AVFrameSubtitleRectangle *src_rect = (const AVFrameSubtitleRectangle *)src->extended_data[i];
> +        AVFrameSubtitleRectangle       *dst_rect = (AVFrameSubtitleRectangle *)dst->extended_data[i];
> +
> +        memset(dst_rect, 0, sizeof(*dst_rect));
> +
> +        dst_rect->flags = src_rect->flags;
> +
> +        if (src->format != AV_PIX_FMT_NONE) {
> +            /* (Alloc and) copy bitmap subtitle */
> +
> +            av_assert1(!src_rect->text);
> +
> +            dst_rect->x = src_rect->x;
> +            dst_rect->y = src_rect->y;
> +            dst_rect->w = src_rect->w;
> +            dst_rect->h = src_rect->h;
> +
> +            ret = av_image_alloc(dst_rect->data, dst_rect->linesize,
> +                                 dst_rect->w, dst_rect->h,
> +                                 dst->format, 0);
> +            if (ret < 0)
> +                return ret;
> +            av_image_copy(dst_rect->data, dst_rect->linesize,
> +                          src_rect->data, src_rect->linesize,
> +                          dst->format, dst_rect->w, dst_rect->h);
> +        } else {
> +            /* Copy text subtitle */
> +
> +            av_assert1(!src_rect->x && !src_rect->y);
> +            av_assert1(!src_rect->w && !src_rect->w);
> +
> +            dst_rect->text = av_strdup(src_rect->text);
> +            if (!dst_rect->text)
> +                return AVERROR(ENOMEM);
> +        }
> +    }
> +    return 0;
> +}
> +
>  int av_frame_copy(AVFrame *dst, const AVFrame *src)
>  {
> -    if (dst->format != src->format || dst->format < 0)
> +    if (dst->format != src->format || (dst->format < 0 && !dst->sub_nb_rects))
>          return AVERROR(EINVAL);
>  
> +    if (dst->sub_nb_rects)
> +        return frame_copy_subtitle(dst, src);
>      if (dst->width > 0 && dst->height > 0)
>          return frame_copy_video(dst, src);
>      else if (dst->nb_samples > 0 && dst->channel_layout)

Maybe I glanced over it, but it doesn't explicitly reset the subtitle
fields. I think this should happen for unknown durations/start times
at least.

> diff --git a/libavutil/frame.h b/libavutil/frame.h
> index 8e51361..d531cfa 100644
> --- a/libavutil/frame.h
> +++ b/libavutil/frame.h
> @@ -148,6 +148,25 @@ typedef struct AVFrameSideData {
>      AVBufferRef *buf;
>  } AVFrameSideData;
>  
> +#define AV_NUM_DATA_POINTERS 8
> +
> +/**
> + * This structure describes decoded subtitle rectangle
> + */
> +typedef struct AVFrameSubtitleRectangle {
> +    int x, y;
> +    int w, h;

Might require clarification that this is strictly for bitmap subtitles
only.

> +
> +    /* image data for bitmap subtitles, in AVFrame.format (AVPixelFormat) */
> +    uint8_t *data[AV_NUM_DATA_POINTERS];
> +    int linesize[AV_NUM_DATA_POINTERS];

Does it define the format somewhere?

> +
> +    /* decoded text for text subtitles, in ASS */
> +    char *text;

I'm fine with this - other would probably argue this should be in
data[0] or so.

> +
> +    int flags;

Which flags (missing doc.)?

> +} AVFrameSubtitleRectangle;

The struct should probably be extendable (so its size is not part of
the ABI).

> +
>  /**
>   * This structure describes decoded (raw) audio or video data.
>   *
> @@ -182,7 +201,6 @@ typedef struct AVFrameSideData {
>   * for AVFrame can be obtained from avcodec_get_frame_class()
>   */
>  typedef struct AVFrame {
> -#define AV_NUM_DATA_POINTERS 8
>      /**
>       * pointer to the picture/channel planes.
>       * This might be different from the first allocated byte
> @@ -224,9 +242,12 @@ typedef struct AVFrame {
>       * For packed audio, there is just one data pointer, and linesize[0]
>       * contains the total size of the buffer for all channels.
>       *
> +     * For subtitles, each pointer corresponds to an AVFrameSubtitleRectangle.
> +     *

It should also mention how many data pointers there are.

Btw. I'm very not much of a fan to have to deal with extended_data. And
the duplication with data. (Does it even document what data[] contains
for subtitles?)

>       * Note: Both data and extended_data should always be set in a valid frame,
> -     * but for planar audio with more channels that can fit in data,
> -     * extended_data must be used in order to access all channels.
> +     * but for subtitles and planar audio with more channels that can fit in
> +     * data, extended_data must be used in order to access all audio channels
> +     * or subtitles rectangles.
>       */
>      uint8_t **extended_data;
>  
> @@ -359,10 +380,10 @@ typedef struct AVFrame {
>       * also be non-NULL for all j < i.
>       *
>       * There may be at most one AVBuffer per data plane, so for video this array
> -     * always contains all the references. For planar audio with more than
> -     * AV_NUM_DATA_POINTERS channels, there may be more buffers than can fit in
> -     * this array. Then the extra AVBufferRef pointers are stored in the
> -     * extended_buf array.
> +     * always contains all the references. For planar audio or subtitles with
> +     * more than AV_NUM_DATA_POINTERS channels or rectangles, there may be more
> +     * buffers than can fit in this array. Then the extra AVBufferRef pointers
> +     * are stored in the extended_buf array.
>       */
>      AVBufferRef *buf[AV_NUM_DATA_POINTERS];

Also to note: this refcounts the AVFrameSubtitleRectangle structs, but
not the contained data. This is not sane.

In particular, I think AVBufferRef refcounted data can't sanely contain
pointer (as they describe byte data). However the hwframescontext stuff
already uses pointers in AVBufferRef'ed structs, so we should at least
make behavior consistent. This means: if the AVBuffer is unreferenced
and destroyed, its destructor needs to free the referenced subtitle
data. I think this is not done yet, but I could be wrong.

>  
> @@ -532,6 +553,29 @@ typedef struct AVFrame {
>       * AVHWFramesContext describing the frame.
>       */
>      AVBufferRef *hw_frames_ctx;
> +
> +    /**
> +     * Number of rectangles available in extended_data
> +     * - encoding: set by user (XXX: no encoding yet)
> +     * - decoding: set by libavcodec, read by user
> +     */
> +    int sub_nb_rects;
> +
> +    /**
> +     * Subtitle start display time in AV_TIME_BASE, relative to pts
> +     * XXX: semantic for <=0?
> +     * - encoding: unused
> +     * - decoding: set by libavcodec, read by user.
> +     */
> +    int64_t sub_start_display;
> +
> +    /**
> +     * Subtitle end display time in AV_TIME_BASE, relative to pts
> +     * XXX: semantic for <=0?
> +     * - encoding: unused
> +     * - decoding: set by libavcodec, read by user.
> +     */
> +    int64_t sub_end_display;

I suggest "0" means literally 0 (i.e. essentially not displayed). -1
could mean "unknown".

What I don't like is that this doesn't define semantics whether a
subtitle frame replaces a previous frame. Normally you would do the
latter if the duration is unknown - but some subtitle formats or
containers can mix them and signal a incorrect too long duration. Then
the next frame shouldn't overlap with the previous frame.

>  } AVFrame;
>  
>  /**
> @@ -641,6 +685,7 @@ void av_frame_move_ref(AVFrame *dst, AVFrame *src);
>   * - format (pixel format for video, sample format for audio)
>   * - width and height for video
>   * - nb_samples and channel_layout for audio
> + * - sub_nb_rects for subtitle
>   *
>   * This function will fill AVFrame.data and AVFrame.buf arrays and, if
>   * necessary, allocate and fill AVFrame.extended_data and AVFrame.extended_buf.
> diff --git a/tests/ref/fate/sub-charenc b/tests/ref/fate/sub-charenc
> index a056cd1..050807b 100644
> --- a/tests/ref/fate/sub-charenc
> +++ b/tests/ref/fate/sub-charenc
> @@ -23,7 +23,7 @@ Dialogue: 0,0:02:31.61,0:02:33.90,Default,,0,0,0,,- А ти как си?\N- До
>  Dialogue: 0,0:02:40.16,0:02:42.79,Default,,0,0,0,,Монахът Дзенг каза,\Nче си в планината Удан.
>  Dialogue: 0,0:02:43.04,0:02:46.54,Default,,0,0,0,,Каза, че практикуваш\Nдълбока медитация.
>  Dialogue: 0,0:02:48.84,0:02:50.68,Default,,0,0,0,,Сигурно в планината\Nе много спокойно.
> -Dialogue: 0,0:02:51.25,0:02:53.46,Default,,0,0,0,,Завиждам ти.
> +Dialogue: 0,0:02:51.26,0:02:53.47,Default,,0,0,0,,Завиждам ти.
>  Dialogue: 0,0:02:53.67,0:02:58.34,Default,,0,0,0,,Имам толкова много работа,\Nпочти не ми остава\Nвреме за почивка.
>  Dialogue: 0,0:03:00.26,0:03:03.89,Default,,0,0,0,,Оставих обучението рано.
>  Dialogue: 0,0:03:05.69,0:03:11.28,Default,,0,0,0,,Защо? Ти си боец на Удан.\NОбучението е всичко.
> @@ -40,7 +40,7 @@ Dialogue: 0,0:03:53.40,0:03:56.57,Default,,0,0,0,,Не можах да издъ
>  Dialogue: 0,0:03:57.49,0:03:59.74,Default,,0,0,0,,Прекъснах медитацията си.
>  Dialogue: 0,0:03:59.95,0:04:02.24,Default,,0,0,0,,Не можах да продължа.
>  Dialogue: 0,0:04:03.20,0:04:07.79,Default,,0,0,0,,Нещо...\Nме дърпаше назад.
> -Dialogue: 0,0:04:09.62,0:04:10.91,Default,,0,0,0,,Какво беше?
> +Dialogue: 0,0:04:09.63,0:04:10.92,Default,,0,0,0,,Какво беше?
>  Dialogue: 0,0:04:15.46,0:04:18.00,Default,,0,0,0,,Нещо, от което не\Nмога да се освободя.
>  Dialogue: 0,0:04:23.39,0:04:24.68,Default,,0,0,0,,Скоро ли ще тръгваш?
>  Dialogue: 0,0:04:26.77,0:04:30.27,Default,,0,0,0,,Подготвяме охрана\Nза една доставка...
> @@ -52,7 +52,7 @@ Dialogue: 0,0:04:48.37,0:04:52.67,Default,,0,0,0,,Да. Той винаги е 
>  Dialogue: 0,0:04:52.88,0:04:56.55,Default,,0,0,0,,Не разбирам.\NКак можеш да се разделиш с него?
>  Dialogue: 0,0:04:56.76,0:04:59.93,Default,,0,0,0,,Той винаги е бил с теб.
>  Dialogue: 0,0:05:01.18,0:05:05.52,Default,,0,0,0,,Твърде много хора са\Nзагинали от това острие.
> -Dialogue: 0,0:05:09.68,0:05:14.52,Default,,0,0,0,,Чисто е единствено защото\Nкръвта се отмива лесно.
> +Dialogue: 0,0:05:09.69,0:05:14.53,Default,,0,0,0,,Чисто е единствено защото\Nкръвта се отмива лесно.
>  Dialogue: 0,0:05:15.40,0:05:20.61,Default,,0,0,0,,Ти го използваш справедливо.\NДостоен си за него.
>  Dialogue: 0,0:05:23.66,0:05:27.37,Default,,0,0,0,,Дойде време\Nда го оставя.
>  Dialogue: 0,0:05:27.58,0:05:31.21,Default,,0,0,0,,Е, какво ще правиш\Nот сега нататък?
Clément Bœsch Nov. 20, 2016, 12:52 p.m. UTC | #17
On Fri, Nov 11, 2016 at 04:46:51PM +0100, wm4 wrote:
[...]
> > +// TODO: delete this compatibility code when all subtitles encoders moved
> > +// to send_frame
> > +static int encode_subtitle_frame(AVCodecContext *avctx,
> > +                                 AVPacket *avpkt,
> > +                                 const AVFrame *frame,
> > +                                 int *got_packet_ptr)
> > +{
> > +    int i, ret;
> > +
> > +FF_DISABLE_DEPRECATION_WARNINGS
> > +    AVPacket tmp_pkt;
> > +    AVSubtitle subtitle;
> > +
> > +    get_subtitle_defaults(&subtitle);
> > +
> > +    /* Allocate (once) a temporary large output subtitle buffer */
> > +    if (!avctx->internal->byte_buffer) {
> > +        const int subtitle_out_max_size = 1024 * 1024;
> > +        uint8_t *subtitle_out = av_malloc(subtitle_out_max_size);
> > +        if (!subtitle_out)
> > +            return AVERROR(ENOMEM);
> > +
> > +        avctx->internal->byte_buffer      = subtitle_out;
> > +        avctx->internal->byte_buffer_size = subtitle_out_max_size;
> > +    }
> 
> This is used for encoding below... does the old API really not have a
> better way for this?
> 

Yeah, apparently. The subtitle API encoding never got updated; there is no
number at the end of "avcodec_encode_subtitle", which gives you an idea
about its era.

> I would have thought 1MB is a bit small, but then again I looked at
> ffmpeg.c, and it's using the same fixed size buffer.
> 

That's indeed where I took the value. Note that this function exists for
compatibility. Later on, we will likely make something smarter, but for
now I'm just making a wrapper: the priority is to make it clean and
transparent for the user (no AVSubtitle visible), then we can take the
time to cleanup our mess internally (making codecs handle AVFrame directly
instead of AVSubtitle)

[...]

(allocation issues raised fixed locally)

[...]
> > +static int decode_subtitle_frame(AVCodecContext *avctx,
> > +                                 AVFrame *frame,
> > +                                 int *got_frame_ptr,
> > +                                 const AVPacket *avpkt)
> > +{
> > +    int i, ret = 0;
> > +
> > +    *got_frame_ptr = 0;
> > +
> > +    if (!avctx->codec)
> > +        return AVERROR(EINVAL);
> > +
> > +    if (!avpkt->data && avpkt->size) {
> > +        av_log(avctx, AV_LOG_ERROR, "invalid packet: NULL data, size != 0\n");
> > +        return AVERROR(EINVAL);
> > +    }
> 
> didn't vobsubs have 0-sized packets at some point? Also, I think the
> caller should check this (avcodec_send_packet).
> 

That's the first check in avcodec_decode_subtitle2() so I suppose that's
valid. It's also present in avcodec_decode_audio4().

As you guessed later, that function is mostly a copy/paste from
avcodec_decode_subtitle2(). The difference being that there is
additionally a convert from AVSubtitle to AVFrame inside.

> > +
> > +    av_assert0(avctx->codec->type == AVMEDIA_TYPE_SUBTITLE);
> > +
> > +    av_frame_unref(frame);
> 
> It's already guaranteed to be unreffed.
> 

OK, dropped

> > +
> > +    if ((avctx->codec->capabilities & AV_CODEC_CAP_DELAY) || avpkt->size) {
> > +        AVPacket pkt_recoded;
> > +        AVPacket tmp = *avpkt;
> > +        int did_split = av_packet_split_side_data(&tmp);
> > +
> > +        if (did_split) {
> > +            /* FFMIN() prevents overflow in case the packet wasn't allocated with
> > +             * proper padding.
> > +             * If the side data is smaller than the buffer padding size, the
> > +             * remaining bytes should have already been filled with zeros by the
> > +             * original packet allocation anyway. */
> > +            memset(tmp.data + tmp.size, 0,
> > +                   FFMIN(avpkt->size - tmp.size, AV_INPUT_BUFFER_PADDING_SIZE));
> > +        }
> > +        pkt_recoded = tmp;
> > +
> > +        ret = recode_subtitle(avctx, &pkt_recoded, &tmp);
> > +        if (ret >= 0) {
> 
> IMO early exit would be nicer...
> 
> This is mostly the old code copy&pasted, I assume.
> 

Yes, I'd like to limit the differences with the original function since
this is kind of a large work with many risks of getting things wrong.

But after reading your review below, I realized I could just call
avcodec_decode_subtitle2() in that function and do the convert afterwards,
which is what I just did locally, addressing as a result a bunch of your
comments later on.

[...]
> > +
> > +            frame->sub_start_display = sub.start_display_time ? av_rescale_q(sub.start_display_time, av_make_q(1, 1000), AV_TIME_BASE_Q)
> > +                                                              : AV_NOPTS_VALUE;
> > +            frame->sub_end_display   = sub.end_display_time   ? av_rescale_q(sub.end_display_time,   av_make_q(1, 1000), AV_TIME_BASE_Q)
> > +                                                              : AV_NOPTS_VALUE;
> 
> end_display_time can be UINT32_MAX for "infinite". I think we should
> clean this up here, and differentiate between "0", "unknown", and
> "until next event".
> 

I don't even know what's the convention today, so this part is currently
mostly ignored from this first prototype.

What would be the difference between "infinite" and "until next
(potentially clear) event"?

[...]
> > +static const AVOption sbuffer_options[] = {
> > +    { "time_base",      NULL, OFFSET(time_base),  AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, 0, INT_MAX, S },
> > +    { "format",         NULL, OFFSET(sub_format), AV_OPT_TYPE_INT,      { .i64 = -1 }, -1, 1, S, "format" },
> > +        { "unspecified", NULL, 0, AV_OPT_TYPE_CONST, {.i64=-1}, INT_MIN, INT_MAX, S, "format" },
> > +        { "bitmap",      NULL, 0, AV_OPT_TYPE_CONST, {.i64=0},  INT_MIN, INT_MAX, S, "format" },
> > +        { "text",        NULL, 0, AV_OPT_TYPE_CONST, {.i64=1},  INT_MIN, INT_MAX, S, "format" },
> 
> Implies no mixed subs. I'm fine with that, but might be restrictive.
> 

Yeah, so far we don't have such things AFAIK. But we could replace this
with a mask later on. It was simpler for the initial format negotiation
that way.

[...]
> > +    } else if (type == AVMEDIA_TYPE_SUBTITLE) {
> > +        if (ff_add_format(&ret, 0 /* bitmap */) < 0 ||
> > +            ff_add_format(&ret, 1 /* text */) < 0)
> > +            return NULL;
> 
> Questionable, is probably preliminary code.
> 

Yes, preliminary code. It involves making a decision about what you just
raised earlier so I make a quick experiment with something as basic as
what we have today.

[...]
> >      while (av_read_frame(fmt, &pkt) >= 0) {
> > -        int i, got_subtitle;
> > -        AVSubtitle sub = {0};
> > -
> >          if (pkt.stream_index == sid) {
> > -            ret = avcodec_decode_subtitle2(dec_ctx, &sub, &got_subtitle, &pkt);
> > -            if (ret < 0) {
> > -                av_log(ctx, AV_LOG_WARNING, "Error decoding: %s (ignored)\n",
> > -                       av_err2str(ret));
> > -            } else if (got_subtitle) {
> > -                const int64_t start_time = av_rescale_q(sub.pts, AV_TIME_BASE_Q, av_make_q(1, 1000));
> > -                const int64_t duration   = sub.end_display_time;
> > -                for (i = 0; i < sub.num_rects; i++) {
> > -                    char *ass_line = sub.rects[i]->ass;
> > +            int i;
> > +
> > +            ret = avcodec_send_packet(dec_ctx, &pkt);
> > +            if (ret < 0)
> > +                break;
> > +            // XXX probably not correct api usage
> > +            ret = avcodec_receive_frame(dec_ctx, frame);
> > +            if (ret >= 0) {
> 
> I've found that converting code which used to old API to the new API is
> very messy. I think that's mostly because the data flow is rather
> different.
> 
> Correct API usage would be something like:
> 
>   while (1) {
>     if (av_codec_send_packet() == AVERROR_EAGAIN)
>       add_packet_back_to_demuxer_queue()
>     while (1) {
>       ret = avcodec_receive_frame
>       if (ret == AVERROR_EAGAIN)
>          break; // new packet needed
>       /* process frame */
>     }
>   }
> 
> Plus EOF and error handling.
> 
> avcodec_send_packet() returning EAGAIN in this specific situation can
> probably be reasonably treated as error instead. Unless the decoder
> violates the declared API rules.
> 

I might actually not clean that code but mark it as deprecated instead,
and make the filter take a stream directly (instead of a filename). It was
an initial attempt to make some tests without updating ffmpeg.c,
integrating into lavfi, etc. Now I can do it properly, so that code will
go away.

[...]
> > diff --git a/libavformat/assenc.c b/libavformat/assenc.c
> > index d50f18f..97a8de8 100644
> > --- a/libavformat/assenc.c
> > +++ b/libavformat/assenc.c
> > @@ -164,6 +164,9 @@ static int write_packet(AVFormatContext *s, AVPacket *pkt)
> >      int hh2, mm2, ss2, ms2;
> >      DialogueLine *dialogue = av_mallocz(sizeof(*dialogue));
> >  
> > +    if (pkt->duration < 0)
> > +        end = INT64_MAX;
> 
> Wait what.
> 

This needs to be in a separate patch with a longer explanation. IIRC I
didn't that because the last SAMI subtitle has no duration so it needs to
be muxed as "til the end".

[...]
> > +static int get_subtitle_buffer(AVFrame *frame)
> > +{
> > +    int i, ret;
> > +
> > +    ret = get_data_buffer(frame, frame->sub_nb_rects, sizeof(AVFrameSubtitleRectangle));
> > +    if (ret < 0)
> > +        return ret;
> > +    for (i = 0; i < frame->sub_nb_rects; i++)
> > +        memset(frame->extended_data[i], 0, sizeof(AVFrameSubtitleRectangle));
> > +    return 0;
> 
> I'm noticing it doesn't allocate the actual subtitle bitmap or text
> data, which is a difference from audio/video.
> 

Yes, exactly. This is tricky. It's really tricky because contrary to audio
and video, we can not know the size required for every (not yet allocated)
rectangle from the raw format information alone.

For video, we have the pixel format, and dimensions. For audio we have the
sample format and number of samples. But for subtitles, we can at most
have the number of rectangles.

To workaround this problem, I just made the get_buffer callback allocate
the rectangles holder. The decoder will then allocate every rectangle data
(of different sizes) manually, be it text or actual bitmap rectangle.

But indeed this means that every ref counting will mean a new allocation
and copy of the rectangles. As subtitles are meant to be small and sparse,
I'm assuming it's not exactly a problem.

Do you have a better suggestion?

Maybe we could have the user (decoders) allocate the rectangles holder and
fill all the sizes instead, and then call the get buffer, but this might
be annoying as it will require for most of them to do it in two-pass
instead of one as of today.

> >  }
> >  
> >  int av_frame_get_buffer(AVFrame *frame, int align)
> >  {
> > +    if (frame->sub_nb_rects)
> > +        return get_subtitle_buffer(frame);
> 
> No format set is valid for subtitles?
> 

Yes, no format set means text. If the format is set, it defines the format
used by the rectangles.

[...]
> > +    /* image data for bitmap subtitles, in AVFrame.format (AVPixelFormat) */
> > +    uint8_t *data[AV_NUM_DATA_POINTERS];
> > +    int linesize[AV_NUM_DATA_POINTERS];
> 
> Does it define the format somewhere?
> 

in AVFrame.format

> > +
> > +    /* decoded text for text subtitles, in ASS */
> > +    char *text;
> 
> I'm fine with this - other would probably argue this should be in
> data[0] or so.
> 
> > +
> > +    int flags;
> 
> Which flags (missing doc.)?
> 

That's the current flags, I need to move/copy them to lavu.

> > +} AVFrameSubtitleRectangle;
> 
> The struct should probably be extendable (so its size is not part of
> the ABI).
> 

In the current design, there is no need to ever allocate them: you set the
number of rectangles in your AVFrame, then call av_frame_get_buffer()
which will allocate them.

> > +
> >  /**
> >   * This structure describes decoded (raw) audio or video data.
> >   *
> > @@ -182,7 +201,6 @@ typedef struct AVFrameSideData {
> >   * for AVFrame can be obtained from avcodec_get_frame_class()
> >   */
> >  typedef struct AVFrame {
> > -#define AV_NUM_DATA_POINTERS 8
> >      /**
> >       * pointer to the picture/channel planes.
> >       * This might be different from the first allocated byte
> > @@ -224,9 +242,12 @@ typedef struct AVFrame {
> >       * For packed audio, there is just one data pointer, and linesize[0]
> >       * contains the total size of the buffer for all channels.
> >       *
> > +     * For subtitles, each pointer corresponds to an AVFrameSubtitleRectangle.
> > +     *
> 
> It should also mention how many data pointers there are.
> 
> Btw. I'm very not much of a fan to have to deal with extended_data. And
> the duplication with data. (Does it even document what data[] contains
> for subtitles?)
> 

It's exactly like audio: data[] points on the firsts extended_data[]

[...]

I'll try to address every other comment and send a new and more complete
version later.

Thanks a lot of the review.
Clément Bœsch Nov. 20, 2016, 12:58 p.m. UTC | #18
On Fri, Nov 11, 2016 at 12:45:03PM +0100, Nicolas George wrote:
> Le septidi 17 brumaire, an CCXXV, Clement Boesch a écrit :
> > I didn't. Duration is somehow broken currently for some reason. I did
> > nothing for sparseness: the reason I added basic support in lavfi is
> > because it was much simpler to handle at ffmpeg.c level, so it's currently
> > just a passthrough mechanism.
> 
> A decision will need to be made pretty soon, though, or it will not help
> anything much. Sparseness will make things break badly OOM-killer-style
> if someone tries to use subtitles coming from the same multiplexed file
> than the corresponding video. Duration affects the visible API, a
> decision on it is more or less final. Here are the results of my current
> thoughts:
> 
> For sparseness, the solution is to use heartbeat frames: if you have a
> subtitle event at 10s and the next one at 50s, emit a frame with no
> payload and just a timestamp at 10.1, 10.2, ... 49.1.
> 
[...]

Right, I'll probably need that for the sub2video filter. I'll think about
it unless you want to implement it yourself.

> > I did't like having multiple fields for text based data. If we want to
> > decode in another form, we can still add an option to print out verbatim
> > text instead of ASS markup.
> 
> I think we are not talking about the same thing. A long time ago, we
> considered replacing the ASS markup with a simple text field with
> styling in a separate, non-text, structure. Did you discard that idea?
> 

Ah, yeah, I discarded that idea. First, I think it's a too large change
for now, and I just won't be able to finish this patchset ever if I start
going that way. Second, apparently it's considered over engineering by the
API users I talked with. They're fine using libass for markup or a simple
freetype-like custom engine for when there is no markup.

If we later decide to export as an AST, we will add a field to that
rectangle and that will do the trick. For now, this is already such as
huge work that I'm not willing to stick that in (and probably won't ever
TBH). It's already raising a ton of questions wrt design.

[...]
Carl Eugen Hoyos Nov. 20, 2016, 1:24 p.m. UTC | #19
2016-11-20 13:52 GMT+01:00 Clément Bœsch <u@pkh.me>:

>> > +static const AVOption sbuffer_options[] = {
>> > +    { "time_base",      NULL, OFFSET(time_base),  AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, 0, INT_MAX, S },
>> > +    { "format",         NULL, OFFSET(sub_format), AV_OPT_TYPE_INT,      { .i64 = -1 }, -1, 1, S, "format" },
>> > +        { "unspecified", NULL, 0, AV_OPT_TYPE_CONST, {.i64=-1}, INT_MIN, INT_MAX, S, "format" },
>> > +        { "bitmap",      NULL, 0, AV_OPT_TYPE_CONST, {.i64=0},  INT_MIN, INT_MAX, S, "format" },
>> > +        { "text",        NULL, 0, AV_OPT_TYPE_CONST, {.i64=1},  INT_MIN, INT_MAX, S, "format" },
>>
>> Implies no mixed subs. I'm fine with that, but might be restrictive.
>>
>
> Yeah, so far we don't have such things AFAIK. But we could replace this
> with a mask later on. It was simpler for the initial format negotiation
> that way.

Will this work with our current external teletext decoder?
(Will it work with a hypothetical future teletext decoder that provides both
bitmap and text when decoding?)

Thank you, Carl Eugen
wm4 Nov. 21, 2016, 11:30 a.m. UTC | #20
On Sun, 20 Nov 2016 14:24:00 +0100
Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:

> 2016-11-20 13:52 GMT+01:00 Clément Bœsch <u@pkh.me>:
> 
> >> > +static const AVOption sbuffer_options[] = {
> >> > +    { "time_base",      NULL, OFFSET(time_base),  AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, 0, INT_MAX, S },
> >> > +    { "format",         NULL, OFFSET(sub_format), AV_OPT_TYPE_INT,      { .i64 = -1 }, -1, 1, S, "format" },
> >> > +        { "unspecified", NULL, 0, AV_OPT_TYPE_CONST, {.i64=-1}, INT_MIN, INT_MAX, S, "format" },
> >> > +        { "bitmap",      NULL, 0, AV_OPT_TYPE_CONST, {.i64=0},  INT_MIN, INT_MAX, S, "format" },
> >> > +        { "text",        NULL, 0, AV_OPT_TYPE_CONST, {.i64=1},  INT_MIN, INT_MAX, S, "format" },  
> >>
> >> Implies no mixed subs. I'm fine with that, but might be restrictive.
> >>  
> >
> > Yeah, so far we don't have such things AFAIK. But we could replace this
> > with a mask later on. It was simpler for the initial format negotiation
> > that way.  
> 
> Will this work with our current external teletext decoder?
> (Will it work with a hypothetical future teletext decoder that provides both
> bitmap and text when decoding?)

That's a different case. I was talking about subtitles with need both
text and images for correct display. Different output formats can be
handled with get_format or a private option or so.
Carl Eugen Hoyos Nov. 21, 2016, 11:46 a.m. UTC | #21
2016-11-21 12:30 GMT+01:00 wm4 <nfxjfg@googlemail.com>:
> On Sun, 20 Nov 2016 14:24:00 +0100
> Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>
>> 2016-11-20 13:52 GMT+01:00 Clément Bœsch <u@pkh.me>:
>>
>> >> > +static const AVOption sbuffer_options[] = {
>> >> > +    { "time_base",      NULL, OFFSET(time_base),  AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, 0, INT_MAX, S },
>> >> > +    { "format",         NULL, OFFSET(sub_format), AV_OPT_TYPE_INT,      { .i64 = -1 }, -1, 1, S, "format" },
>> >> > +        { "unspecified", NULL, 0, AV_OPT_TYPE_CONST, {.i64=-1}, INT_MIN, INT_MAX, S, "format" },
>> >> > +        { "bitmap",      NULL, 0, AV_OPT_TYPE_CONST, {.i64=0},  INT_MIN, INT_MAX, S, "format" },
>> >> > +        { "text",        NULL, 0, AV_OPT_TYPE_CONST, {.i64=1},  INT_MIN, INT_MAX, S, "format" },
>> >>
>> >> Implies no mixed subs. I'm fine with that, but might be restrictive.
>> >>
>> >
>> > Yeah, so far we don't have such things AFAIK. But we could replace this
>> > with a mask later on. It was simpler for the initial format negotiation
>> > that way.
>>
>> Will this work with our current external teletext decoder?
>> (Will it work with a hypothetical future teletext decoder that provides both
>> bitmap and text when decoding?)
>
> That's a different case. I was talking about subtitles with need both
> text and images for correct display. Different output formats can be
> handled with get_format or a private option or so.

So it does already work with our current external teletext decoder and
a hypothetical future teletext decoder that provides both bitmap and
text when decoding?

Thank you, Carl Eugen
wm4 Nov. 21, 2016, 11:48 a.m. UTC | #22
On Sun, 20 Nov 2016 13:52:02 +0100
Clément Bœsch <u@pkh.me> wrote:

> On Fri, Nov 11, 2016 at 04:46:51PM +0100, wm4 wrote:
> [...]
> > > +
> > > +            frame->sub_start_display = sub.start_display_time ? av_rescale_q(sub.start_display_time, av_make_q(1, 1000), AV_TIME_BASE_Q)
> > > +                                                              : AV_NOPTS_VALUE;
> > > +            frame->sub_end_display   = sub.end_display_time   ? av_rescale_q(sub.end_display_time,   av_make_q(1, 1000), AV_TIME_BASE_Q)
> > > +                                                              : AV_NOPTS_VALUE;  
> > 
> > end_display_time can be UINT32_MAX for "infinite". I think we should
> > clean this up here, and differentiate between "0", "unknown", and
> > "until next event".
> >   
> 
> I don't even know what's the convention today, so this part is currently
> mostly ignored from this first prototype.

dvdsubdec.c leaves end_display_time simply at 0 in some cases (which
means show until next event), while pgssubdec.c always sets it to
UINT32_MAX (which apparently means the same). I don't know why it
was done this way (see 0c911c8fbc).

> What would be the difference between "infinite" and "until next
> (potentially clear) event"?

Probably none.

The important part for non-infinite end time values is whether the
subtitles get strictly replaced by other (later) subtitles, or whether
they overlap.

With the existing API, you have to guess by the codec ID.

> 
> [...]
> > > +static int get_subtitle_buffer(AVFrame *frame)
> > > +{
> > > +    int i, ret;
> > > +
> > > +    ret = get_data_buffer(frame, frame->sub_nb_rects, sizeof(AVFrameSubtitleRectangle));
> > > +    if (ret < 0)
> > > +        return ret;
> > > +    for (i = 0; i < frame->sub_nb_rects; i++)
> > > +        memset(frame->extended_data[i], 0, sizeof(AVFrameSubtitleRectangle));
> > > +    return 0;  
> > 
> > I'm noticing it doesn't allocate the actual subtitle bitmap or text
> > data, which is a difference from audio/video.
> >   
> 
> Yes, exactly. This is tricky. It's really tricky because contrary to audio
> and video, we can not know the size required for every (not yet allocated)
> rectangle from the raw format information alone.
> 
> For video, we have the pixel format, and dimensions. For audio we have the
> sample format and number of samples. But for subtitles, we can at most
> have the number of rectangles.
> 
> To workaround this problem, I just made the get_buffer callback allocate
> the rectangles holder. The decoder will then allocate every rectangle data
> (of different sizes) manually, be it text or actual bitmap rectangle.
> 
> But indeed this means that every ref counting will mean a new allocation
> and copy of the rectangles. As subtitles are meant to be small and sparse,
> I'm assuming it's not exactly a problem.
> 
> Do you have a better suggestion?
> 
> Maybe we could have the user (decoders) allocate the rectangles holder and
> fill all the sizes instead, and then call the get buffer, but this might
> be annoying as it will require for most of them to do it in two-pass
> instead of one as of today.
> 

Seems to me that av_frame_get_buffer() would just have confusing
semantics here, and it'd be better to have dedicated allocation
functions.

But the important part is that the subtitle data is actually properly
refcounted/freed, and that things like copy-on-write actually work.

> > >  }
> > >  
> > >  int av_frame_get_buffer(AVFrame *frame, int align)
> > >  {
> > > +    if (frame->sub_nb_rects)
> > > +        return get_subtitle_buffer(frame);  
> > 
> > No format set is valid for subtitles?
> >   
> 
> Yes, no format set means text. If the format is set, it defines the format
> used by the rectangles.

Seems slightly questionable.

> [...]
> > > +    /* image data for bitmap subtitles, in AVFrame.format (AVPixelFormat) */
> > > +    uint8_t *data[AV_NUM_DATA_POINTERS];
> > > +    int linesize[AV_NUM_DATA_POINTERS];  
> > 
> > Does it define the format somewhere?
> >   
> 
> in AVFrame.format

Right. 

> > > +} AVFrameSubtitleRectangle;  
> > 
> > The struct should probably be extendable (so its size is not part of
> > the ABI).
> >   
> 
> In the current design, there is no need to ever allocate them: you set the
> number of rectangles in your AVFrame, then call av_frame_get_buffer()
> which will allocate them.

You still need to make this explicit. Also, allocation isn't the only
operation you can do on a struct.

> > > @@ -224,9 +242,12 @@ typedef struct AVFrame {
> > >       * For packed audio, there is just one data pointer, and linesize[0]
> > >       * contains the total size of the buffer for all channels.
> > >       *
> > > +     * For subtitles, each pointer corresponds to an AVFrameSubtitleRectangle.
> > > +     *  
> > 
> > It should also mention how many data pointers there are.
> > 
> > Btw. I'm very not much of a fan to have to deal with extended_data. And
> > the duplication with data. (Does it even document what data[] contains
> > for subtitles?)
> >   
> 
> It's exactly like audio: data[] points on the firsts extended_data[]

Well, for audio and video, I have my own structs instead of using
AVFrame (because AVFrame is very restrictive, not extensible, and hard
to use). And wiring up these extended* things is terrible. Actually,
the worse offenders are buf/extended_buf because of their weird
semantics.
Nicolas George Nov. 27, 2016, 10:18 a.m. UTC | #23
Le decadi 30 brumaire, an CCXXV, Clement Boesch a écrit :
> Right, I'll probably need that for the sub2video filter. I'll think about
> it unless you want to implement it yourself.

As always, for me, time is the limiting factor. For now, I believe the
only urgent thing to do is to decide what these heartbeat frames look
like: do they just have data[0] = NULL? format = -1?

> Ah, yeah, I discarded that idea. First, I think it's a too large change
> for now, and I just won't be able to finish this patchset ever if I start
> going that way. Second, apparently it's considered over engineering by the
> API users I talked with. They're fine using libass for markup or a simple
> freetype-like custom engine for when there is no markup.
> 
> If we later decide to export as an AST, we will add a field to that
> rectangle and that will do the trick. For now, this is already such as
> huge work that I'm not willing to stick that in (and probably won't ever
> TBH). It's already raising a ton of questions wrt design.

Ok.

Regards,
diff mbox

Patch

diff --git a/ffmpeg.c b/ffmpeg.c
index 36a921a..8222b8b 100644
--- a/ffmpeg.c
+++ b/ffmpeg.c
@@ -856,6 +856,50 @@  static int check_recording_time(OutputStream *ost)
     return 1;
 }
 
+static void do_subtitle_out(OutputFile *of,
+                            OutputStream *ost,
+                            AVFrame *frame)
+{
+    AVCodecContext *enc = ost->enc_ctx;
+    AVPacket pkt;
+    int ret;
+
+    av_init_packet(&pkt);
+    pkt.data = NULL;
+    pkt.size = 0;
+
+    if (!check_recording_time(ost))
+        return;
+
+    ost->frames_encoded++;
+
+    av_assert0(pkt.size || !pkt.data);
+    update_benchmark(NULL);
+
+    ret = avcodec_send_frame(enc, frame);
+    if (ret < 0)
+        goto error;
+
+    while (1) {
+        ret = avcodec_receive_packet(enc, &pkt);
+        if (ret == AVERROR(EAGAIN))
+            break;
+        if (ret < 0)
+            goto error;
+
+        update_benchmark("encode_subtitle %d.%d", ost->file_index, ost->index);
+
+        av_packet_rescale_ts(&pkt, enc->time_base, ost->st->time_base);
+
+        output_packet(of, &pkt, ost);
+    }
+
+    return;
+error:
+    av_log(NULL, AV_LOG_FATAL, "Subtitle encoding failed\n");
+    exit_program(1);
+}
+
 static void do_audio_out(OutputFile *of, OutputStream *ost,
                          AVFrame *frame)
 {
@@ -916,89 +960,6 @@  error:
     exit_program(1);
 }
 
-static void do_subtitle_out(OutputFile *of,
-                            OutputStream *ost,
-                            AVSubtitle *sub)
-{
-    int subtitle_out_max_size = 1024 * 1024;
-    int subtitle_out_size, nb, i;
-    AVCodecContext *enc;
-    AVPacket pkt;
-    int64_t pts;
-
-    if (sub->pts == AV_NOPTS_VALUE) {
-        av_log(NULL, AV_LOG_ERROR, "Subtitle packets must have a pts\n");
-        if (exit_on_error)
-            exit_program(1);
-        return;
-    }
-
-    enc = ost->enc_ctx;
-
-    if (!subtitle_out) {
-        subtitle_out = av_malloc(subtitle_out_max_size);
-        if (!subtitle_out) {
-            av_log(NULL, AV_LOG_FATAL, "Failed to allocate subtitle_out\n");
-            exit_program(1);
-        }
-    }
-
-    /* Note: DVB subtitle need one packet to draw them and one other
-       packet to clear them */
-    /* XXX: signal it in the codec context ? */
-    if (enc->codec_id == AV_CODEC_ID_DVB_SUBTITLE)
-        nb = 2;
-    else
-        nb = 1;
-
-    /* shift timestamp to honor -ss and make check_recording_time() work with -t */
-    pts = sub->pts;
-    if (output_files[ost->file_index]->start_time != AV_NOPTS_VALUE)
-        pts -= output_files[ost->file_index]->start_time;
-    for (i = 0; i < nb; i++) {
-        unsigned save_num_rects = sub->num_rects;
-
-        ost->sync_opts = av_rescale_q(pts, AV_TIME_BASE_Q, enc->time_base);
-        if (!check_recording_time(ost))
-            return;
-
-        sub->pts = pts;
-        // start_display_time is required to be 0
-        sub->pts               += av_rescale_q(sub->start_display_time, (AVRational){ 1, 1000 }, AV_TIME_BASE_Q);
-        sub->end_display_time  -= sub->start_display_time;
-        sub->start_display_time = 0;
-        if (i == 1)
-            sub->num_rects = 0;
-
-        ost->frames_encoded++;
-
-        subtitle_out_size = avcodec_encode_subtitle(enc, subtitle_out,
-                                                    subtitle_out_max_size, sub);
-        if (i == 1)
-            sub->num_rects = save_num_rects;
-        if (subtitle_out_size < 0) {
-            av_log(NULL, AV_LOG_FATAL, "Subtitle encoding failed\n");
-            exit_program(1);
-        }
-
-        av_init_packet(&pkt);
-        pkt.data = subtitle_out;
-        pkt.size = subtitle_out_size;
-        pkt.pts  = av_rescale_q(sub->pts, AV_TIME_BASE_Q, ost->st->time_base);
-        pkt.duration = av_rescale_q(sub->end_display_time, (AVRational){ 1, 1000 }, ost->st->time_base);
-        if (enc->codec_id == AV_CODEC_ID_DVB_SUBTITLE) {
-            /* XXX: the pts correction is handled here. Maybe handling
-               it in the codec would be better */
-            if (i == 0)
-                pkt.pts += 90 * sub->start_display_time;
-            else
-                pkt.pts += 90 * sub->end_display_time;
-        }
-        pkt.dts = pkt.pts;
-        output_packet(of, &pkt, ost);
-    }
-}
-
 static void do_video_out(OutputFile *of,
                          OutputStream *ost,
                          AVFrame *next_picture,
@@ -1431,6 +1392,8 @@  static int reap_filters(int flush)
                 filtered_frame->pts =
                     av_rescale_q(filtered_frame->pts, filter->inputs[0]->time_base, enc->time_base) -
                     av_rescale_q(start_time, AV_TIME_BASE_Q, enc->time_base);
+                filtered_frame->pkt_duration =
+                    av_rescale_q(filtered_frame->pkt_duration, filter->inputs[0]->time_base, enc->time_base);
             }
             //if (ost->source_index >= 0)
             //    *filtered_frame= *input_streams[ost->source_index]->decoded_frame; //for me_threshold
@@ -1458,8 +1421,10 @@  static int reap_filters(int flush)
                 }
                 do_audio_out(of, ost, filtered_frame);
                 break;
+            case AVMEDIA_TYPE_SUBTITLE:
+                do_subtitle_out(of, ost, filtered_frame);
+                break;
             default:
-                // TODO support subtitle filters
                 av_assert0(0);
             }
 
@@ -2081,6 +2046,36 @@  static int send_frame_to_filters(InputStream *ist, AVFrame *decoded_frame)
     return ret;
 }
 
+static int decode_subtitle(InputStream *ist, AVPacket *pkt, int *got_output)
+{
+    int i, ret, err;
+    AVFrame *decoded_frame;
+
+    if (!ist->decoded_frame && !(ist->decoded_frame = av_frame_alloc()))
+        return AVERROR(ENOMEM);
+    if (!ist->filter_frame && !(ist->filter_frame = av_frame_alloc()))
+        return AVERROR(ENOMEM);
+    decoded_frame = ist->decoded_frame;
+
+    update_benchmark(NULL);
+    ret = decode(ist->dec_ctx, decoded_frame, got_output, pkt);
+    update_benchmark("decode_subtitle %d.%d", ist->file_index, ist->st->index);
+
+    if (ret != AVERROR_EOF)
+        check_decode_result(ist, got_output, ret);
+
+    if (!*got_output || ret < 0)
+        return ret;
+
+    ist->frames_decoded++;
+
+    err = send_frame_to_filters(ist, decoded_frame);
+
+    av_frame_unref(ist->filter_frame);
+    av_frame_unref(decoded_frame);
+    return err < 0 ? err : ret;
+}
+
 static int decode_audio(InputStream *ist, AVPacket *pkt, int *got_output)
 {
     AVFrame *decoded_frame;
@@ -2335,65 +2330,6 @@  fail:
     return err < 0 ? err : ret;
 }
 
-static int transcode_subtitles(InputStream *ist, AVPacket *pkt, int *got_output)
-{
-    AVSubtitle subtitle;
-    int i, ret = avcodec_decode_subtitle2(ist->dec_ctx,
-                                          &subtitle, got_output, pkt);
-
-    check_decode_result(NULL, got_output, ret);
-
-    if (ret < 0 || !*got_output) {
-        if (!pkt->size)
-            sub2video_flush(ist);
-        return ret;
-    }
-
-    if (ist->fix_sub_duration) {
-        int end = 1;
-        if (ist->prev_sub.got_output) {
-            end = av_rescale(subtitle.pts - ist->prev_sub.subtitle.pts,
-                             1000, AV_TIME_BASE);
-            if (end < ist->prev_sub.subtitle.end_display_time) {
-                av_log(ist->dec_ctx, AV_LOG_DEBUG,
-                       "Subtitle duration reduced from %d to %d%s\n",
-                       ist->prev_sub.subtitle.end_display_time, end,
-                       end <= 0 ? ", dropping it" : "");
-                ist->prev_sub.subtitle.end_display_time = end;
-            }
-        }
-        FFSWAP(int,        *got_output, ist->prev_sub.got_output);
-        FFSWAP(int,        ret,         ist->prev_sub.ret);
-        FFSWAP(AVSubtitle, subtitle,    ist->prev_sub.subtitle);
-        if (end <= 0)
-            goto out;
-    }
-
-    if (!*got_output)
-        return ret;
-
-    sub2video_update(ist, &subtitle);
-
-    if (!subtitle.num_rects)
-        goto out;
-
-    ist->frames_decoded++;
-
-    for (i = 0; i < nb_output_streams; i++) {
-        OutputStream *ost = output_streams[i];
-
-        if (!check_output_constraints(ist, ost) || !ost->encoding_needed
-            || ost->enc->type != AVMEDIA_TYPE_SUBTITLE)
-            continue;
-
-        do_subtitle_out(output_files[ost->file_index], ost, &subtitle);
-    }
-
-out:
-    avsubtitle_free(&subtitle);
-    return ret;
-}
-
 static int send_filter_eof(InputStream *ist)
 {
     int i, ret;
@@ -2477,11 +2413,7 @@  static int process_input_packet(InputStream *ist, const AVPacket *pkt, int no_eo
                 ist->next_pts += duration; //FIXME the duration is not correct in some cases
             break;
         case AVMEDIA_TYPE_SUBTITLE:
-            if (repeating)
-                break;
-            ret = transcode_subtitles(ist, &avpkt, &got_output);
-            if (!pkt && ret >= 0)
-                ret = AVERROR_EOF;
+            ret = decode_subtitle(ist, repeating ? NULL : &avpkt, &got_output);
             break;
         default:
             return -1;
@@ -3293,7 +3225,8 @@  static int transcode_init(void)
 #endif
 
             if ((enc_ctx->codec_type == AVMEDIA_TYPE_VIDEO ||
-                 enc_ctx->codec_type == AVMEDIA_TYPE_AUDIO) &&
+                 enc_ctx->codec_type == AVMEDIA_TYPE_AUDIO ||
+                 enc_ctx->codec_type == AVMEDIA_TYPE_SUBTITLE) &&
                  filtergraph_is_simple(ost->filter->graph)) {
                     FilterGraph *fg = ost->filter->graph;
                     if (configure_filtergraph(fg)) {
diff --git a/ffmpeg_filter.c b/ffmpeg_filter.c
index 27aeca0..743c96b 100644
--- a/ffmpeg_filter.c
+++ b/ffmpeg_filter.c
@@ -231,8 +231,10 @@  static void init_input_filter(FilterGraph *fg, AVFilterInOut *in)
     int i;
 
     // TODO: support other filter types
-    if (type != AVMEDIA_TYPE_VIDEO && type != AVMEDIA_TYPE_AUDIO) {
-        av_log(NULL, AV_LOG_FATAL, "Only video and audio filters supported "
+    if (type != AVMEDIA_TYPE_VIDEO &&
+        type != AVMEDIA_TYPE_AUDIO &&
+        type != AVMEDIA_TYPE_SUBTITLE) {
+        av_log(NULL, AV_LOG_FATAL, "Only video, audio and subtitle filters supported "
                "currently.\n");
         exit_program(1);
     }
@@ -633,6 +635,29 @@  static int configure_output_audio_filter(FilterGraph *fg, OutputFilter *ofilter,
     return 0;
 }
 
+static int configure_output_subtitle_filter(FilterGraph *fg, OutputFilter *ofilter, AVFilterInOut *out)
+{
+    OutputStream *ost = ofilter->ost;
+    //OutputFile    *of = output_files[ost->file_index];
+    //AVCodecContext *codec  = ost->enc_ctx;
+    AVFilterContext *last_filter = out->filter_ctx;
+    int pad_idx = out->pad_idx;
+    char name[255];
+    int ret;
+
+    snprintf(name, sizeof(name), "output stream %d:%d", ost->file_index, ost->index);
+    ret = avfilter_graph_create_filter(&ofilter->filter,
+                                       avfilter_get_by_name("sbuffersink"),
+                                       name, NULL, NULL, fg->graph);
+    if (ret < 0)
+        return ret;
+
+    if ((ret = avfilter_link(last_filter, pad_idx, ofilter->filter, 0)) < 0)
+        return ret;
+
+    return 0;
+}
+
 #define DESCRIBE_FILTER_LINK(f, inout, in)                         \
 {                                                                  \
     AVFilterContext *ctx = inout->filter_ctx;                      \
@@ -663,6 +688,7 @@  int configure_output_filter(FilterGraph *fg, OutputFilter *ofilter, AVFilterInOu
     switch (avfilter_pad_get_type(out->filter_ctx->output_pads, out->pad_idx)) {
     case AVMEDIA_TYPE_VIDEO: return configure_output_video_filter(fg, ofilter, out);
     case AVMEDIA_TYPE_AUDIO: return configure_output_audio_filter(fg, ofilter, out);
+    case AVMEDIA_TYPE_SUBTITLE: return configure_output_subtitle_filter(fg, ofilter, out);
     default: av_assert0(0);
     }
 }
@@ -957,6 +983,69 @@  static int configure_input_audio_filter(FilterGraph *fg, InputFilter *ifilter,
     return 0;
 }
 
+static int configure_input_subtitle_filter(FilterGraph *fg, InputFilter *ifilter,
+                                           AVFilterInOut *in)
+{
+    AVFilterContext *last_filter;
+    const AVFilter *sbuffer_filt = avfilter_get_by_name("sbuffer");
+    InputStream *ist = ifilter->ist;
+    const AVCodecContext *avctx = ist->dec_ctx;
+    //InputFile     *f = input_files[ist->file_index];
+    AVBPrint args;
+    char name[255];
+    int ret; //, pad_idx = 0;
+    //int64_t tsoffset = 0;
+    const AVRational tb = ist->st->time_base;
+
+    // XXX: codec should be opened here so we could access avctx->codec_descriptor
+    const AVCodecDescriptor *codec_desc = avcodec_descriptor_get(avctx->codec_id);
+
+    if (avctx->codec_type != AVMEDIA_TYPE_SUBTITLE) {
+        av_log(NULL, AV_LOG_ERROR, "Cannot connect subtitle filter to %s input\n",
+               av_get_media_type_string(avctx->codec_type));
+        return AVERROR(EINVAL);
+    }
+
+    av_bprint_init(&args, 0, AV_BPRINT_SIZE_AUTOMATIC);
+    av_bprintf(&args, "time_base=%d/%d:format=", tb.num, tb.den);
+
+    if (codec_desc->props & AV_CODEC_PROP_TEXT_SUB)
+        av_bprintf(&args, "text");
+    else if (codec_desc->props & AV_CODEC_PROP_BITMAP_SUB)
+        av_bprintf(&args, "bitmap");
+    else
+        av_bprintf(&args, "unspecified");
+
+    snprintf(name, sizeof(name), "graph %d input from stream %d:%d", fg->index,
+             ist->file_index, ist->st->index);
+
+    if ((ret = avfilter_graph_create_filter(&ifilter->filter, sbuffer_filt,
+                                            name, args.str, NULL,
+                                            fg->graph)) < 0)
+        return ret;
+    last_filter = ifilter->filter;
+
+#if 0
+    snprintf(name, sizeof(name), "trim for input stream %d:%d",
+             ist->file_index, ist->st->index);
+    if (copy_ts) {
+        tsoffset = f->start_time == AV_NOPTS_VALUE ? 0 : f->start_time;
+        if (!start_at_zero && f->ctx->start_time != AV_NOPTS_VALUE)
+            tsoffset += f->ctx->start_time;
+    }
+    ret = insert_trim(((f->start_time == AV_NOPTS_VALUE) || !f->accurate_seek) ?
+                      AV_NOPTS_VALUE : tsoffset, f->recording_time,
+                      &last_filter, &pad_idx, name);
+    if (ret < 0)
+        return ret;
+#endif
+
+    if ((ret = avfilter_link(last_filter, 0, in->filter_ctx, in->pad_idx)) < 0)
+        return ret;
+
+    return 0;
+}
+
 static int configure_input_filter(FilterGraph *fg, InputFilter *ifilter,
                                   AVFilterInOut *in)
 {
@@ -972,6 +1061,7 @@  static int configure_input_filter(FilterGraph *fg, InputFilter *ifilter,
     switch (avfilter_pad_get_type(in->filter_ctx->input_pads, in->pad_idx)) {
     case AVMEDIA_TYPE_VIDEO: return configure_input_video_filter(fg, ifilter, in);
     case AVMEDIA_TYPE_AUDIO: return configure_input_audio_filter(fg, ifilter, in);
+    case AVMEDIA_TYPE_SUBTITLE: return configure_input_subtitle_filter(fg, ifilter, in);
     default: av_assert0(0);
     }
 }
diff --git a/ffmpeg_opt.c b/ffmpeg_opt.c
index 4d25fff..a8d9a9f 100644
--- a/ffmpeg_opt.c
+++ b/ffmpeg_opt.c
@@ -1487,8 +1487,13 @@  static char *get_ost_filters(OptionsContext *o, AVFormatContext *oc,
     else if (ost->filters)
         return av_strdup(ost->filters);
 
-    return av_strdup(st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO ?
-                     "null" : "anull");
+    switch (st->codecpar->codec_type) {
+    case AVMEDIA_TYPE_VIDEO:    return av_strdup("null");
+    case AVMEDIA_TYPE_AUDIO:    return av_strdup("anull");
+    case AVMEDIA_TYPE_SUBTITLE: return av_strdup("snull");
+    default:
+        av_assert0(0);
+    }
 }
 
 static void check_streamcopy_filters(OptionsContext *o, AVFormatContext *oc,
@@ -1824,6 +1829,9 @@  static OutputStream *new_subtitle_stream(OptionsContext *o, AVFormatContext *oc,
 
     MATCH_PER_STREAM_OPT(copy_initial_nonkeyframes, i, ost->copy_initial_nonkeyframes, oc, st);
 
+    MATCH_PER_STREAM_OPT(filter_scripts, str, ost->filters_script, oc, st);
+    MATCH_PER_STREAM_OPT(filters,        str, ost->filters,        oc, st);
+
     if (!ost->stream_copy) {
         char *frame_size = NULL;
 
@@ -1832,8 +1840,15 @@  static OutputStream *new_subtitle_stream(OptionsContext *o, AVFormatContext *oc,
             av_log(NULL, AV_LOG_FATAL, "Invalid frame size: %s.\n", frame_size);
             exit_program(1);
         }
+
+        ost->avfilter = get_ost_filters(o, oc, ost);
+        if (!ost->avfilter)
+            exit_program(1);
     }
 
+    if (ost->stream_copy)
+        check_streamcopy_filters(o, oc, ost, AVMEDIA_TYPE_SUBTITLE);
+
     return ost;
 }
 
@@ -1961,6 +1976,7 @@  static void init_output_filter(OutputFilter *ofilter, OptionsContext *o,
     switch (ofilter->type) {
     case AVMEDIA_TYPE_VIDEO: ost = new_video_stream(o, oc, -1); break;
     case AVMEDIA_TYPE_AUDIO: ost = new_audio_stream(o, oc, -1); break;
+    case AVMEDIA_TYPE_SUBTITLE: ost = new_subtitle_stream(o, oc, -1); break;
     default:
         av_log(NULL, AV_LOG_FATAL, "Only video and audio filters are supported "
                "currently.\n");
@@ -1980,7 +1996,7 @@  static void init_output_filter(OutputFilter *ofilter, OptionsContext *o,
     }
 
     if (ost->avfilter && (ost->filters || ost->filters_script)) {
-        const char *opt = ost->filters ? "-vf/-af/-filter" : "-filter_script";
+        const char *opt = ost->filters ? "-vf/-af/-sf/-filter" : "-filter_script";
         av_log(NULL, AV_LOG_ERROR,
                "%s '%s' was specified through the %s option "
                "for output stream %d:%d, which is fed from a complex filtergraph.\n"
@@ -2388,7 +2404,8 @@  loop_end:
             ist->decoding_needed |= DECODING_FOR_OST;
 
             if (ost->st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO ||
-                ost->st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
+                ost->st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO ||
+                ost->st->codecpar->codec_type == AVMEDIA_TYPE_SUBTITLE) {
                 err = init_simple_filtergraph(ist, ost);
                 if (err < 0) {
                     av_log(NULL, AV_LOG_ERROR,
@@ -2929,6 +2946,12 @@  static int opt_audio_filters(void *optctx, const char *opt, const char *arg)
     return parse_option(o, "filter:a", arg, options);
 }
 
+static int opt_subtitle_filters(void *optctx, const char *opt, const char *arg)
+{
+    OptionsContext *o = optctx;
+    return parse_option(o, "filter:s", arg, options);
+}
+
 static int opt_vsync(void *optctx, const char *opt, const char *arg)
 {
     if      (!av_strcasecmp(arg, "cfr"))         video_sync_method = VSYNC_CFR;
@@ -3540,6 +3563,8 @@  const OptionDef options[] = {
         "fix subtitles duration" },
     { "canvas_size", OPT_SUBTITLE | HAS_ARG | OPT_STRING | OPT_SPEC | OPT_INPUT, { .off = OFFSET(canvas_sizes) },
         "set canvas size (WxH or abbreviation)", "size" },
+    { "sf",     OPT_SUBTITLE | HAS_ARG | OPT_PERFILE | OPT_OUTPUT, { .func_arg = opt_subtitle_filters },
+        "set subtitle filters", "filter_graph" },
 
     /* grab options */
     { "vc", HAS_ARG | OPT_EXPERT | OPT_VIDEO, { .func_arg = opt_video_channel },
diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index 211112f..0912bff 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -4838,7 +4838,10 @@  int avcodec_decode_video2(AVCodecContext *avctx, AVFrame *picture,
  *                 must be freed with avsubtitle_free if *got_sub_ptr is set.
  * @param[in,out] got_sub_ptr Zero if no subtitle could be decompressed, otherwise, it is nonzero.
  * @param[in] avpkt The input AVPacket containing the input buffer.
+ *
+ * @deprecated Use avcodec_send_packet() and avcodec_receive_frame().
  */
+attribute_deprecated
 int avcodec_decode_subtitle2(AVCodecContext *avctx, AVSubtitle *sub,
                             int *got_sub_ptr,
                             AVPacket *avpkt);
@@ -5322,6 +5325,12 @@  attribute_deprecated
 int avcodec_encode_video2(AVCodecContext *avctx, AVPacket *avpkt,
                           const AVFrame *frame, int *got_packet_ptr);
 
+/**
+ * Encode a subtitle.
+ *
+ * @deprecated use avcodec_send_frame()/avcodec_receive_packet() instead
+ */
+attribute_deprecated
 int avcodec_encode_subtitle(AVCodecContext *avctx, uint8_t *buf, int buf_size,
                             const AVSubtitle *sub);
 
diff --git a/libavcodec/utils.c b/libavcodec/utils.c
index 87de15f..add7edf 100644
--- a/libavcodec/utils.c
+++ b/libavcodec/utils.c
@@ -2050,6 +2050,86 @@  int avcodec_encode_subtitle(AVCodecContext *avctx, uint8_t *buf, int buf_size,
     return ret;
 }
 
+// TODO: delete this compatibility code when all subtitles encoders moved
+// to send_frame
+static int encode_subtitle_frame(AVCodecContext *avctx,
+                                 AVPacket *avpkt,
+                                 const AVFrame *frame,
+                                 int *got_packet_ptr)
+{
+    int i, ret;
+
+FF_DISABLE_DEPRECATION_WARNINGS
+    AVPacket tmp_pkt;
+    AVSubtitle subtitle;
+
+    get_subtitle_defaults(&subtitle);
+
+    /* Allocate (once) a temporary large output subtitle buffer */
+    if (!avctx->internal->byte_buffer) {
+        const int subtitle_out_max_size = 1024 * 1024;
+        uint8_t *subtitle_out = av_malloc(subtitle_out_max_size);
+        if (!subtitle_out)
+            return AVERROR(ENOMEM);
+
+        avctx->internal->byte_buffer      = subtitle_out;
+        avctx->internal->byte_buffer_size = subtitle_out_max_size;
+    }
+
+    /* Craft an AVSubtitle from the AVFrame */
+    subtitle.format = frame->format == AV_PIX_FMT_NONE;
+    subtitle.rects  = av_mallocz_array(frame->sub_nb_rects, sizeof(*subtitle.rects));
+    if (!subtitle.rects)
+        return AVERROR(ENOMEM);
+    subtitle.num_rects = frame->sub_nb_rects;
+    subtitle.pts = frame->pts;
+
+    for (i = 0; i < frame->sub_nb_rects; i++) {
+        const AVFrameSubtitleRectangle *src_rect = (AVFrameSubtitleRectangle *)frame->extended_data[i];
+        AVSubtitleRect *dst_rect;
+
+        dst_rect = av_mallocz(sizeof(*subtitle.rects[i]));
+        if (!dst_rect)
+            return AVERROR(ENOMEM);
+        subtitle.rects[i] = dst_rect;
+
+        if (subtitle.format) {
+            dst_rect->type = SUBTITLE_ASS;
+            dst_rect->ass  = src_rect->text;
+        } else {
+            dst_rect->type = SUBTITLE_BITMAP;
+            dst_rect->x = src_rect->x;
+            dst_rect->y = src_rect->x;
+            dst_rect->w = src_rect->w;
+            dst_rect->h = src_rect->h;
+            memcpy(dst_rect->data,     src_rect->data,     sizeof(dst_rect->data));
+            memcpy(dst_rect->linesize, src_rect->linesize, sizeof(dst_rect->linesize));
+        }
+        dst_rect->flags = src_rect->flags;
+    }
+
+    /* Send it to the old API */
+    ret = avcodec_encode_subtitle(avctx,
+                                  avctx->internal->byte_buffer,
+                                  avctx->internal->byte_buffer_size,
+                                  &subtitle);
+    if (ret < 0)
+        return ret;
+
+    /* Wrap the encoded buffer into a ref-counted AVPacket */
+    av_init_packet(&tmp_pkt);
+    tmp_pkt.data = avctx->internal->byte_buffer;
+    tmp_pkt.size = ret;
+    av_packet_ref(avpkt, &tmp_pkt);
+    avpkt->pts = frame->pts;
+    avpkt->dts = frame->pkt_dts;
+    avpkt->duration = frame->pkt_duration;
+FF_ENABLE_DEPRECATION_WARNINGS
+
+    *got_packet_ptr = 1;
+    return ret;
+}
+
 /**
  * Attempt to guess proper monotonic timestamps for decoded video frames
  * which might have incorrect times. Input timestamps may wrap around, in
@@ -2764,6 +2844,134 @@  void avsubtitle_free(AVSubtitle *sub)
     memset(sub, 0, sizeof(AVSubtitle));
 }
 
+static int decode_subtitle_frame(AVCodecContext *avctx,
+                                 AVFrame *frame,
+                                 int *got_frame_ptr,
+                                 const AVPacket *avpkt)
+{
+    int i, ret = 0;
+
+    *got_frame_ptr = 0;
+
+    if (!avctx->codec)
+        return AVERROR(EINVAL);
+
+    if (!avpkt->data && avpkt->size) {
+        av_log(avctx, AV_LOG_ERROR, "invalid packet: NULL data, size != 0\n");
+        return AVERROR(EINVAL);
+    }
+
+    av_assert0(avctx->codec->type == AVMEDIA_TYPE_SUBTITLE);
+
+    av_frame_unref(frame);
+
+    if ((avctx->codec->capabilities & AV_CODEC_CAP_DELAY) || avpkt->size) {
+        AVPacket pkt_recoded;
+        AVPacket tmp = *avpkt;
+        int did_split = av_packet_split_side_data(&tmp);
+
+        if (did_split) {
+            /* FFMIN() prevents overflow in case the packet wasn't allocated with
+             * proper padding.
+             * If the side data is smaller than the buffer padding size, the
+             * remaining bytes should have already been filled with zeros by the
+             * original packet allocation anyway. */
+            memset(tmp.data + tmp.size, 0,
+                   FFMIN(avpkt->size - tmp.size, AV_INPUT_BUFFER_PADDING_SIZE));
+        }
+        pkt_recoded = tmp;
+
+        ret = recode_subtitle(avctx, &pkt_recoded, &tmp);
+        if (ret >= 0) {
+            int err_ret;
+            AVSubtitle sub = {0};
+
+            avctx->internal->pkt = &pkt_recoded;
+
+            ret = avctx->codec->decode(avctx, &sub, got_frame_ptr, &pkt_recoded);
+            av_assert1((ret >= 0) >= !!*got_frame_ptr &&
+                       !!*got_frame_ptr >= !!sub.num_rects);
+
+            for (i = 0; i < sub.num_rects; i++) {
+                if (sub.rects[i]->ass && !utf8_check(sub.rects[i]->ass)) {
+                    av_log(avctx, AV_LOG_ERROR,
+                           "Invalid UTF-8 in decoded subtitles text; "
+                           "maybe missing -sub_charenc option\n");
+                    avsubtitle_free(&sub);
+                    return AVERROR_INVALIDDATA;
+                }
+            }
+
+            frame->pts                   =
+            frame->best_effort_timestamp = avpkt->pts;
+            frame->pkt_duration          = avpkt->duration;
+            frame->pkt_pos               = avpkt->pos;
+            frame->sub_nb_rects          = sub.num_rects;
+
+            if (avctx->codec_descriptor->props & AV_CODEC_PROP_BITMAP_SUB)
+                frame->format = AV_PIX_FMT_PAL8;
+            else if (avctx->codec_descriptor->props & AV_CODEC_PROP_TEXT_SUB)
+                frame->format = AV_PIX_FMT_NONE;
+
+            frame->sub_start_display = sub.start_display_time ? av_rescale_q(sub.start_display_time, av_make_q(1, 1000), AV_TIME_BASE_Q)
+                                                              : AV_NOPTS_VALUE;
+            frame->sub_end_display   = sub.end_display_time   ? av_rescale_q(sub.end_display_time,   av_make_q(1, 1000), AV_TIME_BASE_Q)
+                                                              : AV_NOPTS_VALUE;
+
+            /* Allocate sub_nb_rects AVFrameSubtitleRectangle */
+            err_ret = av_frame_get_buffer(frame, 0);
+            if (err_ret < 0) {
+                avsubtitle_free(&sub);
+                return err_ret;
+            }
+
+            /* Transfer data from AVSubtitleRect to AVFrameSubtitleRectangle */
+            for (i = 0; i < sub.num_rects; i++) {
+                AVSubtitleRect *src_rect = sub.rects[i];
+                AVFrameSubtitleRectangle *dst_rect = (AVFrameSubtitleRectangle *)frame->extended_data[i];
+
+                if (frame->format == AV_PIX_FMT_NONE) {
+                    dst_rect->text = src_rect->ass;
+                } else {
+                    dst_rect->x = src_rect->x;
+                    dst_rect->y = src_rect->y;
+                    dst_rect->w = src_rect->w;
+                    dst_rect->h = src_rect->h;
+                    memcpy(dst_rect->data,     src_rect->data,     sizeof(src_rect->data));
+                    memcpy(dst_rect->linesize, src_rect->linesize, sizeof(src_rect->linesize));
+                }
+                dst_rect->flags = src_rect->flags;
+
+                // we free and make sure it is set to NULL so the data inside
+                // is not freed after destructing the AVSubtitle
+                av_freep(&sub.rects[i]);
+            }
+            sub.num_rects = 0;
+            avsubtitle_free(&sub);
+
+            if (tmp.data != pkt_recoded.data) { // did we recode?
+                /* prevent from destroying side data from original packet */
+                pkt_recoded.side_data = NULL;
+                pkt_recoded.side_data_elems = 0;
+
+                av_packet_unref(&pkt_recoded);
+            }
+            avctx->internal->pkt = NULL;
+        }
+
+        if (did_split) {
+            av_packet_free_side_data(&tmp);
+            if(ret == tmp.size)
+                ret = avpkt->size;
+        }
+
+        if (*got_frame_ptr)
+            avctx->frame_number++;
+    }
+
+    return ret;
+}
+
 static int do_decode(AVCodecContext *avctx, AVPacket *pkt)
 {
     int got_frame;
@@ -2792,6 +3000,9 @@  static int do_decode(AVCodecContext *avctx, AVPacket *pkt)
     } else if (avctx->codec_type == AVMEDIA_TYPE_AUDIO) {
         ret = avcodec_decode_audio4(avctx, avctx->internal->buffer_frame,
                                     &got_frame, pkt);
+    } else if (avctx->codec_type == AVMEDIA_TYPE_SUBTITLE) {
+        ret = decode_subtitle_frame(avctx, avctx->internal->buffer_frame,
+                                    &got_frame, pkt);
     } else {
         ret = AVERROR(EINVAL);
     }
@@ -2941,6 +3152,9 @@  static int do_encode(AVCodecContext *avctx, const AVFrame *frame, int *got_packe
     } else if (avctx->codec_type == AVMEDIA_TYPE_AUDIO) {
         ret = avcodec_encode_audio2(avctx, avctx->internal->buffer_pkt,
                                     frame, got_packet);
+    } else if (avctx->codec_type == AVMEDIA_TYPE_SUBTITLE) {
+        ret = encode_subtitle_frame(avctx, avctx->internal->buffer_pkt,
+                                    frame, got_packet);
     } else {
         ret = AVERROR(EINVAL);
     }
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 7ed4696..5972b27 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -329,6 +329,9 @@  OBJS-$(CONFIG_YUVTESTSRC_FILTER)             += vsrc_testsrc.o
 
 OBJS-$(CONFIG_NULLSINK_FILTER)               += vsink_nullsink.o
 
+# subtitle filters
+OBJS-$(CONFIG_SNULL_FILTER)                  += sf_snull.o
+
 # multimedia filters
 OBJS-$(CONFIG_ADRAWGRAPH_FILTER)             += f_drawgraph.o
 OBJS-$(CONFIG_AHISTOGRAM_FILTER)             += avf_ahistogram.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 82a65ee..e5d20cd 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -45,6 +45,7 @@  void avfilter_register_all(void)
         return;
     initialized = 1;
 
+    /* audio filters */
     REGISTER_FILTER(ABENCH,         abench,         af);
     REGISTER_FILTER(ACOMPRESSOR,    acompressor,    af);
     REGISTER_FILTER(ACROSSFADE,     acrossfade,     af);
@@ -138,6 +139,7 @@  void avfilter_register_all(void)
 
     REGISTER_FILTER(ANULLSINK,      anullsink,      asink);
 
+    /* video filters */
     REGISTER_FILTER(ALPHAEXTRACT,   alphaextract,   vf);
     REGISTER_FILTER(ALPHAMERGE,     alphamerge,     vf);
     REGISTER_FILTER(ASS,            ass,            vf);
@@ -345,6 +347,9 @@  void avfilter_register_all(void)
 
     REGISTER_FILTER(NULLSINK,       nullsink,       vsink);
 
+    /* subtitle filters */
+    REGISTER_FILTER(SNULL,          snull,          sf);
+
     /* multimedia filters */
     REGISTER_FILTER(ADRAWGRAPH,     adrawgraph,     avf);
     REGISTER_FILTER(AHISTOGRAM,     ahistogram,     avf);
@@ -367,10 +372,13 @@  void avfilter_register_all(void)
     /* those filters are part of public or internal API => registered
      * unconditionally */
     REGISTER_FILTER_UNCONDITIONAL(asrc_abuffer);
+    REGISTER_FILTER_UNCONDITIONAL(ssrc_sbuffer);
     REGISTER_FILTER_UNCONDITIONAL(vsrc_buffer);
     REGISTER_FILTER_UNCONDITIONAL(asink_abuffer);
+    REGISTER_FILTER_UNCONDITIONAL(ssink_sbuffer);
     REGISTER_FILTER_UNCONDITIONAL(vsink_buffer);
     REGISTER_FILTER_UNCONDITIONAL(af_afifo);
+    REGISTER_FILTER_UNCONDITIONAL(sf_sfifo);
     REGISTER_FILTER_UNCONDITIONAL(vf_fifo);
     ff_opencl_register_filter_kernel_code_all();
 }
diff --git a/libavfilter/avfilter.c b/libavfilter/avfilter.c
index 1d469c3..908ab2c 100644
--- a/libavfilter/avfilter.c
+++ b/libavfilter/avfilter.c
@@ -314,6 +314,12 @@  int avfilter_config_links(AVFilterContext *filter)
 
                 if (!link->time_base.num && !link->time_base.den)
                     link->time_base = (AVRational) {1, link->sample_rate};
+                break;
+
+            case AVMEDIA_TYPE_SUBTITLE:
+                if (!link->time_base.num && !link->time_base.den)
+                    link->time_base = inlink ? inlink->time_base : AV_TIME_BASE_Q;
+                break;
             }
 
             if (link->src->nb_inputs && link->src->inputs[0]->hw_frames_ctx &&
@@ -1201,6 +1207,8 @@  int ff_filter_frame(AVFilterLink *link, AVFrame *frame)
             av_assert1(frame->width               == link->w);
             av_assert1(frame->height               == link->h);
         }
+    } else if (link->type == AVMEDIA_TYPE_SUBTITLE) {
+        // TODO?
     } else {
         if (frame->format != link->format) {
             av_log(link->dst, AV_LOG_ERROR, "Format change is not supported\n");
diff --git a/libavfilter/buffersink.c b/libavfilter/buffersink.c
index 2feb56d..f38fb1a 100644
--- a/libavfilter/buffersink.c
+++ b/libavfilter/buffersink.c
@@ -388,6 +388,11 @@  static int asink_query_formats(AVFilterContext *ctx)
     return 0;
 }
 
+static av_cold int ssink_init(AVFilterContext *ctx, void *opaque)
+{
+    return common_init(ctx);
+}
+
 #define OFFSET(x) offsetof(BufferSinkContext, x)
 #define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
 static const AVOption buffersink_options[] = {
@@ -405,9 +410,11 @@  static const AVOption abuffersink_options[] = {
     { NULL },
 };
 #undef FLAGS
+static const AVOption sbuffersink_options[] = {NULL};
 
 AVFILTER_DEFINE_CLASS(buffersink);
 AVFILTER_DEFINE_CLASS(abuffersink);
+AVFILTER_DEFINE_CLASS(sbuffersink);
 
 static const AVFilterPad avfilter_vsink_buffer_inputs[] = {
     {
@@ -452,3 +459,24 @@  AVFilter ff_asink_abuffer = {
     .inputs      = avfilter_asink_abuffer_inputs,
     .outputs     = NULL,
 };
+
+static const AVFilterPad avfilter_ssink_sbuffer_inputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_SUBTITLE,
+        .filter_frame = filter_frame,
+    },
+    { NULL }
+};
+
+AVFilter ff_ssink_sbuffer = {
+    .name          = "sbuffersink",
+    .description   = NULL_IF_CONFIG_SMALL("Buffer subtitle frames, and make them available to the end of the filter graph."),
+    .priv_class    = &sbuffersink_class,
+    .priv_size     = sizeof(BufferSinkContext),
+    .init_opaque   = ssink_init,
+    .uninit        = uninit,
+    //.query_formats = ssink_query_formats,
+    .inputs        = avfilter_ssink_sbuffer_inputs,
+    .outputs       = NULL,
+};
diff --git a/libavfilter/buffersrc.c b/libavfilter/buffersrc.c
index 9294811..c20b7b2 100644
--- a/libavfilter/buffersrc.c
+++ b/libavfilter/buffersrc.c
@@ -63,6 +63,9 @@  typedef struct BufferSourceContext {
     uint64_t channel_layout;
     char    *channel_layout_str;
 
+    /* subtitle only */
+    int sub_format;
+
     int got_format_from_params;
     int eof;
 } BufferSourceContext;
@@ -128,6 +131,8 @@  int av_buffersrc_parameters_set(AVFilterContext *ctx, AVBufferSrcParameters *par
         if (param->channel_layout)
             s->channel_layout = param->channel_layout;
         break;
+    case AVMEDIA_TYPE_SUBTITLE:
+        break;
     default:
         return AVERROR_BUG;
     }
@@ -204,6 +209,8 @@  static int av_buffersrc_add_frame_internal(AVFilterContext *ctx,
         CHECK_AUDIO_PARAM_CHANGE(ctx, s, frame->sample_rate, frame->channel_layout,
                                  av_frame_get_channels(frame), frame->format);
         break;
+    case AVMEDIA_TYPE_SUBTITLE:
+        break;
     default:
         return AVERROR(EINVAL);
     }
@@ -271,6 +278,7 @@  unsigned av_buffersrc_get_nb_failed_requests(AVFilterContext *buffer_src)
 #define OFFSET(x) offsetof(BufferSourceContext, x)
 #define A AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_AUDIO_PARAM
 #define V AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
+#define S AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_SUBTITLE_PARAM
 
 static const AVOption buffer_options[] = {
     { "width",         NULL,                     OFFSET(w),                AV_OPT_TYPE_INT,      { .i64 = 0 }, 0, INT_MAX, V },
@@ -306,6 +314,17 @@  static const AVOption abuffer_options[] = {
 
 AVFILTER_DEFINE_CLASS(abuffer);
 
+static const AVOption sbuffer_options[] = {
+    { "time_base",      NULL, OFFSET(time_base),  AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, 0, INT_MAX, S },
+    { "format",         NULL, OFFSET(sub_format), AV_OPT_TYPE_INT,      { .i64 = -1 }, -1, 1, S, "format" },
+        { "unspecified", NULL, 0, AV_OPT_TYPE_CONST, {.i64=-1}, INT_MIN, INT_MAX, S, "format" },
+        { "bitmap",      NULL, 0, AV_OPT_TYPE_CONST, {.i64=0},  INT_MIN, INT_MAX, S, "format" },
+        { "text",        NULL, 0, AV_OPT_TYPE_CONST, {.i64=1},  INT_MIN, INT_MAX, S, "format" },
+    { NULL }
+};
+
+AVFILTER_DEFINE_CLASS(sbuffer);
+
 static av_cold int init_audio(AVFilterContext *ctx)
 {
     BufferSourceContext *s = ctx->priv;
@@ -397,6 +416,10 @@  static int query_formats(AVFilterContext *ctx)
         if ((ret = ff_set_common_channel_layouts(ctx, channel_layouts)) < 0)
             return ret;
         break;
+    case AVMEDIA_TYPE_SUBTITLE:
+        if ((ret = ff_add_format(&formats, c->sub_format)) < 0)
+            return ret;
+        break;
     default:
         return AVERROR(EINVAL);
     }
@@ -424,6 +447,8 @@  static int config_props(AVFilterLink *link)
         if (!c->channel_layout)
             c->channel_layout = link->channel_layout;
         break;
+    case AVMEDIA_TYPE_SUBTITLE:
+        break;
     default:
         return AVERROR(EINVAL);
     }
@@ -461,6 +486,15 @@  static int poll_frame(AVFilterLink *link)
     return size/sizeof(AVFrame*);
 }
 
+static av_cold int init_subtitle(AVFilterContext *ctx)
+{
+    BufferSourceContext *c = ctx->priv;
+
+    if (!(c->fifo = av_fifo_alloc(sizeof(AVFrame*))))
+        return AVERROR(ENOMEM);
+    return 0;
+}
+
 static const AVFilterPad avfilter_vsrc_buffer_outputs[] = {
     {
         .name          = "default",
@@ -510,3 +544,28 @@  AVFilter ff_asrc_abuffer = {
     .outputs   = avfilter_asrc_abuffer_outputs,
     .priv_class = &abuffer_class,
 };
+
+static const AVFilterPad avfilter_ssrc_sbuffer_outputs[] = {
+    {
+        .name          = "default",
+        .type          = AVMEDIA_TYPE_SUBTITLE,
+        .request_frame = request_frame,
+        .poll_frame    = poll_frame,
+        .config_props  = config_props,
+    },
+    { NULL }
+};
+
+AVFilter ff_ssrc_sbuffer = {
+    .name          = "sbuffer",
+    .description   = NULL_IF_CONFIG_SMALL("Buffer subtitle frames, and make them accessible to the filterchain."),
+    .priv_size     = sizeof(BufferSourceContext),
+    .query_formats = query_formats,
+
+    .init          = init_subtitle,
+    .uninit        = uninit,
+
+    .inputs        = NULL,
+    .outputs       = avfilter_ssrc_sbuffer_outputs,
+    .priv_class    = &sbuffer_class,
+};
diff --git a/libavfilter/fifo.c b/libavfilter/fifo.c
index abfbba1..feda5bd 100644
--- a/libavfilter/fifo.c
+++ b/libavfilter/fifo.c
@@ -313,3 +313,34 @@  AVFilter ff_af_afifo = {
     .inputs    = avfilter_af_afifo_inputs,
     .outputs   = avfilter_af_afifo_outputs,
 };
+
+static const AVFilterPad avfilter_sf_sfifo_inputs[] = {
+    {
+        .name             = "default",
+        .type             = AVMEDIA_TYPE_SUBTITLE,
+        .filter_frame     = add_to_queue,
+    },
+    { NULL }
+};
+
+static const AVFilterPad avfilter_sf_sfifo_outputs[] = {
+    {
+        .name          = "default",
+        .type          = AVMEDIA_TYPE_SUBTITLE,
+        .request_frame = request_frame,
+    },
+    { NULL }
+};
+
+AVFilter ff_sf_sfifo = {
+    .name        = "sfifo",
+    .description = NULL_IF_CONFIG_SMALL("Buffer input subtitles and send them when they are requested."),
+
+    .init      = init,
+    .uninit    = uninit,
+
+    .priv_size = sizeof(FifoContext),
+
+    .inputs    = avfilter_sf_sfifo_inputs,
+    .outputs   = avfilter_sf_sfifo_outputs,
+};
diff --git a/libavfilter/formats.c b/libavfilter/formats.c
index 20f45e3..66bf138 100644
--- a/libavfilter/formats.c
+++ b/libavfilter/formats.c
@@ -364,6 +364,10 @@  AVFilterFormats *ff_all_formats(enum AVMediaType type)
                 return NULL;
             fmt++;
         }
+    } else if (type == AVMEDIA_TYPE_SUBTITLE) {
+        if (ff_add_format(&ret, 0 /* bitmap */) < 0 ||
+            ff_add_format(&ret, 1 /* text */) < 0)
+            return NULL;
     }
 
     return ret;
diff --git a/libavfilter/internal.h b/libavfilter/internal.h
index 3856012..cd8db73 100644
--- a/libavfilter/internal.h
+++ b/libavfilter/internal.h
@@ -142,6 +142,14 @@  struct AVFilterPad {
      * input pads only.
      */
     int needs_writable;
+
+    /**
+     * Callback function to get a subtitle frame. If NULL, the filter system
+     * will use ff_default_get_subtitle_buffer().
+     *
+     * Input subtitle pads only.
+     */
+    AVFrame *(*get_subtitle_buffer)(AVFilterLink *link, int nb_rects);
 };
 
 struct AVFilterGraphInternal {
diff --git a/libavfilter/sf_snull.c b/libavfilter/sf_snull.c
new file mode 100644
index 0000000..c20afdd
--- /dev/null
+++ b/libavfilter/sf_snull.c
@@ -0,0 +1,48 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * null subtitle filter
+ */
+
+#include "avfilter.h"
+#include "internal.h"
+
+static const AVFilterPad sf_snull_inputs[] = {
+    {
+        .name = "default",
+        .type = AVMEDIA_TYPE_SUBTITLE,
+    },
+    { NULL }
+};
+
+static const AVFilterPad sf_snull_outputs[] = {
+    {
+        .name = "default",
+        .type = AVMEDIA_TYPE_SUBTITLE,
+    },
+    { NULL }
+};
+
+AVFilter ff_sf_snull = {
+    .name          = "snull",
+    .description   = NULL_IF_CONFIG_SMALL("Pass the source unchanged to the output."),
+    .inputs        = sf_snull_inputs,
+    .outputs       = sf_snull_outputs,
+};
diff --git a/libavfilter/subtitle.c b/libavfilter/subtitle.c
new file mode 100644
index 0000000..88e43d5
--- /dev/null
+++ b/libavfilter/subtitle.c
@@ -0,0 +1,60 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/common.h"
+
+#include "subtitle.h"
+#include "avfilter.h"
+#include "internal.h"
+
+AVFrame *ff_null_get_subtitle_buffer(AVFilterLink *link, int nb_rects)
+{
+    return ff_get_subtitle_buffer(link->dst->outputs[0], nb_rects);
+}
+
+AVFrame *ff_default_get_subtitle_buffer(AVFilterLink *link, int nb_rects)
+{
+    AVFrame *frame = av_frame_alloc();
+    int ret;
+
+    if (!frame)
+        return NULL;
+
+    frame->sub_nb_rects = nb_rects;
+    frame->format       = link->format;
+    ret = av_frame_get_buffer(frame, 0);
+    if (ret < 0) {
+        av_frame_free(&frame);
+        return NULL;
+    }
+
+    return frame;
+}
+
+AVFrame *ff_get_subtitle_buffer(AVFilterLink *link, int nb_samples)
+{
+    AVFrame *ret = NULL;
+
+    if (link->dstpad->get_subtitle_buffer)
+        ret = link->dstpad->get_audio_buffer(link, nb_samples);
+
+    if (!ret)
+        ret = ff_default_get_audio_buffer(link, nb_samples);
+
+    return ret;
+}
diff --git a/libavfilter/subtitle.h b/libavfilter/subtitle.h
new file mode 100644
index 0000000..1db9ebe
--- /dev/null
+++ b/libavfilter/subtitle.h
@@ -0,0 +1,31 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVFILTER_SUBTITLE_H
+#define AVFILTER_SUBTITLE_H
+
+#include "avfilter.h"
+#include "internal.h"
+
+AVFrame *ff_default_get_subtitle_buffer(AVFilterLink *link, int nb_rects);
+
+AVFrame *ff_null_get_subtitle_buffer(AVFilterLink *link, int nb_rects);
+
+AVFrame *ff_get_subtitle_buffer(AVFilterLink *link, int nb_rects);
+
+#endif /* AVFILTER_SUBTITLE_H */
diff --git a/libavfilter/vf_subtitles.c b/libavfilter/vf_subtitles.c
index 0f22644..20b3c77 100644
--- a/libavfilter/vf_subtitles.c
+++ b/libavfilter/vf_subtitles.c
@@ -305,6 +305,10 @@  static av_cold int init_subtitles(AVFilterContext *ctx)
     AVStream *st;
     AVPacket pkt;
     AssContext *ass = ctx->priv;
+    AVFrame *frame = av_frame_alloc();
+
+    if (!frame)
+        return AVERROR(ENOMEM);
 
     /* Init libass */
     ret = init(ctx);
@@ -447,32 +451,32 @@  static av_cold int init_subtitles(AVFilterContext *ctx)
     pkt.data = NULL;
     pkt.size = 0;
     while (av_read_frame(fmt, &pkt) >= 0) {
-        int i, got_subtitle;
-        AVSubtitle sub = {0};
-
         if (pkt.stream_index == sid) {
-            ret = avcodec_decode_subtitle2(dec_ctx, &sub, &got_subtitle, &pkt);
-            if (ret < 0) {
-                av_log(ctx, AV_LOG_WARNING, "Error decoding: %s (ignored)\n",
-                       av_err2str(ret));
-            } else if (got_subtitle) {
-                const int64_t start_time = av_rescale_q(sub.pts, AV_TIME_BASE_Q, av_make_q(1, 1000));
-                const int64_t duration   = sub.end_display_time;
-                for (i = 0; i < sub.num_rects; i++) {
-                    char *ass_line = sub.rects[i]->ass;
+            int i;
+
+            ret = avcodec_send_packet(dec_ctx, &pkt);
+            if (ret < 0)
+                break;
+            // XXX probably not correct api usage
+            ret = avcodec_receive_frame(dec_ctx, frame);
+            if (ret >= 0) {
+                // XXX: honor start/end display time
+                const int64_t start_time = av_rescale_q(frame->pts,          st->time_base, av_make_q(1, 1000));
+                const int64_t duration   = av_rescale_q(frame->pkt_duration, st->time_base, av_make_q(1, 1000));
+
+                for (i = 0; i < frame->sub_nb_rects; i++) {
+                    AVSubtitleRect *rect = (AVSubtitleRect *)frame->extended_data[i]; // XXX move to lavu
+                    char *ass_line = rect->ass;
                     if (!ass_line)
                         break;
-                    if (LIBAVCODEC_VERSION_INT < AV_VERSION_INT(57,25,100))
-                        ass_process_data(ass->track, ass_line, strlen(ass_line));
-                    else
-                        ass_process_chunk(ass->track, ass_line, strlen(ass_line),
-                                          start_time, duration);
+                    ass_process_chunk(ass->track, ass_line, strlen(ass_line),
+                                      start_time, duration);
                 }
             }
+            av_packet_unref(&pkt);
         }
-        av_packet_unref(&pkt);
-        avsubtitle_free(&sub);
     }
+    av_frame_free(&frame);
 
 end:
     av_dict_free(&codec_opts);
diff --git a/libavformat/assenc.c b/libavformat/assenc.c
index d50f18f..97a8de8 100644
--- a/libavformat/assenc.c
+++ b/libavformat/assenc.c
@@ -164,6 +164,9 @@  static int write_packet(AVFormatContext *s, AVPacket *pkt)
     int hh2, mm2, ss2, ms2;
     DialogueLine *dialogue = av_mallocz(sizeof(*dialogue));
 
+    if (pkt->duration < 0)
+        end = INT64_MAX;
+
     if (!dialogue)
         return AVERROR(ENOMEM);
 
diff --git a/libavutil/frame.c b/libavutil/frame.c
index 53e6174..8bb3ed2 100644
--- a/libavutil/frame.c
+++ b/libavutil/frame.c
@@ -216,52 +216,33 @@  fail:
     return AVERROR(ENOMEM);
 }
 
-static int get_audio_buffer(AVFrame *frame, int align)
+static int get_data_buffer(AVFrame *frame, int n, size_t size)
 {
-    int channels;
-    int planar   = av_sample_fmt_is_planar(frame->format);
-    int planes;
-    int ret, i;
-
-    if (!frame->channels)
-        frame->channels = av_get_channel_layout_nb_channels(frame->channel_layout);
-
-    channels = frame->channels;
-    planes = planar ? channels : 1;
-
-    CHECK_CHANNELS_CONSISTENCY(frame);
-    if (!frame->linesize[0]) {
-        ret = av_samples_get_buffer_size(&frame->linesize[0], channels,
-                                         frame->nb_samples, frame->format,
-                                         align);
-        if (ret < 0)
-            return ret;
-    }
+    int i;
 
-    if (planes > AV_NUM_DATA_POINTERS) {
-        frame->extended_data = av_mallocz_array(planes,
-                                          sizeof(*frame->extended_data));
-        frame->extended_buf  = av_mallocz_array((planes - AV_NUM_DATA_POINTERS),
-                                          sizeof(*frame->extended_buf));
+    if (n > AV_NUM_DATA_POINTERS) {
+        frame->extended_data = av_mallocz_array(n, sizeof(*frame->extended_data));
+        frame->extended_buf  = av_mallocz_array(n - AV_NUM_DATA_POINTERS,
+                                                sizeof(*frame->extended_buf));
         if (!frame->extended_data || !frame->extended_buf) {
             av_freep(&frame->extended_data);
             av_freep(&frame->extended_buf);
             return AVERROR(ENOMEM);
         }
-        frame->nb_extended_buf = planes - AV_NUM_DATA_POINTERS;
+        frame->nb_extended_buf = n - AV_NUM_DATA_POINTERS;
     } else
         frame->extended_data = frame->data;
 
-    for (i = 0; i < FFMIN(planes, AV_NUM_DATA_POINTERS); i++) {
-        frame->buf[i] = av_buffer_alloc(frame->linesize[0]);
+    for (i = 0; i < FFMIN(n, AV_NUM_DATA_POINTERS); i++) {
+        frame->buf[i] = av_buffer_alloc(size);
         if (!frame->buf[i]) {
             av_frame_unref(frame);
             return AVERROR(ENOMEM);
         }
         frame->extended_data[i] = frame->data[i] = frame->buf[i]->data;
     }
-    for (i = 0; i < planes - AV_NUM_DATA_POINTERS; i++) {
-        frame->extended_buf[i] = av_buffer_alloc(frame->linesize[0]);
+    for (i = 0; i < n - AV_NUM_DATA_POINTERS; i++) {
+        frame->extended_buf[i] = av_buffer_alloc(size);
         if (!frame->extended_buf[i]) {
             av_frame_unref(frame);
             return AVERROR(ENOMEM);
@@ -269,11 +250,50 @@  static int get_audio_buffer(AVFrame *frame, int align)
         frame->extended_data[i + AV_NUM_DATA_POINTERS] = frame->extended_buf[i]->data;
     }
     return 0;
+}
+
+static int get_audio_buffer(AVFrame *frame, int align)
+{
+    int channels;
+    int planar   = av_sample_fmt_is_planar(frame->format);
+    int planes;
+    int ret;
+
+    if (!frame->channels)
+        frame->channels = av_get_channel_layout_nb_channels(frame->channel_layout);
 
+    channels = frame->channels;
+    planes = planar ? channels : 1;
+
+    CHECK_CHANNELS_CONSISTENCY(frame);
+    if (!frame->linesize[0]) {
+        ret = av_samples_get_buffer_size(&frame->linesize[0], channels,
+                                         frame->nb_samples, frame->format,
+                                         align);
+        if (ret < 0)
+            return ret;
+    }
+
+    return get_data_buffer(frame, planes, frame->linesize[0]);
+}
+
+static int get_subtitle_buffer(AVFrame *frame)
+{
+    int i, ret;
+
+    ret = get_data_buffer(frame, frame->sub_nb_rects, sizeof(AVFrameSubtitleRectangle));
+    if (ret < 0)
+        return ret;
+    for (i = 0; i < frame->sub_nb_rects; i++)
+        memset(frame->extended_data[i], 0, sizeof(AVFrameSubtitleRectangle));
+    return 0;
 }
 
 int av_frame_get_buffer(AVFrame *frame, int align)
 {
+    if (frame->sub_nb_rects)
+        return get_subtitle_buffer(frame);
+
     if (frame->format < 0)
         return AVERROR(EINVAL);
 
@@ -320,6 +340,8 @@  FF_ENABLE_DEPRECATION_WARNINGS
     dst->colorspace             = src->colorspace;
     dst->color_range            = src->color_range;
     dst->chroma_location        = src->chroma_location;
+    dst->sub_start_display      = src->sub_start_display;
+    dst->sub_end_display        = src->sub_end_display;
 
     av_dict_copy(&dst->metadata, src->metadata, 0);
 
@@ -561,6 +583,7 @@  int av_frame_make_writable(AVFrame *frame)
     tmp.channels       = frame->channels;
     tmp.channel_layout = frame->channel_layout;
     tmp.nb_samples     = frame->nb_samples;
+    tmp.sub_nb_rects   = frame->sub_nb_rects;
     ret = av_frame_get_buffer(&tmp, 32);
     if (ret < 0)
         return ret;
@@ -716,11 +739,57 @@  static int frame_copy_audio(AVFrame *dst, const AVFrame *src)
     return 0;
 }
 
+static int frame_copy_subtitle(AVFrame *dst, const AVFrame *src)
+{
+    int i, ret;
+
+    for (i = 0; i < dst->sub_nb_rects; i++) {
+        const AVFrameSubtitleRectangle *src_rect = (const AVFrameSubtitleRectangle *)src->extended_data[i];
+        AVFrameSubtitleRectangle       *dst_rect = (AVFrameSubtitleRectangle *)dst->extended_data[i];
+
+        memset(dst_rect, 0, sizeof(*dst_rect));
+
+        dst_rect->flags = src_rect->flags;
+
+        if (src->format != AV_PIX_FMT_NONE) {
+            /* (Alloc and) copy bitmap subtitle */
+
+            av_assert1(!src_rect->text);
+
+            dst_rect->x = src_rect->x;
+            dst_rect->y = src_rect->y;
+            dst_rect->w = src_rect->w;
+            dst_rect->h = src_rect->h;
+
+            ret = av_image_alloc(dst_rect->data, dst_rect->linesize,
+                                 dst_rect->w, dst_rect->h,
+                                 dst->format, 0);
+            if (ret < 0)
+                return ret;
+            av_image_copy(dst_rect->data, dst_rect->linesize,
+                          src_rect->data, src_rect->linesize,
+                          dst->format, dst_rect->w, dst_rect->h);
+        } else {
+            /* Copy text subtitle */
+
+            av_assert1(!src_rect->x && !src_rect->y);
+            av_assert1(!src_rect->w && !src_rect->w);
+
+            dst_rect->text = av_strdup(src_rect->text);
+            if (!dst_rect->text)
+                return AVERROR(ENOMEM);
+        }
+    }
+    return 0;
+}
+
 int av_frame_copy(AVFrame *dst, const AVFrame *src)
 {
-    if (dst->format != src->format || dst->format < 0)
+    if (dst->format != src->format || (dst->format < 0 && !dst->sub_nb_rects))
         return AVERROR(EINVAL);
 
+    if (dst->sub_nb_rects)
+        return frame_copy_subtitle(dst, src);
     if (dst->width > 0 && dst->height > 0)
         return frame_copy_video(dst, src);
     else if (dst->nb_samples > 0 && dst->channel_layout)
diff --git a/libavutil/frame.h b/libavutil/frame.h
index 8e51361..d531cfa 100644
--- a/libavutil/frame.h
+++ b/libavutil/frame.h
@@ -148,6 +148,25 @@  typedef struct AVFrameSideData {
     AVBufferRef *buf;
 } AVFrameSideData;
 
+#define AV_NUM_DATA_POINTERS 8
+
+/**
+ * This structure describes decoded subtitle rectangle
+ */
+typedef struct AVFrameSubtitleRectangle {
+    int x, y;
+    int w, h;
+
+    /* image data for bitmap subtitles, in AVFrame.format (AVPixelFormat) */
+    uint8_t *data[AV_NUM_DATA_POINTERS];
+    int linesize[AV_NUM_DATA_POINTERS];
+
+    /* decoded text for text subtitles, in ASS */
+    char *text;
+
+    int flags;
+} AVFrameSubtitleRectangle;
+
 /**
  * This structure describes decoded (raw) audio or video data.
  *
@@ -182,7 +201,6 @@  typedef struct AVFrameSideData {
  * for AVFrame can be obtained from avcodec_get_frame_class()
  */
 typedef struct AVFrame {
-#define AV_NUM_DATA_POINTERS 8
     /**
      * pointer to the picture/channel planes.
      * This might be different from the first allocated byte
@@ -224,9 +242,12 @@  typedef struct AVFrame {
      * For packed audio, there is just one data pointer, and linesize[0]
      * contains the total size of the buffer for all channels.
      *
+     * For subtitles, each pointer corresponds to an AVFrameSubtitleRectangle.
+     *
      * Note: Both data and extended_data should always be set in a valid frame,
-     * but for planar audio with more channels that can fit in data,
-     * extended_data must be used in order to access all channels.
+     * but for subtitles and planar audio with more channels that can fit in
+     * data, extended_data must be used in order to access all audio channels
+     * or subtitles rectangles.
      */
     uint8_t **extended_data;
 
@@ -359,10 +380,10 @@  typedef struct AVFrame {
      * also be non-NULL for all j < i.
      *
      * There may be at most one AVBuffer per data plane, so for video this array
-     * always contains all the references. For planar audio with more than
-     * AV_NUM_DATA_POINTERS channels, there may be more buffers than can fit in
-     * this array. Then the extra AVBufferRef pointers are stored in the
-     * extended_buf array.
+     * always contains all the references. For planar audio or subtitles with
+     * more than AV_NUM_DATA_POINTERS channels or rectangles, there may be more
+     * buffers than can fit in this array. Then the extra AVBufferRef pointers
+     * are stored in the extended_buf array.
      */
     AVBufferRef *buf[AV_NUM_DATA_POINTERS];
 
@@ -532,6 +553,29 @@  typedef struct AVFrame {
      * AVHWFramesContext describing the frame.
      */
     AVBufferRef *hw_frames_ctx;
+
+    /**
+     * Number of rectangles available in extended_data
+     * - encoding: set by user (XXX: no encoding yet)
+     * - decoding: set by libavcodec, read by user
+     */
+    int sub_nb_rects;
+
+    /**
+     * Subtitle start display time in AV_TIME_BASE, relative to pts
+     * XXX: semantic for <=0?
+     * - encoding: unused
+     * - decoding: set by libavcodec, read by user.
+     */
+    int64_t sub_start_display;
+
+    /**
+     * Subtitle end display time in AV_TIME_BASE, relative to pts
+     * XXX: semantic for <=0?
+     * - encoding: unused
+     * - decoding: set by libavcodec, read by user.
+     */
+    int64_t sub_end_display;
 } AVFrame;
 
 /**
@@ -641,6 +685,7 @@  void av_frame_move_ref(AVFrame *dst, AVFrame *src);
  * - format (pixel format for video, sample format for audio)
  * - width and height for video
  * - nb_samples and channel_layout for audio
+ * - sub_nb_rects for subtitle
  *
  * This function will fill AVFrame.data and AVFrame.buf arrays and, if
  * necessary, allocate and fill AVFrame.extended_data and AVFrame.extended_buf.
diff --git a/tests/ref/fate/sub-charenc b/tests/ref/fate/sub-charenc
index a056cd1..050807b 100644
--- a/tests/ref/fate/sub-charenc
+++ b/tests/ref/fate/sub-charenc
@@ -23,7 +23,7 @@  Dialogue: 0,0:02:31.61,0:02:33.90,Default,,0,0,0,,- А ти как си?\N- До
 Dialogue: 0,0:02:40.16,0:02:42.79,Default,,0,0,0,,Монахът Дзенг каза,\Nче си в планината Удан.
 Dialogue: 0,0:02:43.04,0:02:46.54,Default,,0,0,0,,Каза, че практикуваш\Nдълбока медитация.
 Dialogue: 0,0:02:48.84,0:02:50.68,Default,,0,0,0,,Сигурно в планината\Nе много спокойно.
-Dialogue: 0,0:02:51.25,0:02:53.46,Default,,0,0,0,,Завиждам ти.
+Dialogue: 0,0:02:51.26,0:02:53.47,Default,,0,0,0,,Завиждам ти.
 Dialogue: 0,0:02:53.67,0:02:58.34,Default,,0,0,0,,Имам толкова много работа,\Nпочти не ми остава\Nвреме за почивка.
 Dialogue: 0,0:03:00.26,0:03:03.89,Default,,0,0,0,,Оставих обучението рано.
 Dialogue: 0,0:03:05.69,0:03:11.28,Default,,0,0,0,,Защо? Ти си боец на Удан.\NОбучението е всичко.
@@ -40,7 +40,7 @@  Dialogue: 0,0:03:53.40,0:03:56.57,Default,,0,0,0,,Не можах да издъ
 Dialogue: 0,0:03:57.49,0:03:59.74,Default,,0,0,0,,Прекъснах медитацията си.
 Dialogue: 0,0:03:59.95,0:04:02.24,Default,,0,0,0,,Не можах да продължа.
 Dialogue: 0,0:04:03.20,0:04:07.79,Default,,0,0,0,,Нещо...\Nме дърпаше назад.
-Dialogue: 0,0:04:09.62,0:04:10.91,Default,,0,0,0,,Какво беше?
+Dialogue: 0,0:04:09.63,0:04:10.92,Default,,0,0,0,,Какво беше?
 Dialogue: 0,0:04:15.46,0:04:18.00,Default,,0,0,0,,Нещо, от което не\Nмога да се освободя.
 Dialogue: 0,0:04:23.39,0:04:24.68,Default,,0,0,0,,Скоро ли ще тръгваш?
 Dialogue: 0,0:04:26.77,0:04:30.27,Default,,0,0,0,,Подготвяме охрана\Nза една доставка...
@@ -52,7 +52,7 @@  Dialogue: 0,0:04:48.37,0:04:52.67,Default,,0,0,0,,Да. Той винаги е 
 Dialogue: 0,0:04:52.88,0:04:56.55,Default,,0,0,0,,Не разбирам.\NКак можеш да се разделиш с него?
 Dialogue: 0,0:04:56.76,0:04:59.93,Default,,0,0,0,,Той винаги е бил с теб.
 Dialogue: 0,0:05:01.18,0:05:05.52,Default,,0,0,0,,Твърде много хора са\Nзагинали от това острие.
-Dialogue: 0,0:05:09.68,0:05:14.52,Default,,0,0,0,,Чисто е единствено защото\Nкръвта се отмива лесно.
+Dialogue: 0,0:05:09.69,0:05:14.53,Default,,0,0,0,,Чисто е единствено защото\Nкръвта се отмива лесно.
 Dialogue: 0,0:05:15.40,0:05:20.61,Default,,0,0,0,,Ти го използваш справедливо.\NДостоен си за него.
 Dialogue: 0,0:05:23.66,0:05:27.37,Default,,0,0,0,,Дойде време\Nда го оставя.
 Dialogue: 0,0:05:27.58,0:05:31.21,Default,,0,0,0,,Е, какво ще правиш\Nот сега нататък?