Message ID | 20161102220934.26010-1-u@pkh.me |
---|---|
State | New |
Headers | show |
2016-11-02 23:09 GMT+01:00 Clément Bœsch <u@pkh.me>:
> - find a test case for dvbsub
Please elaborate;-)
Thanks for working on this, Carl Eugen
On Wed, Nov 02, 2016 at 11:11:06PM +0100, Carl Eugen Hoyos wrote: > 2016-11-02 23:09 GMT+01:00 Clément Bœsch <u@pkh.me>: > > > - find a test case for dvbsub > > Please elaborate;-) > I need a FATE test involving dvb transcoding.
On Wed, 2 Nov 2016, Clément Bœsch wrote: > --- > So this is just a prototype that is starting to work. It needs a split > (and completion of the long TODO list) before review but I wanted to > share it for early feedbacks. Great, thanks! > So long story short: AVSubtitle → AVFrame, you have to use the new M:N > API, and it's integrated into lavfi. > > Now a bit longer: the AVFrame->extended_data are > AVFrameSubtitleRectangle (declared in lavu/frame). Wouldn't it make sense to simply use video AVFrames here instead of a separate struct? Thanks, Marton
Le duodi 12 brumaire, an CCXXV, Clement Boesch a écrit : > --- > So this is just a prototype that is starting to work. It needs a split > (and completion of the long TODO list) before review but I wanted to > share it for early feedbacks. > > So long story short: AVSubtitle → AVFrame, you have to use the new M:N > API, and it's integrated into lavfi. Yay! > Now a bit longer: the AVFrame->extended_data are > AVFrameSubtitleRectangle (declared in lavu/frame). Frames are > refcounted. So far this is not really meaningful as a frame ref will > alloc and copy every rectangle but that's a limitation we will have to > deal with for now but it allows proper integration into the framework. Can you briefly explain how you handled the duration and sparseness? Also, I see you kept the text field a simple string with ASS syntax. Is it final? > It actually simplifies a lot the workflow so I'm optimistic about the > end result. Unfortunately, there is a lot of work remaining. > > End note: despite my obsession with subtitles, I'd like to remind > everyone that I actually hate working with them, I just believe it's > really important to get that historical burden out of sight. That means, > if anyone wants to help with that, they are extremelly welcome. > > Thanks, > > TODO: > - what to do about sub2video? The plan was always to remove it as soon as a proper solution in libavfilter was implemented and does the same job. Preferably with a short transitional period for the users, but not if it is too inconvenient. > - properly deal with -ss and -t (need strim filter?) > - sub_start_display/sub_end_display needs to be honored > - find a test case for dvbsub as it's likely broken (ffmpeg.c hack is > removed and should be replaced by a EAGAIN logic in lavc/utils.c) > - make it pass FATE: > * fix cc/subcc > * broke various other stuff > - split commit > - Changelog/APIchanges > - proper API doxy > - update lavfi/subtitles? Regards,
On Thu, Nov 03, 2016 at 01:05:01AM +0100, Marton Balint wrote: [...] > > So long story short: AVSubtitle → AVFrame, you have to use the new M:N > > API, and it's integrated into lavfi. > > > > Now a bit longer: the AVFrame->extended_data are > > AVFrameSubtitleRectangle (declared in lavu/frame). > > Wouldn't it make sense to simply use video AVFrames here instead of a > separate struct? > mmh. AVFrame for each rectangle? So we will have AVFrame in AVFrame, and the need to add fields such as x and y in that AVFrame? I'm not too fond of this but it could be considered yes
(sorry for initially replying in private by mistake) On Thu, Nov 03, 2016 at 07:36:29PM +0100, Nicolas George wrote: > Le duodi 12 brumaire, an CCXXV, Clement Boesch a écrit : > > --- > > So this is just a prototype that is starting to work. It needs a split > > (and completion of the long TODO list) before review but I wanted to > > share it for early feedbacks. > > > > So long story short: AVSubtitle → AVFrame, you have to use the new M:N > > API, and it's integrated into lavfi. > > Yay! > > > Now a bit longer: the AVFrame->extended_data are > > AVFrameSubtitleRectangle (declared in lavu/frame). Frames are > > refcounted. So far this is not really meaningful as a frame ref will > > alloc and copy every rectangle but that's a limitation we will have to > > deal with for now but it allows proper integration into the framework. > > Can you briefly explain how you handled the duration and sparseness? > I didn't. Duration is somehow broken currently for some reason. I did nothing for sparseness: the reason I added basic support in lavfi is because it was much simpler to handle at ffmpeg.c level, so it's currently just a passthrough mechanism. > Also, I see you kept the text field a simple string with ASS syntax. Is > it final? > I did't like having multiple fields for text based data. If we want to decode in another form, we can still add an option to print out verbatim text instead of ASS markup. > > It actually simplifies a lot the workflow so I'm optimistic about the > > end result. Unfortunately, there is a lot of work remaining. > > > > End note: despite my obsession with subtitles, I'd like to remind > > everyone that I actually hate working with them, I just believe it's > > really important to get that historical burden out of sight. That > > means, > > if anyone wants to help with that, they are extremelly welcome. > > > > Thanks, > > > > TODO: > > - what to do about sub2video? > > The plan was always to remove it as soon as a proper solution in > libavfilter was implemented and does the same job. Preferably with a > short transitional period for the users, but not if it is too > inconvenient. > Yeah I guess I'll need to write a filter to blend in the final patchset submission. > > - properly deal with -ss and -t (need strim filter?) > > - sub_start_display/sub_end_display needs to be honored > > - find a test case for dvbsub as it's likely broken (ffmpeg.c hack is > > removed and should be replaced by a EAGAIN logic in lavc/utils.c) > > - make it pass FATE: > > * fix cc/subcc > > * broke various other stuff > > - split commit > > - Changelog/APIchanges > > - proper API doxy > > - update lavfi/subtitles? > > Regards, > > -- > Nicolas George
Le septidi 17 brumaire, an CCXXV, Clement Boesch a écrit : > I didn't. Duration is somehow broken currently for some reason. I did > nothing for sparseness: the reason I added basic support in lavfi is > because it was much simpler to handle at ffmpeg.c level, so it's currently > just a passthrough mechanism. A decision will need to be made pretty soon, though, or it will not help anything much. Sparseness will make things break badly OOM-killer-style if someone tries to use subtitles coming from the same multiplexed file than the corresponding video. Duration affects the visible API, a decision on it is more or less final. Here are the results of my current thoughts: For sparseness, the solution is to use heartbeat frames: if you have a subtitle event at 10s and the next one at 50s, emit a frame with no payload and just a timestamp at 10.1, 10.2, ... 49.1. Whoever is supposed to emit the frame can be decided later. The simplest idea is to connect sbuffersrc to a master vbuffersrc: when a frame is added on the master, consider generating heartbeat frames on the slaves. The code needs to be ready immediately to handle the heartbeat frames. At least have a way of expressing them: maybe data[0] == NULL? And the code needs to not segfault on them. The duration issue is more tricky, because there are so many cases. Here is a scheme I think should work: Each subtitle screen decodes into two subtitles frames: one to show it, one to clear it. The clear frame needs to reference the corresponding start frame, to allow for overlap. For example, the following ASS dialogue: Dialogue: 0,0:00:10.00,0:00:15.00,,,,,,,Long dialogue line. Dialogue: 0,0:00:12.00,0:00:13.00,,,,,,,Short. would decode into: pts=10 id=42 text="Long dialogue line." pts=12 id=43 text="Short." pts=13 id=43 clear pts=15 id=42 clear When the duration is entirely reliable (text files read all at once and correctly processed), the decoder generates both frames immediately and keeps the clear frame in a reorder buffer. When the duration is not entirely reliable, the decoder should generate the clear frame when it gets the corresponding packet (either the clear packet or the next start packet). If the duration is known but not reliable (dvdsub, dvbsub), the decoder should use it as a cap when waiting for the actual end. The decoder needs some kind of heartbeat flush API to get the pending clear frames. We may want an option to disable internal reordering and get clear frames immediately. When the duration is not known but not reliable, we may set some kind of "expiration timestamp" on the start frame, but I am not sure it is necessary. Whether the duration is reliable or not is a property of both the codec and the format. For example, mkvmerge does not de-overlap events when muxing SRT into Matroska, therefore the duration is not known. On the other hand, when lavf reads directly from a SRT file, it can de-overlap easily. I suppose it means AVCodecParameters needs an extra field. > I did't like having multiple fields for text based data. If we want to > decode in another form, we can still add an option to print out verbatim > text instead of ASS markup. I think we are not talking about the same thing. A long time ago, we considered replacing the ASS markup with a simple text field with styling in a separate, non-text, structure. Did you discard that idea? For CSS-based subtitles, a richer data structure would make it slightly less hard to preserve the structure of the styling information. > Yeah I guess I'll need to write a filter to blend in the final patchset > submission. Or just a sub->video filter. Regards,
On Fri, 11 Nov 2016 12:45:03 +0100 Nicolas George <george@nsup.org> wrote: > Le septidi 17 brumaire, an CCXXV, Clement Boesch a écrit : > > I didn't. Duration is somehow broken currently for some reason. I did > > nothing for sparseness: the reason I added basic support in lavfi is > > because it was much simpler to handle at ffmpeg.c level, so it's currently > > just a passthrough mechanism. > > A decision will need to be made pretty soon, though, or it will not help > anything much. Sparseness will make things break badly OOM-killer-style > if someone tries to use subtitles coming from the same multiplexed file > than the corresponding video. Duration affects the visible API, a > decision on it is more or less final. Here are the results of my current > thoughts: > > For sparseness, the solution is to use heartbeat frames: if you have a > subtitle event at 10s and the next one at 50s, emit a frame with no > payload and just a timestamp at 10.1, 10.2, ... 49.1. Sounds like a bad idea. > > Whoever is supposed to emit the frame can be decided later. The simplest > idea is to connect sbuffersrc to a master vbuffersrc: when a frame is > added on the master, consider generating heartbeat frames on the slaves. > > The code needs to be ready immediately to handle the heartbeat frames. > At least have a way of expressing them: maybe data[0] == NULL? And the > code needs to not segfault on them. > > The duration issue is more tricky, because there are so many cases. Here > is a scheme I think should work: > > Each subtitle screen decodes into two subtitles frames: one to show it, > one to clear it. The clear frame needs to reference the corresponding > start frame, to allow for overlap. > > For example, the following ASS dialogue: > > Dialogue: 0,0:00:10.00,0:00:15.00,,,,,,,Long dialogue line. > Dialogue: 0,0:00:12.00,0:00:13.00,,,,,,,Short. > > would decode into: > > pts=10 id=42 text="Long dialogue line." > pts=12 id=43 text="Short." > pts=13 id=43 clear > pts=15 id=42 clear > > When the duration is entirely reliable (text files read all at once and > correctly processed), the decoder generates both frames immediately and > keeps the clear frame in a reorder buffer. > > When the duration is not entirely reliable, the decoder should generate > the clear frame when it gets the corresponding packet (either the clear > packet or the next start packet). If the duration is known but not > reliable (dvdsub, dvbsub), the decoder should use it as a cap when > waiting for the actual end. > > The decoder needs some kind of heartbeat flush API to get the pending > clear frames. We may want an option to disable internal reordering and > get clear frames immediately. > > When the duration is not known but not reliable, we may set some kind of > "expiration timestamp" on the start frame, but I am not sure it is > necessary. > > Whether the duration is reliable or not is a property of both the codec > and the format. For example, mkvmerge does not de-overlap events when > muxing SRT into Matroska, therefore the duration is not known. On the > other hand, when lavf reads directly from a SRT file, it can de-overlap > easily. I suppose it means AVCodecParameters needs an extra field. > > > I did't like having multiple fields for text based data. If we want to > > decode in another form, we can still add an option to print out verbatim > > text instead of ASS markup. > > I think we are not talking about the same thing. A long time ago, we > considered replacing the ASS markup with a simple text field with > styling in a separate, non-text, structure. Did you discard that idea? > > For CSS-based subtitles, a richer data structure would make it slightly > less hard to preserve the structure of the styling information. > > > Yeah I guess I'll need to write a filter to blend in the final patchset > > submission. > > Or just a sub->video filter. > > Regards, >
Le primidi 21 brumaire, an CCXXV, wm4 a écrit :
> Sounds like a bad idea.
Do you have a better one?
On Fri, 11 Nov 2016 12:49:31 +0100 Nicolas George <george@nsup.org> wrote: > Le primidi 21 brumaire, an CCXXV, wm4 a écrit : > > Sounds like a bad idea. > > Do you have a better one? No, but I'm confident that you can find one after having thought about the problem for so long.
Le primidi 21 brumaire, an CCXXV, wm4 a écrit : > No, but I'm confident that you can find one after having thought about > the problem for so long. Well, I have thought about it for long, and not only is this the solution I have found, but I do not think it is a "bad idea" at all. And I would really appreciate if in the future you refrained from that kind of useless empty remark. You can raise practical concerns, ask for explanations or rationales, of course. But a purely negative reply that took you all of three minutes in answer to the result of years of design is just incredibly rude. Regards,
On Fri, 11 Nov 2016 13:45:54 +0100 Nicolas George <george@nsup.org> wrote: > Le primidi 21 brumaire, an CCXXV, wm4 a écrit : > > No, but I'm confident that you can find one after having thought about > > the problem for so long. > > Well, I have thought about it for long, and not only is this the > solution I have found, but I do not think it is a "bad idea" at all. OK, let's think about alternative approaches. How about not trying to let libavfilter do synchronizing subtitles and other streams in all cases? Why not just send nothing to the subtitle buffer sources if the subtitles are sparse and there is no new packet? If it's sparse, make it sparse. I assume the whole point of this exercise is to prevent excessive buffering (e.g. not trying to read the next subtitle packet, which might read most of the file, and necessitate buffering it in memory). E.g. if you overlay video on subtitles, you'd normally require new frames for both subtitles and video. If you'd treat subtitles like video and audio, you'd have to try to read the next subtitle packet to know, well, whether there's a new subtitle or whether to use the previous one. If I understood this correctly, you want to send empty frames every 100ms if the duration of a subtitle is unknown. Why is it 100ms? Does it make a difference if it's 100ms or a few seconds (or until EOF) until the subtitle is naturally terminated? Why not 1ms? This seems like a very "approximate" solution - sure, it works, but it's akin to using polling and sleep calls in I/O or multithreadeded code. Maybe a heartbeat on every video frame? What if there's no video stream? Video can change timestamps and frame count, video could go sparser un-sparse at certain points. This related idea of slaving sbuffersrc to a master vbuffersrc. This approach works on video output, while sparseness is really a problem at the source (i.e. demuxer). It's questionable how this would work with subtitles demuxed from separate files (which also might have A/V streams). It also works on the video output, while the issue of subtitles with unknown duration is mostly a demuxing issue. What happens if there's a video->sub filter, how would it send heartbeats? Would it require a new libavfilter graph syntax for filters that generate subtitles within the graph, and would it require users to explicitly specify a "companion" video source? The whole problem is that it's hard to do determine whether a new subtitle frame should be available at a certain point inside libavfilter, which in turn is hard because with how generic libavfilter wants to be. It seems to me that not libavfilter should handle this, but the one which feeds libavfilter. If it feeds a new video frame to it, but no subtitle frame, it means there is no new subtitle yet due to sparseness. There is actually no need for weird heartbeat frames. The libavfilter API user is in the position to know whether the subtitle demuxer/decoder can produce a new packet/frame. It would be crazy if the API user had to send heartbeat frames in these situations, and had to care about how many heartbeats are sent when. In complex cases (where audio/video/subs are connected in a non-trivial way, possibly converting to each other at certain points), the user would have to be careful which buffersinks to read in order not to trigger excessive readahead. Also the user would possibly have to disable "automagic" synchronization mechanisms in other parts of libavfilter. Even then, you would need to "filter" sparse frames (update their timestamps, produce new ones, etc). This sounds very complex. What about filtering subtitles alone? This should be possible? Why would libavfilter in general be responsible to sync subtitles and video anyway? It should do that only on filters which have both subtitle and video inputs or so. Why does this need decoder enhancements anyway? How about it just uses the API in its current extent, which applications have handled for years? Again, special snowflake libavfilter/ffmpeg.c. Btw. video frames can also be sparse (think of mp4s that contain slide shows). Are we going to get video heartbeat frames? How are all the filters going to handle it? Even for not-sparse video, there seem to be cases (possibly fixed now) where libavfilter just excessively buffers when using ffmpeg.c. I'm still fighting such cases with my own libavfilter API use. (Interestingly, this often involves sparse video.) (Oh, and I don't claim to have understood the problem in its whole extent. But I do have a lot of experience with subtitles and unfortunately also with using libavfilter in a "generic" way.) > And I would really appreciate if in the future you refrained from that > kind of useless empty remark. You can raise practical concerns, ask for > explanations or rationales, of course. But a purely negative reply that > took you all of three minutes in answer to the result of years of design > is just incredibly rude. I find your conduct incredibly rude as well. It's not nice to take every reply as an offense, instead of e.g. starting a discussion. It's also not nice to call my remarks "useless". No, pointing out that a solution is sub-optimal is not rudeness. Why are you asking me for a better solution? You're the one who wants subtitles in libavfilter, not me. Thus it's your responsibility to come up with a good design. If there's no good design, then it's a sure sign that it's not a good idea to have subtitles in libavfilter. Indeed, subtitles are incredibly complex, and there are many cases that somehow need to be handled in a generic way, and it's not necessarily a good idea to add this complexity to libavfilter, which designed to handle audio and video data (and even after years of work, isn't that good at handling audio and video at the same time). It's like trying to push a square peg through a round hole. I haven't come to the conclusion yet that this is the case, so hold that thought. Please try not to reply to every paragraph separately or so. This makes it hard to follow discussions. In fact, I won't take part in a discussion of that kind. It wastes so much time because you can get lost in meaningless details.
Le primidi 21 brumaire, an CCXXV, wm4 a écrit : > OK, let's think about alternative approaches. How about not trying to > let libavfilter do synchronizing subtitles and other streams in all > cases? Why not just send nothing to the subtitle buffer sources if the > subtitles are sparse and there is no new packet? If it's sparse, make > it sparse. Sometimes a filter needs sync. Therefore the framework must be able to provide it. Your suggestion does not allow that. > I assume the whole point of this exercise is to prevent excessive > buffering (e.g. not trying to read the next subtitle packet, which > might read most of the file, and necessitate buffering it in memory). > E.g. if you overlay video on subtitles, you'd normally require new > frames for both subtitles and video. If you'd treat subtitles like > video and audio, you'd have to try to read the next subtitle packet to > know, well, whether there's a new subtitle or whether to use the > previous one. You understand the issue. > If I understood this correctly, you want to send empty frames every > 100ms if the duration of a subtitle is unknown. Why is it 100ms? Does > it make a difference if it's 100ms or a few seconds (or until EOF) > until the subtitle is naturally terminated? Why not 1ms? This seems > like a very "approximate" solution - sure, it works, but it's akin to > using polling and sleep calls in I/O or multithreadeded code. That was just an example to help readers understand. > Maybe a heartbeat on every video frame? That is exactly the idea, you did not read the mail carefully enough before judging it "bad idea". > What if there's no video > stream? Then there is probably no need to sync. > This related idea of slaving sbuffersrc to a master vbuffersrc. This > approach works on video output, I have no idea why you start talking about output. > while sparseness is really a problem at > the source (i.e. demuxer). It's questionable how this would work with > subtitles demuxed from separate files When the subtitles are demuxed from a file separate from the stream they need to sync, the next frame is available on demand, and therefore the heartbeat frames are not needed, nor generated. > (which also might have A/V > streams). If that happens, it seems like a very strange and pathological use case. FFmpeg can not perform miracles. > It also works on the video output, while the issue of > subtitles with unknown duration is mostly a demuxing issue. What Again, I do not know why you speak about output. > happens if there's a video->sub filter, how would it send heartbeats? I do not know, please tell me what it does exactly and I will be able to answer you. > Would it require a new libavfilter graph syntax for filters that > generate subtitles within the graph, and would it require users to > explicitly specify a "companion" video source? Probably not. > The whole problem is that it's hard to do determine whether a new > subtitle frame should be available at a certain point inside > libavfilter, which in turn is hard because with how generic libavfilter > wants to be. Exactly. > It seems to me that not libavfilter should handle this, but the one > which feeds libavfilter. Its collaboration is mandatory. But the bulk of the work should be done by the library, not duplicated in all applications. > If it feeds a new video frame to it, but no > subtitle frame, it means there is no new subtitle yet due to > sparseness. Libavfilter can not know when there is "no subtitle frame", only when there is one. Hence the need to tell it explicitly: > There is actually no need for weird heartbeat frames. ... with a heartbeat frame. > The > libavfilter API user is in the position to know whether the subtitle > demuxer/decoder can produce a new packet/frame. It would be crazy if ^^^^^ Disparaging judgement without grounds. > the API user had to send heartbeat frames in these situations, and > had to care about how many heartbeats are sent when. Good thing it does not, then. > In complex cases (where audio/video/subs are connected in a > non-trivial way, possibly converting to each other at certain points), > the user would have to be careful which buffersinks to read in order > not to trigger excessive readahead. Also the user would possibly have to > disable "automagic" synchronization mechanisms in other parts of > libavfilter. > > Even then, you would need to "filter" sparse frames (update their > timestamps, produce new ones, etc). This sounds very complex. Yes, it is complex. The problem is complex, the solution has little chance of being simple. > What about filtering subtitles alone? This should be possible? Yes. > Why would libavfilter in general be responsible to sync subtitles and > video anyway? It should do that only on filters which have both > subtitle and video inputs or so. I do not understand why you are trying to make a distinction between lavfi and individual filters. The sparseness issue is not about where the code reside but about what information is available or not. > Why does this need decoder enhancements anyway? How about it just uses > the API in its current extent, which applications have handled for > years? Again, special snowflake libavfilter/ffmpeg.c. Mu. > Btw. video frames can also be sparse (think of mp4s that contain slide > shows). Are we going to get video heartbeat frames? How are all the > filters going to handle it? Currently they do not, but this is a separate issue. > Even for not-sparse video, there seem to be cases (possibly fixed now) > where libavfilter just excessively buffers when using ffmpeg.c. I'm > still fighting such cases with my own libavfilter API use. > (Interestingly, this often involves sparse video.) The case with not-sparse video involve sparse video? > (Oh, and I don't claim to have understood the problem in its whole > extent. But I do have a lot of experience with subtitles and > unfortunately also with using libavfilter in a "generic" way.) > > > And I would really appreciate if in the future you refrained from that > > kind of useless empty remark. You can raise practical concerns, ask for > > explanations or rationales, of course. But a purely negative reply that > > took you all of three minutes in answer to the result of years of design > > is just incredibly rude. > > I find your conduct incredibly rude as well. It's not nice to take > every reply as an offense, instead of e.g. starting a discussion. > It's also not nice to call my remarks "useless". > > No, pointing out that a solution is sub-optimal is not rudeness. Indeed. Next time, do that. Give arguments, ask question. Refrain from a disparaging blanked judgement. > Why are you asking me for a better solution? Because you are the one who claimed the solution I propose is bad. Unless the w in your nickname actually means Winston, claiming that a solution is bad when the is no better one is just a waste of time. > You're the one who wants > subtitles in libavfilter, not me. You could have noticed that I am far from being the only one. > Thus it's your responsibility to come > up with a good design. I did. > If there's no good design, then it's a sure sign > that it's not a good idea to have subtitles in libavfilter. Indeed, Or it is a sure sign that your aesthetics considerations are skewed. > subtitles are incredibly complex, and there are many cases that > somehow need to be handled in a generic way, and it's not necessarily a > good idea to add this complexity to libavfilter, which designed to > handle audio and video data (and even after years of work, isn't that > good at handling audio and video at the same time). It's like trying to > push a square peg through a round hole. I haven't come to the > conclusion yet that this is the case, so hold that thought. > > Please try not to reply to every paragraph separately or so. This makes > it hard to follow discussions. In fact, I won't take part in a > discussion of that kind. It wastes so much time because you can get lost > in meaningless details. This is exactly what I did because this is exactly what needed to be done for that kind of discussion.
On Fri, 11 Nov 2016 15:36:04 +0100 Nicolas George <george@nsup.org> wrote: > > It seems to me that not libavfilter should handle this, but the one > > which feeds libavfilter. > > Its collaboration is mandatory. But the bulk of the work should be done > by the library, not duplicated in all applications. I'm saying that a good solution is application-specific. It depends on the exact data flow depending on what kind of filter graph is used. > > If it feeds a new video frame to it, but no > > subtitle frame, it means there is no new subtitle yet due to > > sparseness. > > Libavfilter can not know when there is "no subtitle frame", only when > there is one. Hence the need to tell it explicitly: > > > There is actually no need for weird heartbeat frames. > > ... with a heartbeat frame. libavfilter still doesn't know whether there are subtitle frames at a future timestamps if you send heartbeat frames. What I'm suggesting is that the API user has to be aware of the requirements his filter chain might have on subtitle input. Thus, if the user requests output for which a subtitle frame would have been required, but it didn't send one, libavfilter assumes that there is none. If you want your heartbeat frames to work, they would have to go into the reverse direction: send the timestamp up to the source filters, let the user check the timestamp, and then send a heartbeat frame if for this _specific_ timestamp no subtitle frame will be available. But that is also complex. > > Please try not to reply to every paragraph separately or so. This makes > > it hard to follow discussions. In fact, I won't take part in a > > discussion of that kind. It wastes so much time because you can get lost > > in meaningless details. > > This is exactly what I did because this is exactly what needed to be > done for that kind of discussion. I consider this reply very disrespectful. Maybe read the CoC?
Le primidi 21 brumaire, an CCXXV, wm4 a écrit : > I'm saying that a good solution is application-specific. It depends on > the exact data flow depending on what kind of filter graph is used. The library should still try to make things as simple as possible. > libavfilter still doesn't know whether there are subtitle frames at > a future timestamps if you send heartbeat frames. Care to explain that statement? Apparently you still have not understood how it is supposed to work. Here is the problem, once and for all. Consider a subtitle_hardburn filter with a video input and a subtitle input. The last subtitle segment ended at t=500, the next one starts at t=1500. A video frames arrives at t=1000. In order to process that frame, the filter needs to know that there is no subtitle to hardburn at that time, and for that, it needs the next subtitle frame; thus, it buffers the video frame and requests a subtitle one. If the subtitle comes from a separate file, the frame at t=1500 is read, injected to the filter, and it works. If the subtitle comes from the same file, trying to read that frame cause the video at t=1000.1 to be read, then at t=1000.2... until t=1499.9, thus buffering 5000 video frames before being able to process. With heartbeat frames, the video frame at t=1000.1 generates one. When it arrives to the filter, the filter can do its work. Hence, the heartbeat frames reduced the latency to huge and unbounded to just one. > What I'm suggesting is that the API user has to be aware of the > requirements his filter chain might have on subtitle input. Thus, if > the user requests output for which a subtitle frame would have been > required, but it didn't send one, libavfilter assumes that there is > none. Therefore, you are trying to turn the request_frame() call into the equivalent of the heartbeat frames. Sorry, it can not work that way. Data in libavfilter has to flow forwards, not backwards. > If you want your heartbeat frames to work, They work the way I described. > they would have to go into > the reverse direction: send the timestamp up to the source filters, let > the user check the timestamp, and then send a heartbeat frame if for > this _specific_ timestamp no subtitle frame will be available. But that > is also complex. Maybe that could work, but I am very dubious because of the big picture. But even if it does, it would mean a huge change in all the API. I like my idea better, and at least I know it works. > I consider this reply very disrespectful. It is no more disrespectful than trying to dictate how I should answer in the first place. > Maybe read the CoC? Please do.
On Wed, 2 Nov 2016 23:09:34 +0100 Clément Bœsch <u@pkh.me> wrote: > --- > So this is just a prototype that is starting to work. It needs a split > (and completion of the long TODO list) before review but I wanted to > share it for early feedbacks. > > So long story short: AVSubtitle → AVFrame, you have to use the new M:N > API, and it's integrated into lavfi. > > Now a bit longer: the AVFrame->extended_data are > AVFrameSubtitleRectangle (declared in lavu/frame). Frames are > refcounted. So far this is not really meaningful as a frame ref will > alloc and copy every rectangle but that's a limitation we will have to > deal with for now but it allows proper integration into the framework. > > It actually simplifies a lot the workflow so I'm optimistic about the > end result. Unfortunately, there is a lot of work remaining. > > End note: despite my obsession with subtitles, I'd like to remind > everyone that I actually hate working with them, I just believe it's > really important to get that historical burden out of sight. That means, > if anyone wants to help with that, they are extremelly welcome. > > Thanks, > > TODO: > - what to do about sub2video? > - properly deal with -ss and -t (need strim filter?) > - sub_start_display/sub_end_display needs to be honored > - find a test case for dvbsub as it's likely broken (ffmpeg.c hack is > removed and should be replaced by a EAGAIN logic in lavc/utils.c) > - make it pass FATE: > * fix cc/subcc > * broke various other stuff > - split commit > - Changelog/APIchanges > - proper API doxy > - update lavfi/subtitles? > --- > ffmpeg.c | 231 ++++++++++++++++----------------------------- > ffmpeg_filter.c | 94 +++++++++++++++++- > ffmpeg_opt.c | 33 ++++++- > libavcodec/avcodec.h | 9 ++ > libavcodec/utils.c | 214 +++++++++++++++++++++++++++++++++++++++++ > libavfilter/Makefile | 3 + > libavfilter/allfilters.c | 8 ++ > libavfilter/avfilter.c | 8 ++ > libavfilter/buffersink.c | 28 ++++++ > libavfilter/buffersrc.c | 59 ++++++++++++ > libavfilter/fifo.c | 31 ++++++ > libavfilter/formats.c | 4 + > libavfilter/internal.h | 8 ++ > libavfilter/sf_snull.c | 48 ++++++++++ > libavfilter/subtitle.c | 60 ++++++++++++ > libavfilter/subtitle.h | 31 ++++++ > libavfilter/vf_subtitles.c | 42 +++++---- > libavformat/assenc.c | 3 + > libavutil/frame.c | 131 +++++++++++++++++++------ > libavutil/frame.h | 59 ++++++++++-- > tests/ref/fate/sub-charenc | 6 +- > 21 files changed, 895 insertions(+), 215 deletions(-) > create mode 100644 libavfilter/sf_snull.c > create mode 100644 libavfilter/subtitle.c > create mode 100644 libavfilter/subtitle.h > > diff --git a/ffmpeg.c b/ffmpeg.c > index 36a921a..8222b8b 100644 > --- a/ffmpeg.c > +++ b/ffmpeg.c Can't comment on those. > diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h > index 211112f..0912bff 100644 > --- a/libavcodec/avcodec.h > +++ b/libavcodec/avcodec.h > @@ -4838,7 +4838,10 @@ int avcodec_decode_video2(AVCodecContext *avctx, AVFrame *picture, > * must be freed with avsubtitle_free if *got_sub_ptr is set. > * @param[in,out] got_sub_ptr Zero if no subtitle could be decompressed, otherwise, it is nonzero. > * @param[in] avpkt The input AVPacket containing the input buffer. > + * > + * @deprecated Use avcodec_send_packet() and avcodec_receive_frame(). > */ > +attribute_deprecated > int avcodec_decode_subtitle2(AVCodecContext *avctx, AVSubtitle *sub, > int *got_sub_ptr, > AVPacket *avpkt); > @@ -5322,6 +5325,12 @@ attribute_deprecated > int avcodec_encode_video2(AVCodecContext *avctx, AVPacket *avpkt, > const AVFrame *frame, int *got_packet_ptr); > > +/** > + * Encode a subtitle. > + * > + * @deprecated use avcodec_send_frame()/avcodec_receive_packet() instead > + */ > +attribute_deprecated > int avcodec_encode_subtitle(AVCodecContext *avctx, uint8_t *buf, int buf_size, > const AVSubtitle *sub); > > diff --git a/libavcodec/utils.c b/libavcodec/utils.c > index 87de15f..add7edf 100644 > --- a/libavcodec/utils.c > +++ b/libavcodec/utils.c > @@ -2050,6 +2050,86 @@ int avcodec_encode_subtitle(AVCodecContext *avctx, uint8_t *buf, int buf_size, > return ret; > } > > +// TODO: delete this compatibility code when all subtitles encoders moved > +// to send_frame > +static int encode_subtitle_frame(AVCodecContext *avctx, > + AVPacket *avpkt, > + const AVFrame *frame, > + int *got_packet_ptr) > +{ > + int i, ret; > + > +FF_DISABLE_DEPRECATION_WARNINGS > + AVPacket tmp_pkt; > + AVSubtitle subtitle; > + > + get_subtitle_defaults(&subtitle); > + > + /* Allocate (once) a temporary large output subtitle buffer */ > + if (!avctx->internal->byte_buffer) { > + const int subtitle_out_max_size = 1024 * 1024; > + uint8_t *subtitle_out = av_malloc(subtitle_out_max_size); > + if (!subtitle_out) > + return AVERROR(ENOMEM); > + > + avctx->internal->byte_buffer = subtitle_out; > + avctx->internal->byte_buffer_size = subtitle_out_max_size; > + } This is used for encoding below... does the old API really not have a better way for this? I would have thought 1MB is a bit small, but then again I looked at ffmpeg.c, and it's using the same fixed size buffer. > + > + /* Craft an AVSubtitle from the AVFrame */ > + subtitle.format = frame->format == AV_PIX_FMT_NONE; > + subtitle.rects = av_mallocz_array(frame->sub_nb_rects, sizeof(*subtitle.rects)); > + if (!subtitle.rects) > + return AVERROR(ENOMEM); > + subtitle.num_rects = frame->sub_nb_rects; > + subtitle.pts = frame->pts; > + > + for (i = 0; i < frame->sub_nb_rects; i++) { > + const AVFrameSubtitleRectangle *src_rect = (AVFrameSubtitleRectangle *)frame->extended_data[i]; > + AVSubtitleRect *dst_rect; > + > + dst_rect = av_mallocz(sizeof(*subtitle.rects[i])); > + if (!dst_rect) > + return AVERROR(ENOMEM); Leaks at least subtitle.rects (if you care...). > + subtitle.rects[i] = dst_rect; > + > + if (subtitle.format) { > + dst_rect->type = SUBTITLE_ASS; > + dst_rect->ass = src_rect->text; > + } else { > + dst_rect->type = SUBTITLE_BITMAP; > + dst_rect->x = src_rect->x; > + dst_rect->y = src_rect->x; > + dst_rect->w = src_rect->w; > + dst_rect->h = src_rect->h; > + memcpy(dst_rect->data, src_rect->data, sizeof(dst_rect->data)); > + memcpy(dst_rect->linesize, src_rect->linesize, sizeof(dst_rect->linesize)); > + } > + dst_rect->flags = src_rect->flags; > + } > + > + /* Send it to the old API */ > + ret = avcodec_encode_subtitle(avctx, > + avctx->internal->byte_buffer, > + avctx->internal->byte_buffer_size, > + &subtitle); > + if (ret < 0) > + return ret; > + > + /* Wrap the encoded buffer into a ref-counted AVPacket */ > + av_init_packet(&tmp_pkt); > + tmp_pkt.data = avctx->internal->byte_buffer; > + tmp_pkt.size = ret; > + av_packet_ref(avpkt, &tmp_pkt); Can fail. > + avpkt->pts = frame->pts; > + avpkt->dts = frame->pkt_dts; > + avpkt->duration = frame->pkt_duration; > +FF_ENABLE_DEPRECATION_WARNINGS > + > + *got_packet_ptr = 1; > + return ret; > +} > + > /** > * Attempt to guess proper monotonic timestamps for decoded video frames > * which might have incorrect times. Input timestamps may wrap around, in > @@ -2764,6 +2844,134 @@ void avsubtitle_free(AVSubtitle *sub) > memset(sub, 0, sizeof(AVSubtitle)); > } > > +static int decode_subtitle_frame(AVCodecContext *avctx, > + AVFrame *frame, > + int *got_frame_ptr, > + const AVPacket *avpkt) > +{ > + int i, ret = 0; > + > + *got_frame_ptr = 0; > + > + if (!avctx->codec) > + return AVERROR(EINVAL); > + > + if (!avpkt->data && avpkt->size) { > + av_log(avctx, AV_LOG_ERROR, "invalid packet: NULL data, size != 0\n"); > + return AVERROR(EINVAL); > + } didn't vobsubs have 0-sized packets at some point? Also, I think the caller should check this (avcodec_send_packet). > + > + av_assert0(avctx->codec->type == AVMEDIA_TYPE_SUBTITLE); > + > + av_frame_unref(frame); It's already guaranteed to be unreffed. > + > + if ((avctx->codec->capabilities & AV_CODEC_CAP_DELAY) || avpkt->size) { > + AVPacket pkt_recoded; > + AVPacket tmp = *avpkt; > + int did_split = av_packet_split_side_data(&tmp); > + > + if (did_split) { > + /* FFMIN() prevents overflow in case the packet wasn't allocated with > + * proper padding. > + * If the side data is smaller than the buffer padding size, the > + * remaining bytes should have already been filled with zeros by the > + * original packet allocation anyway. */ > + memset(tmp.data + tmp.size, 0, > + FFMIN(avpkt->size - tmp.size, AV_INPUT_BUFFER_PADDING_SIZE)); > + } > + pkt_recoded = tmp; > + > + ret = recode_subtitle(avctx, &pkt_recoded, &tmp); > + if (ret >= 0) { IMO early exit would be nicer... This is mostly the old code copy&pasted, I assume. > + int err_ret; > + AVSubtitle sub = {0}; > + > + avctx->internal->pkt = &pkt_recoded; > + > + ret = avctx->codec->decode(avctx, &sub, got_frame_ptr, &pkt_recoded); > + av_assert1((ret >= 0) >= !!*got_frame_ptr && > + !!*got_frame_ptr >= !!sub.num_rects); > + > + for (i = 0; i < sub.num_rects; i++) { > + if (sub.rects[i]->ass && !utf8_check(sub.rects[i]->ass)) { > + av_log(avctx, AV_LOG_ERROR, > + "Invalid UTF-8 in decoded subtitles text; " > + "maybe missing -sub_charenc option\n"); > + avsubtitle_free(&sub); > + return AVERROR_INVALIDDATA; > + } > + } Hate. > + > + frame->pts = > + frame->best_effort_timestamp = avpkt->pts; > + frame->pkt_duration = avpkt->duration; > + frame->pkt_pos = avpkt->pos; > + frame->sub_nb_rects = sub.num_rects; > + > + if (avctx->codec_descriptor->props & AV_CODEC_PROP_BITMAP_SUB) > + frame->format = AV_PIX_FMT_PAL8; > + else if (avctx->codec_descriptor->props & AV_CODEC_PROP_TEXT_SUB) > + frame->format = AV_PIX_FMT_NONE; AFAIK these are just hints, not the definitive format. Why not use sub.format? > + > + frame->sub_start_display = sub.start_display_time ? av_rescale_q(sub.start_display_time, av_make_q(1, 1000), AV_TIME_BASE_Q) > + : AV_NOPTS_VALUE; > + frame->sub_end_display = sub.end_display_time ? av_rescale_q(sub.end_display_time, av_make_q(1, 1000), AV_TIME_BASE_Q) > + : AV_NOPTS_VALUE; end_display_time can be UINT32_MAX for "infinite". I think we should clean this up here, and differentiate between "0", "unknown", and "until next event". > + /* Allocate sub_nb_rects AVFrameSubtitleRectangle */ > + err_ret = av_frame_get_buffer(frame, 0); > + if (err_ret < 0) { > + avsubtitle_free(&sub); > + return err_ret; > + } > + > + /* Transfer data from AVSubtitleRect to AVFrameSubtitleRectangle */ > + for (i = 0; i < sub.num_rects; i++) { > + AVSubtitleRect *src_rect = sub.rects[i]; > + AVFrameSubtitleRectangle *dst_rect = (AVFrameSubtitleRectangle *)frame->extended_data[i]; > + > + if (frame->format == AV_PIX_FMT_NONE) { > + dst_rect->text = src_rect->ass; > + } else { > + dst_rect->x = src_rect->x; > + dst_rect->y = src_rect->y; > + dst_rect->w = src_rect->w; > + dst_rect->h = src_rect->h; > + memcpy(dst_rect->data, src_rect->data, sizeof(src_rect->data)); > + memcpy(dst_rect->linesize, src_rect->linesize, sizeof(src_rect->linesize)); > + } > + dst_rect->flags = src_rect->flags; > + > + // we free and make sure it is set to NULL so the data inside > + // is not freed after destructing the AVSubtitle > + av_freep(&sub.rects[i]); > + } > + sub.num_rects = 0; > + avsubtitle_free(&sub); > + > + if (tmp.data != pkt_recoded.data) { // did we recode? > + /* prevent from destroying side data from original packet */ > + pkt_recoded.side_data = NULL; > + pkt_recoded.side_data_elems = 0; > + > + av_packet_unref(&pkt_recoded); > + } > + avctx->internal->pkt = NULL; > + } > + > + if (did_split) { > + av_packet_free_side_data(&tmp); This leaks if the code above returns due to an error, I think? > + if(ret == tmp.size) > + ret = avpkt->size; > + } > + > + if (*got_frame_ptr) > + avctx->frame_number++; > + } > + > + return ret; > +} > + > static int do_decode(AVCodecContext *avctx, AVPacket *pkt) > { > int got_frame; > @@ -2792,6 +3000,9 @@ static int do_decode(AVCodecContext *avctx, AVPacket *pkt) > } else if (avctx->codec_type == AVMEDIA_TYPE_AUDIO) { > ret = avcodec_decode_audio4(avctx, avctx->internal->buffer_frame, > &got_frame, pkt); > + } else if (avctx->codec_type == AVMEDIA_TYPE_SUBTITLE) { > + ret = decode_subtitle_frame(avctx, avctx->internal->buffer_frame, > + &got_frame, pkt); > } else { > ret = AVERROR(EINVAL); > } > @@ -2941,6 +3152,9 @@ static int do_encode(AVCodecContext *avctx, const AVFrame *frame, int *got_packe > } else if (avctx->codec_type == AVMEDIA_TYPE_AUDIO) { > ret = avcodec_encode_audio2(avctx, avctx->internal->buffer_pkt, > frame, got_packet); > + } else if (avctx->codec_type == AVMEDIA_TYPE_SUBTITLE) { > + ret = encode_subtitle_frame(avctx, avctx->internal->buffer_pkt, > + frame, got_packet); > } else { > ret = AVERROR(EINVAL); > } > diff --git a/libavfilter/Makefile b/libavfilter/Makefile > index 7ed4696..5972b27 100644 > --- a/libavfilter/Makefile > +++ b/libavfilter/Makefile > @@ -329,6 +329,9 @@ OBJS-$(CONFIG_YUVTESTSRC_FILTER) += vsrc_testsrc.o > > OBJS-$(CONFIG_NULLSINK_FILTER) += vsink_nullsink.o > > +# subtitle filters > +OBJS-$(CONFIG_SNULL_FILTER) += sf_snull.o > + > # multimedia filters > OBJS-$(CONFIG_ADRAWGRAPH_FILTER) += f_drawgraph.o > OBJS-$(CONFIG_AHISTOGRAM_FILTER) += avf_ahistogram.o > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c > index 82a65ee..e5d20cd 100644 > --- a/libavfilter/allfilters.c > +++ b/libavfilter/allfilters.c > @@ -45,6 +45,7 @@ void avfilter_register_all(void) > return; > initialized = 1; > > + /* audio filters */ > REGISTER_FILTER(ABENCH, abench, af); > REGISTER_FILTER(ACOMPRESSOR, acompressor, af); > REGISTER_FILTER(ACROSSFADE, acrossfade, af); > @@ -138,6 +139,7 @@ void avfilter_register_all(void) > > REGISTER_FILTER(ANULLSINK, anullsink, asink); > > + /* video filters */ > REGISTER_FILTER(ALPHAEXTRACT, alphaextract, vf); > REGISTER_FILTER(ALPHAMERGE, alphamerge, vf); > REGISTER_FILTER(ASS, ass, vf); > @@ -345,6 +347,9 @@ void avfilter_register_all(void) > > REGISTER_FILTER(NULLSINK, nullsink, vsink); > > + /* subtitle filters */ > + REGISTER_FILTER(SNULL, snull, sf); > + > /* multimedia filters */ > REGISTER_FILTER(ADRAWGRAPH, adrawgraph, avf); > REGISTER_FILTER(AHISTOGRAM, ahistogram, avf); > @@ -367,10 +372,13 @@ void avfilter_register_all(void) > /* those filters are part of public or internal API => registered > * unconditionally */ > REGISTER_FILTER_UNCONDITIONAL(asrc_abuffer); > + REGISTER_FILTER_UNCONDITIONAL(ssrc_sbuffer); > REGISTER_FILTER_UNCONDITIONAL(vsrc_buffer); > REGISTER_FILTER_UNCONDITIONAL(asink_abuffer); > + REGISTER_FILTER_UNCONDITIONAL(ssink_sbuffer); > REGISTER_FILTER_UNCONDITIONAL(vsink_buffer); > REGISTER_FILTER_UNCONDITIONAL(af_afifo); > + REGISTER_FILTER_UNCONDITIONAL(sf_sfifo); > REGISTER_FILTER_UNCONDITIONAL(vf_fifo); > ff_opencl_register_filter_kernel_code_all(); > } > diff --git a/libavfilter/avfilter.c b/libavfilter/avfilter.c > index 1d469c3..908ab2c 100644 > --- a/libavfilter/avfilter.c > +++ b/libavfilter/avfilter.c > @@ -314,6 +314,12 @@ int avfilter_config_links(AVFilterContext *filter) > > if (!link->time_base.num && !link->time_base.den) > link->time_base = (AVRational) {1, link->sample_rate}; > + break; > + > + case AVMEDIA_TYPE_SUBTITLE: > + if (!link->time_base.num && !link->time_base.den) > + link->time_base = inlink ? inlink->time_base : AV_TIME_BASE_Q; > + break; > } > > if (link->src->nb_inputs && link->src->inputs[0]->hw_frames_ctx && > @@ -1201,6 +1207,8 @@ int ff_filter_frame(AVFilterLink *link, AVFrame *frame) > av_assert1(frame->width == link->w); > av_assert1(frame->height == link->h); > } > + } else if (link->type == AVMEDIA_TYPE_SUBTITLE) { > + // TODO? > } else { > if (frame->format != link->format) { > av_log(link->dst, AV_LOG_ERROR, "Format change is not supported\n"); > diff --git a/libavfilter/buffersink.c b/libavfilter/buffersink.c > index 2feb56d..f38fb1a 100644 > --- a/libavfilter/buffersink.c > +++ b/libavfilter/buffersink.c > @@ -388,6 +388,11 @@ static int asink_query_formats(AVFilterContext *ctx) > return 0; > } > > +static av_cold int ssink_init(AVFilterContext *ctx, void *opaque) > +{ > + return common_init(ctx); > +} > + > #define OFFSET(x) offsetof(BufferSinkContext, x) > #define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM > static const AVOption buffersink_options[] = { > @@ -405,9 +410,11 @@ static const AVOption abuffersink_options[] = { > { NULL }, > }; > #undef FLAGS > +static const AVOption sbuffersink_options[] = {NULL}; > > AVFILTER_DEFINE_CLASS(buffersink); > AVFILTER_DEFINE_CLASS(abuffersink); > +AVFILTER_DEFINE_CLASS(sbuffersink); > > static const AVFilterPad avfilter_vsink_buffer_inputs[] = { > { > @@ -452,3 +459,24 @@ AVFilter ff_asink_abuffer = { > .inputs = avfilter_asink_abuffer_inputs, > .outputs = NULL, > }; > + > +static const AVFilterPad avfilter_ssink_sbuffer_inputs[] = { > + { > + .name = "default", > + .type = AVMEDIA_TYPE_SUBTITLE, > + .filter_frame = filter_frame, > + }, > + { NULL } > +}; > + > +AVFilter ff_ssink_sbuffer = { > + .name = "sbuffersink", > + .description = NULL_IF_CONFIG_SMALL("Buffer subtitle frames, and make them available to the end of the filter graph."), > + .priv_class = &sbuffersink_class, > + .priv_size = sizeof(BufferSinkContext), > + .init_opaque = ssink_init, > + .uninit = uninit, > + //.query_formats = ssink_query_formats, > + .inputs = avfilter_ssink_sbuffer_inputs, > + .outputs = NULL, > +}; > diff --git a/libavfilter/buffersrc.c b/libavfilter/buffersrc.c > index 9294811..c20b7b2 100644 > --- a/libavfilter/buffersrc.c > +++ b/libavfilter/buffersrc.c > @@ -63,6 +63,9 @@ typedef struct BufferSourceContext { > uint64_t channel_layout; > char *channel_layout_str; > > + /* subtitle only */ > + int sub_format; > + > int got_format_from_params; > int eof; > } BufferSourceContext; > @@ -128,6 +131,8 @@ int av_buffersrc_parameters_set(AVFilterContext *ctx, AVBufferSrcParameters *par > if (param->channel_layout) > s->channel_layout = param->channel_layout; > break; > + case AVMEDIA_TYPE_SUBTITLE: > + break; > default: > return AVERROR_BUG; > } > @@ -204,6 +209,8 @@ static int av_buffersrc_add_frame_internal(AVFilterContext *ctx, > CHECK_AUDIO_PARAM_CHANGE(ctx, s, frame->sample_rate, frame->channel_layout, > av_frame_get_channels(frame), frame->format); > break; > + case AVMEDIA_TYPE_SUBTITLE: > + break; > default: > return AVERROR(EINVAL); > } > @@ -271,6 +278,7 @@ unsigned av_buffersrc_get_nb_failed_requests(AVFilterContext *buffer_src) > #define OFFSET(x) offsetof(BufferSourceContext, x) > #define A AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_AUDIO_PARAM > #define V AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM > +#define S AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_SUBTITLE_PARAM > > static const AVOption buffer_options[] = { > { "width", NULL, OFFSET(w), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INT_MAX, V }, > @@ -306,6 +314,17 @@ static const AVOption abuffer_options[] = { > > AVFILTER_DEFINE_CLASS(abuffer); > > +static const AVOption sbuffer_options[] = { > + { "time_base", NULL, OFFSET(time_base), AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, 0, INT_MAX, S }, > + { "format", NULL, OFFSET(sub_format), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 1, S, "format" }, > + { "unspecified", NULL, 0, AV_OPT_TYPE_CONST, {.i64=-1}, INT_MIN, INT_MAX, S, "format" }, > + { "bitmap", NULL, 0, AV_OPT_TYPE_CONST, {.i64=0}, INT_MIN, INT_MAX, S, "format" }, > + { "text", NULL, 0, AV_OPT_TYPE_CONST, {.i64=1}, INT_MIN, INT_MAX, S, "format" }, Implies no mixed subs. I'm fine with that, but might be restrictive. > + { NULL } > +}; > + > +AVFILTER_DEFINE_CLASS(sbuffer); > + > static av_cold int init_audio(AVFilterContext *ctx) > { > BufferSourceContext *s = ctx->priv; > @@ -397,6 +416,10 @@ static int query_formats(AVFilterContext *ctx) > if ((ret = ff_set_common_channel_layouts(ctx, channel_layouts)) < 0) > return ret; > break; > + case AVMEDIA_TYPE_SUBTITLE: > + if ((ret = ff_add_format(&formats, c->sub_format)) < 0) > + return ret; > + break; > default: > return AVERROR(EINVAL); > } > @@ -424,6 +447,8 @@ static int config_props(AVFilterLink *link) > if (!c->channel_layout) > c->channel_layout = link->channel_layout; > break; > + case AVMEDIA_TYPE_SUBTITLE: > + break; > default: > return AVERROR(EINVAL); > } > @@ -461,6 +486,15 @@ static int poll_frame(AVFilterLink *link) > return size/sizeof(AVFrame*); > } > > +static av_cold int init_subtitle(AVFilterContext *ctx) > +{ > + BufferSourceContext *c = ctx->priv; > + > + if (!(c->fifo = av_fifo_alloc(sizeof(AVFrame*)))) > + return AVERROR(ENOMEM); > + return 0; > +} > + > static const AVFilterPad avfilter_vsrc_buffer_outputs[] = { > { > .name = "default", > @@ -510,3 +544,28 @@ AVFilter ff_asrc_abuffer = { > .outputs = avfilter_asrc_abuffer_outputs, > .priv_class = &abuffer_class, > }; > + > +static const AVFilterPad avfilter_ssrc_sbuffer_outputs[] = { > + { > + .name = "default", > + .type = AVMEDIA_TYPE_SUBTITLE, > + .request_frame = request_frame, > + .poll_frame = poll_frame, > + .config_props = config_props, > + }, > + { NULL } > +}; > + > +AVFilter ff_ssrc_sbuffer = { > + .name = "sbuffer", > + .description = NULL_IF_CONFIG_SMALL("Buffer subtitle frames, and make them accessible to the filterchain."), > + .priv_size = sizeof(BufferSourceContext), > + .query_formats = query_formats, > + > + .init = init_subtitle, > + .uninit = uninit, > + > + .inputs = NULL, > + .outputs = avfilter_ssrc_sbuffer_outputs, > + .priv_class = &sbuffer_class, > +}; > diff --git a/libavfilter/fifo.c b/libavfilter/fifo.c > index abfbba1..feda5bd 100644 > --- a/libavfilter/fifo.c > +++ b/libavfilter/fifo.c > @@ -313,3 +313,34 @@ AVFilter ff_af_afifo = { > .inputs = avfilter_af_afifo_inputs, > .outputs = avfilter_af_afifo_outputs, > }; > + > +static const AVFilterPad avfilter_sf_sfifo_inputs[] = { > + { > + .name = "default", > + .type = AVMEDIA_TYPE_SUBTITLE, > + .filter_frame = add_to_queue, > + }, > + { NULL } > +}; > + > +static const AVFilterPad avfilter_sf_sfifo_outputs[] = { > + { > + .name = "default", > + .type = AVMEDIA_TYPE_SUBTITLE, > + .request_frame = request_frame, > + }, > + { NULL } > +}; > + > +AVFilter ff_sf_sfifo = { > + .name = "sfifo", > + .description = NULL_IF_CONFIG_SMALL("Buffer input subtitles and send them when they are requested."), > + > + .init = init, > + .uninit = uninit, > + > + .priv_size = sizeof(FifoContext), > + > + .inputs = avfilter_sf_sfifo_inputs, > + .outputs = avfilter_sf_sfifo_outputs, > +}; > diff --git a/libavfilter/formats.c b/libavfilter/formats.c > index 20f45e3..66bf138 100644 > --- a/libavfilter/formats.c > +++ b/libavfilter/formats.c > @@ -364,6 +364,10 @@ AVFilterFormats *ff_all_formats(enum AVMediaType type) > return NULL; > fmt++; > } > + } else if (type == AVMEDIA_TYPE_SUBTITLE) { > + if (ff_add_format(&ret, 0 /* bitmap */) < 0 || > + ff_add_format(&ret, 1 /* text */) < 0) > + return NULL; Questionable, is probably preliminary code. > } > > return ret; > diff --git a/libavfilter/internal.h b/libavfilter/internal.h > index 3856012..cd8db73 100644 > --- a/libavfilter/internal.h > +++ b/libavfilter/internal.h > @@ -142,6 +142,14 @@ struct AVFilterPad { > * input pads only. > */ > int needs_writable; > + > + /** > + * Callback function to get a subtitle frame. If NULL, the filter system > + * will use ff_default_get_subtitle_buffer(). > + * > + * Input subtitle pads only. > + */ > + AVFrame *(*get_subtitle_buffer)(AVFilterLink *link, int nb_rects); > }; > > struct AVFilterGraphInternal { > diff --git a/libavfilter/sf_snull.c b/libavfilter/sf_snull.c > new file mode 100644 > index 0000000..c20afdd > --- /dev/null > +++ b/libavfilter/sf_snull.c > @@ -0,0 +1,48 @@ > +/* > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + */ > + > +/** > + * @file > + * null subtitle filter > + */ > + > +#include "avfilter.h" > +#include "internal.h" > + > +static const AVFilterPad sf_snull_inputs[] = { > + { > + .name = "default", > + .type = AVMEDIA_TYPE_SUBTITLE, > + }, > + { NULL } > +}; > + > +static const AVFilterPad sf_snull_outputs[] = { > + { > + .name = "default", > + .type = AVMEDIA_TYPE_SUBTITLE, > + }, > + { NULL } > +}; > + > +AVFilter ff_sf_snull = { > + .name = "snull", > + .description = NULL_IF_CONFIG_SMALL("Pass the source unchanged to the output."), > + .inputs = sf_snull_inputs, > + .outputs = sf_snull_outputs, > +}; > diff --git a/libavfilter/subtitle.c b/libavfilter/subtitle.c > new file mode 100644 > index 0000000..88e43d5 > --- /dev/null > +++ b/libavfilter/subtitle.c > @@ -0,0 +1,60 @@ > +/* > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + */ > + > +#include "libavutil/common.h" > + > +#include "subtitle.h" > +#include "avfilter.h" > +#include "internal.h" > + > +AVFrame *ff_null_get_subtitle_buffer(AVFilterLink *link, int nb_rects) > +{ > + return ff_get_subtitle_buffer(link->dst->outputs[0], nb_rects); > +} > + > +AVFrame *ff_default_get_subtitle_buffer(AVFilterLink *link, int nb_rects) > +{ > + AVFrame *frame = av_frame_alloc(); > + int ret; > + > + if (!frame) > + return NULL; > + > + frame->sub_nb_rects = nb_rects; > + frame->format = link->format; > + ret = av_frame_get_buffer(frame, 0); > + if (ret < 0) { > + av_frame_free(&frame); > + return NULL; > + } > + > + return frame; > +} > + > +AVFrame *ff_get_subtitle_buffer(AVFilterLink *link, int nb_samples) > +{ > + AVFrame *ret = NULL; > + > + if (link->dstpad->get_subtitle_buffer) > + ret = link->dstpad->get_audio_buffer(link, nb_samples); > + > + if (!ret) > + ret = ff_default_get_audio_buffer(link, nb_samples); > + > + return ret; > +} > diff --git a/libavfilter/subtitle.h b/libavfilter/subtitle.h > new file mode 100644 > index 0000000..1db9ebe > --- /dev/null > +++ b/libavfilter/subtitle.h > @@ -0,0 +1,31 @@ > +/* > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + */ > + > +#ifndef AVFILTER_SUBTITLE_H > +#define AVFILTER_SUBTITLE_H > + > +#include "avfilter.h" > +#include "internal.h" > + > +AVFrame *ff_default_get_subtitle_buffer(AVFilterLink *link, int nb_rects); > + > +AVFrame *ff_null_get_subtitle_buffer(AVFilterLink *link, int nb_rects); > + > +AVFrame *ff_get_subtitle_buffer(AVFilterLink *link, int nb_rects); > + > +#endif /* AVFILTER_SUBTITLE_H */ > diff --git a/libavfilter/vf_subtitles.c b/libavfilter/vf_subtitles.c > index 0f22644..20b3c77 100644 > --- a/libavfilter/vf_subtitles.c > +++ b/libavfilter/vf_subtitles.c > @@ -305,6 +305,10 @@ static av_cold int init_subtitles(AVFilterContext *ctx) > AVStream *st; > AVPacket pkt; > AssContext *ass = ctx->priv; > + AVFrame *frame = av_frame_alloc(); > + > + if (!frame) > + return AVERROR(ENOMEM); > > /* Init libass */ > ret = init(ctx); > @@ -447,32 +451,32 @@ static av_cold int init_subtitles(AVFilterContext *ctx) > pkt.data = NULL; > pkt.size = 0; > while (av_read_frame(fmt, &pkt) >= 0) { > - int i, got_subtitle; > - AVSubtitle sub = {0}; > - > if (pkt.stream_index == sid) { > - ret = avcodec_decode_subtitle2(dec_ctx, &sub, &got_subtitle, &pkt); > - if (ret < 0) { > - av_log(ctx, AV_LOG_WARNING, "Error decoding: %s (ignored)\n", > - av_err2str(ret)); > - } else if (got_subtitle) { > - const int64_t start_time = av_rescale_q(sub.pts, AV_TIME_BASE_Q, av_make_q(1, 1000)); > - const int64_t duration = sub.end_display_time; > - for (i = 0; i < sub.num_rects; i++) { > - char *ass_line = sub.rects[i]->ass; > + int i; > + > + ret = avcodec_send_packet(dec_ctx, &pkt); > + if (ret < 0) > + break; > + // XXX probably not correct api usage > + ret = avcodec_receive_frame(dec_ctx, frame); > + if (ret >= 0) { I've found that converting code which used to old API to the new API is very messy. I think that's mostly because the data flow is rather different. Correct API usage would be something like: while (1) { if (av_codec_send_packet() == AVERROR_EAGAIN) add_packet_back_to_demuxer_queue() while (1) { ret = avcodec_receive_frame if (ret == AVERROR_EAGAIN) break; // new packet needed /* process frame */ } } Plus EOF and error handling. avcodec_send_packet() returning EAGAIN in this specific situation can probably be reasonably treated as error instead. Unless the decoder violates the declared API rules. > + // XXX: honor start/end display time > + const int64_t start_time = av_rescale_q(frame->pts, st->time_base, av_make_q(1, 1000)); > + const int64_t duration = av_rescale_q(frame->pkt_duration, st->time_base, av_make_q(1, 1000)); > + > + for (i = 0; i < frame->sub_nb_rects; i++) { > + AVSubtitleRect *rect = (AVSubtitleRect *)frame->extended_data[i]; // XXX move to lavu > + char *ass_line = rect->ass; > if (!ass_line) > break; > - if (LIBAVCODEC_VERSION_INT < AV_VERSION_INT(57,25,100)) > - ass_process_data(ass->track, ass_line, strlen(ass_line)); > - else > - ass_process_chunk(ass->track, ass_line, strlen(ass_line), > - start_time, duration); > + ass_process_chunk(ass->track, ass_line, strlen(ass_line), > + start_time, duration); > } > } > + av_packet_unref(&pkt); > } > - av_packet_unref(&pkt); > - avsubtitle_free(&sub); > } > + av_frame_free(&frame); > > end: > av_dict_free(&codec_opts); > diff --git a/libavformat/assenc.c b/libavformat/assenc.c > index d50f18f..97a8de8 100644 > --- a/libavformat/assenc.c > +++ b/libavformat/assenc.c > @@ -164,6 +164,9 @@ static int write_packet(AVFormatContext *s, AVPacket *pkt) > int hh2, mm2, ss2, ms2; > DialogueLine *dialogue = av_mallocz(sizeof(*dialogue)); > > + if (pkt->duration < 0) > + end = INT64_MAX; Wait what. > + > if (!dialogue) > return AVERROR(ENOMEM); > > diff --git a/libavutil/frame.c b/libavutil/frame.c > index 53e6174..8bb3ed2 100644 > --- a/libavutil/frame.c > +++ b/libavutil/frame.c > @@ -216,52 +216,33 @@ fail: > return AVERROR(ENOMEM); > } > > -static int get_audio_buffer(AVFrame *frame, int align) > +static int get_data_buffer(AVFrame *frame, int n, size_t size) > { > - int channels; > - int planar = av_sample_fmt_is_planar(frame->format); > - int planes; > - int ret, i; > - > - if (!frame->channels) > - frame->channels = av_get_channel_layout_nb_channels(frame->channel_layout); > - > - channels = frame->channels; > - planes = planar ? channels : 1; > - > - CHECK_CHANNELS_CONSISTENCY(frame); > - if (!frame->linesize[0]) { > - ret = av_samples_get_buffer_size(&frame->linesize[0], channels, > - frame->nb_samples, frame->format, > - align); > - if (ret < 0) > - return ret; > - } > + int i; > > - if (planes > AV_NUM_DATA_POINTERS) { > - frame->extended_data = av_mallocz_array(planes, > - sizeof(*frame->extended_data)); > - frame->extended_buf = av_mallocz_array((planes - AV_NUM_DATA_POINTERS), > - sizeof(*frame->extended_buf)); > + if (n > AV_NUM_DATA_POINTERS) { > + frame->extended_data = av_mallocz_array(n, sizeof(*frame->extended_data)); > + frame->extended_buf = av_mallocz_array(n - AV_NUM_DATA_POINTERS, > + sizeof(*frame->extended_buf)); > if (!frame->extended_data || !frame->extended_buf) { > av_freep(&frame->extended_data); > av_freep(&frame->extended_buf); > return AVERROR(ENOMEM); > } > - frame->nb_extended_buf = planes - AV_NUM_DATA_POINTERS; > + frame->nb_extended_buf = n - AV_NUM_DATA_POINTERS; > } else > frame->extended_data = frame->data; > > - for (i = 0; i < FFMIN(planes, AV_NUM_DATA_POINTERS); i++) { > - frame->buf[i] = av_buffer_alloc(frame->linesize[0]); > + for (i = 0; i < FFMIN(n, AV_NUM_DATA_POINTERS); i++) { > + frame->buf[i] = av_buffer_alloc(size); > if (!frame->buf[i]) { > av_frame_unref(frame); > return AVERROR(ENOMEM); > } > frame->extended_data[i] = frame->data[i] = frame->buf[i]->data; > } > - for (i = 0; i < planes - AV_NUM_DATA_POINTERS; i++) { > - frame->extended_buf[i] = av_buffer_alloc(frame->linesize[0]); > + for (i = 0; i < n - AV_NUM_DATA_POINTERS; i++) { > + frame->extended_buf[i] = av_buffer_alloc(size); > if (!frame->extended_buf[i]) { > av_frame_unref(frame); > return AVERROR(ENOMEM); > @@ -269,11 +250,50 @@ static int get_audio_buffer(AVFrame *frame, int align) > frame->extended_data[i + AV_NUM_DATA_POINTERS] = frame->extended_buf[i]->data; > } > return 0; > +} > + > +static int get_audio_buffer(AVFrame *frame, int align) > +{ > + int channels; > + int planar = av_sample_fmt_is_planar(frame->format); > + int planes; > + int ret; > + > + if (!frame->channels) > + frame->channels = av_get_channel_layout_nb_channels(frame->channel_layout); > > + channels = frame->channels; > + planes = planar ? channels : 1; > + > + CHECK_CHANNELS_CONSISTENCY(frame); > + if (!frame->linesize[0]) { > + ret = av_samples_get_buffer_size(&frame->linesize[0], channels, > + frame->nb_samples, frame->format, > + align); > + if (ret < 0) > + return ret; > + } > + > + return get_data_buffer(frame, planes, frame->linesize[0]); > +} > + > +static int get_subtitle_buffer(AVFrame *frame) > +{ > + int i, ret; > + > + ret = get_data_buffer(frame, frame->sub_nb_rects, sizeof(AVFrameSubtitleRectangle)); > + if (ret < 0) > + return ret; > + for (i = 0; i < frame->sub_nb_rects; i++) > + memset(frame->extended_data[i], 0, sizeof(AVFrameSubtitleRectangle)); > + return 0; I'm noticing it doesn't allocate the actual subtitle bitmap or text data, which is a difference from audio/video. > } > > int av_frame_get_buffer(AVFrame *frame, int align) > { > + if (frame->sub_nb_rects) > + return get_subtitle_buffer(frame); No format set is valid for subtitles? > + > if (frame->format < 0) > return AVERROR(EINVAL); > > @@ -320,6 +340,8 @@ FF_ENABLE_DEPRECATION_WARNINGS > dst->colorspace = src->colorspace; > dst->color_range = src->color_range; > dst->chroma_location = src->chroma_location; > + dst->sub_start_display = src->sub_start_display; > + dst->sub_end_display = src->sub_end_display; > > av_dict_copy(&dst->metadata, src->metadata, 0); > > @@ -561,6 +583,7 @@ int av_frame_make_writable(AVFrame *frame) > tmp.channels = frame->channels; > tmp.channel_layout = frame->channel_layout; > tmp.nb_samples = frame->nb_samples; > + tmp.sub_nb_rects = frame->sub_nb_rects; > ret = av_frame_get_buffer(&tmp, 32); > if (ret < 0) > return ret; > @@ -716,11 +739,57 @@ static int frame_copy_audio(AVFrame *dst, const AVFrame *src) > return 0; > } > > +static int frame_copy_subtitle(AVFrame *dst, const AVFrame *src) > +{ > + int i, ret; > + > + for (i = 0; i < dst->sub_nb_rects; i++) { > + const AVFrameSubtitleRectangle *src_rect = (const AVFrameSubtitleRectangle *)src->extended_data[i]; > + AVFrameSubtitleRectangle *dst_rect = (AVFrameSubtitleRectangle *)dst->extended_data[i]; > + > + memset(dst_rect, 0, sizeof(*dst_rect)); > + > + dst_rect->flags = src_rect->flags; > + > + if (src->format != AV_PIX_FMT_NONE) { > + /* (Alloc and) copy bitmap subtitle */ > + > + av_assert1(!src_rect->text); > + > + dst_rect->x = src_rect->x; > + dst_rect->y = src_rect->y; > + dst_rect->w = src_rect->w; > + dst_rect->h = src_rect->h; > + > + ret = av_image_alloc(dst_rect->data, dst_rect->linesize, > + dst_rect->w, dst_rect->h, > + dst->format, 0); > + if (ret < 0) > + return ret; > + av_image_copy(dst_rect->data, dst_rect->linesize, > + src_rect->data, src_rect->linesize, > + dst->format, dst_rect->w, dst_rect->h); > + } else { > + /* Copy text subtitle */ > + > + av_assert1(!src_rect->x && !src_rect->y); > + av_assert1(!src_rect->w && !src_rect->w); > + > + dst_rect->text = av_strdup(src_rect->text); > + if (!dst_rect->text) > + return AVERROR(ENOMEM); > + } > + } > + return 0; > +} > + > int av_frame_copy(AVFrame *dst, const AVFrame *src) > { > - if (dst->format != src->format || dst->format < 0) > + if (dst->format != src->format || (dst->format < 0 && !dst->sub_nb_rects)) > return AVERROR(EINVAL); > > + if (dst->sub_nb_rects) > + return frame_copy_subtitle(dst, src); > if (dst->width > 0 && dst->height > 0) > return frame_copy_video(dst, src); > else if (dst->nb_samples > 0 && dst->channel_layout) Maybe I glanced over it, but it doesn't explicitly reset the subtitle fields. I think this should happen for unknown durations/start times at least. > diff --git a/libavutil/frame.h b/libavutil/frame.h > index 8e51361..d531cfa 100644 > --- a/libavutil/frame.h > +++ b/libavutil/frame.h > @@ -148,6 +148,25 @@ typedef struct AVFrameSideData { > AVBufferRef *buf; > } AVFrameSideData; > > +#define AV_NUM_DATA_POINTERS 8 > + > +/** > + * This structure describes decoded subtitle rectangle > + */ > +typedef struct AVFrameSubtitleRectangle { > + int x, y; > + int w, h; Might require clarification that this is strictly for bitmap subtitles only. > + > + /* image data for bitmap subtitles, in AVFrame.format (AVPixelFormat) */ > + uint8_t *data[AV_NUM_DATA_POINTERS]; > + int linesize[AV_NUM_DATA_POINTERS]; Does it define the format somewhere? > + > + /* decoded text for text subtitles, in ASS */ > + char *text; I'm fine with this - other would probably argue this should be in data[0] or so. > + > + int flags; Which flags (missing doc.)? > +} AVFrameSubtitleRectangle; The struct should probably be extendable (so its size is not part of the ABI). > + > /** > * This structure describes decoded (raw) audio or video data. > * > @@ -182,7 +201,6 @@ typedef struct AVFrameSideData { > * for AVFrame can be obtained from avcodec_get_frame_class() > */ > typedef struct AVFrame { > -#define AV_NUM_DATA_POINTERS 8 > /** > * pointer to the picture/channel planes. > * This might be different from the first allocated byte > @@ -224,9 +242,12 @@ typedef struct AVFrame { > * For packed audio, there is just one data pointer, and linesize[0] > * contains the total size of the buffer for all channels. > * > + * For subtitles, each pointer corresponds to an AVFrameSubtitleRectangle. > + * It should also mention how many data pointers there are. Btw. I'm very not much of a fan to have to deal with extended_data. And the duplication with data. (Does it even document what data[] contains for subtitles?) > * Note: Both data and extended_data should always be set in a valid frame, > - * but for planar audio with more channels that can fit in data, > - * extended_data must be used in order to access all channels. > + * but for subtitles and planar audio with more channels that can fit in > + * data, extended_data must be used in order to access all audio channels > + * or subtitles rectangles. > */ > uint8_t **extended_data; > > @@ -359,10 +380,10 @@ typedef struct AVFrame { > * also be non-NULL for all j < i. > * > * There may be at most one AVBuffer per data plane, so for video this array > - * always contains all the references. For planar audio with more than > - * AV_NUM_DATA_POINTERS channels, there may be more buffers than can fit in > - * this array. Then the extra AVBufferRef pointers are stored in the > - * extended_buf array. > + * always contains all the references. For planar audio or subtitles with > + * more than AV_NUM_DATA_POINTERS channels or rectangles, there may be more > + * buffers than can fit in this array. Then the extra AVBufferRef pointers > + * are stored in the extended_buf array. > */ > AVBufferRef *buf[AV_NUM_DATA_POINTERS]; Also to note: this refcounts the AVFrameSubtitleRectangle structs, but not the contained data. This is not sane. In particular, I think AVBufferRef refcounted data can't sanely contain pointer (as they describe byte data). However the hwframescontext stuff already uses pointers in AVBufferRef'ed structs, so we should at least make behavior consistent. This means: if the AVBuffer is unreferenced and destroyed, its destructor needs to free the referenced subtitle data. I think this is not done yet, but I could be wrong. > > @@ -532,6 +553,29 @@ typedef struct AVFrame { > * AVHWFramesContext describing the frame. > */ > AVBufferRef *hw_frames_ctx; > + > + /** > + * Number of rectangles available in extended_data > + * - encoding: set by user (XXX: no encoding yet) > + * - decoding: set by libavcodec, read by user > + */ > + int sub_nb_rects; > + > + /** > + * Subtitle start display time in AV_TIME_BASE, relative to pts > + * XXX: semantic for <=0? > + * - encoding: unused > + * - decoding: set by libavcodec, read by user. > + */ > + int64_t sub_start_display; > + > + /** > + * Subtitle end display time in AV_TIME_BASE, relative to pts > + * XXX: semantic for <=0? > + * - encoding: unused > + * - decoding: set by libavcodec, read by user. > + */ > + int64_t sub_end_display; I suggest "0" means literally 0 (i.e. essentially not displayed). -1 could mean "unknown". What I don't like is that this doesn't define semantics whether a subtitle frame replaces a previous frame. Normally you would do the latter if the duration is unknown - but some subtitle formats or containers can mix them and signal a incorrect too long duration. Then the next frame shouldn't overlap with the previous frame. > } AVFrame; > > /** > @@ -641,6 +685,7 @@ void av_frame_move_ref(AVFrame *dst, AVFrame *src); > * - format (pixel format for video, sample format for audio) > * - width and height for video > * - nb_samples and channel_layout for audio > + * - sub_nb_rects for subtitle > * > * This function will fill AVFrame.data and AVFrame.buf arrays and, if > * necessary, allocate and fill AVFrame.extended_data and AVFrame.extended_buf. > diff --git a/tests/ref/fate/sub-charenc b/tests/ref/fate/sub-charenc > index a056cd1..050807b 100644 > --- a/tests/ref/fate/sub-charenc > +++ b/tests/ref/fate/sub-charenc > @@ -23,7 +23,7 @@ Dialogue: 0,0:02:31.61,0:02:33.90,Default,,0,0,0,,- А ти как си?\N- До > Dialogue: 0,0:02:40.16,0:02:42.79,Default,,0,0,0,,Монахът Дзенг каза,\Nче си в планината Удан. > Dialogue: 0,0:02:43.04,0:02:46.54,Default,,0,0,0,,Каза, че практикуваш\Nдълбока медитация. > Dialogue: 0,0:02:48.84,0:02:50.68,Default,,0,0,0,,Сигурно в планината\Nе много спокойно. > -Dialogue: 0,0:02:51.25,0:02:53.46,Default,,0,0,0,,Завиждам ти. > +Dialogue: 0,0:02:51.26,0:02:53.47,Default,,0,0,0,,Завиждам ти. > Dialogue: 0,0:02:53.67,0:02:58.34,Default,,0,0,0,,Имам толкова много работа,\Nпочти не ми остава\Nвреме за почивка. > Dialogue: 0,0:03:00.26,0:03:03.89,Default,,0,0,0,,Оставих обучението рано. > Dialogue: 0,0:03:05.69,0:03:11.28,Default,,0,0,0,,Защо? Ти си боец на Удан.\NОбучението е всичко. > @@ -40,7 +40,7 @@ Dialogue: 0,0:03:53.40,0:03:56.57,Default,,0,0,0,,Не можах да издъ > Dialogue: 0,0:03:57.49,0:03:59.74,Default,,0,0,0,,Прекъснах медитацията си. > Dialogue: 0,0:03:59.95,0:04:02.24,Default,,0,0,0,,Не можах да продължа. > Dialogue: 0,0:04:03.20,0:04:07.79,Default,,0,0,0,,Нещо...\Nме дърпаше назад. > -Dialogue: 0,0:04:09.62,0:04:10.91,Default,,0,0,0,,Какво беше? > +Dialogue: 0,0:04:09.63,0:04:10.92,Default,,0,0,0,,Какво беше? > Dialogue: 0,0:04:15.46,0:04:18.00,Default,,0,0,0,,Нещо, от което не\Nмога да се освободя. > Dialogue: 0,0:04:23.39,0:04:24.68,Default,,0,0,0,,Скоро ли ще тръгваш? > Dialogue: 0,0:04:26.77,0:04:30.27,Default,,0,0,0,,Подготвяме охрана\Nза една доставка... > @@ -52,7 +52,7 @@ Dialogue: 0,0:04:48.37,0:04:52.67,Default,,0,0,0,,Да. Той винаги е > Dialogue: 0,0:04:52.88,0:04:56.55,Default,,0,0,0,,Не разбирам.\NКак можеш да се разделиш с него? > Dialogue: 0,0:04:56.76,0:04:59.93,Default,,0,0,0,,Той винаги е бил с теб. > Dialogue: 0,0:05:01.18,0:05:05.52,Default,,0,0,0,,Твърде много хора са\Nзагинали от това острие. > -Dialogue: 0,0:05:09.68,0:05:14.52,Default,,0,0,0,,Чисто е единствено защото\Nкръвта се отмива лесно. > +Dialogue: 0,0:05:09.69,0:05:14.53,Default,,0,0,0,,Чисто е единствено защото\Nкръвта се отмива лесно. > Dialogue: 0,0:05:15.40,0:05:20.61,Default,,0,0,0,,Ти го използваш справедливо.\NДостоен си за него. > Dialogue: 0,0:05:23.66,0:05:27.37,Default,,0,0,0,,Дойде време\Nда го оставя. > Dialogue: 0,0:05:27.58,0:05:31.21,Default,,0,0,0,,Е, какво ще правиш\Nот сега нататък?
On Fri, Nov 11, 2016 at 04:46:51PM +0100, wm4 wrote: [...] > > +// TODO: delete this compatibility code when all subtitles encoders moved > > +// to send_frame > > +static int encode_subtitle_frame(AVCodecContext *avctx, > > + AVPacket *avpkt, > > + const AVFrame *frame, > > + int *got_packet_ptr) > > +{ > > + int i, ret; > > + > > +FF_DISABLE_DEPRECATION_WARNINGS > > + AVPacket tmp_pkt; > > + AVSubtitle subtitle; > > + > > + get_subtitle_defaults(&subtitle); > > + > > + /* Allocate (once) a temporary large output subtitle buffer */ > > + if (!avctx->internal->byte_buffer) { > > + const int subtitle_out_max_size = 1024 * 1024; > > + uint8_t *subtitle_out = av_malloc(subtitle_out_max_size); > > + if (!subtitle_out) > > + return AVERROR(ENOMEM); > > + > > + avctx->internal->byte_buffer = subtitle_out; > > + avctx->internal->byte_buffer_size = subtitle_out_max_size; > > + } > > This is used for encoding below... does the old API really not have a > better way for this? > Yeah, apparently. The subtitle API encoding never got updated; there is no number at the end of "avcodec_encode_subtitle", which gives you an idea about its era. > I would have thought 1MB is a bit small, but then again I looked at > ffmpeg.c, and it's using the same fixed size buffer. > That's indeed where I took the value. Note that this function exists for compatibility. Later on, we will likely make something smarter, but for now I'm just making a wrapper: the priority is to make it clean and transparent for the user (no AVSubtitle visible), then we can take the time to cleanup our mess internally (making codecs handle AVFrame directly instead of AVSubtitle) [...] (allocation issues raised fixed locally) [...] > > +static int decode_subtitle_frame(AVCodecContext *avctx, > > + AVFrame *frame, > > + int *got_frame_ptr, > > + const AVPacket *avpkt) > > +{ > > + int i, ret = 0; > > + > > + *got_frame_ptr = 0; > > + > > + if (!avctx->codec) > > + return AVERROR(EINVAL); > > + > > + if (!avpkt->data && avpkt->size) { > > + av_log(avctx, AV_LOG_ERROR, "invalid packet: NULL data, size != 0\n"); > > + return AVERROR(EINVAL); > > + } > > didn't vobsubs have 0-sized packets at some point? Also, I think the > caller should check this (avcodec_send_packet). > That's the first check in avcodec_decode_subtitle2() so I suppose that's valid. It's also present in avcodec_decode_audio4(). As you guessed later, that function is mostly a copy/paste from avcodec_decode_subtitle2(). The difference being that there is additionally a convert from AVSubtitle to AVFrame inside. > > + > > + av_assert0(avctx->codec->type == AVMEDIA_TYPE_SUBTITLE); > > + > > + av_frame_unref(frame); > > It's already guaranteed to be unreffed. > OK, dropped > > + > > + if ((avctx->codec->capabilities & AV_CODEC_CAP_DELAY) || avpkt->size) { > > + AVPacket pkt_recoded; > > + AVPacket tmp = *avpkt; > > + int did_split = av_packet_split_side_data(&tmp); > > + > > + if (did_split) { > > + /* FFMIN() prevents overflow in case the packet wasn't allocated with > > + * proper padding. > > + * If the side data is smaller than the buffer padding size, the > > + * remaining bytes should have already been filled with zeros by the > > + * original packet allocation anyway. */ > > + memset(tmp.data + tmp.size, 0, > > + FFMIN(avpkt->size - tmp.size, AV_INPUT_BUFFER_PADDING_SIZE)); > > + } > > + pkt_recoded = tmp; > > + > > + ret = recode_subtitle(avctx, &pkt_recoded, &tmp); > > + if (ret >= 0) { > > IMO early exit would be nicer... > > This is mostly the old code copy&pasted, I assume. > Yes, I'd like to limit the differences with the original function since this is kind of a large work with many risks of getting things wrong. But after reading your review below, I realized I could just call avcodec_decode_subtitle2() in that function and do the convert afterwards, which is what I just did locally, addressing as a result a bunch of your comments later on. [...] > > + > > + frame->sub_start_display = sub.start_display_time ? av_rescale_q(sub.start_display_time, av_make_q(1, 1000), AV_TIME_BASE_Q) > > + : AV_NOPTS_VALUE; > > + frame->sub_end_display = sub.end_display_time ? av_rescale_q(sub.end_display_time, av_make_q(1, 1000), AV_TIME_BASE_Q) > > + : AV_NOPTS_VALUE; > > end_display_time can be UINT32_MAX for "infinite". I think we should > clean this up here, and differentiate between "0", "unknown", and > "until next event". > I don't even know what's the convention today, so this part is currently mostly ignored from this first prototype. What would be the difference between "infinite" and "until next (potentially clear) event"? [...] > > +static const AVOption sbuffer_options[] = { > > + { "time_base", NULL, OFFSET(time_base), AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, 0, INT_MAX, S }, > > + { "format", NULL, OFFSET(sub_format), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 1, S, "format" }, > > + { "unspecified", NULL, 0, AV_OPT_TYPE_CONST, {.i64=-1}, INT_MIN, INT_MAX, S, "format" }, > > + { "bitmap", NULL, 0, AV_OPT_TYPE_CONST, {.i64=0}, INT_MIN, INT_MAX, S, "format" }, > > + { "text", NULL, 0, AV_OPT_TYPE_CONST, {.i64=1}, INT_MIN, INT_MAX, S, "format" }, > > Implies no mixed subs. I'm fine with that, but might be restrictive. > Yeah, so far we don't have such things AFAIK. But we could replace this with a mask later on. It was simpler for the initial format negotiation that way. [...] > > + } else if (type == AVMEDIA_TYPE_SUBTITLE) { > > + if (ff_add_format(&ret, 0 /* bitmap */) < 0 || > > + ff_add_format(&ret, 1 /* text */) < 0) > > + return NULL; > > Questionable, is probably preliminary code. > Yes, preliminary code. It involves making a decision about what you just raised earlier so I make a quick experiment with something as basic as what we have today. [...] > > while (av_read_frame(fmt, &pkt) >= 0) { > > - int i, got_subtitle; > > - AVSubtitle sub = {0}; > > - > > if (pkt.stream_index == sid) { > > - ret = avcodec_decode_subtitle2(dec_ctx, &sub, &got_subtitle, &pkt); > > - if (ret < 0) { > > - av_log(ctx, AV_LOG_WARNING, "Error decoding: %s (ignored)\n", > > - av_err2str(ret)); > > - } else if (got_subtitle) { > > - const int64_t start_time = av_rescale_q(sub.pts, AV_TIME_BASE_Q, av_make_q(1, 1000)); > > - const int64_t duration = sub.end_display_time; > > - for (i = 0; i < sub.num_rects; i++) { > > - char *ass_line = sub.rects[i]->ass; > > + int i; > > + > > + ret = avcodec_send_packet(dec_ctx, &pkt); > > + if (ret < 0) > > + break; > > + // XXX probably not correct api usage > > + ret = avcodec_receive_frame(dec_ctx, frame); > > + if (ret >= 0) { > > I've found that converting code which used to old API to the new API is > very messy. I think that's mostly because the data flow is rather > different. > > Correct API usage would be something like: > > while (1) { > if (av_codec_send_packet() == AVERROR_EAGAIN) > add_packet_back_to_demuxer_queue() > while (1) { > ret = avcodec_receive_frame > if (ret == AVERROR_EAGAIN) > break; // new packet needed > /* process frame */ > } > } > > Plus EOF and error handling. > > avcodec_send_packet() returning EAGAIN in this specific situation can > probably be reasonably treated as error instead. Unless the decoder > violates the declared API rules. > I might actually not clean that code but mark it as deprecated instead, and make the filter take a stream directly (instead of a filename). It was an initial attempt to make some tests without updating ffmpeg.c, integrating into lavfi, etc. Now I can do it properly, so that code will go away. [...] > > diff --git a/libavformat/assenc.c b/libavformat/assenc.c > > index d50f18f..97a8de8 100644 > > --- a/libavformat/assenc.c > > +++ b/libavformat/assenc.c > > @@ -164,6 +164,9 @@ static int write_packet(AVFormatContext *s, AVPacket *pkt) > > int hh2, mm2, ss2, ms2; > > DialogueLine *dialogue = av_mallocz(sizeof(*dialogue)); > > > > + if (pkt->duration < 0) > > + end = INT64_MAX; > > Wait what. > This needs to be in a separate patch with a longer explanation. IIRC I didn't that because the last SAMI subtitle has no duration so it needs to be muxed as "til the end". [...] > > +static int get_subtitle_buffer(AVFrame *frame) > > +{ > > + int i, ret; > > + > > + ret = get_data_buffer(frame, frame->sub_nb_rects, sizeof(AVFrameSubtitleRectangle)); > > + if (ret < 0) > > + return ret; > > + for (i = 0; i < frame->sub_nb_rects; i++) > > + memset(frame->extended_data[i], 0, sizeof(AVFrameSubtitleRectangle)); > > + return 0; > > I'm noticing it doesn't allocate the actual subtitle bitmap or text > data, which is a difference from audio/video. > Yes, exactly. This is tricky. It's really tricky because contrary to audio and video, we can not know the size required for every (not yet allocated) rectangle from the raw format information alone. For video, we have the pixel format, and dimensions. For audio we have the sample format and number of samples. But for subtitles, we can at most have the number of rectangles. To workaround this problem, I just made the get_buffer callback allocate the rectangles holder. The decoder will then allocate every rectangle data (of different sizes) manually, be it text or actual bitmap rectangle. But indeed this means that every ref counting will mean a new allocation and copy of the rectangles. As subtitles are meant to be small and sparse, I'm assuming it's not exactly a problem. Do you have a better suggestion? Maybe we could have the user (decoders) allocate the rectangles holder and fill all the sizes instead, and then call the get buffer, but this might be annoying as it will require for most of them to do it in two-pass instead of one as of today. > > } > > > > int av_frame_get_buffer(AVFrame *frame, int align) > > { > > + if (frame->sub_nb_rects) > > + return get_subtitle_buffer(frame); > > No format set is valid for subtitles? > Yes, no format set means text. If the format is set, it defines the format used by the rectangles. [...] > > + /* image data for bitmap subtitles, in AVFrame.format (AVPixelFormat) */ > > + uint8_t *data[AV_NUM_DATA_POINTERS]; > > + int linesize[AV_NUM_DATA_POINTERS]; > > Does it define the format somewhere? > in AVFrame.format > > + > > + /* decoded text for text subtitles, in ASS */ > > + char *text; > > I'm fine with this - other would probably argue this should be in > data[0] or so. > > > + > > + int flags; > > Which flags (missing doc.)? > That's the current flags, I need to move/copy them to lavu. > > +} AVFrameSubtitleRectangle; > > The struct should probably be extendable (so its size is not part of > the ABI). > In the current design, there is no need to ever allocate them: you set the number of rectangles in your AVFrame, then call av_frame_get_buffer() which will allocate them. > > + > > /** > > * This structure describes decoded (raw) audio or video data. > > * > > @@ -182,7 +201,6 @@ typedef struct AVFrameSideData { > > * for AVFrame can be obtained from avcodec_get_frame_class() > > */ > > typedef struct AVFrame { > > -#define AV_NUM_DATA_POINTERS 8 > > /** > > * pointer to the picture/channel planes. > > * This might be different from the first allocated byte > > @@ -224,9 +242,12 @@ typedef struct AVFrame { > > * For packed audio, there is just one data pointer, and linesize[0] > > * contains the total size of the buffer for all channels. > > * > > + * For subtitles, each pointer corresponds to an AVFrameSubtitleRectangle. > > + * > > It should also mention how many data pointers there are. > > Btw. I'm very not much of a fan to have to deal with extended_data. And > the duplication with data. (Does it even document what data[] contains > for subtitles?) > It's exactly like audio: data[] points on the firsts extended_data[] [...] I'll try to address every other comment and send a new and more complete version later. Thanks a lot of the review.
On Fri, Nov 11, 2016 at 12:45:03PM +0100, Nicolas George wrote: > Le septidi 17 brumaire, an CCXXV, Clement Boesch a écrit : > > I didn't. Duration is somehow broken currently for some reason. I did > > nothing for sparseness: the reason I added basic support in lavfi is > > because it was much simpler to handle at ffmpeg.c level, so it's currently > > just a passthrough mechanism. > > A decision will need to be made pretty soon, though, or it will not help > anything much. Sparseness will make things break badly OOM-killer-style > if someone tries to use subtitles coming from the same multiplexed file > than the corresponding video. Duration affects the visible API, a > decision on it is more or less final. Here are the results of my current > thoughts: > > For sparseness, the solution is to use heartbeat frames: if you have a > subtitle event at 10s and the next one at 50s, emit a frame with no > payload and just a timestamp at 10.1, 10.2, ... 49.1. > [...] Right, I'll probably need that for the sub2video filter. I'll think about it unless you want to implement it yourself. > > I did't like having multiple fields for text based data. If we want to > > decode in another form, we can still add an option to print out verbatim > > text instead of ASS markup. > > I think we are not talking about the same thing. A long time ago, we > considered replacing the ASS markup with a simple text field with > styling in a separate, non-text, structure. Did you discard that idea? > Ah, yeah, I discarded that idea. First, I think it's a too large change for now, and I just won't be able to finish this patchset ever if I start going that way. Second, apparently it's considered over engineering by the API users I talked with. They're fine using libass for markup or a simple freetype-like custom engine for when there is no markup. If we later decide to export as an AST, we will add a field to that rectangle and that will do the trick. For now, this is already such as huge work that I'm not willing to stick that in (and probably won't ever TBH). It's already raising a ton of questions wrt design. [...]
2016-11-20 13:52 GMT+01:00 Clément Bœsch <u@pkh.me>: >> > +static const AVOption sbuffer_options[] = { >> > + { "time_base", NULL, OFFSET(time_base), AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, 0, INT_MAX, S }, >> > + { "format", NULL, OFFSET(sub_format), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 1, S, "format" }, >> > + { "unspecified", NULL, 0, AV_OPT_TYPE_CONST, {.i64=-1}, INT_MIN, INT_MAX, S, "format" }, >> > + { "bitmap", NULL, 0, AV_OPT_TYPE_CONST, {.i64=0}, INT_MIN, INT_MAX, S, "format" }, >> > + { "text", NULL, 0, AV_OPT_TYPE_CONST, {.i64=1}, INT_MIN, INT_MAX, S, "format" }, >> >> Implies no mixed subs. I'm fine with that, but might be restrictive. >> > > Yeah, so far we don't have such things AFAIK. But we could replace this > with a mask later on. It was simpler for the initial format negotiation > that way. Will this work with our current external teletext decoder? (Will it work with a hypothetical future teletext decoder that provides both bitmap and text when decoding?) Thank you, Carl Eugen
On Sun, 20 Nov 2016 14:24:00 +0100 Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > 2016-11-20 13:52 GMT+01:00 Clément Bœsch <u@pkh.me>: > > >> > +static const AVOption sbuffer_options[] = { > >> > + { "time_base", NULL, OFFSET(time_base), AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, 0, INT_MAX, S }, > >> > + { "format", NULL, OFFSET(sub_format), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 1, S, "format" }, > >> > + { "unspecified", NULL, 0, AV_OPT_TYPE_CONST, {.i64=-1}, INT_MIN, INT_MAX, S, "format" }, > >> > + { "bitmap", NULL, 0, AV_OPT_TYPE_CONST, {.i64=0}, INT_MIN, INT_MAX, S, "format" }, > >> > + { "text", NULL, 0, AV_OPT_TYPE_CONST, {.i64=1}, INT_MIN, INT_MAX, S, "format" }, > >> > >> Implies no mixed subs. I'm fine with that, but might be restrictive. > >> > > > > Yeah, so far we don't have such things AFAIK. But we could replace this > > with a mask later on. It was simpler for the initial format negotiation > > that way. > > Will this work with our current external teletext decoder? > (Will it work with a hypothetical future teletext decoder that provides both > bitmap and text when decoding?) That's a different case. I was talking about subtitles with need both text and images for correct display. Different output formats can be handled with get_format or a private option or so.
2016-11-21 12:30 GMT+01:00 wm4 <nfxjfg@googlemail.com>: > On Sun, 20 Nov 2016 14:24:00 +0100 > Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > >> 2016-11-20 13:52 GMT+01:00 Clément Bœsch <u@pkh.me>: >> >> >> > +static const AVOption sbuffer_options[] = { >> >> > + { "time_base", NULL, OFFSET(time_base), AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, 0, INT_MAX, S }, >> >> > + { "format", NULL, OFFSET(sub_format), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 1, S, "format" }, >> >> > + { "unspecified", NULL, 0, AV_OPT_TYPE_CONST, {.i64=-1}, INT_MIN, INT_MAX, S, "format" }, >> >> > + { "bitmap", NULL, 0, AV_OPT_TYPE_CONST, {.i64=0}, INT_MIN, INT_MAX, S, "format" }, >> >> > + { "text", NULL, 0, AV_OPT_TYPE_CONST, {.i64=1}, INT_MIN, INT_MAX, S, "format" }, >> >> >> >> Implies no mixed subs. I'm fine with that, but might be restrictive. >> >> >> > >> > Yeah, so far we don't have such things AFAIK. But we could replace this >> > with a mask later on. It was simpler for the initial format negotiation >> > that way. >> >> Will this work with our current external teletext decoder? >> (Will it work with a hypothetical future teletext decoder that provides both >> bitmap and text when decoding?) > > That's a different case. I was talking about subtitles with need both > text and images for correct display. Different output formats can be > handled with get_format or a private option or so. So it does already work with our current external teletext decoder and a hypothetical future teletext decoder that provides both bitmap and text when decoding? Thank you, Carl Eugen
On Sun, 20 Nov 2016 13:52:02 +0100 Clément Bœsch <u@pkh.me> wrote: > On Fri, Nov 11, 2016 at 04:46:51PM +0100, wm4 wrote: > [...] > > > + > > > + frame->sub_start_display = sub.start_display_time ? av_rescale_q(sub.start_display_time, av_make_q(1, 1000), AV_TIME_BASE_Q) > > > + : AV_NOPTS_VALUE; > > > + frame->sub_end_display = sub.end_display_time ? av_rescale_q(sub.end_display_time, av_make_q(1, 1000), AV_TIME_BASE_Q) > > > + : AV_NOPTS_VALUE; > > > > end_display_time can be UINT32_MAX for "infinite". I think we should > > clean this up here, and differentiate between "0", "unknown", and > > "until next event". > > > > I don't even know what's the convention today, so this part is currently > mostly ignored from this first prototype. dvdsubdec.c leaves end_display_time simply at 0 in some cases (which means show until next event), while pgssubdec.c always sets it to UINT32_MAX (which apparently means the same). I don't know why it was done this way (see 0c911c8fbc). > What would be the difference between "infinite" and "until next > (potentially clear) event"? Probably none. The important part for non-infinite end time values is whether the subtitles get strictly replaced by other (later) subtitles, or whether they overlap. With the existing API, you have to guess by the codec ID. > > [...] > > > +static int get_subtitle_buffer(AVFrame *frame) > > > +{ > > > + int i, ret; > > > + > > > + ret = get_data_buffer(frame, frame->sub_nb_rects, sizeof(AVFrameSubtitleRectangle)); > > > + if (ret < 0) > > > + return ret; > > > + for (i = 0; i < frame->sub_nb_rects; i++) > > > + memset(frame->extended_data[i], 0, sizeof(AVFrameSubtitleRectangle)); > > > + return 0; > > > > I'm noticing it doesn't allocate the actual subtitle bitmap or text > > data, which is a difference from audio/video. > > > > Yes, exactly. This is tricky. It's really tricky because contrary to audio > and video, we can not know the size required for every (not yet allocated) > rectangle from the raw format information alone. > > For video, we have the pixel format, and dimensions. For audio we have the > sample format and number of samples. But for subtitles, we can at most > have the number of rectangles. > > To workaround this problem, I just made the get_buffer callback allocate > the rectangles holder. The decoder will then allocate every rectangle data > (of different sizes) manually, be it text or actual bitmap rectangle. > > But indeed this means that every ref counting will mean a new allocation > and copy of the rectangles. As subtitles are meant to be small and sparse, > I'm assuming it's not exactly a problem. > > Do you have a better suggestion? > > Maybe we could have the user (decoders) allocate the rectangles holder and > fill all the sizes instead, and then call the get buffer, but this might > be annoying as it will require for most of them to do it in two-pass > instead of one as of today. > Seems to me that av_frame_get_buffer() would just have confusing semantics here, and it'd be better to have dedicated allocation functions. But the important part is that the subtitle data is actually properly refcounted/freed, and that things like copy-on-write actually work. > > > } > > > > > > int av_frame_get_buffer(AVFrame *frame, int align) > > > { > > > + if (frame->sub_nb_rects) > > > + return get_subtitle_buffer(frame); > > > > No format set is valid for subtitles? > > > > Yes, no format set means text. If the format is set, it defines the format > used by the rectangles. Seems slightly questionable. > [...] > > > + /* image data for bitmap subtitles, in AVFrame.format (AVPixelFormat) */ > > > + uint8_t *data[AV_NUM_DATA_POINTERS]; > > > + int linesize[AV_NUM_DATA_POINTERS]; > > > > Does it define the format somewhere? > > > > in AVFrame.format Right. > > > +} AVFrameSubtitleRectangle; > > > > The struct should probably be extendable (so its size is not part of > > the ABI). > > > > In the current design, there is no need to ever allocate them: you set the > number of rectangles in your AVFrame, then call av_frame_get_buffer() > which will allocate them. You still need to make this explicit. Also, allocation isn't the only operation you can do on a struct. > > > @@ -224,9 +242,12 @@ typedef struct AVFrame { > > > * For packed audio, there is just one data pointer, and linesize[0] > > > * contains the total size of the buffer for all channels. > > > * > > > + * For subtitles, each pointer corresponds to an AVFrameSubtitleRectangle. > > > + * > > > > It should also mention how many data pointers there are. > > > > Btw. I'm very not much of a fan to have to deal with extended_data. And > > the duplication with data. (Does it even document what data[] contains > > for subtitles?) > > > > It's exactly like audio: data[] points on the firsts extended_data[] Well, for audio and video, I have my own structs instead of using AVFrame (because AVFrame is very restrictive, not extensible, and hard to use). And wiring up these extended* things is terrible. Actually, the worse offenders are buf/extended_buf because of their weird semantics.
Le decadi 30 brumaire, an CCXXV, Clement Boesch a écrit : > Right, I'll probably need that for the sub2video filter. I'll think about > it unless you want to implement it yourself. As always, for me, time is the limiting factor. For now, I believe the only urgent thing to do is to decide what these heartbeat frames look like: do they just have data[0] = NULL? format = -1? > Ah, yeah, I discarded that idea. First, I think it's a too large change > for now, and I just won't be able to finish this patchset ever if I start > going that way. Second, apparently it's considered over engineering by the > API users I talked with. They're fine using libass for markup or a simple > freetype-like custom engine for when there is no markup. > > If we later decide to export as an AST, we will add a field to that > rectangle and that will do the trick. For now, this is already such as > huge work that I'm not willing to stick that in (and probably won't ever > TBH). It's already raising a ton of questions wrt design. Ok. Regards,
diff --git a/ffmpeg.c b/ffmpeg.c index 36a921a..8222b8b 100644 --- a/ffmpeg.c +++ b/ffmpeg.c @@ -856,6 +856,50 @@ static int check_recording_time(OutputStream *ost) return 1; } +static void do_subtitle_out(OutputFile *of, + OutputStream *ost, + AVFrame *frame) +{ + AVCodecContext *enc = ost->enc_ctx; + AVPacket pkt; + int ret; + + av_init_packet(&pkt); + pkt.data = NULL; + pkt.size = 0; + + if (!check_recording_time(ost)) + return; + + ost->frames_encoded++; + + av_assert0(pkt.size || !pkt.data); + update_benchmark(NULL); + + ret = avcodec_send_frame(enc, frame); + if (ret < 0) + goto error; + + while (1) { + ret = avcodec_receive_packet(enc, &pkt); + if (ret == AVERROR(EAGAIN)) + break; + if (ret < 0) + goto error; + + update_benchmark("encode_subtitle %d.%d", ost->file_index, ost->index); + + av_packet_rescale_ts(&pkt, enc->time_base, ost->st->time_base); + + output_packet(of, &pkt, ost); + } + + return; +error: + av_log(NULL, AV_LOG_FATAL, "Subtitle encoding failed\n"); + exit_program(1); +} + static void do_audio_out(OutputFile *of, OutputStream *ost, AVFrame *frame) { @@ -916,89 +960,6 @@ error: exit_program(1); } -static void do_subtitle_out(OutputFile *of, - OutputStream *ost, - AVSubtitle *sub) -{ - int subtitle_out_max_size = 1024 * 1024; - int subtitle_out_size, nb, i; - AVCodecContext *enc; - AVPacket pkt; - int64_t pts; - - if (sub->pts == AV_NOPTS_VALUE) { - av_log(NULL, AV_LOG_ERROR, "Subtitle packets must have a pts\n"); - if (exit_on_error) - exit_program(1); - return; - } - - enc = ost->enc_ctx; - - if (!subtitle_out) { - subtitle_out = av_malloc(subtitle_out_max_size); - if (!subtitle_out) { - av_log(NULL, AV_LOG_FATAL, "Failed to allocate subtitle_out\n"); - exit_program(1); - } - } - - /* Note: DVB subtitle need one packet to draw them and one other - packet to clear them */ - /* XXX: signal it in the codec context ? */ - if (enc->codec_id == AV_CODEC_ID_DVB_SUBTITLE) - nb = 2; - else - nb = 1; - - /* shift timestamp to honor -ss and make check_recording_time() work with -t */ - pts = sub->pts; - if (output_files[ost->file_index]->start_time != AV_NOPTS_VALUE) - pts -= output_files[ost->file_index]->start_time; - for (i = 0; i < nb; i++) { - unsigned save_num_rects = sub->num_rects; - - ost->sync_opts = av_rescale_q(pts, AV_TIME_BASE_Q, enc->time_base); - if (!check_recording_time(ost)) - return; - - sub->pts = pts; - // start_display_time is required to be 0 - sub->pts += av_rescale_q(sub->start_display_time, (AVRational){ 1, 1000 }, AV_TIME_BASE_Q); - sub->end_display_time -= sub->start_display_time; - sub->start_display_time = 0; - if (i == 1) - sub->num_rects = 0; - - ost->frames_encoded++; - - subtitle_out_size = avcodec_encode_subtitle(enc, subtitle_out, - subtitle_out_max_size, sub); - if (i == 1) - sub->num_rects = save_num_rects; - if (subtitle_out_size < 0) { - av_log(NULL, AV_LOG_FATAL, "Subtitle encoding failed\n"); - exit_program(1); - } - - av_init_packet(&pkt); - pkt.data = subtitle_out; - pkt.size = subtitle_out_size; - pkt.pts = av_rescale_q(sub->pts, AV_TIME_BASE_Q, ost->st->time_base); - pkt.duration = av_rescale_q(sub->end_display_time, (AVRational){ 1, 1000 }, ost->st->time_base); - if (enc->codec_id == AV_CODEC_ID_DVB_SUBTITLE) { - /* XXX: the pts correction is handled here. Maybe handling - it in the codec would be better */ - if (i == 0) - pkt.pts += 90 * sub->start_display_time; - else - pkt.pts += 90 * sub->end_display_time; - } - pkt.dts = pkt.pts; - output_packet(of, &pkt, ost); - } -} - static void do_video_out(OutputFile *of, OutputStream *ost, AVFrame *next_picture, @@ -1431,6 +1392,8 @@ static int reap_filters(int flush) filtered_frame->pts = av_rescale_q(filtered_frame->pts, filter->inputs[0]->time_base, enc->time_base) - av_rescale_q(start_time, AV_TIME_BASE_Q, enc->time_base); + filtered_frame->pkt_duration = + av_rescale_q(filtered_frame->pkt_duration, filter->inputs[0]->time_base, enc->time_base); } //if (ost->source_index >= 0) // *filtered_frame= *input_streams[ost->source_index]->decoded_frame; //for me_threshold @@ -1458,8 +1421,10 @@ static int reap_filters(int flush) } do_audio_out(of, ost, filtered_frame); break; + case AVMEDIA_TYPE_SUBTITLE: + do_subtitle_out(of, ost, filtered_frame); + break; default: - // TODO support subtitle filters av_assert0(0); } @@ -2081,6 +2046,36 @@ static int send_frame_to_filters(InputStream *ist, AVFrame *decoded_frame) return ret; } +static int decode_subtitle(InputStream *ist, AVPacket *pkt, int *got_output) +{ + int i, ret, err; + AVFrame *decoded_frame; + + if (!ist->decoded_frame && !(ist->decoded_frame = av_frame_alloc())) + return AVERROR(ENOMEM); + if (!ist->filter_frame && !(ist->filter_frame = av_frame_alloc())) + return AVERROR(ENOMEM); + decoded_frame = ist->decoded_frame; + + update_benchmark(NULL); + ret = decode(ist->dec_ctx, decoded_frame, got_output, pkt); + update_benchmark("decode_subtitle %d.%d", ist->file_index, ist->st->index); + + if (ret != AVERROR_EOF) + check_decode_result(ist, got_output, ret); + + if (!*got_output || ret < 0) + return ret; + + ist->frames_decoded++; + + err = send_frame_to_filters(ist, decoded_frame); + + av_frame_unref(ist->filter_frame); + av_frame_unref(decoded_frame); + return err < 0 ? err : ret; +} + static int decode_audio(InputStream *ist, AVPacket *pkt, int *got_output) { AVFrame *decoded_frame; @@ -2335,65 +2330,6 @@ fail: return err < 0 ? err : ret; } -static int transcode_subtitles(InputStream *ist, AVPacket *pkt, int *got_output) -{ - AVSubtitle subtitle; - int i, ret = avcodec_decode_subtitle2(ist->dec_ctx, - &subtitle, got_output, pkt); - - check_decode_result(NULL, got_output, ret); - - if (ret < 0 || !*got_output) { - if (!pkt->size) - sub2video_flush(ist); - return ret; - } - - if (ist->fix_sub_duration) { - int end = 1; - if (ist->prev_sub.got_output) { - end = av_rescale(subtitle.pts - ist->prev_sub.subtitle.pts, - 1000, AV_TIME_BASE); - if (end < ist->prev_sub.subtitle.end_display_time) { - av_log(ist->dec_ctx, AV_LOG_DEBUG, - "Subtitle duration reduced from %d to %d%s\n", - ist->prev_sub.subtitle.end_display_time, end, - end <= 0 ? ", dropping it" : ""); - ist->prev_sub.subtitle.end_display_time = end; - } - } - FFSWAP(int, *got_output, ist->prev_sub.got_output); - FFSWAP(int, ret, ist->prev_sub.ret); - FFSWAP(AVSubtitle, subtitle, ist->prev_sub.subtitle); - if (end <= 0) - goto out; - } - - if (!*got_output) - return ret; - - sub2video_update(ist, &subtitle); - - if (!subtitle.num_rects) - goto out; - - ist->frames_decoded++; - - for (i = 0; i < nb_output_streams; i++) { - OutputStream *ost = output_streams[i]; - - if (!check_output_constraints(ist, ost) || !ost->encoding_needed - || ost->enc->type != AVMEDIA_TYPE_SUBTITLE) - continue; - - do_subtitle_out(output_files[ost->file_index], ost, &subtitle); - } - -out: - avsubtitle_free(&subtitle); - return ret; -} - static int send_filter_eof(InputStream *ist) { int i, ret; @@ -2477,11 +2413,7 @@ static int process_input_packet(InputStream *ist, const AVPacket *pkt, int no_eo ist->next_pts += duration; //FIXME the duration is not correct in some cases break; case AVMEDIA_TYPE_SUBTITLE: - if (repeating) - break; - ret = transcode_subtitles(ist, &avpkt, &got_output); - if (!pkt && ret >= 0) - ret = AVERROR_EOF; + ret = decode_subtitle(ist, repeating ? NULL : &avpkt, &got_output); break; default: return -1; @@ -3293,7 +3225,8 @@ static int transcode_init(void) #endif if ((enc_ctx->codec_type == AVMEDIA_TYPE_VIDEO || - enc_ctx->codec_type == AVMEDIA_TYPE_AUDIO) && + enc_ctx->codec_type == AVMEDIA_TYPE_AUDIO || + enc_ctx->codec_type == AVMEDIA_TYPE_SUBTITLE) && filtergraph_is_simple(ost->filter->graph)) { FilterGraph *fg = ost->filter->graph; if (configure_filtergraph(fg)) { diff --git a/ffmpeg_filter.c b/ffmpeg_filter.c index 27aeca0..743c96b 100644 --- a/ffmpeg_filter.c +++ b/ffmpeg_filter.c @@ -231,8 +231,10 @@ static void init_input_filter(FilterGraph *fg, AVFilterInOut *in) int i; // TODO: support other filter types - if (type != AVMEDIA_TYPE_VIDEO && type != AVMEDIA_TYPE_AUDIO) { - av_log(NULL, AV_LOG_FATAL, "Only video and audio filters supported " + if (type != AVMEDIA_TYPE_VIDEO && + type != AVMEDIA_TYPE_AUDIO && + type != AVMEDIA_TYPE_SUBTITLE) { + av_log(NULL, AV_LOG_FATAL, "Only video, audio and subtitle filters supported " "currently.\n"); exit_program(1); } @@ -633,6 +635,29 @@ static int configure_output_audio_filter(FilterGraph *fg, OutputFilter *ofilter, return 0; } +static int configure_output_subtitle_filter(FilterGraph *fg, OutputFilter *ofilter, AVFilterInOut *out) +{ + OutputStream *ost = ofilter->ost; + //OutputFile *of = output_files[ost->file_index]; + //AVCodecContext *codec = ost->enc_ctx; + AVFilterContext *last_filter = out->filter_ctx; + int pad_idx = out->pad_idx; + char name[255]; + int ret; + + snprintf(name, sizeof(name), "output stream %d:%d", ost->file_index, ost->index); + ret = avfilter_graph_create_filter(&ofilter->filter, + avfilter_get_by_name("sbuffersink"), + name, NULL, NULL, fg->graph); + if (ret < 0) + return ret; + + if ((ret = avfilter_link(last_filter, pad_idx, ofilter->filter, 0)) < 0) + return ret; + + return 0; +} + #define DESCRIBE_FILTER_LINK(f, inout, in) \ { \ AVFilterContext *ctx = inout->filter_ctx; \ @@ -663,6 +688,7 @@ int configure_output_filter(FilterGraph *fg, OutputFilter *ofilter, AVFilterInOu switch (avfilter_pad_get_type(out->filter_ctx->output_pads, out->pad_idx)) { case AVMEDIA_TYPE_VIDEO: return configure_output_video_filter(fg, ofilter, out); case AVMEDIA_TYPE_AUDIO: return configure_output_audio_filter(fg, ofilter, out); + case AVMEDIA_TYPE_SUBTITLE: return configure_output_subtitle_filter(fg, ofilter, out); default: av_assert0(0); } } @@ -957,6 +983,69 @@ static int configure_input_audio_filter(FilterGraph *fg, InputFilter *ifilter, return 0; } +static int configure_input_subtitle_filter(FilterGraph *fg, InputFilter *ifilter, + AVFilterInOut *in) +{ + AVFilterContext *last_filter; + const AVFilter *sbuffer_filt = avfilter_get_by_name("sbuffer"); + InputStream *ist = ifilter->ist; + const AVCodecContext *avctx = ist->dec_ctx; + //InputFile *f = input_files[ist->file_index]; + AVBPrint args; + char name[255]; + int ret; //, pad_idx = 0; + //int64_t tsoffset = 0; + const AVRational tb = ist->st->time_base; + + // XXX: codec should be opened here so we could access avctx->codec_descriptor + const AVCodecDescriptor *codec_desc = avcodec_descriptor_get(avctx->codec_id); + + if (avctx->codec_type != AVMEDIA_TYPE_SUBTITLE) { + av_log(NULL, AV_LOG_ERROR, "Cannot connect subtitle filter to %s input\n", + av_get_media_type_string(avctx->codec_type)); + return AVERROR(EINVAL); + } + + av_bprint_init(&args, 0, AV_BPRINT_SIZE_AUTOMATIC); + av_bprintf(&args, "time_base=%d/%d:format=", tb.num, tb.den); + + if (codec_desc->props & AV_CODEC_PROP_TEXT_SUB) + av_bprintf(&args, "text"); + else if (codec_desc->props & AV_CODEC_PROP_BITMAP_SUB) + av_bprintf(&args, "bitmap"); + else + av_bprintf(&args, "unspecified"); + + snprintf(name, sizeof(name), "graph %d input from stream %d:%d", fg->index, + ist->file_index, ist->st->index); + + if ((ret = avfilter_graph_create_filter(&ifilter->filter, sbuffer_filt, + name, args.str, NULL, + fg->graph)) < 0) + return ret; + last_filter = ifilter->filter; + +#if 0 + snprintf(name, sizeof(name), "trim for input stream %d:%d", + ist->file_index, ist->st->index); + if (copy_ts) { + tsoffset = f->start_time == AV_NOPTS_VALUE ? 0 : f->start_time; + if (!start_at_zero && f->ctx->start_time != AV_NOPTS_VALUE) + tsoffset += f->ctx->start_time; + } + ret = insert_trim(((f->start_time == AV_NOPTS_VALUE) || !f->accurate_seek) ? + AV_NOPTS_VALUE : tsoffset, f->recording_time, + &last_filter, &pad_idx, name); + if (ret < 0) + return ret; +#endif + + if ((ret = avfilter_link(last_filter, 0, in->filter_ctx, in->pad_idx)) < 0) + return ret; + + return 0; +} + static int configure_input_filter(FilterGraph *fg, InputFilter *ifilter, AVFilterInOut *in) { @@ -972,6 +1061,7 @@ static int configure_input_filter(FilterGraph *fg, InputFilter *ifilter, switch (avfilter_pad_get_type(in->filter_ctx->input_pads, in->pad_idx)) { case AVMEDIA_TYPE_VIDEO: return configure_input_video_filter(fg, ifilter, in); case AVMEDIA_TYPE_AUDIO: return configure_input_audio_filter(fg, ifilter, in); + case AVMEDIA_TYPE_SUBTITLE: return configure_input_subtitle_filter(fg, ifilter, in); default: av_assert0(0); } } diff --git a/ffmpeg_opt.c b/ffmpeg_opt.c index 4d25fff..a8d9a9f 100644 --- a/ffmpeg_opt.c +++ b/ffmpeg_opt.c @@ -1487,8 +1487,13 @@ static char *get_ost_filters(OptionsContext *o, AVFormatContext *oc, else if (ost->filters) return av_strdup(ost->filters); - return av_strdup(st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO ? - "null" : "anull"); + switch (st->codecpar->codec_type) { + case AVMEDIA_TYPE_VIDEO: return av_strdup("null"); + case AVMEDIA_TYPE_AUDIO: return av_strdup("anull"); + case AVMEDIA_TYPE_SUBTITLE: return av_strdup("snull"); + default: + av_assert0(0); + } } static void check_streamcopy_filters(OptionsContext *o, AVFormatContext *oc, @@ -1824,6 +1829,9 @@ static OutputStream *new_subtitle_stream(OptionsContext *o, AVFormatContext *oc, MATCH_PER_STREAM_OPT(copy_initial_nonkeyframes, i, ost->copy_initial_nonkeyframes, oc, st); + MATCH_PER_STREAM_OPT(filter_scripts, str, ost->filters_script, oc, st); + MATCH_PER_STREAM_OPT(filters, str, ost->filters, oc, st); + if (!ost->stream_copy) { char *frame_size = NULL; @@ -1832,8 +1840,15 @@ static OutputStream *new_subtitle_stream(OptionsContext *o, AVFormatContext *oc, av_log(NULL, AV_LOG_FATAL, "Invalid frame size: %s.\n", frame_size); exit_program(1); } + + ost->avfilter = get_ost_filters(o, oc, ost); + if (!ost->avfilter) + exit_program(1); } + if (ost->stream_copy) + check_streamcopy_filters(o, oc, ost, AVMEDIA_TYPE_SUBTITLE); + return ost; } @@ -1961,6 +1976,7 @@ static void init_output_filter(OutputFilter *ofilter, OptionsContext *o, switch (ofilter->type) { case AVMEDIA_TYPE_VIDEO: ost = new_video_stream(o, oc, -1); break; case AVMEDIA_TYPE_AUDIO: ost = new_audio_stream(o, oc, -1); break; + case AVMEDIA_TYPE_SUBTITLE: ost = new_subtitle_stream(o, oc, -1); break; default: av_log(NULL, AV_LOG_FATAL, "Only video and audio filters are supported " "currently.\n"); @@ -1980,7 +1996,7 @@ static void init_output_filter(OutputFilter *ofilter, OptionsContext *o, } if (ost->avfilter && (ost->filters || ost->filters_script)) { - const char *opt = ost->filters ? "-vf/-af/-filter" : "-filter_script"; + const char *opt = ost->filters ? "-vf/-af/-sf/-filter" : "-filter_script"; av_log(NULL, AV_LOG_ERROR, "%s '%s' was specified through the %s option " "for output stream %d:%d, which is fed from a complex filtergraph.\n" @@ -2388,7 +2404,8 @@ loop_end: ist->decoding_needed |= DECODING_FOR_OST; if (ost->st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO || - ost->st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) { + ost->st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO || + ost->st->codecpar->codec_type == AVMEDIA_TYPE_SUBTITLE) { err = init_simple_filtergraph(ist, ost); if (err < 0) { av_log(NULL, AV_LOG_ERROR, @@ -2929,6 +2946,12 @@ static int opt_audio_filters(void *optctx, const char *opt, const char *arg) return parse_option(o, "filter:a", arg, options); } +static int opt_subtitle_filters(void *optctx, const char *opt, const char *arg) +{ + OptionsContext *o = optctx; + return parse_option(o, "filter:s", arg, options); +} + static int opt_vsync(void *optctx, const char *opt, const char *arg) { if (!av_strcasecmp(arg, "cfr")) video_sync_method = VSYNC_CFR; @@ -3540,6 +3563,8 @@ const OptionDef options[] = { "fix subtitles duration" }, { "canvas_size", OPT_SUBTITLE | HAS_ARG | OPT_STRING | OPT_SPEC | OPT_INPUT, { .off = OFFSET(canvas_sizes) }, "set canvas size (WxH or abbreviation)", "size" }, + { "sf", OPT_SUBTITLE | HAS_ARG | OPT_PERFILE | OPT_OUTPUT, { .func_arg = opt_subtitle_filters }, + "set subtitle filters", "filter_graph" }, /* grab options */ { "vc", HAS_ARG | OPT_EXPERT | OPT_VIDEO, { .func_arg = opt_video_channel }, diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h index 211112f..0912bff 100644 --- a/libavcodec/avcodec.h +++ b/libavcodec/avcodec.h @@ -4838,7 +4838,10 @@ int avcodec_decode_video2(AVCodecContext *avctx, AVFrame *picture, * must be freed with avsubtitle_free if *got_sub_ptr is set. * @param[in,out] got_sub_ptr Zero if no subtitle could be decompressed, otherwise, it is nonzero. * @param[in] avpkt The input AVPacket containing the input buffer. + * + * @deprecated Use avcodec_send_packet() and avcodec_receive_frame(). */ +attribute_deprecated int avcodec_decode_subtitle2(AVCodecContext *avctx, AVSubtitle *sub, int *got_sub_ptr, AVPacket *avpkt); @@ -5322,6 +5325,12 @@ attribute_deprecated int avcodec_encode_video2(AVCodecContext *avctx, AVPacket *avpkt, const AVFrame *frame, int *got_packet_ptr); +/** + * Encode a subtitle. + * + * @deprecated use avcodec_send_frame()/avcodec_receive_packet() instead + */ +attribute_deprecated int avcodec_encode_subtitle(AVCodecContext *avctx, uint8_t *buf, int buf_size, const AVSubtitle *sub); diff --git a/libavcodec/utils.c b/libavcodec/utils.c index 87de15f..add7edf 100644 --- a/libavcodec/utils.c +++ b/libavcodec/utils.c @@ -2050,6 +2050,86 @@ int avcodec_encode_subtitle(AVCodecContext *avctx, uint8_t *buf, int buf_size, return ret; } +// TODO: delete this compatibility code when all subtitles encoders moved +// to send_frame +static int encode_subtitle_frame(AVCodecContext *avctx, + AVPacket *avpkt, + const AVFrame *frame, + int *got_packet_ptr) +{ + int i, ret; + +FF_DISABLE_DEPRECATION_WARNINGS + AVPacket tmp_pkt; + AVSubtitle subtitle; + + get_subtitle_defaults(&subtitle); + + /* Allocate (once) a temporary large output subtitle buffer */ + if (!avctx->internal->byte_buffer) { + const int subtitle_out_max_size = 1024 * 1024; + uint8_t *subtitle_out = av_malloc(subtitle_out_max_size); + if (!subtitle_out) + return AVERROR(ENOMEM); + + avctx->internal->byte_buffer = subtitle_out; + avctx->internal->byte_buffer_size = subtitle_out_max_size; + } + + /* Craft an AVSubtitle from the AVFrame */ + subtitle.format = frame->format == AV_PIX_FMT_NONE; + subtitle.rects = av_mallocz_array(frame->sub_nb_rects, sizeof(*subtitle.rects)); + if (!subtitle.rects) + return AVERROR(ENOMEM); + subtitle.num_rects = frame->sub_nb_rects; + subtitle.pts = frame->pts; + + for (i = 0; i < frame->sub_nb_rects; i++) { + const AVFrameSubtitleRectangle *src_rect = (AVFrameSubtitleRectangle *)frame->extended_data[i]; + AVSubtitleRect *dst_rect; + + dst_rect = av_mallocz(sizeof(*subtitle.rects[i])); + if (!dst_rect) + return AVERROR(ENOMEM); + subtitle.rects[i] = dst_rect; + + if (subtitle.format) { + dst_rect->type = SUBTITLE_ASS; + dst_rect->ass = src_rect->text; + } else { + dst_rect->type = SUBTITLE_BITMAP; + dst_rect->x = src_rect->x; + dst_rect->y = src_rect->x; + dst_rect->w = src_rect->w; + dst_rect->h = src_rect->h; + memcpy(dst_rect->data, src_rect->data, sizeof(dst_rect->data)); + memcpy(dst_rect->linesize, src_rect->linesize, sizeof(dst_rect->linesize)); + } + dst_rect->flags = src_rect->flags; + } + + /* Send it to the old API */ + ret = avcodec_encode_subtitle(avctx, + avctx->internal->byte_buffer, + avctx->internal->byte_buffer_size, + &subtitle); + if (ret < 0) + return ret; + + /* Wrap the encoded buffer into a ref-counted AVPacket */ + av_init_packet(&tmp_pkt); + tmp_pkt.data = avctx->internal->byte_buffer; + tmp_pkt.size = ret; + av_packet_ref(avpkt, &tmp_pkt); + avpkt->pts = frame->pts; + avpkt->dts = frame->pkt_dts; + avpkt->duration = frame->pkt_duration; +FF_ENABLE_DEPRECATION_WARNINGS + + *got_packet_ptr = 1; + return ret; +} + /** * Attempt to guess proper monotonic timestamps for decoded video frames * which might have incorrect times. Input timestamps may wrap around, in @@ -2764,6 +2844,134 @@ void avsubtitle_free(AVSubtitle *sub) memset(sub, 0, sizeof(AVSubtitle)); } +static int decode_subtitle_frame(AVCodecContext *avctx, + AVFrame *frame, + int *got_frame_ptr, + const AVPacket *avpkt) +{ + int i, ret = 0; + + *got_frame_ptr = 0; + + if (!avctx->codec) + return AVERROR(EINVAL); + + if (!avpkt->data && avpkt->size) { + av_log(avctx, AV_LOG_ERROR, "invalid packet: NULL data, size != 0\n"); + return AVERROR(EINVAL); + } + + av_assert0(avctx->codec->type == AVMEDIA_TYPE_SUBTITLE); + + av_frame_unref(frame); + + if ((avctx->codec->capabilities & AV_CODEC_CAP_DELAY) || avpkt->size) { + AVPacket pkt_recoded; + AVPacket tmp = *avpkt; + int did_split = av_packet_split_side_data(&tmp); + + if (did_split) { + /* FFMIN() prevents overflow in case the packet wasn't allocated with + * proper padding. + * If the side data is smaller than the buffer padding size, the + * remaining bytes should have already been filled with zeros by the + * original packet allocation anyway. */ + memset(tmp.data + tmp.size, 0, + FFMIN(avpkt->size - tmp.size, AV_INPUT_BUFFER_PADDING_SIZE)); + } + pkt_recoded = tmp; + + ret = recode_subtitle(avctx, &pkt_recoded, &tmp); + if (ret >= 0) { + int err_ret; + AVSubtitle sub = {0}; + + avctx->internal->pkt = &pkt_recoded; + + ret = avctx->codec->decode(avctx, &sub, got_frame_ptr, &pkt_recoded); + av_assert1((ret >= 0) >= !!*got_frame_ptr && + !!*got_frame_ptr >= !!sub.num_rects); + + for (i = 0; i < sub.num_rects; i++) { + if (sub.rects[i]->ass && !utf8_check(sub.rects[i]->ass)) { + av_log(avctx, AV_LOG_ERROR, + "Invalid UTF-8 in decoded subtitles text; " + "maybe missing -sub_charenc option\n"); + avsubtitle_free(&sub); + return AVERROR_INVALIDDATA; + } + } + + frame->pts = + frame->best_effort_timestamp = avpkt->pts; + frame->pkt_duration = avpkt->duration; + frame->pkt_pos = avpkt->pos; + frame->sub_nb_rects = sub.num_rects; + + if (avctx->codec_descriptor->props & AV_CODEC_PROP_BITMAP_SUB) + frame->format = AV_PIX_FMT_PAL8; + else if (avctx->codec_descriptor->props & AV_CODEC_PROP_TEXT_SUB) + frame->format = AV_PIX_FMT_NONE; + + frame->sub_start_display = sub.start_display_time ? av_rescale_q(sub.start_display_time, av_make_q(1, 1000), AV_TIME_BASE_Q) + : AV_NOPTS_VALUE; + frame->sub_end_display = sub.end_display_time ? av_rescale_q(sub.end_display_time, av_make_q(1, 1000), AV_TIME_BASE_Q) + : AV_NOPTS_VALUE; + + /* Allocate sub_nb_rects AVFrameSubtitleRectangle */ + err_ret = av_frame_get_buffer(frame, 0); + if (err_ret < 0) { + avsubtitle_free(&sub); + return err_ret; + } + + /* Transfer data from AVSubtitleRect to AVFrameSubtitleRectangle */ + for (i = 0; i < sub.num_rects; i++) { + AVSubtitleRect *src_rect = sub.rects[i]; + AVFrameSubtitleRectangle *dst_rect = (AVFrameSubtitleRectangle *)frame->extended_data[i]; + + if (frame->format == AV_PIX_FMT_NONE) { + dst_rect->text = src_rect->ass; + } else { + dst_rect->x = src_rect->x; + dst_rect->y = src_rect->y; + dst_rect->w = src_rect->w; + dst_rect->h = src_rect->h; + memcpy(dst_rect->data, src_rect->data, sizeof(src_rect->data)); + memcpy(dst_rect->linesize, src_rect->linesize, sizeof(src_rect->linesize)); + } + dst_rect->flags = src_rect->flags; + + // we free and make sure it is set to NULL so the data inside + // is not freed after destructing the AVSubtitle + av_freep(&sub.rects[i]); + } + sub.num_rects = 0; + avsubtitle_free(&sub); + + if (tmp.data != pkt_recoded.data) { // did we recode? + /* prevent from destroying side data from original packet */ + pkt_recoded.side_data = NULL; + pkt_recoded.side_data_elems = 0; + + av_packet_unref(&pkt_recoded); + } + avctx->internal->pkt = NULL; + } + + if (did_split) { + av_packet_free_side_data(&tmp); + if(ret == tmp.size) + ret = avpkt->size; + } + + if (*got_frame_ptr) + avctx->frame_number++; + } + + return ret; +} + static int do_decode(AVCodecContext *avctx, AVPacket *pkt) { int got_frame; @@ -2792,6 +3000,9 @@ static int do_decode(AVCodecContext *avctx, AVPacket *pkt) } else if (avctx->codec_type == AVMEDIA_TYPE_AUDIO) { ret = avcodec_decode_audio4(avctx, avctx->internal->buffer_frame, &got_frame, pkt); + } else if (avctx->codec_type == AVMEDIA_TYPE_SUBTITLE) { + ret = decode_subtitle_frame(avctx, avctx->internal->buffer_frame, + &got_frame, pkt); } else { ret = AVERROR(EINVAL); } @@ -2941,6 +3152,9 @@ static int do_encode(AVCodecContext *avctx, const AVFrame *frame, int *got_packe } else if (avctx->codec_type == AVMEDIA_TYPE_AUDIO) { ret = avcodec_encode_audio2(avctx, avctx->internal->buffer_pkt, frame, got_packet); + } else if (avctx->codec_type == AVMEDIA_TYPE_SUBTITLE) { + ret = encode_subtitle_frame(avctx, avctx->internal->buffer_pkt, + frame, got_packet); } else { ret = AVERROR(EINVAL); } diff --git a/libavfilter/Makefile b/libavfilter/Makefile index 7ed4696..5972b27 100644 --- a/libavfilter/Makefile +++ b/libavfilter/Makefile @@ -329,6 +329,9 @@ OBJS-$(CONFIG_YUVTESTSRC_FILTER) += vsrc_testsrc.o OBJS-$(CONFIG_NULLSINK_FILTER) += vsink_nullsink.o +# subtitle filters +OBJS-$(CONFIG_SNULL_FILTER) += sf_snull.o + # multimedia filters OBJS-$(CONFIG_ADRAWGRAPH_FILTER) += f_drawgraph.o OBJS-$(CONFIG_AHISTOGRAM_FILTER) += avf_ahistogram.o diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c index 82a65ee..e5d20cd 100644 --- a/libavfilter/allfilters.c +++ b/libavfilter/allfilters.c @@ -45,6 +45,7 @@ void avfilter_register_all(void) return; initialized = 1; + /* audio filters */ REGISTER_FILTER(ABENCH, abench, af); REGISTER_FILTER(ACOMPRESSOR, acompressor, af); REGISTER_FILTER(ACROSSFADE, acrossfade, af); @@ -138,6 +139,7 @@ void avfilter_register_all(void) REGISTER_FILTER(ANULLSINK, anullsink, asink); + /* video filters */ REGISTER_FILTER(ALPHAEXTRACT, alphaextract, vf); REGISTER_FILTER(ALPHAMERGE, alphamerge, vf); REGISTER_FILTER(ASS, ass, vf); @@ -345,6 +347,9 @@ void avfilter_register_all(void) REGISTER_FILTER(NULLSINK, nullsink, vsink); + /* subtitle filters */ + REGISTER_FILTER(SNULL, snull, sf); + /* multimedia filters */ REGISTER_FILTER(ADRAWGRAPH, adrawgraph, avf); REGISTER_FILTER(AHISTOGRAM, ahistogram, avf); @@ -367,10 +372,13 @@ void avfilter_register_all(void) /* those filters are part of public or internal API => registered * unconditionally */ REGISTER_FILTER_UNCONDITIONAL(asrc_abuffer); + REGISTER_FILTER_UNCONDITIONAL(ssrc_sbuffer); REGISTER_FILTER_UNCONDITIONAL(vsrc_buffer); REGISTER_FILTER_UNCONDITIONAL(asink_abuffer); + REGISTER_FILTER_UNCONDITIONAL(ssink_sbuffer); REGISTER_FILTER_UNCONDITIONAL(vsink_buffer); REGISTER_FILTER_UNCONDITIONAL(af_afifo); + REGISTER_FILTER_UNCONDITIONAL(sf_sfifo); REGISTER_FILTER_UNCONDITIONAL(vf_fifo); ff_opencl_register_filter_kernel_code_all(); } diff --git a/libavfilter/avfilter.c b/libavfilter/avfilter.c index 1d469c3..908ab2c 100644 --- a/libavfilter/avfilter.c +++ b/libavfilter/avfilter.c @@ -314,6 +314,12 @@ int avfilter_config_links(AVFilterContext *filter) if (!link->time_base.num && !link->time_base.den) link->time_base = (AVRational) {1, link->sample_rate}; + break; + + case AVMEDIA_TYPE_SUBTITLE: + if (!link->time_base.num && !link->time_base.den) + link->time_base = inlink ? inlink->time_base : AV_TIME_BASE_Q; + break; } if (link->src->nb_inputs && link->src->inputs[0]->hw_frames_ctx && @@ -1201,6 +1207,8 @@ int ff_filter_frame(AVFilterLink *link, AVFrame *frame) av_assert1(frame->width == link->w); av_assert1(frame->height == link->h); } + } else if (link->type == AVMEDIA_TYPE_SUBTITLE) { + // TODO? } else { if (frame->format != link->format) { av_log(link->dst, AV_LOG_ERROR, "Format change is not supported\n"); diff --git a/libavfilter/buffersink.c b/libavfilter/buffersink.c index 2feb56d..f38fb1a 100644 --- a/libavfilter/buffersink.c +++ b/libavfilter/buffersink.c @@ -388,6 +388,11 @@ static int asink_query_formats(AVFilterContext *ctx) return 0; } +static av_cold int ssink_init(AVFilterContext *ctx, void *opaque) +{ + return common_init(ctx); +} + #define OFFSET(x) offsetof(BufferSinkContext, x) #define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM static const AVOption buffersink_options[] = { @@ -405,9 +410,11 @@ static const AVOption abuffersink_options[] = { { NULL }, }; #undef FLAGS +static const AVOption sbuffersink_options[] = {NULL}; AVFILTER_DEFINE_CLASS(buffersink); AVFILTER_DEFINE_CLASS(abuffersink); +AVFILTER_DEFINE_CLASS(sbuffersink); static const AVFilterPad avfilter_vsink_buffer_inputs[] = { { @@ -452,3 +459,24 @@ AVFilter ff_asink_abuffer = { .inputs = avfilter_asink_abuffer_inputs, .outputs = NULL, }; + +static const AVFilterPad avfilter_ssink_sbuffer_inputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_SUBTITLE, + .filter_frame = filter_frame, + }, + { NULL } +}; + +AVFilter ff_ssink_sbuffer = { + .name = "sbuffersink", + .description = NULL_IF_CONFIG_SMALL("Buffer subtitle frames, and make them available to the end of the filter graph."), + .priv_class = &sbuffersink_class, + .priv_size = sizeof(BufferSinkContext), + .init_opaque = ssink_init, + .uninit = uninit, + //.query_formats = ssink_query_formats, + .inputs = avfilter_ssink_sbuffer_inputs, + .outputs = NULL, +}; diff --git a/libavfilter/buffersrc.c b/libavfilter/buffersrc.c index 9294811..c20b7b2 100644 --- a/libavfilter/buffersrc.c +++ b/libavfilter/buffersrc.c @@ -63,6 +63,9 @@ typedef struct BufferSourceContext { uint64_t channel_layout; char *channel_layout_str; + /* subtitle only */ + int sub_format; + int got_format_from_params; int eof; } BufferSourceContext; @@ -128,6 +131,8 @@ int av_buffersrc_parameters_set(AVFilterContext *ctx, AVBufferSrcParameters *par if (param->channel_layout) s->channel_layout = param->channel_layout; break; + case AVMEDIA_TYPE_SUBTITLE: + break; default: return AVERROR_BUG; } @@ -204,6 +209,8 @@ static int av_buffersrc_add_frame_internal(AVFilterContext *ctx, CHECK_AUDIO_PARAM_CHANGE(ctx, s, frame->sample_rate, frame->channel_layout, av_frame_get_channels(frame), frame->format); break; + case AVMEDIA_TYPE_SUBTITLE: + break; default: return AVERROR(EINVAL); } @@ -271,6 +278,7 @@ unsigned av_buffersrc_get_nb_failed_requests(AVFilterContext *buffer_src) #define OFFSET(x) offsetof(BufferSourceContext, x) #define A AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_AUDIO_PARAM #define V AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM +#define S AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_SUBTITLE_PARAM static const AVOption buffer_options[] = { { "width", NULL, OFFSET(w), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INT_MAX, V }, @@ -306,6 +314,17 @@ static const AVOption abuffer_options[] = { AVFILTER_DEFINE_CLASS(abuffer); +static const AVOption sbuffer_options[] = { + { "time_base", NULL, OFFSET(time_base), AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, 0, INT_MAX, S }, + { "format", NULL, OFFSET(sub_format), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 1, S, "format" }, + { "unspecified", NULL, 0, AV_OPT_TYPE_CONST, {.i64=-1}, INT_MIN, INT_MAX, S, "format" }, + { "bitmap", NULL, 0, AV_OPT_TYPE_CONST, {.i64=0}, INT_MIN, INT_MAX, S, "format" }, + { "text", NULL, 0, AV_OPT_TYPE_CONST, {.i64=1}, INT_MIN, INT_MAX, S, "format" }, + { NULL } +}; + +AVFILTER_DEFINE_CLASS(sbuffer); + static av_cold int init_audio(AVFilterContext *ctx) { BufferSourceContext *s = ctx->priv; @@ -397,6 +416,10 @@ static int query_formats(AVFilterContext *ctx) if ((ret = ff_set_common_channel_layouts(ctx, channel_layouts)) < 0) return ret; break; + case AVMEDIA_TYPE_SUBTITLE: + if ((ret = ff_add_format(&formats, c->sub_format)) < 0) + return ret; + break; default: return AVERROR(EINVAL); } @@ -424,6 +447,8 @@ static int config_props(AVFilterLink *link) if (!c->channel_layout) c->channel_layout = link->channel_layout; break; + case AVMEDIA_TYPE_SUBTITLE: + break; default: return AVERROR(EINVAL); } @@ -461,6 +486,15 @@ static int poll_frame(AVFilterLink *link) return size/sizeof(AVFrame*); } +static av_cold int init_subtitle(AVFilterContext *ctx) +{ + BufferSourceContext *c = ctx->priv; + + if (!(c->fifo = av_fifo_alloc(sizeof(AVFrame*)))) + return AVERROR(ENOMEM); + return 0; +} + static const AVFilterPad avfilter_vsrc_buffer_outputs[] = { { .name = "default", @@ -510,3 +544,28 @@ AVFilter ff_asrc_abuffer = { .outputs = avfilter_asrc_abuffer_outputs, .priv_class = &abuffer_class, }; + +static const AVFilterPad avfilter_ssrc_sbuffer_outputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_SUBTITLE, + .request_frame = request_frame, + .poll_frame = poll_frame, + .config_props = config_props, + }, + { NULL } +}; + +AVFilter ff_ssrc_sbuffer = { + .name = "sbuffer", + .description = NULL_IF_CONFIG_SMALL("Buffer subtitle frames, and make them accessible to the filterchain."), + .priv_size = sizeof(BufferSourceContext), + .query_formats = query_formats, + + .init = init_subtitle, + .uninit = uninit, + + .inputs = NULL, + .outputs = avfilter_ssrc_sbuffer_outputs, + .priv_class = &sbuffer_class, +}; diff --git a/libavfilter/fifo.c b/libavfilter/fifo.c index abfbba1..feda5bd 100644 --- a/libavfilter/fifo.c +++ b/libavfilter/fifo.c @@ -313,3 +313,34 @@ AVFilter ff_af_afifo = { .inputs = avfilter_af_afifo_inputs, .outputs = avfilter_af_afifo_outputs, }; + +static const AVFilterPad avfilter_sf_sfifo_inputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_SUBTITLE, + .filter_frame = add_to_queue, + }, + { NULL } +}; + +static const AVFilterPad avfilter_sf_sfifo_outputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_SUBTITLE, + .request_frame = request_frame, + }, + { NULL } +}; + +AVFilter ff_sf_sfifo = { + .name = "sfifo", + .description = NULL_IF_CONFIG_SMALL("Buffer input subtitles and send them when they are requested."), + + .init = init, + .uninit = uninit, + + .priv_size = sizeof(FifoContext), + + .inputs = avfilter_sf_sfifo_inputs, + .outputs = avfilter_sf_sfifo_outputs, +}; diff --git a/libavfilter/formats.c b/libavfilter/formats.c index 20f45e3..66bf138 100644 --- a/libavfilter/formats.c +++ b/libavfilter/formats.c @@ -364,6 +364,10 @@ AVFilterFormats *ff_all_formats(enum AVMediaType type) return NULL; fmt++; } + } else if (type == AVMEDIA_TYPE_SUBTITLE) { + if (ff_add_format(&ret, 0 /* bitmap */) < 0 || + ff_add_format(&ret, 1 /* text */) < 0) + return NULL; } return ret; diff --git a/libavfilter/internal.h b/libavfilter/internal.h index 3856012..cd8db73 100644 --- a/libavfilter/internal.h +++ b/libavfilter/internal.h @@ -142,6 +142,14 @@ struct AVFilterPad { * input pads only. */ int needs_writable; + + /** + * Callback function to get a subtitle frame. If NULL, the filter system + * will use ff_default_get_subtitle_buffer(). + * + * Input subtitle pads only. + */ + AVFrame *(*get_subtitle_buffer)(AVFilterLink *link, int nb_rects); }; struct AVFilterGraphInternal { diff --git a/libavfilter/sf_snull.c b/libavfilter/sf_snull.c new file mode 100644 index 0000000..c20afdd --- /dev/null +++ b/libavfilter/sf_snull.c @@ -0,0 +1,48 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * null subtitle filter + */ + +#include "avfilter.h" +#include "internal.h" + +static const AVFilterPad sf_snull_inputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_SUBTITLE, + }, + { NULL } +}; + +static const AVFilterPad sf_snull_outputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_SUBTITLE, + }, + { NULL } +}; + +AVFilter ff_sf_snull = { + .name = "snull", + .description = NULL_IF_CONFIG_SMALL("Pass the source unchanged to the output."), + .inputs = sf_snull_inputs, + .outputs = sf_snull_outputs, +}; diff --git a/libavfilter/subtitle.c b/libavfilter/subtitle.c new file mode 100644 index 0000000..88e43d5 --- /dev/null +++ b/libavfilter/subtitle.c @@ -0,0 +1,60 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/common.h" + +#include "subtitle.h" +#include "avfilter.h" +#include "internal.h" + +AVFrame *ff_null_get_subtitle_buffer(AVFilterLink *link, int nb_rects) +{ + return ff_get_subtitle_buffer(link->dst->outputs[0], nb_rects); +} + +AVFrame *ff_default_get_subtitle_buffer(AVFilterLink *link, int nb_rects) +{ + AVFrame *frame = av_frame_alloc(); + int ret; + + if (!frame) + return NULL; + + frame->sub_nb_rects = nb_rects; + frame->format = link->format; + ret = av_frame_get_buffer(frame, 0); + if (ret < 0) { + av_frame_free(&frame); + return NULL; + } + + return frame; +} + +AVFrame *ff_get_subtitle_buffer(AVFilterLink *link, int nb_samples) +{ + AVFrame *ret = NULL; + + if (link->dstpad->get_subtitle_buffer) + ret = link->dstpad->get_audio_buffer(link, nb_samples); + + if (!ret) + ret = ff_default_get_audio_buffer(link, nb_samples); + + return ret; +} diff --git a/libavfilter/subtitle.h b/libavfilter/subtitle.h new file mode 100644 index 0000000..1db9ebe --- /dev/null +++ b/libavfilter/subtitle.h @@ -0,0 +1,31 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVFILTER_SUBTITLE_H +#define AVFILTER_SUBTITLE_H + +#include "avfilter.h" +#include "internal.h" + +AVFrame *ff_default_get_subtitle_buffer(AVFilterLink *link, int nb_rects); + +AVFrame *ff_null_get_subtitle_buffer(AVFilterLink *link, int nb_rects); + +AVFrame *ff_get_subtitle_buffer(AVFilterLink *link, int nb_rects); + +#endif /* AVFILTER_SUBTITLE_H */ diff --git a/libavfilter/vf_subtitles.c b/libavfilter/vf_subtitles.c index 0f22644..20b3c77 100644 --- a/libavfilter/vf_subtitles.c +++ b/libavfilter/vf_subtitles.c @@ -305,6 +305,10 @@ static av_cold int init_subtitles(AVFilterContext *ctx) AVStream *st; AVPacket pkt; AssContext *ass = ctx->priv; + AVFrame *frame = av_frame_alloc(); + + if (!frame) + return AVERROR(ENOMEM); /* Init libass */ ret = init(ctx); @@ -447,32 +451,32 @@ static av_cold int init_subtitles(AVFilterContext *ctx) pkt.data = NULL; pkt.size = 0; while (av_read_frame(fmt, &pkt) >= 0) { - int i, got_subtitle; - AVSubtitle sub = {0}; - if (pkt.stream_index == sid) { - ret = avcodec_decode_subtitle2(dec_ctx, &sub, &got_subtitle, &pkt); - if (ret < 0) { - av_log(ctx, AV_LOG_WARNING, "Error decoding: %s (ignored)\n", - av_err2str(ret)); - } else if (got_subtitle) { - const int64_t start_time = av_rescale_q(sub.pts, AV_TIME_BASE_Q, av_make_q(1, 1000)); - const int64_t duration = sub.end_display_time; - for (i = 0; i < sub.num_rects; i++) { - char *ass_line = sub.rects[i]->ass; + int i; + + ret = avcodec_send_packet(dec_ctx, &pkt); + if (ret < 0) + break; + // XXX probably not correct api usage + ret = avcodec_receive_frame(dec_ctx, frame); + if (ret >= 0) { + // XXX: honor start/end display time + const int64_t start_time = av_rescale_q(frame->pts, st->time_base, av_make_q(1, 1000)); + const int64_t duration = av_rescale_q(frame->pkt_duration, st->time_base, av_make_q(1, 1000)); + + for (i = 0; i < frame->sub_nb_rects; i++) { + AVSubtitleRect *rect = (AVSubtitleRect *)frame->extended_data[i]; // XXX move to lavu + char *ass_line = rect->ass; if (!ass_line) break; - if (LIBAVCODEC_VERSION_INT < AV_VERSION_INT(57,25,100)) - ass_process_data(ass->track, ass_line, strlen(ass_line)); - else - ass_process_chunk(ass->track, ass_line, strlen(ass_line), - start_time, duration); + ass_process_chunk(ass->track, ass_line, strlen(ass_line), + start_time, duration); } } + av_packet_unref(&pkt); } - av_packet_unref(&pkt); - avsubtitle_free(&sub); } + av_frame_free(&frame); end: av_dict_free(&codec_opts); diff --git a/libavformat/assenc.c b/libavformat/assenc.c index d50f18f..97a8de8 100644 --- a/libavformat/assenc.c +++ b/libavformat/assenc.c @@ -164,6 +164,9 @@ static int write_packet(AVFormatContext *s, AVPacket *pkt) int hh2, mm2, ss2, ms2; DialogueLine *dialogue = av_mallocz(sizeof(*dialogue)); + if (pkt->duration < 0) + end = INT64_MAX; + if (!dialogue) return AVERROR(ENOMEM); diff --git a/libavutil/frame.c b/libavutil/frame.c index 53e6174..8bb3ed2 100644 --- a/libavutil/frame.c +++ b/libavutil/frame.c @@ -216,52 +216,33 @@ fail: return AVERROR(ENOMEM); } -static int get_audio_buffer(AVFrame *frame, int align) +static int get_data_buffer(AVFrame *frame, int n, size_t size) { - int channels; - int planar = av_sample_fmt_is_planar(frame->format); - int planes; - int ret, i; - - if (!frame->channels) - frame->channels = av_get_channel_layout_nb_channels(frame->channel_layout); - - channels = frame->channels; - planes = planar ? channels : 1; - - CHECK_CHANNELS_CONSISTENCY(frame); - if (!frame->linesize[0]) { - ret = av_samples_get_buffer_size(&frame->linesize[0], channels, - frame->nb_samples, frame->format, - align); - if (ret < 0) - return ret; - } + int i; - if (planes > AV_NUM_DATA_POINTERS) { - frame->extended_data = av_mallocz_array(planes, - sizeof(*frame->extended_data)); - frame->extended_buf = av_mallocz_array((planes - AV_NUM_DATA_POINTERS), - sizeof(*frame->extended_buf)); + if (n > AV_NUM_DATA_POINTERS) { + frame->extended_data = av_mallocz_array(n, sizeof(*frame->extended_data)); + frame->extended_buf = av_mallocz_array(n - AV_NUM_DATA_POINTERS, + sizeof(*frame->extended_buf)); if (!frame->extended_data || !frame->extended_buf) { av_freep(&frame->extended_data); av_freep(&frame->extended_buf); return AVERROR(ENOMEM); } - frame->nb_extended_buf = planes - AV_NUM_DATA_POINTERS; + frame->nb_extended_buf = n - AV_NUM_DATA_POINTERS; } else frame->extended_data = frame->data; - for (i = 0; i < FFMIN(planes, AV_NUM_DATA_POINTERS); i++) { - frame->buf[i] = av_buffer_alloc(frame->linesize[0]); + for (i = 0; i < FFMIN(n, AV_NUM_DATA_POINTERS); i++) { + frame->buf[i] = av_buffer_alloc(size); if (!frame->buf[i]) { av_frame_unref(frame); return AVERROR(ENOMEM); } frame->extended_data[i] = frame->data[i] = frame->buf[i]->data; } - for (i = 0; i < planes - AV_NUM_DATA_POINTERS; i++) { - frame->extended_buf[i] = av_buffer_alloc(frame->linesize[0]); + for (i = 0; i < n - AV_NUM_DATA_POINTERS; i++) { + frame->extended_buf[i] = av_buffer_alloc(size); if (!frame->extended_buf[i]) { av_frame_unref(frame); return AVERROR(ENOMEM); @@ -269,11 +250,50 @@ static int get_audio_buffer(AVFrame *frame, int align) frame->extended_data[i + AV_NUM_DATA_POINTERS] = frame->extended_buf[i]->data; } return 0; +} + +static int get_audio_buffer(AVFrame *frame, int align) +{ + int channels; + int planar = av_sample_fmt_is_planar(frame->format); + int planes; + int ret; + + if (!frame->channels) + frame->channels = av_get_channel_layout_nb_channels(frame->channel_layout); + channels = frame->channels; + planes = planar ? channels : 1; + + CHECK_CHANNELS_CONSISTENCY(frame); + if (!frame->linesize[0]) { + ret = av_samples_get_buffer_size(&frame->linesize[0], channels, + frame->nb_samples, frame->format, + align); + if (ret < 0) + return ret; + } + + return get_data_buffer(frame, planes, frame->linesize[0]); +} + +static int get_subtitle_buffer(AVFrame *frame) +{ + int i, ret; + + ret = get_data_buffer(frame, frame->sub_nb_rects, sizeof(AVFrameSubtitleRectangle)); + if (ret < 0) + return ret; + for (i = 0; i < frame->sub_nb_rects; i++) + memset(frame->extended_data[i], 0, sizeof(AVFrameSubtitleRectangle)); + return 0; } int av_frame_get_buffer(AVFrame *frame, int align) { + if (frame->sub_nb_rects) + return get_subtitle_buffer(frame); + if (frame->format < 0) return AVERROR(EINVAL); @@ -320,6 +340,8 @@ FF_ENABLE_DEPRECATION_WARNINGS dst->colorspace = src->colorspace; dst->color_range = src->color_range; dst->chroma_location = src->chroma_location; + dst->sub_start_display = src->sub_start_display; + dst->sub_end_display = src->sub_end_display; av_dict_copy(&dst->metadata, src->metadata, 0); @@ -561,6 +583,7 @@ int av_frame_make_writable(AVFrame *frame) tmp.channels = frame->channels; tmp.channel_layout = frame->channel_layout; tmp.nb_samples = frame->nb_samples; + tmp.sub_nb_rects = frame->sub_nb_rects; ret = av_frame_get_buffer(&tmp, 32); if (ret < 0) return ret; @@ -716,11 +739,57 @@ static int frame_copy_audio(AVFrame *dst, const AVFrame *src) return 0; } +static int frame_copy_subtitle(AVFrame *dst, const AVFrame *src) +{ + int i, ret; + + for (i = 0; i < dst->sub_nb_rects; i++) { + const AVFrameSubtitleRectangle *src_rect = (const AVFrameSubtitleRectangle *)src->extended_data[i]; + AVFrameSubtitleRectangle *dst_rect = (AVFrameSubtitleRectangle *)dst->extended_data[i]; + + memset(dst_rect, 0, sizeof(*dst_rect)); + + dst_rect->flags = src_rect->flags; + + if (src->format != AV_PIX_FMT_NONE) { + /* (Alloc and) copy bitmap subtitle */ + + av_assert1(!src_rect->text); + + dst_rect->x = src_rect->x; + dst_rect->y = src_rect->y; + dst_rect->w = src_rect->w; + dst_rect->h = src_rect->h; + + ret = av_image_alloc(dst_rect->data, dst_rect->linesize, + dst_rect->w, dst_rect->h, + dst->format, 0); + if (ret < 0) + return ret; + av_image_copy(dst_rect->data, dst_rect->linesize, + src_rect->data, src_rect->linesize, + dst->format, dst_rect->w, dst_rect->h); + } else { + /* Copy text subtitle */ + + av_assert1(!src_rect->x && !src_rect->y); + av_assert1(!src_rect->w && !src_rect->w); + + dst_rect->text = av_strdup(src_rect->text); + if (!dst_rect->text) + return AVERROR(ENOMEM); + } + } + return 0; +} + int av_frame_copy(AVFrame *dst, const AVFrame *src) { - if (dst->format != src->format || dst->format < 0) + if (dst->format != src->format || (dst->format < 0 && !dst->sub_nb_rects)) return AVERROR(EINVAL); + if (dst->sub_nb_rects) + return frame_copy_subtitle(dst, src); if (dst->width > 0 && dst->height > 0) return frame_copy_video(dst, src); else if (dst->nb_samples > 0 && dst->channel_layout) diff --git a/libavutil/frame.h b/libavutil/frame.h index 8e51361..d531cfa 100644 --- a/libavutil/frame.h +++ b/libavutil/frame.h @@ -148,6 +148,25 @@ typedef struct AVFrameSideData { AVBufferRef *buf; } AVFrameSideData; +#define AV_NUM_DATA_POINTERS 8 + +/** + * This structure describes decoded subtitle rectangle + */ +typedef struct AVFrameSubtitleRectangle { + int x, y; + int w, h; + + /* image data for bitmap subtitles, in AVFrame.format (AVPixelFormat) */ + uint8_t *data[AV_NUM_DATA_POINTERS]; + int linesize[AV_NUM_DATA_POINTERS]; + + /* decoded text for text subtitles, in ASS */ + char *text; + + int flags; +} AVFrameSubtitleRectangle; + /** * This structure describes decoded (raw) audio or video data. * @@ -182,7 +201,6 @@ typedef struct AVFrameSideData { * for AVFrame can be obtained from avcodec_get_frame_class() */ typedef struct AVFrame { -#define AV_NUM_DATA_POINTERS 8 /** * pointer to the picture/channel planes. * This might be different from the first allocated byte @@ -224,9 +242,12 @@ typedef struct AVFrame { * For packed audio, there is just one data pointer, and linesize[0] * contains the total size of the buffer for all channels. * + * For subtitles, each pointer corresponds to an AVFrameSubtitleRectangle. + * * Note: Both data and extended_data should always be set in a valid frame, - * but for planar audio with more channels that can fit in data, - * extended_data must be used in order to access all channels. + * but for subtitles and planar audio with more channels that can fit in + * data, extended_data must be used in order to access all audio channels + * or subtitles rectangles. */ uint8_t **extended_data; @@ -359,10 +380,10 @@ typedef struct AVFrame { * also be non-NULL for all j < i. * * There may be at most one AVBuffer per data plane, so for video this array - * always contains all the references. For planar audio with more than - * AV_NUM_DATA_POINTERS channels, there may be more buffers than can fit in - * this array. Then the extra AVBufferRef pointers are stored in the - * extended_buf array. + * always contains all the references. For planar audio or subtitles with + * more than AV_NUM_DATA_POINTERS channels or rectangles, there may be more + * buffers than can fit in this array. Then the extra AVBufferRef pointers + * are stored in the extended_buf array. */ AVBufferRef *buf[AV_NUM_DATA_POINTERS]; @@ -532,6 +553,29 @@ typedef struct AVFrame { * AVHWFramesContext describing the frame. */ AVBufferRef *hw_frames_ctx; + + /** + * Number of rectangles available in extended_data + * - encoding: set by user (XXX: no encoding yet) + * - decoding: set by libavcodec, read by user + */ + int sub_nb_rects; + + /** + * Subtitle start display time in AV_TIME_BASE, relative to pts + * XXX: semantic for <=0? + * - encoding: unused + * - decoding: set by libavcodec, read by user. + */ + int64_t sub_start_display; + + /** + * Subtitle end display time in AV_TIME_BASE, relative to pts + * XXX: semantic for <=0? + * - encoding: unused + * - decoding: set by libavcodec, read by user. + */ + int64_t sub_end_display; } AVFrame; /** @@ -641,6 +685,7 @@ void av_frame_move_ref(AVFrame *dst, AVFrame *src); * - format (pixel format for video, sample format for audio) * - width and height for video * - nb_samples and channel_layout for audio + * - sub_nb_rects for subtitle * * This function will fill AVFrame.data and AVFrame.buf arrays and, if * necessary, allocate and fill AVFrame.extended_data and AVFrame.extended_buf. diff --git a/tests/ref/fate/sub-charenc b/tests/ref/fate/sub-charenc index a056cd1..050807b 100644 --- a/tests/ref/fate/sub-charenc +++ b/tests/ref/fate/sub-charenc @@ -23,7 +23,7 @@ Dialogue: 0,0:02:31.61,0:02:33.90,Default,,0,0,0,,- А ти как си?\N- До Dialogue: 0,0:02:40.16,0:02:42.79,Default,,0,0,0,,Монахът Дзенг каза,\Nче си в планината Удан. Dialogue: 0,0:02:43.04,0:02:46.54,Default,,0,0,0,,Каза, че практикуваш\Nдълбока медитация. Dialogue: 0,0:02:48.84,0:02:50.68,Default,,0,0,0,,Сигурно в планината\Nе много спокойно. -Dialogue: 0,0:02:51.25,0:02:53.46,Default,,0,0,0,,Завиждам ти. +Dialogue: 0,0:02:51.26,0:02:53.47,Default,,0,0,0,,Завиждам ти. Dialogue: 0,0:02:53.67,0:02:58.34,Default,,0,0,0,,Имам толкова много работа,\Nпочти не ми остава\Nвреме за почивка. Dialogue: 0,0:03:00.26,0:03:03.89,Default,,0,0,0,,Оставих обучението рано. Dialogue: 0,0:03:05.69,0:03:11.28,Default,,0,0,0,,Защо? Ти си боец на Удан.\NОбучението е всичко. @@ -40,7 +40,7 @@ Dialogue: 0,0:03:53.40,0:03:56.57,Default,,0,0,0,,Не можах да издъ Dialogue: 0,0:03:57.49,0:03:59.74,Default,,0,0,0,,Прекъснах медитацията си. Dialogue: 0,0:03:59.95,0:04:02.24,Default,,0,0,0,,Не можах да продължа. Dialogue: 0,0:04:03.20,0:04:07.79,Default,,0,0,0,,Нещо...\Nме дърпаше назад. -Dialogue: 0,0:04:09.62,0:04:10.91,Default,,0,0,0,,Какво беше? +Dialogue: 0,0:04:09.63,0:04:10.92,Default,,0,0,0,,Какво беше? Dialogue: 0,0:04:15.46,0:04:18.00,Default,,0,0,0,,Нещо, от което не\Nмога да се освободя. Dialogue: 0,0:04:23.39,0:04:24.68,Default,,0,0,0,,Скоро ли ще тръгваш? Dialogue: 0,0:04:26.77,0:04:30.27,Default,,0,0,0,,Подготвяме охрана\Nза една доставка... @@ -52,7 +52,7 @@ Dialogue: 0,0:04:48.37,0:04:52.67,Default,,0,0,0,,Да. Той винаги е Dialogue: 0,0:04:52.88,0:04:56.55,Default,,0,0,0,,Не разбирам.\NКак можеш да се разделиш с него? Dialogue: 0,0:04:56.76,0:04:59.93,Default,,0,0,0,,Той винаги е бил с теб. Dialogue: 0,0:05:01.18,0:05:05.52,Default,,0,0,0,,Твърде много хора са\Nзагинали от това острие. -Dialogue: 0,0:05:09.68,0:05:14.52,Default,,0,0,0,,Чисто е единствено защото\Nкръвта се отмива лесно. +Dialogue: 0,0:05:09.69,0:05:14.53,Default,,0,0,0,,Чисто е единствено защото\Nкръвта се отмива лесно. Dialogue: 0,0:05:15.40,0:05:20.61,Default,,0,0,0,,Ти го използваш справедливо.\NДостоен си за него. Dialogue: 0,0:05:23.66,0:05:27.37,Default,,0,0,0,,Дойде време\Nда го оставя. Dialogue: 0,0:05:27.58,0:05:31.21,Default,,0,0,0,,Е, какво ще правиш\Nот сега нататък?