Message ID | DM8P223MB03658B8D2E1841A6D1D76552BA6C9@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM |
---|---|
State | Superseded, archived |
Headers | show |
Series | [FFmpeg-devel,v20,01/20] avcodec, avutil: Move enum AVSubtitleType to avutil, add new and deprecate old values | expand |
Context | Check | Description |
---|---|---|
andriy/configurex86 | warning | Failed to apply patch |
andriy/configureppc | warning | Failed to apply patch |
5 Dec 2021, 17:23 by softworkz@hotmail.com: > @@ -491,6 +499,39 @@ typedef struct AVFrame { > */ > uint64_t channel_layout; > > + /** > + * Display start time, relative to packet pts, in ms. > + */ > + uint32_t subtitle_start_time; > + > + /** > + * Display end time, relative to packet pts, in ms. > + */ > + uint32_t subtitle_end_time; > Milliseconds? Our entire API's based around timestamps with time bases. Plus, we all know what happened when Matroska settled onto milliseconds and ruined a perfectly complex but good container. Make this relative to the PTS field, with the same timebase as the PTS field. There's even a new AVFrame->time_base field for you to set so you wouldn't forget it. > + /** > + * Number of items in the @ref subtitle_areas array. > + */ > + unsigned num_subtitle_areas; > + > + /** > + * Array of subtitle areas, may be empty. > + */ > + AVSubtitleArea **subtitle_areas; > There's no reason why this cannot be handled using the buffer and data fields. If you need more space there, you're free to bump the maximum number of pointers, plus this removes the horrid malloc(1) hack. We've discussed this, and I couldn't follow why this was bad in the email discussion you linked. > + /** > + * Usually the same as packet pts, in AV_TIME_BASE. > + * > + * @deprecated This is kept for compatibility reasons and corresponds to > + * AVSubtitle->pts. Might be removed in the future. > + */ > + int64_t subtitle_pts; > I'm not going to accept a field which is instantly deprecated. As we've discussed multiple times, please merge this into the regular frame PTS field. We already have _2_ necessary stard/end fields. > + /** > + * Header containing style information for text subtitles. > + */ > + AVBufferRef *subtitle_header; > Good, I like this.
On Sun, Dec 5, 2021 at 5:40 PM Lynne <dev@lynne.ee> wrote: > > + /** > > + * Usually the same as packet pts, in AV_TIME_BASE. > > + * > > + * @deprecated This is kept for compatibility reasons and corresponds to > > + * AVSubtitle->pts. Might be removed in the future. > > + */ > > + int64_t subtitle_pts; > > > > I'm not going to accept a field which is instantly deprecated. > As we've discussed multiple times, please merge this into > the regular frame PTS field. We already have _2_ necessary > stard/end fields. > I agree with this entirely. Even ignoring the fact that adding a new field thats deprecated is instantly a disqualification, AVSubtitle had one pts field, AVFrame already has one pts field - both are even documented to have the same semantic. They should just contain the exact same data, thats how you achieve compatibility, not by claiming you need a new field for compatibility reasons. - Hendrik
> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Lynne > Sent: Sunday, December 5, 2021 5:40 PM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame > for subtitle handling > > 5 Dec 2021, 17:23 by softworkz@hotmail.com: > > > @@ -491,6 +499,39 @@ typedef struct AVFrame { > > */ > > uint64_t channel_layout; > > > > + /** > > + * Display start time, relative to packet pts, in ms. > > + */ > > + uint32_t subtitle_start_time; > > + > > + /** > > + * Display end time, relative to packet pts, in ms. > > + */ > > + uint32_t subtitle_end_time; > > > > Milliseconds? Our entire API's based around timestamps > with time bases. Plus, we all know what happened when > Matroska settled onto milliseconds and ruined a perfectly > complex but good container. > Make this relative to the PTS field, with the same timebase > as the PTS field. > There's even a new AVFrame->time_base field for you to > set so you wouldn't forget it. The internal format for text subtitles is ASS, and this is using a timebase of milliseconds. All existing decoders and encoders are using this and I'm afraid, but I will not go and change them all. > > + /** > > + * Number of items in the @ref subtitle_areas array. > > + */ > > + unsigned num_subtitle_areas; > > + > > + /** > > + * Array of subtitle areas, may be empty. > > + */ > > + AVSubtitleArea **subtitle_areas; > > > > There's no reason why this cannot be handled using the buffer > and data fields. If you need more space there, you're free to bump > the maximum number of pointers, plus this removes the horrid > malloc(1) hack. We've discussed this, and I couldn't follow why > this was bad in the email discussion you linked. There are reasons. Some of those had I mentioned in an earlier discussion with Hendrik. The effort alone to relate the buffers to subtitle areas (which area 'owns' which buffer) is not reasonable. Too many cases to consider, what's when there are 3 areas and the second area doesn't have any buffer? The convention is that the buffer should be used contiguously. Managing those relations is error-prone and would require a lot of code. > > + /** > > + * Usually the same as packet pts, in AV_TIME_BASE. > > + * > > + * @deprecated This is kept for compatibility reasons and corresponds > to > > + * AVSubtitle->pts. Might be removed in the future. > > + */ > > + int64_t subtitle_pts; > > > > I'm not going to accept a field which is instantly deprecated. > As we've discussed multiple times, please merge this into > the regular frame PTS field. We already have _2_ necessary > stard/end fields. -- > I agree with this entirely. Even ignoring the fact that adding a new > field thats deprecated is instantly a disqualification, AVSubtitle had > one pts field, AVFrame already has one pts field - both are even > documented to have the same semantic. They should just contain the > exact same data, thats how you achieve compatibility, not by claiming > you need a new field for compatibility reasons. > > - Hendrik I think the mistake is to declare subtitle_pts as deprecated. I had added the deprecation at a very early point in time where I had still thought that it can be eliminated. Even though we are driving subtitle data through the graph attached to AVFrame, the behavior involved is very different from audio and video frames. Actually there's not one but many different ways how subtitle data can appear in a source and travel through a filtergraph: - Sometimes, subtitle events are muxed into a stream many seconds ahead of display time. In this case, AVFrame.pts is the mux position and AVFrame.subtitle_pts is the actual presentation time. When filtering subtitles to modify something, it would be still desired to retain the offset between mux time and display start - Sometimes, subtitle events are occurring in the mux "live" - right in the moment when they are meant to be shown. An example for this are closed captions, and when extracting those via the new splitcc filter, the subtitle_pts is equal to the frame.pts value. But CC events do not come regularly, while downstream filters might expect exactly that in order to proceed. That's why the splitcc filter can emit subtitle frames at a certain framerate. It does so by re- sending the most recent subtitle frame - basically not much different than the main heatbeat mechanism (which cannot be used here because the stream has it's origin inside the filtegraph (secondary output of splitcc). Now when splitcc sends such a repeated frame, it needs to adjust the frame's pts in order to match the configured output framerate. But that's for keeping the graph running, the actual display start time (subtitle_pts) must not be changed. It depends then on the downstream filters how to handle those frames. For example, a filter could detect (based on subtitle_pts) repeated subtitle events. Then it can skip processing and use a previous cached result (some of the filter are actually doing that). - I already mentioned the subtitle_heartbeat mechanism that I have kept. The same applies here: it needs to resend frames to retain a certain frequency of frames and for this purpose it needs to send duplicate frames. Those frames need a new pts, but subtitle_pts must remain at its original value - There exist extreme cases, like where the complete set of ASS subtitles is muxed at the beginning of the stream with pkt-pts values like 1,2,3,.. The pkt-pts is what subtitle frames get as AVFrame.pts while the actual display start-time it in subtitle_pts. The desire way to handle such situations surely varies from case to case. Having those values separate is an important key for keeping a wide range of options open to handle this - for current and future subtitle filters. My patchset offers a really wide range of possibilities for subtitle filtering. Those are real-world production scenarios that are working. This is not based on just a few command line experiments. Not every scenario that would be desirable is working. With subtitles, there are a lot of different cases for which I have given just a few examples above, and certain scenarios will need some work to get it working by some tweaking, adding a new filter or adding additional options to existing filters (like you could see in the version history). My experience from doing just that has made me realize one important point: with the proposed architecture, it's possible to get almost any desired scenario working (what might not work already). It can all be done with filters and the next change I'd want to do after merge is to move the heartbeat functionality into a filter, so it allows to have much better control about its functionality (and the possibility to choose at which point in the graph heartbeat should originate). Also it will make sense to have a counterpart filter to this, which eliminates duplicated subtitle frames. Having both, pts and subtitle_pts is an important element for all these things to work. If it would be merged into a single value, probably half of the example scenarios of my patchset would no longer work. Kind regards, softworkz
> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Hendrik > Leppkes > Sent: Sunday, December 5, 2021 5:45 PM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame > for subtitle handling > > On Sun, Dec 5, 2021 at 5:40 PM Lynne <dev@lynne.ee> wrote: > > > + /** > > > + * Usually the same as packet pts, in AV_TIME_BASE. > > > + * > > > + * @deprecated This is kept for compatibility reasons and > corresponds to > > > + * AVSubtitle->pts. Might be removed in the future. > > > + */ > > > + int64_t subtitle_pts; > > > > > > > I'm not going to accept a field which is instantly deprecated. > > As we've discussed multiple times, please merge this into > > the regular frame PTS field. We already have _2_ necessary > > stard/end fields. > > > > I agree with this entirely. Even ignoring the fact that adding a new > field thats deprecated is instantly a disqualification, AVSubtitle had > one pts field, AVFrame already has one pts field The short answer to why it has worked before with a single pts field is: Because there was no filtering. Kind regards, softworkz
> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Lynne > Sent: Sunday, December 5, 2021 5:40 PM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame > for subtitle handling > > 5 Dec 2021, 17:23 by softworkz@hotmail.com: > > > @@ -491,6 +499,39 @@ typedef struct AVFrame { > > */ > > uint64_t channel_layout; > > > > + /** > > + * Display start time, relative to packet pts, in ms. > > + */ > > + uint32_t subtitle_start_time; > > + > > + /** > > + * Display end time, relative to packet pts, in ms. > > + */ > > + uint32_t subtitle_end_time; > > > > Milliseconds? Our entire API's based around timestamps > with time bases. Plus, we all know what happened when > Matroska settled onto milliseconds and ruined a perfectly > complex but good container. > Make this relative to the PTS field, with the same timebase > as the PTS field. > There's even a new AVFrame->time_base field for you to > set so you wouldn't forget it. > > > > + /** > > + * Number of items in the @ref subtitle_areas array. > > + */ > > + unsigned num_subtitle_areas; > > + > > + /** > > + * Array of subtitle areas, may be empty. > > + */ > > + AVSubtitleArea **subtitle_areas; > > > > There's no reason why this cannot be handled using the buffer > and data fields. If you need more space there, you're free to bump > the maximum number of pointers, plus this removes the horrid > malloc(1) hack. We've discussed this, and I couldn't follow why > this was bad in the email discussion you linked. Hi Lynne, let me add another note on these two topics. One of the reasons why this subtitle filtering patchset has proceeded rather smoothly so far, causing only a small amount of regressions that were easy to find and fix, is that I have kept as much as possible from the current AVSubtitle implementation, projecting it to the new frame-based subtitle implementation, while keeping it mostly compatible with the current logic. Interestingly, an earlier attempt towards subtitle filtering has gotten stuck right at the point about storing subtitle data in AVFrame buffers. I decided to focus on functionality rather than academic kind of details. What has been in place and working for such a long time can't be so wrong that it can't be accepted. We also need to consider that it's not my patchset that is deliberately deciding to store data in pointers associate with rects/area and to store start- and end-times in milliseconds: it's all the decoders and encoders that are doing this. I'm just adapting to the status quo. These two things - changing the timebase of the ms-fields and changing the location of the buffer pointers - is a gigantic task that would require to change everything that this patchset covers and implements - plus even more code that this patchset hasn't even touched. It would require changes to everything everywhere where subtitles are concerned. It is a task that is risky as hell, will cause plenty of regressions under guarantee, require extensive testing and might even drive this whole endeavor into a dead end. As I said before, please understand that I'm not ready to enter that path. Thanks, softworkz
5 Dec 2021, 20:21 by softworkz@hotmail.com: > > >> -----Original Message----- >> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Lynne >> Sent: Sunday, December 5, 2021 5:40 PM >> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> >> Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame >> for subtitle handling >> >> 5 Dec 2021, 17:23 by softworkz@hotmail.com: >> >> > @@ -491,6 +499,39 @@ typedef struct AVFrame { >> > */ >> > uint64_t channel_layout; >> > >> > + /** >> > + * Display start time, relative to packet pts, in ms. >> > + */ >> > + uint32_t subtitle_start_time; >> > + >> > + /** >> > + * Display end time, relative to packet pts, in ms. >> > + */ >> > + uint32_t subtitle_end_time; >> > >> >> Milliseconds? Our entire API's based around timestamps >> with time bases. Plus, we all know what happened when >> Matroska settled onto milliseconds and ruined a perfectly >> complex but good container. >> Make this relative to the PTS field, with the same timebase >> as the PTS field. >> There's even a new AVFrame->time_base field for you to >> set so you wouldn't forget it. >> >> >> > + /** >> > + * Number of items in the @ref subtitle_areas array. >> > + */ >> > + unsigned num_subtitle_areas; >> > + >> > + /** >> > + * Array of subtitle areas, may be empty. >> > + */ >> > + AVSubtitleArea **subtitle_areas; >> > >> >> There's no reason why this cannot be handled using the buffer >> and data fields. If you need more space there, you're free to bump >> the maximum number of pointers, plus this removes the horrid >> malloc(1) hack. We've discussed this, and I couldn't follow why >> this was bad in the email discussion you linked. >> > > > Hi Lynne, > > let me add another note on these two topics. > > One of the reasons why this subtitle filtering patchset has proceeded > rather smoothly so far, causing only a small amount of regressions > that were easy to find and fix, is that I have kept as much as possible > from the current AVSubtitle implementation, projecting it to the > new frame-based subtitle implementation, while keeping it mostly > compatible with the current logic. > Interestingly, an earlier attempt towards subtitle filtering has > gotten stuck right at the point about storing subtitle data in > AVFrame buffers. I decided to focus on functionality rather than > academic kind of details. What has been in place and working > for such a long time can't be so wrong that it can't be accepted. > > We also need to consider that it's not my patchset that is deliberately > deciding to store data in pointers associate with rects/area and > to store start- and end-times in milliseconds: it's all the decoders > and encoders that are doing this. I'm just adapting to the status > quo. > > These two things - changing the timebase of the ms-fields and > changing the location of the buffer pointers - is a gigantic task > that would require to change everything that this patchset covers > and implements - plus even more code that this patchset hasn't even > touched. It would require changes to everything everywhere where > subtitles are concerned. > It is a task that is risky as hell, will cause plenty of > regressions under guarantee, require extensive testing and might even > drive this whole endeavor into a dead end. > > As I said before, please understand that I'm not ready to enter that > path. > You don't have to break anything to make the start/end times proper timestamps. We have timebases for that! If you'd like your timestamps to have millisecond precision, you use a millisecond timebase! So what if we use ASS internally which rounds timestamps? We have bitmap formats too, and those are generally following the video timestamps. You absolutely don't want the timestamps being rounded in that case, or you risk having your subtitles be a frame late or a frame early. You could add a separate timebase for the subtitle start/end fields, in case the packet's timebase doesn't match the subtitle format's native timebase. I won't object to that. You also don't have to risk that much by removing the subtitle_pts field. Once the start/end times are converted to proper timestamps, then the AVFrame->pts field can do whatever it wants to, whether it mirrors the demuxer's timestamp, be constant, non-existent or even go backwards. If you just want to have an empty subtitle rect, you can simply zero the subtitle struct's data and buffer fields. Calling these issues too academic when the frame API is so very straightforward and cut-and-paste is like calling computers too academic when you have several thousand large numbers to add. I'd really rather have a subtitle API that's usable, maintainable and easy to extend, even if it's momentarily broken for certain corner cases and weird files. We can easily fix broken files one by one. We've got plenty of heuristics, and have mostly abstracted and documented them well. We can add more. We cannot remove cruft from an API that's designed to be crufty, however, because the cruft is the only thing holding it together.
On Sun, Dec 5, 2021 at 6:58 PM Soft Works <softworkz@hotmail.com> wrote: > > > > > -----Original Message----- > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Lynne > > Sent: Sunday, December 5, 2021 5:40 PM > > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame > > for subtitle handling > > > > 5 Dec 2021, 17:23 by softworkz@hotmail.com: > > > > > @@ -491,6 +499,39 @@ typedef struct AVFrame { > > > */ > > > uint64_t channel_layout; > > > > > > + /** > > > + * Display start time, relative to packet pts, in ms. > > > + */ > > > + uint32_t subtitle_start_time; > > > + > > > + /** > > > + * Display end time, relative to packet pts, in ms. > > > + */ > > > + uint32_t subtitle_end_time; > > > > > > > Milliseconds? Our entire API's based around timestamps > > with time bases. Plus, we all know what happened when > > Matroska settled onto milliseconds and ruined a perfectly > > complex but good container. > > Make this relative to the PTS field, with the same timebase > > as the PTS field. > > There's even a new AVFrame->time_base field for you to > > set so you wouldn't forget it. > > The internal format for text subtitles is ASS, and this is > using a timebase of milliseconds. > > All existing decoders and encoders are using this and I'm > afraid, but I will not go and change them all. > > > > + /** > > > + * Number of items in the @ref subtitle_areas array. > > > + */ > > > + unsigned num_subtitle_areas; > > > + > > > + /** > > > + * Array of subtitle areas, may be empty. > > > + */ > > > + AVSubtitleArea **subtitle_areas; > > > > > > > There's no reason why this cannot be handled using the buffer > > and data fields. If you need more space there, you're free to bump > > the maximum number of pointers, plus this removes the horrid > > malloc(1) hack. We've discussed this, and I couldn't follow why > > this was bad in the email discussion you linked. > > There are reasons. Some of those had I mentioned in an earlier > discussion with Hendrik. > The effort alone to relate the buffers to subtitle areas (which area > 'owns' which buffer) is not reasonable. Too many cases to consider, > what's when there are 3 areas and the second area doesn't have any > buffer? The convention is that the buffer should be used contiguously. > Managing those relations is error-prone and would require a lot of code. > > > > + /** > > > + * Usually the same as packet pts, in AV_TIME_BASE. > > > + * > > > + * @deprecated This is kept for compatibility reasons and corresponds > > to > > > + * AVSubtitle->pts. Might be removed in the future. > > > + */ > > > + int64_t subtitle_pts; > > > > > > > I'm not going to accept a field which is instantly deprecated. > > As we've discussed multiple times, please merge this into > > the regular frame PTS field. We already have _2_ necessary > > stard/end fields. > > -- > > > I agree with this entirely. Even ignoring the fact that adding a new > > field thats deprecated is instantly a disqualification, AVSubtitle had > > one pts field, AVFrame already has one pts field - both are even > > documented to have the same semantic. They should just contain the > > exact same data, thats how you achieve compatibility, not by claiming > > you need a new field for compatibility reasons. > > > > - Hendrik > > I think the mistake is to declare subtitle_pts as deprecated. I had > added the deprecation at a very early point in time where I had still > thought that it can be eliminated. > > Even though we are driving subtitle data through the graph attached > to AVFrame, the behavior involved is very different from audio and > video frames. Actually there's not one but many different ways how > subtitle data can appear in a source and travel through a filtergraph: > > - Sometimes, subtitle events are muxed into a stream many seconds > ahead of display time. In this case, AVFrame.pts is the mux position > and AVFrame.subtitle_pts is the actual presentation time. > When filtering subtitles to modify something, it would be still desired > to retain the offset between mux time and display start > This information was impossible to convey in AVSubtitle, as it had exactly one pts field, so where does it come from now? Also, isn't that kind of offset what subtitle_start_time is for, a delta on top of the pts? Sounds like its just duplicating logic once again. pts is also by definition and name the "presentation time stamp", if a timestamp of another meaning can convey additional information, we do have a dts field in AVFrame, for the decoding timestamp. > Now when splitcc sends such a repeated frame, it needs to adjust the > frame's pts in order to match the configured output framerate. > But that's for keeping the graph running, the actual display start > time (subtitle_pts) must not be changed. This sounds like an implementation detail, and not something we should have in a core public structure like AVFrame. - Hendrik
On Sun, Dec 5, 2021 at 11:38 PM Hendrik Leppkes <h.leppkes@gmail.com> wrote: > > On Sun, Dec 5, 2021 at 6:58 PM Soft Works <softworkz@hotmail.com> wrote: > > > > > > > > > -----Original Message----- > > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Lynne > > > Sent: Sunday, December 5, 2021 5:40 PM > > > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > > > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame > > > for subtitle handling > > > > > > 5 Dec 2021, 17:23 by softworkz@hotmail.com: > > > > > > > @@ -491,6 +499,39 @@ typedef struct AVFrame { > > > > */ > > > > uint64_t channel_layout; > > > > > > > > + /** > > > > + * Display start time, relative to packet pts, in ms. > > > > + */ > > > > + uint32_t subtitle_start_time; > > > > + > > > > + /** > > > > + * Display end time, relative to packet pts, in ms. > > > > + */ > > > > + uint32_t subtitle_end_time; > > > > > > > > > > Milliseconds? Our entire API's based around timestamps > > > with time bases. Plus, we all know what happened when > > > Matroska settled onto milliseconds and ruined a perfectly > > > complex but good container. > > > Make this relative to the PTS field, with the same timebase > > > as the PTS field. > > > There's even a new AVFrame->time_base field for you to > > > set so you wouldn't forget it. > > > > The internal format for text subtitles is ASS, and this is > > using a timebase of milliseconds. > > > > All existing decoders and encoders are using this and I'm > > afraid, but I will not go and change them all. > > > > > > + /** > > > > + * Number of items in the @ref subtitle_areas array. > > > > + */ > > > > + unsigned num_subtitle_areas; > > > > + > > > > + /** > > > > + * Array of subtitle areas, may be empty. > > > > + */ > > > > + AVSubtitleArea **subtitle_areas; > > > > > > > > > > There's no reason why this cannot be handled using the buffer > > > and data fields. If you need more space there, you're free to bump > > > the maximum number of pointers, plus this removes the horrid > > > malloc(1) hack. We've discussed this, and I couldn't follow why > > > this was bad in the email discussion you linked. > > > > There are reasons. Some of those had I mentioned in an earlier > > discussion with Hendrik. > > The effort alone to relate the buffers to subtitle areas (which area > > 'owns' which buffer) is not reasonable. Too many cases to consider, > > what's when there are 3 areas and the second area doesn't have any > > buffer? The convention is that the buffer should be used contiguously. > > Managing those relations is error-prone and would require a lot of code. > > > > > > + /** > > > > + * Usually the same as packet pts, in AV_TIME_BASE. > > > > + * > > > > + * @deprecated This is kept for compatibility reasons and corresponds > > > to > > > > + * AVSubtitle->pts. Might be removed in the future. > > > > + */ > > > > + int64_t subtitle_pts; > > > > > > > > > > I'm not going to accept a field which is instantly deprecated. > > > As we've discussed multiple times, please merge this into > > > the regular frame PTS field. We already have _2_ necessary > > > stard/end fields. > > > > -- > > > > > I agree with this entirely. Even ignoring the fact that adding a new > > > field thats deprecated is instantly a disqualification, AVSubtitle had > > > one pts field, AVFrame already has one pts field - both are even > > > documented to have the same semantic. They should just contain the > > > exact same data, thats how you achieve compatibility, not by claiming > > > you need a new field for compatibility reasons. > > > > > > - Hendrik > > > > I think the mistake is to declare subtitle_pts as deprecated. I had > > added the deprecation at a very early point in time where I had still > > thought that it can be eliminated. > > > > Even though we are driving subtitle data through the graph attached > > to AVFrame, the behavior involved is very different from audio and > > video frames. Actually there's not one but many different ways how > > subtitle data can appear in a source and travel through a filtergraph: > > > > - Sometimes, subtitle events are muxed into a stream many seconds > > ahead of display time. In this case, AVFrame.pts is the mux position > > and AVFrame.subtitle_pts is the actual presentation time. > > When filtering subtitles to modify something, it would be still desired > > to retain the offset between mux time and display start > > > > This information was impossible to convey in AVSubtitle, as it had > exactly one pts field, so where does it come from now? > Also, isn't that kind of offset what subtitle_start_time is for, a > delta on top of the pts? > > Sounds like its just duplicating logic once again. > > pts is also by definition and name the "presentation time stamp", if a > timestamp of another meaning can convey additional information, we do > have a dts field in AVFrame, for the decoding timestamp. To illustrate this, lets just look at the description of timestamp fields in AVFrame pts: Presentation timestamp in time_base units (time when frame should be shown to user). pkt_dts: DTS copied from the AVPacket that triggered returning this frame. These comments are pretty clear on the meaning of those fields, and shall not be violated by subtitles. pts is the time when the frame is going to be displayed, and nothing else. We probably shouldn't even tolerate subtitle_start_time as its a stupid field when it should just get baked into the timestamp (with duration being used for the end_time field). pkt_dts can be used to convey an additional timestamp that was copied from the demuxer, if for some reason you want to preserve the interleaving position of the frame - which in itself is already questionable, as that should be up to the muxer or the interleaving code to decide, but the field exists, so its fine. subtitle_pts has to go, and these existing fields have to be used within their definition. - Hendrik
> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Soft Works > Sent: Sunday, December 5, 2021 6:58 PM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame > for subtitle handling > > > > > -----Original Message----- > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Lynne > > Sent: Sunday, December 5, 2021 5:40 PM > > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame > > for subtitle handling > > > > 5 Dec 2021, 17:23 by softworkz@hotmail.com: > > > > > @@ -491,6 +499,39 @@ typedef struct AVFrame { > > > */ > > > uint64_t channel_layout; > > > > > > + /** > > > + * Display start time, relative to packet pts, in ms. > > > + */ > > > + uint32_t subtitle_start_time; > > > + > > > + /** > > > + * Display end time, relative to packet pts, in ms. > > > + */ > > > + uint32_t subtitle_end_time; > > > > > > > Milliseconds? Our entire API's based around timestamps > > with time bases. Plus, we all know what happened when > > Matroska settled onto milliseconds and ruined a perfectly > > complex but good container. > > Make this relative to the PTS field, with the same timebase > > as the PTS field. > > There's even a new AVFrame->time_base field for you to > > set so you wouldn't forget it. > > The internal format for text subtitles is ASS, and this is > using a timebase of milliseconds. > > All existing decoders and encoders are using this and I'm > afraid, but I will not go and change them all. > > > > + /** > > > + * Number of items in the @ref subtitle_areas array. > > > + */ > > > + unsigned num_subtitle_areas; > > > + > > > + /** > > > + * Array of subtitle areas, may be empty. > > > + */ > > > + AVSubtitleArea **subtitle_areas; > > > > > > > There's no reason why this cannot be handled using the buffer > > and data fields. If you need more space there, you're free to bump > > the maximum number of pointers, plus this removes the horrid > > malloc(1) hack. We've discussed this, and I couldn't follow why > > this was bad in the email discussion you linked. > > There are reasons. Some of those had I mentioned in an earlier > discussion with Hendrik. > The effort alone to relate the buffers to subtitle areas (which area > 'owns' which buffer) is not reasonable. Too many cases to consider, > what's when there are 3 areas and the second area doesn't have any > buffer? The convention is that the buffer should be used contiguously. > Managing those relations is error-prone and would require a lot of code. > > > > + /** > > > + * Usually the same as packet pts, in AV_TIME_BASE. > > > + * > > > + * @deprecated This is kept for compatibility reasons and > corresponds > > to > > > + * AVSubtitle->pts. Might be removed in the future. > > > + */ > > > + int64_t subtitle_pts; > > > > > > > I'm not going to accept a field which is instantly deprecated. > > As we've discussed multiple times, please merge this into > > the regular frame PTS field. We already have _2_ necessary > > stard/end fields. > > -- > > > I agree with this entirely. Even ignoring the fact that adding a new > > field thats deprecated is instantly a disqualification, AVSubtitle had > > one pts field, AVFrame already has one pts field - both are even > > documented to have the same semantic. They should just contain the > > exact same data, thats how you achieve compatibility, not by claiming > > you need a new field for compatibility reasons. > > > > - Hendrik > > I think the mistake is to declare subtitle_pts as deprecated. I had > added the deprecation at a very early point in time where I had still > thought that it can be eliminated. > > Even though we are driving subtitle data through the graph attached > to AVFrame, the behavior involved is very different from audio and > video frames. Actually there's not one but many different ways how > subtitle data can appear in a source and travel through a filtergraph: > > - Sometimes, subtitle events are muxed into a stream many seconds > ahead of display time. In this case, AVFrame.pts is the mux position > and AVFrame.subtitle_pts is the actual presentation time. > When filtering subtitles to modify something, it would be still desired > to retain the offset between mux time and display start > > - Sometimes, subtitle events are occurring in the mux "live" - right in > the moment when they are meant to be shown. An example for this are > closed captions, and when extracting those via the new splitcc filter, > the subtitle_pts is equal to the frame.pts value. > But CC events do not come regularly, while downstream filters might > expect exactly that in order to proceed. That's why the splitcc filter > can emit subtitle frames at a certain framerate. It does so by re- > sending the most recent subtitle frame - basically not much different > than the main heatbeat mechanism (which cannot be used here because > the stream has it's origin inside the filtegraph (secondary output > of splitcc). > Now when splitcc sends such a repeated frame, it needs to adjust the > frame's pts in order to match the configured output framerate. > But that's for keeping the graph running, the actual display start > time (subtitle_pts) must not be changed. > It depends then on the downstream filters how to handle those frames. > For example, a filter could detect (based on subtitle_pts) repeated > subtitle events. Then it can skip processing and use a previous cached > result (some of the filter are actually doing that). > > - I already mentioned the subtitle_heartbeat mechanism that I have kept. > The same applies here: it needs to resend frames to retain a certain > frequency of frames and for this purpose it needs to send duplicate > frames. Those frames need a new pts, but subtitle_pts must remain > at its original value > > - There exist extreme cases, like where the complete set of ASS subtitles > is muxed at the beginning of the stream with pkt-pts values like 1,2,3,.. > The pkt-pts is what subtitle frames get as AVFrame.pts while the actual > display start-time it in subtitle_pts. > The desire way to handle such situations surely varies from case to case. > Having those values separate is an important key for keeping a wide range > of options open to handle this - for current and future subtitle filters. > > My patchset offers a really wide range of possibilities for subtitle > filtering. Those are real-world production scenarios that are working. > This is not based on just a few command line experiments. > > Not every scenario that would be desirable is working. With subtitles, > there are a lot of different cases for which I have given just a few > examples above, and certain scenarios will need some work to get it > working by some tweaking, adding a new filter or adding additional > options to existing filters (like you could see in the version history). > My experience from doing just that has made me realize one important > point: with the proposed architecture, it's possible to get almost > any desired scenario working Maybe this doesn't sound that much substantial; let me illustrate what I mean by looking at an example. Last week, I needed to convert closed captions during a live transcoding and deliver these as regular text subtitles (ass, srt, vtt). As explained above, the problem with CCs is that they are not muxed ahead of time as it's usually the case with other subtitles - they are appearing in the stream right in the moment when they are meant to be presented. And even worse: there is no duration included. The duration is solely determined by the occurrence of the following update (can be an addition to the current text or a completely new text (or nothing). The splitcc filter uses the existing cc_dec decoder, which can work in two different modes (splitcc exposes the decoder's options): - Default On each CC update, it saves the data and waits (for a certain time) until the next update happens. Then it emits the saved event, setting the duration from the time difference. - Realtime Emits subtitle events immediately with the duration set to infinite. A minimum interval between two subsequent outputs can be specified for this mode. None of those modes is really useful for the desired conversion: In the default mode, subtitles are shown too late, with the delay not even being constant but varying as the decoder doesn’t delay by a fixed timespan but by the interval between two events which can range between milliseconds and several seconds. The display start times are correct though (not delayed), and that can lead to situations where for example an event with a 3 second duration arrives at the client at a time where 2.5s of the display time have elapsed already and the remaining 0.5s display duration appears like a flicker. In turn, that mode is surely suitable for offline conversions but not for live transcoding. When using the real_time mode on the other side, we don't have any durations, and when we convert that to a text subtitle format, the lines will just fill up the screen to the max. Setting a fixed duration doesn’t help either: the display start times are varying arbitrarily and when we don't specify the accurate duration, the events will overlap in time and texts will be shown stacked on overlap, which looks like jumping up and down. I've been at the same point before in the context of burning-in CCs where I had discussed the issue with the guys from libass: https://github.com/libass/libass/issues/562 The solution there was the 'render_latest_only' option that I ended up adding to the overlaytextsubs and textsub2video filters. Unfortunately this can only work when implemented at the final end of the chain (in the former case: the libass renderer), and that option isn't available when converting to a subtitle output stream which will be processed by a client we don't have control of. Eventually I came to a new idea for which I added the scatter_realtime_output option to splitcc. It basically works like this: - The splitcc output is no longer driven by the cc_decoder's output - Instead, splitcc is emitting subtitle events at a fixed "frame"-rate based on the configured real_time_latency_msec value (e.g. 250msec) - all subtitle frames/events have a fixed duration set to this value - subsequent events have start display times (subtitle_pts) increasing by this (fixed) value - if there has been an update from the split_cc decoder between two of those events, that update is taken and emitted with the next output. The decoded event's start time is replaced by the output event time (means: start times are quantized to 250ms intervals) - if there hasn't been a change, the previous content is re-sent (but with the timings changed to the next 250ms interval) Essentially, closed caption events are quantized and split ("scattered") to match that fixed time raster and the filter output is decoupled from the CC decoder's output timings. But that's not all. There's another, additional level of decoupling in the case of the splitcc filter: This filter has one input (video) and two outputs (0: video, 1: text subs) where the second output works in a special way. The first output is just for passthrough of the video frames from which we take the CC side data without further manipulation. It's very simple behavior: you push a frame to the input and the same frame is synchronously being output at output pin 0. The second output uses the request_frame api instead, which works as a kind of pull model. Frames are requested according to the configured output framerate and we need to increase the frame.pts value each time by the reciprocal value of that output framerate. For splitcc, this is set to 5 fps by default. In our example, we have a 250ms interval, which makes 4 fps. That means that we need to duplicate every 4th of our output frames. The duplicated frame needs to have the subtitle_pts unchanged, so it can be identified as a duplicate and removed/ignored downstream. While, frame.pts needs to follow the graph timing to keep it all going. Effectively, these are all different kinds of heartbeat cases. Those beats are required to drive subtitle filtering because the subtitles' own timings alone are not capable for driving a constant filtering flow. All earlier discussions about subtitle filtering had that conclusion by the way, that a heartbeat mechanism is required for subtitle filtering. The purpose of that heartbeat mechanism is to ensure the filtering flow and this is driven by AVFrame.pts, that's why it's exactly that field that needs to be used for this and no other one - even when it might not exactly fit to the code doc comment. It can only work like that. Kind regards, softworkz
On Mon, Dec 6, 2021 at 12:23 AM Soft Works <softworkz@hotmail.com> wrote: > > It can only work like that. > You literally designed subtitle filtering. You defined how it works. All your explanations come down to some fancy form of "because I made it required". All I'm hearing is that some internal mechanisms in avfilter need it. Why is it a public field in one of the most key structures in our entire codebase? You understand this concern, right? Any public field should first (and only) answer the question, what information does this convey to the user? Does the user need this information? I'm not seeing any of that in your explanations. All I'm seeing is internal avfilter workings - cemented in public API. You are not integrating into an existing infrastructure, you are literally trying to add the whole thing, so you can change it to not need this redundant field. - Hendrik
> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Hendrik > Leppkes > Sent: Monday, December 6, 2021 12:37 AM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame > for subtitle handling > > On Mon, Dec 6, 2021 at 12:23 AM Soft Works <softworkz@hotmail.com> wrote: > > > > It can only work like that. > > > > You literally designed subtitle filtering. You defined how it works. > All your explanations come down to some fancy form of "because I made > it required". The origin was in the context of keeping sub2video and the existing heartbeat functionality working, And in the end it's still about heartbeat and filter flow. Removing the field and replacing with the main pts field makes 18 fate tests fail. Those tests are not testing the new filtering. I guess that shouldn't happen when the field would be redundant. Thanks, sw
> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Hendrik > Leppkes > Sent: Monday, December 6, 2021 12:37 AM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame > for subtitle handling > > On Mon, Dec 6, 2021 at 12:23 AM Soft Works <softworkz@hotmail.com> wrote: > > > > It can only work like that. > > > > You literally designed subtitle filtering. You defined how it works. > All your explanations come down to some fancy form of "because I made > it required". > > All I'm hearing is that some internal mechanisms in avfilter need it. > Why is it a public field in one of the most key structures in our > entire codebase? You understand this concern, right? Any public field > should first (and only) answer the question, what information does > this convey to the user? Does the user need this information? I'm not > seeing any of that in your explanations. All I'm seeing is internal > avfilter workings - cemented in public API. > > You are not integrating into an existing infrastructure, you are > literally trying to add the whole thing, so you can change it to not > need this redundant field. > > - Hendrik OK, maybe this is better to understand. Very simple burn-down: I send a subtitle frame into a filtergraph. pts = 10 subtitle_pts = 10 Now I need to re-send a duplicate of that frame into the graph (=> heartbeat - to keep the graph flowing) Now, it's not possible (by lavf design, not mine) to send another frame without increasing the pts of the frame. So we send it like this: pts = 11 subtitle_pts = 10 The pts needs to be increased. But the start-display time of the subtitle event must not be changed - it's still the same frame like before, just a duplicate. softworkz
On Mon, Dec 6, 2021 at 1:24 AM Soft Works <softworkz@hotmail.com> wrote: > > > > > -----Original Message----- > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Hendrik > > Leppkes > > Sent: Monday, December 6, 2021 12:37 AM > > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame > > for subtitle handling > > > > On Mon, Dec 6, 2021 at 12:23 AM Soft Works <softworkz@hotmail.com> wrote: > > > > > > It can only work like that. > > > > > > > You literally designed subtitle filtering. You defined how it works. > > All your explanations come down to some fancy form of "because I made > > it required". > > > > All I'm hearing is that some internal mechanisms in avfilter need it. > > Why is it a public field in one of the most key structures in our > > entire codebase? You understand this concern, right? Any public field > > should first (and only) answer the question, what information does > > this convey to the user? Does the user need this information? I'm not > > seeing any of that in your explanations. All I'm seeing is internal > > avfilter workings - cemented in public API. > > > > You are not integrating into an existing infrastructure, you are > > literally trying to add the whole thing, so you can change it to not > > need this redundant field. > > > > - Hendrik > > OK, maybe this is better to understand. Very simple burn-down: > > I send a subtitle frame into a filtergraph. > > pts = 10 > subtitle_pts = 10 > > Now I need to re-send a duplicate of that frame into the graph > (=> heartbeat - to keep the graph flowing) > > Now, it's not possible (by lavf design, not mine) to send another frame > without increasing the pts of the frame. So we send it like this: > > pts = 11 > subtitle_pts = 10 > > The pts needs to be increased. But the start-display time of the > subtitle event must not be changed - it's still the same frame like > before, just a duplicate. > You just keep re-iterating that avfilter internals need it. And I'll just keep saying that avfilter internals shall not define public API design. I have not heard anything that justifies yet another field that has no benefit to the external public API - to the contrary even, a user would assume pts is pts, and he would be wrong. - Hendrik
> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Hendrik > Leppkes > Sent: Monday, December 6, 2021 1:37 AM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame > for subtitle handling > > On Mon, Dec 6, 2021 at 1:24 AM Soft Works <softworkz@hotmail.com> wrote: > > > > > > > > > -----Original Message----- > > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Hendrik > > > Leppkes > > > Sent: Monday, December 6, 2021 12:37 AM > > > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > > > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare > AVFrame > > > for subtitle handling > > > > > > On Mon, Dec 6, 2021 at 12:23 AM Soft Works <softworkz@hotmail.com> wrote: > > > > > > > > It can only work like that. > > > > > > > > > > You literally designed subtitle filtering. You defined how it works. > > > All your explanations come down to some fancy form of "because I made > > > it required". > > > > > > All I'm hearing is that some internal mechanisms in avfilter need it. > > > Why is it a public field in one of the most key structures in our > > > entire codebase? You understand this concern, right? Any public field > > > should first (and only) answer the question, what information does > > > this convey to the user? Does the user need this information? I'm not > > > seeing any of that in your explanations. All I'm seeing is internal > > > avfilter workings - cemented in public API. > > > > > > You are not integrating into an existing infrastructure, you are > > > literally trying to add the whole thing, so you can change it to not > > > need this redundant field. > > > > > > - Hendrik > > > > OK, maybe this is better to understand. Very simple burn-down: > > > > I send a subtitle frame into a filtergraph. > > > > pts = 10 > > subtitle_pts = 10 > > > > Now I need to re-send a duplicate of that frame into the graph > > (=> heartbeat - to keep the graph flowing) > > > > Now, it's not possible (by lavf design, not mine) to send another frame > > without increasing the pts of the frame. So we send it like this: > > > > pts = 11 > > subtitle_pts = 10 > > > > The pts needs to be increased. But the start-display time of the > > subtitle event must not be changed - it's still the same frame like > > before, just a duplicate. > > > > You just keep re-iterating that avfilter internals need it. And I'll > just keep saying that avfilter internals shall not define public API > design. I have not heard anything that justifies yet another field > that has no benefit to the external public API - to the contrary even, > a user would assume pts is pts, and he would be wrong. So, what solution would you suggest, then? sw
> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Hendrik > Leppkes > Sent: Monday, December 6, 2021 1:37 AM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame > for subtitle handling > > On Mon, Dec 6, 2021 at 1:24 AM Soft Works <softworkz@hotmail.com> wrote: > > > > > > > > > -----Original Message----- > > > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Hendrik > > > Leppkes > > > Sent: Monday, December 6, 2021 12:37 AM > > > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > > > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare > AVFrame > > > for subtitle handling > > > > > > On Mon, Dec 6, 2021 at 12:23 AM Soft Works <softworkz@hotmail.com> wrote: > > > > > > > > It can only work like that. > > > > > > > > > > You literally designed subtitle filtering. You defined how it works. > > > All your explanations come down to some fancy form of "because I made > > > it required". > > > > > > All I'm hearing is that some internal mechanisms in avfilter need it. > > > Why is it a public field in one of the most key structures in our > > > entire codebase? You understand this concern, right? Any public field > > > should first (and only) answer the question, what information does > > > this convey to the user? Does the user need this information? I'm not > > > seeing any of that in your explanations. All I'm seeing is internal > > > avfilter workings - cemented in public API. > > > > > > You are not integrating into an existing infrastructure, you are > > > literally trying to add the whole thing, so you can change it to not > > > need this redundant field. > > > > > > - Hendrik > > > > OK, maybe this is better to understand. Very simple burn-down: > > > > I send a subtitle frame into a filtergraph. > > > > pts = 10 > > subtitle_pts = 10 > > > > Now I need to re-send a duplicate of that frame into the graph > > (=> heartbeat - to keep the graph flowing) > > > > Now, it's not possible (by lavf design, not mine) to send another frame > > without increasing the pts of the frame. So we send it like this: > > > > pts = 11 > > subtitle_pts = 10 > > > > The pts needs to be increased. But the start-display time of the > > subtitle event must not be changed - it's still the same frame like > > before, just a duplicate. > > > > You just keep re-iterating that avfilter internals need it. And I'll > just keep saying that avfilter internals shall not define public API > design. I have not heard anything that justifies yet another field > that has no benefit to the external public API - to the contrary even, > a user would assume pts is pts, and he would be wrong. About which use cases of the public API are we talking about actually? Applications similar to ffmpeg CLI? Applications that want to do subtitle filtering? Applications that work with subtitle AVFrames? => The heartbeat mechanism is an intrinsic requirement for performing subtitle filtering. Whether you want to send frames into a filtergraph or process subtitle frames coming out of a filtergraph: such applications will often need to deal with both fields, and when just one, then it might be the one or the other, depending on use case. ffmpeg cli needs to deal with both of these fields. That's already a good indication that other consumers will need the same. Essentially, this is not an implementation detail that needs to be hidden, it's an important element of subtitle filtering functionality. I have tried to explain the functionality and importance of having these fields separate in detail and by examples. And it's not only the use of those fields in the current code. I have quite a number of future use cases in mind, and many of them wouldn't be possible with the duality of those fields. I admit that the meaning of those fields in a subtitle context and their relation to each other might not be easy to understand immediately, but that's just one tiny bit among many others in case of ffmpeg and surely not among the worst. Maybe some improvement could be achieved by things like renaming or clarifying doc texts, I'm open to suggestions. Kind regards, softworkz
Hendrik Leppkes (12021-12-06): > You just keep re-iterating that avfilter internals need it. And I'll > just keep saying that avfilter internals shall not define public API > design. I have not heard anything that justifies yet another field > that has no benefit to the external public API - to the contrary even, > a user would assume pts is pts, and he would be wrong. Thank you. I need to add that libavfilter internals need it only because it was designed in a very naive way, by duplicating the frame instead of using dummy frames without content. This is not the only part of these patch series that are improperly designed. Having two different filters to overlay subs depending on whether they are text or bitmap is unacceptable. Not having any utility filters (to shift or scale the times, to discard a segment, tu concatenate subtitles along with audio and video, to print diagnostic information, etc.) is also a big issue — and of course, triplicating the boilerplate code for these filters is a big no. The way the overlaytextsubs filter synchronizes its inputs is completely broken; it works with the obvious test case, of course, but it will fail in any complicated case. I have made these comments before. I also have offered to help do things properly. But instead of welcoming my help, Soft Works has rudely rejected it. So now I am just looking with annoyance as people waste their time helping Soft Works polishing patches whose design is wrong from the core. What everybody should be telling Soft Works on this subject is just this: You do not try to redesign a major API of the project when you have been in the project in just a few months and are still struggling with our coding conventions and with configuring a mail client. Especially not something that have been in project for more than a decade. There is a reason I have not yet implemented subtitles filtering, or offered it as a Summer of Code project: it is hard and has a ton of prerequisite. Now, if I knew I had the support of the core developers here, I could serenely get on with implementing FATE tests for the negotiation system and refactor it, which is the necessary first step before extending it for a third media type. Regards,
Quoting Lynne (2021-12-05 22:30:59) > 5 Dec 2021, 20:21 by softworkz@hotmail.com: > > One of the reasons why this subtitle filtering patchset has proceeded > > rather smoothly so far, causing only a small amount of regressions > > that were easy to find and fix, is that I have kept as much as possible > > from the current AVSubtitle implementation, projecting it to the > > new frame-based subtitle implementation, while keeping it mostly > > compatible with the current logic. > > Interestingly, an earlier attempt towards subtitle filtering has > > gotten stuck right at the point about storing subtitle data in > > AVFrame buffers. I decided to focus on functionality rather than > > academic kind of details. What has been in place and working > > for such a long time can't be so wrong that it can't be accepted. > > > > We also need to consider that it's not my patchset that is deliberately > > deciding to store data in pointers associate with rects/area and > > to store start- and end-times in milliseconds: it's all the decoders > > and encoders that are doing this. I'm just adapting to the status > > quo. > > > > These two things - changing the timebase of the ms-fields and > > changing the location of the buffer pointers - is a gigantic task > > that would require to change everything that this patchset covers > > and implements - plus even more code that this patchset hasn't even > > touched. It would require changes to everything everywhere where > > subtitles are concerned. > > It is a task that is risky as hell, will cause plenty of > > regressions under guarantee, require extensive testing and might even > > drive this whole endeavor into a dead end. > > > > As I said before, please understand that I'm not ready to enter that > > path. > > > > You don't have to break anything to make the start/end times proper > timestamps. We have timebases for that! If you'd like your timestamps > to have millisecond precision, you use a millisecond timebase! > > So what if we use ASS internally which rounds timestamps? We have > bitmap formats too, and those are generally following the video > timestamps. You absolutely don't want the timestamps being rounded > in that case, or you risk having your subtitles be a frame late or > a frame early. > > You could add a separate timebase for the subtitle start/end fields, > in case the packet's timebase doesn't match the subtitle format's > native timebase. I won't object to that. > > You also don't have to risk that much by removing the subtitle_pts > field. Once the start/end times are converted to proper timestamps, > then the AVFrame->pts field can do whatever it wants to, whether > it mirrors the demuxer's timestamp, be constant, non-existent or even > go backwards. > > If you just want to have an empty subtitle rect, you can simply > zero the subtitle struct's data and buffer fields. > > Calling these issues too academic when the frame API is so very > straightforward and cut-and-paste is like calling computers > too academic when you have several thousand large numbers to add. > > I'd really rather have a subtitle API that's usable, maintainable and easy > to extend, even if it's momentarily broken for certain corner cases > and weird files. > We can easily fix broken files one by one. We've got plenty of heuristics, > and have mostly abstracted and documented them well. We can add more. > We cannot remove cruft from an API that's designed to be crufty, however, > because the cruft is the only thing holding it together. +1 This set is a very significant change that will likely stay with us for many years, if not decades. So it is of utmost importance that the API design is clean and well-defined, because changing it later will be extremely hard. Keeping all the edge cases working flawlessly is IMO secondary here. It would be nice of course, but they can always be fixed later. The API cannot be, at least not without major pain.
> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Anton > Khirnov > Sent: Monday, December 6, 2021 4:52 PM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame > for subtitle handling > > Quoting Lynne (2021-12-05 22:30:59) > > 5 Dec 2021, 20:21 by softworkz@hotmail.com: > > > One of the reasons why this subtitle filtering patchset has proceeded > > > rather smoothly so far, causing only a small amount of regressions > > > that were easy to find and fix, is that I have kept as much as possible > > > from the current AVSubtitle implementation, projecting it to the > > > new frame-based subtitle implementation, while keeping it mostly > > > compatible with the current logic. > > > Interestingly, an earlier attempt towards subtitle filtering has > > > gotten stuck right at the point about storing subtitle data in > > > AVFrame buffers. I decided to focus on functionality rather than > > > academic kind of details. What has been in place and working > > > for such a long time can't be so wrong that it can't be accepted. > > > > > > We also need to consider that it's not my patchset that is deliberately > > > deciding to store data in pointers associate with rects/area and > > > to store start- and end-times in milliseconds: it's all the decoders > > > and encoders that are doing this. I'm just adapting to the status > > > quo. > > > > > > These two things - changing the timebase of the ms-fields and > > > changing the location of the buffer pointers - is a gigantic task > > > that would require to change everything that this patchset covers > > > and implements - plus even more code that this patchset hasn't even > > > touched. It would require changes to everything everywhere where > > > subtitles are concerned. > > > It is a task that is risky as hell, will cause plenty of > > > regressions under guarantee, require extensive testing and might even > > > drive this whole endeavor into a dead end. > > > > > > As I said before, please understand that I'm not ready to enter that > > > path. > > > > > > > You don't have to break anything to make the start/end times proper > > timestamps. We have timebases for that! If you'd like your timestamps > > to have millisecond precision, you use a millisecond timebase! > > > > So what if we use ASS internally which rounds timestamps? We have > > bitmap formats too, and those are generally following the video > > timestamps. You absolutely don't want the timestamps being rounded > > in that case, or you risk having your subtitles be a frame late or > > a frame early. > > > > You could add a separate timebase for the subtitle start/end fields, > > in case the packet's timebase doesn't match the subtitle format's > > native timebase. I won't object to that. > > > > You also don't have to risk that much by removing the subtitle_pts > > field. Once the start/end times are converted to proper timestamps, > > then the AVFrame->pts field can do whatever it wants to, whether > > it mirrors the demuxer's timestamp, be constant, non-existent or even > > go backwards. > > > > If you just want to have an empty subtitle rect, you can simply > > zero the subtitle struct's data and buffer fields. > > > > Calling these issues too academic when the frame API is so very > > straightforward and cut-and-paste is like calling computers > > too academic when you have several thousand large numbers to add. > > > > I'd really rather have a subtitle API that's usable, maintainable and easy > > to extend, even if it's momentarily broken for certain corner cases > > and weird files. > > We can easily fix broken files one by one. We've got plenty of heuristics, > > and have mostly abstracted and documented them well. We can add more. > > We cannot remove cruft from an API that's designed to be crufty, however, > > because the cruft is the only thing holding it together. > > +1 > > This set is a very significant change that will likely stay with us for > many years, if not decades. So it is of utmost importance that the API > design is clean and well-defined, because changing it later will be > extremely hard. > > Keeping all the edge cases working flawlessly is IMO secondary here. > It would be nice of course, but they can always be fixed later. The API > cannot be, at least not without major pain. From the perspective of that logic, the problem is that almost all cases are "edge cases". Kind regards, softworkz
> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Nicolas > George > Sent: Monday, December 6, 2021 2:43 PM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame > for subtitle handling [..] > Regards, > -- > Nicolas George I have stripped those points that don't appear to be worth responding to. If you have an example for a scenario that isn't working with my patchset (and I'm sure such cases can be found), then I'll look into it and try to address it. Regards, softworkz
On Mon, Dec 06, 2021 at 02:42:58PM +0100, Nicolas George wrote: > Hendrik Leppkes (12021-12-06): > > You just keep re-iterating that avfilter internals need it. And I'll > > just keep saying that avfilter internals shall not define public API > > design. I have not heard anything that justifies yet another field > > that has no benefit to the external public API - to the contrary even, > > a user would assume pts is pts, and he would be wrong. > > Thank you. > > I need to add that libavfilter internals need it only because it was > designed in a very naive way, by duplicating the frame instead of using > dummy frames without content. > > This is not the only part of these patch series that are improperly > designed. > Having two different filters to overlay subs depending on > whether they are text or bitmap is unacceptable. i do agree that is inconvenient i would not call it unacceptable > Not having any utility > filters (to shift or scale the times, to discard a segment, tu > concatenate subtitles along with audio and video, to print diagnostic > information, etc.) is also a big issue i agree too [...] > Now, if I knew I had the support of the core developers here, I could > serenely get on with implementing FATE tests for the negotiation system > and refactor it, which is the necessary first step before extending it > for a third media type. Anyone who does that without triggering hostilities and (either works together with other developers OR finds any other solution all involved are happy with) has my support. But i am a bit scared of more fights, hostilities between 2 cooks trying to cook one soup. And myself having more headaches from reading more escalating threads. Basically i dont want to end up hiding depressed in a corner after some quite hard headed developers fail to treat each other respectfully and cooperate. Also judging from your question here i guess that you have time and want to work on FFmpeg ? If you rather want to work on something else within FFmpeg. I have both easy and hard things that need to be done Easy one, fix ssl support on fate.ffmpeg.org (that is setup letsencrypt) Hard one, can low latency support to libavfilter be improved? consider a use case like realtime streaming, cloud gaming or some cloud based metaverse any filtering done for these would benefit from really low latency. Because the end user interacts with the remote and so any filtering done adds top the latency of this loop Is libavfilter optimal for this usecase ? would some different threading be better, or maybe some information exchange between filters about which lines from the previous filter are already available to the next to process thx [...]
diff --git a/doc/APIchanges b/doc/APIchanges index a466a2ec22..8c63ea0311 100644 --- a/doc/APIchanges +++ b/doc/APIchanges @@ -14,6 +14,16 @@ libavutil: 2021-04-27 API changes, most recent first: +2021-12-05 - xxxxxxxxxx - lavu 58.1.100 - frame.h + Add AVMediaType field to AVFrame + Add Fields for carrying subtitle data to AVFrame + (subtitle_areas, subtitle_header, subtitle_pts, start/end time, etc.) + Add av_frame_get_buffer2() and deprecate av_frame_get_buffer() + +2021-12-05 - xxxxxxxxxx - lavu 58.1.100 - subfmt.h + Add struct AVSubtitleArea (replaces AVSubtitle) + Add av_get_subtitle_fmt_name() and av_get_subtitle_fmt() + 2021-12-05 - xxxxxxxxxx - lavu 58.1.100 - subfmt.h Add enum AVSubtitleType (moved from lavc), add new values, deprecate existing diff --git a/libavutil/Makefile b/libavutil/Makefile index c7843db1e4..7e79936876 100644 --- a/libavutil/Makefile +++ b/libavutil/Makefile @@ -160,6 +160,7 @@ OBJS = adler32.o \ slicethread.o \ spherical.o \ stereo3d.o \ + subfmt.o \ threadmessage.o \ time.o \ timecode.o \ diff --git a/libavutil/frame.c b/libavutil/frame.c index 64eed5b0dd..5da082adda 100644 --- a/libavutil/frame.c +++ b/libavutil/frame.c @@ -26,6 +26,7 @@ #include "imgutils.h" #include "mem.h" #include "samplefmt.h" +#include "subfmt.h" #include "hwcontext.h" #define CHECK_CHANNELS_CONSISTENCY(frame) \ @@ -50,6 +51,9 @@ const char *av_get_colorspace_name(enum AVColorSpace val) return name[val]; } #endif + +static int frame_copy_subtitles(AVFrame *dst, const AVFrame *src, int copy_data); + static void get_frame_defaults(AVFrame *frame) { if (frame->extended_data != frame->data) @@ -73,7 +77,12 @@ static void get_frame_defaults(AVFrame *frame) frame->colorspace = AVCOL_SPC_UNSPECIFIED; frame->color_range = AVCOL_RANGE_UNSPECIFIED; frame->chroma_location = AVCHROMA_LOC_UNSPECIFIED; - frame->flags = 0; + frame->subtitle_start_time = 0; + frame->subtitle_end_time = 0; + frame->num_subtitle_areas = 0; + frame->subtitle_areas = NULL; + frame->subtitle_pts = 0; + frame->subtitle_header = NULL; } static void free_side_data(AVFrameSideData **ptr_sd) @@ -244,23 +253,55 @@ static int get_audio_buffer(AVFrame *frame, int align) } +static int get_subtitle_buffer(AVFrame *frame) +{ + // Buffers in AVFrame->buf[] are not used in case of subtitle frames. + // To accomodate with existing code, checking ->buf[0] to determine + // whether a frame is ref-counted or has data, we're adding a 1-byte + // buffer here, which marks the subtitle frame to contain data. + frame->buf[0] = av_buffer_alloc(1); + if (!frame->buf[0]) { + av_frame_unref(frame); + return AVERROR(ENOMEM); + } + + frame->extended_data = frame->data; + + return 0; +} + int av_frame_get_buffer(AVFrame *frame, int align) +{ + if (frame->width > 0 && frame->height > 0) + frame->type = AVMEDIA_TYPE_VIDEO; + else if (frame->nb_samples > 0 && (frame->channel_layout || frame->channels > 0)) + frame->type = AVMEDIA_TYPE_AUDIO; + + return av_frame_get_buffer2(frame, align); +} + +int av_frame_get_buffer2(AVFrame *frame, int align) { if (frame->format < 0) return AVERROR(EINVAL); - if (frame->width > 0 && frame->height > 0) + switch(frame->type) { + case AVMEDIA_TYPE_VIDEO: return get_video_buffer(frame, align); - else if (frame->nb_samples > 0 && (frame->channel_layout || frame->channels > 0)) + case AVMEDIA_TYPE_AUDIO: return get_audio_buffer(frame, align); - - return AVERROR(EINVAL); + case AVMEDIA_TYPE_SUBTITLE: + return get_subtitle_buffer(frame); + default: + return AVERROR(EINVAL); + } } static int frame_copy_props(AVFrame *dst, const AVFrame *src, int force_copy) { int ret, i; + dst->type = src->type; dst->key_frame = src->key_frame; dst->pict_type = src->pict_type; dst->sample_aspect_ratio = src->sample_aspect_ratio; @@ -292,6 +333,12 @@ static int frame_copy_props(AVFrame *dst, const AVFrame *src, int force_copy) dst->colorspace = src->colorspace; dst->color_range = src->color_range; dst->chroma_location = src->chroma_location; + dst->subtitle_start_time = src->subtitle_start_time; + dst->subtitle_end_time = src->subtitle_end_time; + dst->subtitle_pts = src->subtitle_pts; + + if ((ret = av_buffer_replace(&dst->subtitle_header, src->subtitle_header)) < 0) + return ret; av_dict_copy(&dst->metadata, src->metadata, 0); @@ -333,6 +380,7 @@ int av_frame_ref(AVFrame *dst, const AVFrame *src) av_assert1(dst->width == 0 && dst->height == 0); av_assert1(dst->channels == 0); + dst->type = src->type; dst->format = src->format; dst->width = src->width; dst->height = src->height; @@ -346,7 +394,7 @@ int av_frame_ref(AVFrame *dst, const AVFrame *src) /* duplicate the frame data if it's not refcounted */ if (!src->buf[0]) { - ret = av_frame_get_buffer(dst, 0); + ret = av_frame_get_buffer2(dst, 0); if (ret < 0) goto fail; @@ -368,6 +416,10 @@ int av_frame_ref(AVFrame *dst, const AVFrame *src) } } + /* copy subtitle areas */ + if (src->type == AVMEDIA_TYPE_SUBTITLE) + frame_copy_subtitles(dst, src, 0); + if (src->extended_buf) { dst->extended_buf = av_calloc(src->nb_extended_buf, sizeof(*dst->extended_buf)); @@ -438,7 +490,7 @@ AVFrame *av_frame_clone(const AVFrame *src) void av_frame_unref(AVFrame *frame) { - int i; + unsigned i, n; if (!frame) return; @@ -457,6 +509,21 @@ void av_frame_unref(AVFrame *frame) av_buffer_unref(&frame->opaque_ref); av_buffer_unref(&frame->private_ref); + av_buffer_unref(&frame->subtitle_header); + + for (i = 0; i < frame->num_subtitle_areas; i++) { + AVSubtitleArea *area = frame->subtitle_areas[i]; + + for (n = 0; n < FF_ARRAY_ELEMS(area->buf); n++) + av_buffer_unref(&area->buf[n]); + + av_freep(&area->text); + av_freep(&area->ass); + av_free(area); + } + + av_freep(&frame->subtitle_areas); + get_frame_defaults(frame); } @@ -474,18 +541,28 @@ void av_frame_move_ref(AVFrame *dst, AVFrame *src) int av_frame_is_writable(AVFrame *frame) { - int i, ret = 1; + AVSubtitleArea *area; + int ret = 1; /* assume non-refcounted frames are not writable */ if (!frame->buf[0]) return 0; - for (i = 0; i < FF_ARRAY_ELEMS(frame->buf); i++) + for (unsigned i = 0; i < FF_ARRAY_ELEMS(frame->buf); i++) if (frame->buf[i]) ret &= !!av_buffer_is_writable(frame->buf[i]); - for (i = 0; i < frame->nb_extended_buf; i++) + for (unsigned i = 0; i < frame->nb_extended_buf; i++) ret &= !!av_buffer_is_writable(frame->extended_buf[i]); + for (unsigned i = 0; i < frame->num_subtitle_areas; i++) { + area = frame->subtitle_areas[i]; + if (area) { + for (unsigned n = 0; n < FF_ARRAY_ELEMS(area->buf); n++) + if (area->buf[n]) + ret &= !!av_buffer_is_writable(area->buf[n]); + } + } + return ret; } @@ -501,6 +578,7 @@ int av_frame_make_writable(AVFrame *frame) return 0; memset(&tmp, 0, sizeof(tmp)); + tmp.type = frame->type; tmp.format = frame->format; tmp.width = frame->width; tmp.height = frame->height; @@ -511,7 +589,7 @@ int av_frame_make_writable(AVFrame *frame) if (frame->hw_frames_ctx) ret = av_hwframe_get_buffer(frame->hw_frames_ctx, &tmp, 0); else - ret = av_frame_get_buffer(&tmp, 0); + ret = av_frame_get_buffer2(&tmp, 0); if (ret < 0) return ret; @@ -546,14 +624,22 @@ AVBufferRef *av_frame_get_plane_buffer(AVFrame *frame, int plane) uint8_t *data; int planes, i; - if (frame->nb_samples) { - int channels = frame->channels; - if (!channels) - return NULL; - CHECK_CHANNELS_CONSISTENCY(frame); - planes = av_sample_fmt_is_planar(frame->format) ? channels : 1; - } else + switch(frame->type) { + case AVMEDIA_TYPE_VIDEO: planes = 4; + break; + case AVMEDIA_TYPE_AUDIO: + { + int channels = frame->channels; + if (!channels) + return NULL; + CHECK_CHANNELS_CONSISTENCY(frame); + planes = av_sample_fmt_is_planar(frame->format) ? channels : 1; + break; + } + default: + return NULL; + } if (plane < 0 || plane >= planes || !frame->extended_data[plane]) return NULL; @@ -677,17 +763,98 @@ static int frame_copy_audio(AVFrame *dst, const AVFrame *src) return 0; } +static int av_subtitle_area2area(AVSubtitleArea *dst, const AVSubtitleArea *src, int copy_data) +{ + dst->x = src->x; + dst->y = src->y; + dst->w = src->w; + dst->h = src->h; + dst->nb_colors = src->nb_colors; + dst->type = src->type; + dst->flags = src->flags; + + switch (dst->type) { + case AV_SUBTITLE_FMT_BITMAP: + + for (unsigned i = 0; i < AV_NUM_BUFFER_POINTERS; i++) { + if (src->h > 0 && src->w > 0 && src->buf[i]) { + dst->buf[0] = av_buffer_ref(src->buf[i]); + if (!dst->buf[i]) + return AVERROR(ENOMEM); + + if (copy_data) { + const int ret = av_buffer_make_writable(&dst->buf[i]); + if (ret < 0) + return ret; + } + + dst->linesize[i] = src->linesize[i]; + } + } + + memcpy(&dst->pal[0], &src->pal[0], sizeof(src->pal[0]) * 256); + + break; + case AV_SUBTITLE_FMT_TEXT: + + if (src->text) { + dst->text = av_strdup(src->text); + if (!dst->text) + return AVERROR(ENOMEM); + } + + break; + case AV_SUBTITLE_FMT_ASS: + + if (src->ass) { + dst->ass = av_strdup(src->ass); + if (!dst->ass) + return AVERROR(ENOMEM); + } + + break; + default: + + av_log(NULL, AV_LOG_ERROR, "Subtitle area has invalid format: %d", dst->type); + return AVERROR(ENOMEM); + } + + return 0; +} + +static int frame_copy_subtitles(AVFrame *dst, const AVFrame *src, int copy_data) +{ + dst->format = src->format; + + dst->num_subtitle_areas = src->num_subtitle_areas; + + if (src->num_subtitle_areas) { + dst->subtitle_areas = av_malloc_array(src->num_subtitle_areas, sizeof(AVSubtitleArea *)); + + for (unsigned i = 0; i < src->num_subtitle_areas; i++) { + dst->subtitle_areas[i] = av_mallocz(sizeof(AVSubtitleArea)); + av_subtitle_area2area(dst->subtitle_areas[i], src->subtitle_areas[i], copy_data); + } + } + + return 0; +} + int av_frame_copy(AVFrame *dst, const AVFrame *src) { if (dst->format != src->format || dst->format < 0) return AVERROR(EINVAL); - if (dst->width > 0 && dst->height > 0) + switch(dst->type) { + case AVMEDIA_TYPE_VIDEO: return frame_copy_video(dst, src); - else if (dst->nb_samples > 0 && dst->channels > 0) + case AVMEDIA_TYPE_AUDIO: return frame_copy_audio(dst, src); - - return AVERROR(EINVAL); + case AVMEDIA_TYPE_SUBTITLE: + return frame_copy_subtitles(dst, src, 1); + default: + return AVERROR(EINVAL); + } } void av_frame_remove_side_data(AVFrame *frame, enum AVFrameSideDataType type) diff --git a/libavutil/frame.h b/libavutil/frame.h index 3f295f6b9e..f4ca23781b 100644 --- a/libavutil/frame.h +++ b/libavutil/frame.h @@ -34,6 +34,7 @@ #include "rational.h" #include "samplefmt.h" #include "pixfmt.h" +#include "subfmt.h" #include "version.h" @@ -278,7 +279,7 @@ typedef struct AVRegionOfInterest { } AVRegionOfInterest; /** - * This structure describes decoded (raw) audio or video data. + * This structure describes decoded (raw) audio, video or subtitle data. * * AVFrame must be allocated using av_frame_alloc(). Note that this only * allocates the AVFrame itself, the buffers for the data must be managed @@ -309,6 +310,13 @@ typedef struct AVRegionOfInterest { */ typedef struct AVFrame { #define AV_NUM_DATA_POINTERS 8 + /** + * Media type of the frame (audio, video, subtitles..) + * + * See AVMEDIA_TYPE_xxx + */ + enum AVMediaType type; + /** * pointer to the picture/channel planes. * This might be different from the first allocated byte. For video, @@ -392,7 +400,7 @@ typedef struct AVFrame { /** * format of the frame, -1 if unknown or unset * Values correspond to enum AVPixelFormat for video frames, - * enum AVSampleFormat for audio) + * enum AVSampleFormat for audio, AVSubtitleType for subtitles) */ int format; @@ -491,6 +499,39 @@ typedef struct AVFrame { */ uint64_t channel_layout; + /** + * Display start time, relative to packet pts, in ms. + */ + uint32_t subtitle_start_time; + + /** + * Display end time, relative to packet pts, in ms. + */ + uint32_t subtitle_end_time; + + /** + * Number of items in the @ref subtitle_areas array. + */ + unsigned num_subtitle_areas; + + /** + * Array of subtitle areas, may be empty. + */ + AVSubtitleArea **subtitle_areas; + + /** + * Usually the same as packet pts, in AV_TIME_BASE. + * + * @deprecated This is kept for compatibility reasons and corresponds to + * AVSubtitle->pts. Might be removed in the future. + */ + int64_t subtitle_pts; + + /** + * Header containing style information for text subtitles. + */ + AVBufferRef *subtitle_header; + /** * AVBuffer references backing the data for this frame. All the pointers in * data and extended_data must point inside one of the buffers in buf or @@ -750,6 +791,8 @@ void av_frame_move_ref(AVFrame *dst, AVFrame *src); /** * Allocate new buffer(s) for audio or video data. * + * Note: For subtitle data, use av_frame_get_buffer2 + * * The following fields must be set on frame before calling this function: * - format (pixel format for video, sample format for audio) * - width and height for video @@ -769,9 +812,39 @@ void av_frame_move_ref(AVFrame *dst, AVFrame *src); * recommended to pass 0 here unless you know what you are doing. * * @return 0 on success, a negative AVERROR on error. + * + * @deprecated Use @ref av_frame_get_buffer2 instead and set @ref AVFrame.type + * before calling. */ +attribute_deprecated int av_frame_get_buffer(AVFrame *frame, int align); +/** + * Allocate new buffer(s) for audio, video or subtitle data. + * + * The following fields must be set on frame before calling this function: + * - format (pixel format for video, sample format for audio) + * - width and height for video + * - nb_samples and channel_layout for audio + * - type (AVMediaType) + * + * This function will fill AVFrame.data and AVFrame.buf arrays and, if + * necessary, allocate and fill AVFrame.extended_data and AVFrame.extended_buf. + * For planar formats, one buffer will be allocated for each plane. + * + * @warning: if frame already has been allocated, calling this function will + * leak memory. In addition, undefined behavior can occur in certain + * cases. + * + * @param frame frame in which to store the new buffers. + * @param align Required buffer size alignment. If equal to 0, alignment will be + * chosen automatically for the current CPU. It is highly + * recommended to pass 0 here unless you know what you are doing. + * + * @return 0 on success, a negative AVERROR on error. + */ +int av_frame_get_buffer2(AVFrame *frame, int align); + /** * Check if the frame data is writable. * diff --git a/libavutil/subfmt.c b/libavutil/subfmt.c new file mode 100644 index 0000000000..3998a6d486 --- /dev/null +++ b/libavutil/subfmt.c @@ -0,0 +1,50 @@ +/* + * Copyright (c) 2021 softworkz + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include <string.h> +#include "common.h" +#include "subfmt.h" + +typedef struct SubtitleFmtInfo { + enum AVSubtitleType fmt; + const char* name; +} SubtitleFmtInfo; + +static const char sub_fmt_info[AV_SUBTITLE_FMT_NB][24] = { + [AV_SUBTITLE_FMT_UNKNOWN] = "Unknown subtitle format", + [AV_SUBTITLE_FMT_BITMAP] = "Graphical subtitles", + [AV_SUBTITLE_FMT_TEXT] = "Text subtitles (plain)", + [AV_SUBTITLE_FMT_ASS] = "Text subtitles (ass)", +}; + +const char *av_get_subtitle_fmt_name(enum AVSubtitleType sub_fmt) +{ + if (sub_fmt < 0 || sub_fmt >= AV_SUBTITLE_FMT_NB) + return NULL; + return sub_fmt_info[sub_fmt]; +} + +enum AVSubtitleType av_get_subtitle_fmt(const char *name) +{ + for (int i = 0; i < AV_SUBTITLE_FMT_NB; i++) + if (!strcmp(sub_fmt_info[i], name)) + return i; + return AV_SUBTITLE_FMT_NONE; +} diff --git a/libavutil/subfmt.h b/libavutil/subfmt.h index 88078a0d14..4e0250cd23 100644 --- a/libavutil/subfmt.h +++ b/libavutil/subfmt.h @@ -21,6 +21,10 @@ #ifndef AVUTIL_SUBFMT_H #define AVUTIL_SUBFMT_H +#include <stdint.h> + +#include "buffer.h" + enum AVSubtitleType { /** @@ -63,4 +67,48 @@ enum AVSubtitleType { AV_SUBTITLE_FMT_NB, ///< number of subtitle formats, DO NOT USE THIS if you want to link with shared libav* because the number of formats might differ between versions. }; +typedef struct AVSubtitleArea { +#define AV_NUM_BUFFER_POINTERS 1 + + enum AVSubtitleType type; + int flags; + + int x; ///< top left corner of area. + int y; ///< top left corner of area. + int w; ///< width of area. + int h; ///< height of area. + int nb_colors; ///< number of colors in bitmap palette (@ref pal). + + /** + * Buffers and line sizes for the bitmap of this subtitle. + * + * @{ + */ + AVBufferRef *buf[AV_NUM_BUFFER_POINTERS]; + int linesize[AV_NUM_BUFFER_POINTERS]; + /** + * @} + */ + + uint32_t pal[256]; ///< RGBA palette for the bitmap. + + char *text; ///< 0-terminated plain UTF-8 text + char *ass; ///< 0-terminated ASS/SSA compatible event line. + +} AVSubtitleArea; + +/** + * Return the name of sub_fmt, or NULL if sub_fmt is not + * recognized. + */ +const char *av_get_subtitle_fmt_name(enum AVSubtitleType sub_fmt); + +/** + * Return a subtitle format corresponding to name, or AV_SUBTITLE_FMT_NONE + * on error. + * + * @param name Subtitle format name. + */ +enum AVSubtitleType av_get_subtitle_fmt(const char *name); + #endif /* AVUTIL_SUBFMT_H */
Root commit for adding subtitle filtering capabilities. In detail: - Add type (AVMediaType) field to AVFrame Replaces previous way of distinction which was based on checking width and height to determine whether a frame is audio or video - Add subtitle fields to AVFrame - Add new struct AVSubtitleArea, similar to AVSubtitleRect, but different allocation logic. Cannot and must not be used interchangeably, hence the new struct Signed-off-by: softworkz <softworkz@hotmail.com> --- doc/APIchanges | 10 +++ libavutil/Makefile | 1 + libavutil/frame.c | 211 ++++++++++++++++++++++++++++++++++++++++----- libavutil/frame.h | 77 ++++++++++++++++- libavutil/subfmt.c | 50 +++++++++++ libavutil/subfmt.h | 48 +++++++++++ 6 files changed, 373 insertions(+), 24 deletions(-) create mode 100644 libavutil/subfmt.c