[FFmpeg-devel] yadif frame doubling - incorrect closed captioning

Submitted by Gabriel Blanchard on Jan. 13, 2019, 10:26 p.m.

Details

Message ID CACW-bOJOVSGb-7StwLot3LSJMuKbSY=tBrTtJ1v2SLMj7WLTjg@mail.gmail.com
State New
Headers show

Commit Message

Gabriel Blanchard Jan. 13, 2019, 10:26 p.m.
When frame doubling using yadif/bwdif closed captioning gets copied to the
second frame - as a result the closed captioning text is garbage.

I've attached a very simple patch that fixes this issue. Very similar to
what vf_fps.c already does around line 253.

-Gabe

Comments

Michael Niedermayer Jan. 14, 2019, 8:31 a.m.
On Sun, Jan 13, 2019 at 05:26:49PM -0500, Gabriel Blanchard wrote:
> When frame doubling using yadif/bwdif closed captioning gets copied to the
> second frame - as a result the closed captioning text is garbage.
> 
> I've attached a very simple patch that fixes this issue. Very similar to
> what vf_fps.c already does around line 253.
> 
> -Gabe

>  yadif_common.c |    3 +++
>  1 file changed, 3 insertions(+)
> b01ce870aa048fbe25c6ae69ec1b611d6a782865  0001-fix-closed-captioning-when-frame-doubling.patch
> From 0f6d3c31842ae33eaa3d5d91600bcd80c9c0a6b9 Mon Sep 17 00:00:00 2001
> From: Gabriel Blanchard <gblanchard@start.ca>
> Date: Sun, 13 Jan 2019 17:10:01 -0500
> Subject: [PATCH 1/1] fix closed captioning when frame doubling
> 
> ---
>  libavfilter/yadif_common.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/libavfilter/yadif_common.c b/libavfilter/yadif_common.c
> index a10cf7a17f..fbb4289b80 100644
> --- a/libavfilter/yadif_common.c
> +++ b/libavfilter/yadif_common.c
> @@ -43,6 +43,9 @@ static int return_frame(AVFilterContext *ctx, int is_second)
>              return AVERROR(ENOMEM);
>  
>          av_frame_copy_props(yadif->out, yadif->cur);
> +        // Don't copy Closed Captioning
> +        av_frame_remove_side_data(yadif->out, AV_FRAME_DATA_A53_CC);

This also applies to AV_FRAME_DATA_MOTION_VECTORS which too become invalid
on a duplicated frame

Thus a new function should be added which does all this, and that then
be used

thx
[...]
Carl Eugen Hoyos Jan. 14, 2019, 11:28 a.m.
2019-01-14 9:31 GMT+01:00, Michael Niedermayer <michael@niedermayer.cc>:
> On Sun, Jan 13, 2019 at 05:26:49PM -0500, Gabriel Blanchard wrote:
>> When frame doubling using yadif/bwdif closed captioning gets copied to
>> the
>> second frame - as a result the closed captioning text is garbage.
>>
>> I've attached a very simple patch that fixes this issue. Very similar to
>> what vf_fps.c already does around line 253.
>>
>> -Gabe
>
>>  yadif_common.c |    3 +++
>>  1 file changed, 3 insertions(+)
>> b01ce870aa048fbe25c6ae69ec1b611d6a782865
>> 0001-fix-closed-captioning-when-frame-doubling.patch
>> From 0f6d3c31842ae33eaa3d5d91600bcd80c9c0a6b9 Mon Sep 17 00:00:00 2001
>> From: Gabriel Blanchard <gblanchard@start.ca>
>> Date: Sun, 13 Jan 2019 17:10:01 -0500
>> Subject: [PATCH 1/1] fix closed captioning when frame doubling
>>
>> ---
>>  libavfilter/yadif_common.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/libavfilter/yadif_common.c b/libavfilter/yadif_common.c
>> index a10cf7a17f..fbb4289b80 100644
>> --- a/libavfilter/yadif_common.c
>> +++ b/libavfilter/yadif_common.c
>> @@ -43,6 +43,9 @@ static int return_frame(AVFilterContext *ctx, int
>> is_second)
>>              return AVERROR(ENOMEM);
>>
>>          av_frame_copy_props(yadif->out, yadif->cur);
>> +        // Don't copy Closed Captioning
>> +        av_frame_remove_side_data(yadif->out, AV_FRAME_DATA_A53_CC);
>
> This also applies to AV_FRAME_DATA_MOTION_VECTORS which
> too become invalid on a duplicated frame

> Thus a new function should be added which does all this, and that then
> be used

Could you explain?
I first thought that you mean a function "remove a particular kind of
side data" but that already exists.
Do you mean a new function that exactly removes the kind of side
data that we decide in advance should always be removed from
duplicated frames? Will this be correct for all use cases? (I don't
doubt it, just asking.)

Carl Eugen
Nicolas George Jan. 14, 2019, 12:12 p.m.
Carl Eugen Hoyos (12019-01-14):
> Do you mean a new function that exactly removes the kind of side
> data that we decide in advance should always be removed from
> duplicated frames?

That seems like a good idea. Applications may want to duplicate frames
too, and they will not be updated when we add a new type of side data.

>		     Will this be correct for all use cases? (I don't
> doubt it, just asking.)

We can add "unsigned flags" to the function prototype just for
extensibility.

Regards,
Carl Eugen Hoyos Jan. 14, 2019, 12:36 p.m.
2019-01-14 13:12 GMT+01:00, Nicolas George <george@nsup.org>:
> Carl Eugen Hoyos (12019-01-14):
>> Do you mean a new function that exactly removes the kind of side
>> data that we decide in advance should always be removed from
>> duplicated frames?
>
> That seems like a good idea. Applications may want to duplicate frames
> too, and they will not be updated when we add a new type of side data.
>
>>		     Will this be correct for all use cases? (I don't
>> doubt it, just asking.)

> We can add "unsigned flags" to the function prototype just for
> extensibility.

Then we would have the same function as now, no?

Carl Eugen
Nicolas George Jan. 14, 2019, 12:51 p.m.
Carl Eugen Hoyos (12019-01-14):
> Then we would have the same function as now, no?

No, the function we have now removes side data based on its type, the
new function would remove it based on circumstances, whatever that means
to be determined at a later point.

Regards,
Devin Heitmueller Jan. 14, 2019, 2:21 p.m.
On Mon, Jan 14, 2019 at 3:31 AM Michael Niedermayer
<michael@niedermayer.cc> wrote:
> Thus a new function should be added which does all this, and that then
> be used

For what it's worth, the fix is actually incorrect both here and in
vf_vps.  When doubling the framerate, the correct behavior is to split
the content across both frames and cut the cc_count in half (it's
actually more tricky than that because certain entries have to be
moved to the front of the array).

That said, using a centralized function is a step in the right
direction.  It should probably be in libavutil, given it will need to
be available to both filters and formats (and potentially codecs as
well if we wanted them to proactively fix cases where they receive an
invalid cc_count).  That said, in order to do the transformation
properly it would need to receive the target framerate as well as be
able to maintain some state (since reducing the framerate requires
content to be cached in order to combine one or more entries).

In short, a centralized function would be good, but we probably need
to think through what the API looks like so we don't have to introduce
a new API in libavutil and then deprecate it once we want to make the
splitting/combining logic work according to the spec.

Devin
Michael Niedermayer Jan. 14, 2019, 3:26 p.m.
On Mon, Jan 14, 2019 at 09:21:10AM -0500, Devin Heitmueller wrote:
> On Mon, Jan 14, 2019 at 3:31 AM Michael Niedermayer
> <michael@niedermayer.cc> wrote:
> > Thus a new function should be added which does all this, and that then
> > be used
> 
> For what it's worth, the fix is actually incorrect both here and in
> vf_vps.  When doubling the framerate, the correct behavior is to split
> the content across both frames and cut the cc_count in half (it's
> actually more tricky than that because certain entries have to be
> moved to the front of the array).

thanks for explaining, i had already suspected that this would become
more complex ...


> 
> That said, using a centralized function is a step in the right
> direction.  It should probably be in libavutil, given it will need to
> be available to both filters and formats (and potentially codecs as
> well if we wanted them to proactively fix cases where they receive an
> invalid cc_count).  That said, in order to do the transformation
> properly it would need to receive the target framerate as well as be
> able to maintain some state (since reducing the framerate requires
> content to be cached in order to combine one or more entries).
> 

> In short, a centralized function would be good, but we probably need
> to think through what the API looks like so we don't have to introduce
> a new API in libavutil and then deprecate it once we want to make the
> splitting/combining logic work according to the spec.

I fear that no matter how hard we try we will likely eventually run
into cases that it cannot handle

also maybe 2 functions would keep this simpler
one to deal with temporal transformations (frame drop, duplicate, interpolate,
combine)
one to deal with spatial transformations, crop, pad, scale, rotate

[...]
Devin Heitmueller Jan. 14, 2019, 4:11 p.m.
> > In short, a centralized function would be good, but we probably need
> > to think through what the API looks like so we don't have to introduce
> > a new API in libavutil and then deprecate it once we want to make the
> > splitting/combining logic work according to the spec.
>
> I fear that no matter how hard we try we will likely eventually run
> into cases that it cannot handle

Oh, I completely agree.  Some captions are just going to be too
screwed up to render.  However I think there are cases we can recover
from, and in particular I would like to make sure that streams created
with versions of ffmpeg before this patch continue to play properly
(i.e. where the stream only has caption data on every other frame and
the cc_count is 2x what it is supposed to be).

> also maybe 2 functions would keep this simpler
> one to deal with temporal transformations (frame drop, duplicate, interpolate,
> combine)
> one to deal with spatial transformations, crop, pad, scale, rotate

There shouldn't be any need for a function for spatial
transformations.  The expected cc_count is unrelated to the resolution
of the video.  It's tied exclusively to the framerate and whether you
are doing frame or field-level encoding when the video is interlaced.
The underlying goal was to ensure that the bitrate of the caption data
is a constant 9600bps and thus they wanted it to be spread evenly
across the frames/fields.  When people encounter problems going from
1080i to 720p for example, it's because that also involves framerate
conversion, not because the resolution of the individual video frames
is being changed.

Devin
Michael Niedermayer Jan. 14, 2019, 5:45 p.m.
On Mon, Jan 14, 2019 at 11:11:59AM -0500, Devin Heitmueller wrote:
> > > In short, a centralized function would be good, but we probably need
> > > to think through what the API looks like so we don't have to introduce
> > > a new API in libavutil and then deprecate it once we want to make the
> > > splitting/combining logic work according to the spec.
> >
> > I fear that no matter how hard we try we will likely eventually run
> > into cases that it cannot handle
> 
> Oh, I completely agree.  Some captions are just going to be too
> screwed up to render.  However I think there are cases we can recover
> from, and in particular I would like to make sure that streams created
> with versions of ffmpeg before this patch continue to play properly
> (i.e. where the stream only has caption data on every other frame and
> the cc_count is 2x what it is supposed to be).
> 
> > also maybe 2 functions would keep this simpler
> > one to deal with temporal transformations (frame drop, duplicate, interpolate,
> > combine)
> > one to deal with spatial transformations, crop, pad, scale, rotate
> 
> There shouldn't be any need for a function for spatial
> transformations.  The expected cc_count is unrelated to the resolution
> of the video.  It's tied exclusively to the framerate and whether you
> are doing frame or field-level encoding when the video is interlaced.
> The underlying goal was to ensure that the bitrate of the caption data
> is a constant 9600bps and thus they wanted it to be spread evenly
> across the frames/fields.  When people encounter problems going from
> 1080i to 720p for example, it's because that also involves framerate
> conversion, not because the resolution of the individual video frames
> is being changed.

well, there are other types of side data too. for example motion
vectors would need to be updated on crop depending on which rectangle
is croped out

[...]
Devin Heitmueller Jan. 14, 2019, 6:43 p.m.
On Mon, Jan 14, 2019 at 12:45 PM Michael Niedermayer
<michael@niedermayer.cc> wrote:
> well, there are other types of side data too. for example motion
> vectors would need to be updated on crop depending on which rectangle
> is croped out

Ah, yes, of course.  I thought we were just discussing captions.

So are you proposing a single general utility function which you would
pass AVFrames, and it would perform changes to any side data that is
present (e.g. captions, MVs, etc)?  I had assumed we would introduce a
function specific to captions, which can be reused without having an
underlying AVFrame.

Devin
Michael Niedermayer Jan. 14, 2019, 7:22 p.m.
On Mon, Jan 14, 2019 at 01:43:45PM -0500, Devin Heitmueller wrote:
> On Mon, Jan 14, 2019 at 12:45 PM Michael Niedermayer
> <michael@niedermayer.cc> wrote:
> > well, there are other types of side data too. for example motion
> > vectors would need to be updated on crop depending on which rectangle
> > is croped out
> 
> Ah, yes, of course.  I thought we were just discussing captions.
> 
> So are you proposing a single general utility function which you would
> pass AVFrames, and it would perform changes to any side data that is
> present (e.g. captions, MVs, etc)?  I had assumed we would introduce a
> function specific to captions, which can be reused without having an
> underlying AVFrame.

we need to update all side data that requires an update so a function
doing that is needed
we can of course have a seperate function which does the update for
captions. I think that could be a internal/static function though

thx

[...]

Patch hide | download patch | download mbox

From 0f6d3c31842ae33eaa3d5d91600bcd80c9c0a6b9 Mon Sep 17 00:00:00 2001
From: Gabriel Blanchard <gblanchard@start.ca>
Date: Sun, 13 Jan 2019 17:10:01 -0500
Subject: [PATCH 1/1] fix closed captioning when frame doubling

---
 libavfilter/yadif_common.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libavfilter/yadif_common.c b/libavfilter/yadif_common.c
index a10cf7a17f..fbb4289b80 100644
--- a/libavfilter/yadif_common.c
+++ b/libavfilter/yadif_common.c
@@ -43,6 +43,9 @@  static int return_frame(AVFilterContext *ctx, int is_second)
             return AVERROR(ENOMEM);
 
         av_frame_copy_props(yadif->out, yadif->cur);
+        // Don't copy Closed Captioning
+        av_frame_remove_side_data(yadif->out, AV_FRAME_DATA_A53_CC);
+
         yadif->out->interlaced_frame = 0;
         if (yadif->current_field == YADIF_FIELD_BACK_END)
             yadif->current_field = YADIF_FIELD_END;
-- 
2.19.0