diff mbox

[FFmpeg-devel,v1] avcodec/v210enc: add yuv420p/yuv420p10 input pixel format support

Message ID 20190920155517.5854-1-lance.lmwang@gmail.com
State Superseded
Headers show

Commit Message

Lance Wang Sept. 20, 2019, 3:55 p.m. UTC
From: Limin Wang <lance.lmwang@gmail.com>

Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
---
 libavcodec/v210_template.c | 20 ++++++++++++++++++++
 libavcodec/v210enc.c       |  8 +++++---
 2 files changed, 25 insertions(+), 3 deletions(-)

Comments

Michael Niedermayer Sept. 20, 2019, 4:10 p.m. UTC | #1
On Fri, Sep 20, 2019 at 11:55:17PM +0800, lance.lmwang@gmail.com wrote:
> From: Limin Wang <lance.lmwang@gmail.com>
> 
> Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
> ---
>  libavcodec/v210_template.c | 20 ++++++++++++++++++++
>  libavcodec/v210enc.c       |  8 +++++---
>  2 files changed, 25 insertions(+), 3 deletions(-)

Adding a nearest neighbor scaler or in fact any scaler
into an encoder is not ok

This belongs in swscale and it is already there.

Thanks

[...]
Devin Heitmueller Sept. 20, 2019, 7:19 p.m. UTC | #2
Hello Michael,


> On Sep 20, 2019, at 12:10 PM, Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> On Fri, Sep 20, 2019 at 11:55:17PM +0800, lance.lmwang@gmail.com wrote:
>> From: Limin Wang <lance.lmwang@gmail.com>
>> 
>> Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
>> ---
>> libavcodec/v210_template.c | 20 ++++++++++++++++++++
>> libavcodec/v210enc.c       |  8 +++++---
>> 2 files changed, 25 insertions(+), 3 deletions(-)
> 
> Adding a nearest neighbor scaler or in fact any scaler
> into an encoder is not ok
> 
> This belongs in swscale and it is already there.


Just to be clear, there is no scaling going on here.  The patch just expands 4:2:0 to 4:2:2 while properly supporting interlaced chroma.  It avoids having to auto insert the swscale filter in the case where there is no scaling required (e.g. H.264 4:2:0 video being output to decklink in its original resolution).

Regards,

Devin
Lance Wang Sept. 20, 2019, 11:53 p.m. UTC | #3
On Fri, Sep 20, 2019 at 06:10:59PM +0200, Michael Niedermayer wrote:
> On Fri, Sep 20, 2019 at 11:55:17PM +0800, lance.lmwang@gmail.com wrote:
> > From: Limin Wang <lance.lmwang@gmail.com>
> > 
> > Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
> > ---
> >  libavcodec/v210_template.c | 20 ++++++++++++++++++++
> >  libavcodec/v210enc.c       |  8 +++++---
> >  2 files changed, 25 insertions(+), 3 deletions(-)
> 
> Adding a nearest neighbor scaler or in fact any scaler
> into an encoder is not ok
> 
> This belongs in swscale and it is already there.

It's try to improve for the performance, below is the benchmark result:
./ffmpeg  -benchmark -y -lavfi testsrc2=s=4096x3072:r=10:d=10 -pix_fmt yuv420p -c:v v210 -f null -
master:
[Parsed_testsrc2_0 @ 0x7fd922f01940] EOF timestamp not reliableA speed=3.06x
frame=  100 fps= 30 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=3.02x
bench: utime=2.762s stime=0.539s rtime=3.308s
bench: maxrss=93372416kB

with the patch:
frame=  100 fps= 36 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=3.57x
video:3302400kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
bench: utime=2.258s stime=0.536s rtime=2.803s
bench: maxrss=80809984kB

 ./ffmpeg  -benchmark -y -lavfi testsrc2=s=4096x3072:r=10:d=10 -pix_fmt yuv420p10 -c:v v210 -f null -
master:
frame=  100 fps= 26 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=2.61x
video:3302400kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
bench: utime=3.257s stime=0.559s rtime=3.827s
bench: maxrss=152371200kB

with the patch
frame=  100 fps= 31 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=3.11x
video:3302400kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
bench: utime=2.625s stime=0.573s rtime=3.212s
bench: maxrss=127197184kB


> 
> Thanks
> 
> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> No snowflake in an avalanche ever feels responsible. -- Voltaire



> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Michael Niedermayer Sept. 21, 2019, 8:44 p.m. UTC | #4
On Fri, Sep 20, 2019 at 03:19:52PM -0400, Devin Heitmueller wrote:
> Hello Michael,
> 
> 
> > On Sep 20, 2019, at 12:10 PM, Michael Niedermayer <michael@niedermayer.cc> wrote:
> > 
> > On Fri, Sep 20, 2019 at 11:55:17PM +0800, lance.lmwang@gmail.com wrote:
> >> From: Limin Wang <lance.lmwang@gmail.com>
> >> 
> >> Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
> >> ---
> >> libavcodec/v210_template.c | 20 ++++++++++++++++++++
> >> libavcodec/v210enc.c       |  8 +++++---
> >> 2 files changed, 25 insertions(+), 3 deletions(-)
> > 
> > Adding a nearest neighbor scaler or in fact any scaler
> > into an encoder is not ok
> > 
> > This belongs in swscale and it is already there.
> 
> 
> Just to be clear, there is no scaling going on here. 

> The patch just expands 4:2:0 to 4:2:2 while properly supporting interlaced chroma.  

4:2:0 and 4:2:2 have a chroma plane with different resolution.
converting between planes of different resolution is what i called scaling.


> It avoids having to auto insert the swscale filter in the case where there is no scaling required (e.g. H.264 4:2:0 video being output to decklink in its original resolution).

yes, doing an operation in the encoder avoids a filter being inserted which
does that operation.
Thats true for every encoder and every filter.

Also replacing interpolation by a nearest neighbor implementation
is quite expectedly faster.

one problem is
the user can setup the scale filter with high quality in mind or with 
low quality and speed in mind.
But after this patch she always gets low quality because the low quality
convertion code is hardcoded in the encoder which pretends to support 420.
The outside code has no chance to know it shouldnt feed 420 if high quality
is wanted.

Also why should this be in one encoder and not be available to other
encoders supporting 4:2:2 input ?
A solution should work for all of them

Iam not sure what is the best solution but simply hardcoding this in
one encoder feels rather wrong

Thanks
Devin Heitmueller Sept. 22, 2019, 2:49 a.m. UTC | #5
> On Sep 21, 2019, at 4:44 PM, Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
>> The patch just expands 4:2:0 to 4:2:2 while properly supporting interlaced chroma.  
> 
> 4:2:0 and 4:2:2 have a chroma plane with different resolution.
> converting between planes of different resolution is what i called scaling.
> 
> 
>> It avoids having to auto insert the swscale filter in the case where there is no scaling required (e.g. H.264 4:2:0 video being output to decklink in its original resolution).
> 
> yes, doing an operation in the encoder avoids a filter being inserted which
> does that operation.
> Thats true for every encoder and every filter.

The key thing here is the encoder is already touching every pixel, so avoiding having the need for the filter essentially allows the conversion to happen at essentially zero cost (as we repack the pixels into the requisite v210 layout).

> Also replacing interpolation by a nearest neighbor implementation
> is quite expectedly faster.

Yes, and we can certainly argue about whether doing interpolation of chroma when doing 4:2:0 to 4:2:2 actually has any visible benefit.  I can however say the cost of having swscaler in the pipeline is considerable.  In fact I didn’t appreciate it myself until I was trying to deliver 1080p60 in realtime to four decklink outputs and couldn’t keep up on my target platform.  And because filters generally aren’t threaded, I got hit with one of those cases where I had to break out the profiler and ask “why on Earth is the main ffmpeg thread so busy?"


> one problem is
> the user can setup the scale filter with high quality in mind or with 
> low quality and speed in mind.
> But after this patch she always gets low quality because the low quality
> convertion code is hardcoded in the encoder which pretends to support 420.
> The outside code has no chance to know it shouldnt feed 420 if high quality
> is wanted.

The user can still insert a scaler explicitly or use the pix_fmt argument so the format filter gets put into the pipeline.

> 
> Also why should this be in one encoder and not be available to other
> encoders supporting 4:2:2 input ?
> A solution should work for all of them

I would assume this would really only be helpful in encoders which only support 4:2:2 and not 4:2:0, since typical encoders that accept 4:2:0 would preserve that in their resulting encoding (i.e. they wouldn’t blindly upscale 4:2:0 to 4:2:2 for no good reason).

I did actually consider doing a separate filter which just does packed/planer conversion and 4:2:0 to 4:2:2 (as opposed to swscaler).  In this case though the additional modularity in such a filter was outweighed by my goal to minimize the number of times I’m copying the frame data.  Combining it with the v210 encoding meant only a single pass over the data.

> 
> Iam not sure what is the best solution but simply hardcoding this in
> one encoder feels rather wrong

The scale filter performs three basic roles:
1.  Scaling
2.  Packed to planer conversion (or vice versa)
3.  Colorspace conversion

I supposed potentially someone could redesign swscale to include the option to not take the slow path for cases where scaling isn’t actually required (i.e. cases where only 2 and 3 are needed).

Just so we’re all on the same page - this wasn’t a case of random or premature optimization.  I have a specific use case where I’m decoding four instances of 1080p60 video and the platform can’t keep up without this change.  It’s the result of actually profiling the entire pipeline as opposed to some unit test with a benchmark.  In fact I don’t particularly agree with Limin's numbers where he used the benchmark option, since that fails to take into account caching behavior or memory bandwidth on a platform which is constrained (a problem which is exacerbated when running multiple instances).  In a perfect world we would have very small operations which each perform some discrete function, and we can combine all of those in a pipeline.  In the real world though, significant benefits can be gained by combining certain operations to avoid copying the same pixels over and over again.

Devin
Paul B Mahol Sept. 22, 2019, 8:05 a.m. UTC | #6
On 9/20/19, lance.lmwang@gmail.com <lance.lmwang@gmail.com> wrote:
> From: Limin Wang <lance.lmwang@gmail.com>
>
> Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
> ---
>  libavcodec/v210_template.c | 20 ++++++++++++++++++++
>  libavcodec/v210enc.c       |  8 +++++---
>  2 files changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/libavcodec/v210_template.c b/libavcodec/v210_template.c
> index 9e1d9f9..083a9f1 100644
> --- a/libavcodec/v210_template.c
> +++ b/libavcodec/v210_template.c
> @@ -43,11 +43,31 @@ static void RENAME(v210_enc)(AVCodecContext *avctx,
>      const TYPE *y = (const TYPE *)pic->data[0];
>      const TYPE *u = (const TYPE *)pic->data[1];
>      const TYPE *v = (const TYPE *)pic->data[2];
> +    const TYPE *u_even = u;
> +    const TYPE *v_even = v;
>      const int sample_size = 6 * s->RENAME(sample_factor);
>      const int sample_w    = avctx->width / sample_size;
>
>      for (h = 0; h < avctx->height; h++) {
>          uint32_t val;
> +
> +        if (pic->format == AV_PIX_FMT_YUV420P10 ||
> +            pic->format == AV_PIX_FMT_YUV420P) {
> +            int mod = pic->interlaced_frame == 1 ? 4 : 2;
> +            if (h % mod == 0) {
> +                u_even = u;
> +                v_even = v;
> +            } else {
> +                /* progressive chroma */
> +                if (mod == 2) {
> +                    u = u_even;
> +                    v = v_even;
> +                } else if (h % 4 == 2) {
> +                    u = u_even;
> +                    v = v_even;
> +                }
> +            }
> +        }
>          w = sample_w * sample_size;
>          s->RENAME(pack_line)(y, u, v, dst, w);
>
> diff --git a/libavcodec/v210enc.c b/libavcodec/v210enc.c
> index 16e8810..2180737 100644
> --- a/libavcodec/v210enc.c
> +++ b/libavcodec/v210enc.c
> @@ -131,9 +131,9 @@ static int encode_frame(AVCodecContext *avctx, AVPacket
> *pkt,
>      }
>      dst = pkt->data;
>
> -    if (pic->format == AV_PIX_FMT_YUV422P10)
> +    if (pic->format == AV_PIX_FMT_YUV422P10 || pic->format ==
> AV_PIX_FMT_YUV420P10)
>          v210_enc_10(avctx, dst, pic);
> -    else if(pic->format == AV_PIX_FMT_YUV422P)
> +    else if(pic->format == AV_PIX_FMT_YUV422P || pic->format ==
> AV_PIX_FMT_YUV420P)
>          v210_enc_8(avctx, dst, pic);
>
>      side_data = av_frame_get_side_data(pic, AV_FRAME_DATA_A53_CC);
> @@ -165,5 +165,7 @@ AVCodec ff_v210_encoder = {
>      .priv_data_size = sizeof(V210EncContext),
>      .init           = encode_init,
>      .encode2        = encode_frame,
> -    .pix_fmts       = (const enum AVPixelFormat[]){ AV_PIX_FMT_YUV422P10,
> AV_PIX_FMT_YUV422P, AV_PIX_FMT_NONE },
> +    .pix_fmts       = (const enum AVPixelFormat[]){ AV_PIX_FMT_YUV422P10,
> AV_PIX_FMT_YUV422P,
> +                                                    AV_PIX_FMT_YUV420P10,
> AV_PIX_FMT_YUV420P,
> +                                                    AV_PIX_FMT_NONE },
>  };
> --
> 2.6.4
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".


Why people constantly try to do this? First cinepak, now this?

Obviously not accepted.
Lance Wang Sept. 22, 2019, 12:35 p.m. UTC | #7
On Sun, Sep 22, 2019 at 10:05:10AM +0200, Paul B Mahol wrote:
> On 9/20/19, lance.lmwang@gmail.com <lance.lmwang@gmail.com> wrote:
> > From: Limin Wang <lance.lmwang@gmail.com>
> >
> > Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
> > ---
> >  libavcodec/v210_template.c | 20 ++++++++++++++++++++
> >  libavcodec/v210enc.c       |  8 +++++---
> >  2 files changed, 25 insertions(+), 3 deletions(-)
> >
> > diff --git a/libavcodec/v210_template.c b/libavcodec/v210_template.c
> > index 9e1d9f9..083a9f1 100644
> > --- a/libavcodec/v210_template.c
> > +++ b/libavcodec/v210_template.c
> > @@ -43,11 +43,31 @@ static void RENAME(v210_enc)(AVCodecContext *avctx,
> >      const TYPE *y = (const TYPE *)pic->data[0];
> >      const TYPE *u = (const TYPE *)pic->data[1];
> >      const TYPE *v = (const TYPE *)pic->data[2];
> > +    const TYPE *u_even = u;
> > +    const TYPE *v_even = v;
> >      const int sample_size = 6 * s->RENAME(sample_factor);
> >      const int sample_w    = avctx->width / sample_size;
> >
> >      for (h = 0; h < avctx->height; h++) {
> >          uint32_t val;
> > +
> > +        if (pic->format == AV_PIX_FMT_YUV420P10 ||
> > +            pic->format == AV_PIX_FMT_YUV420P) {
> > +            int mod = pic->interlaced_frame == 1 ? 4 : 2;
> > +            if (h % mod == 0) {
> > +                u_even = u;
> > +                v_even = v;
> > +            } else {
> > +                /* progressive chroma */
> > +                if (mod == 2) {
> > +                    u = u_even;
> > +                    v = v_even;
> > +                } else if (h % 4 == 2) {
> > +                    u = u_even;
> > +                    v = v_even;
> > +                }
> > +            }
> > +        }
> >          w = sample_w * sample_size;
> >          s->RENAME(pack_line)(y, u, v, dst, w);
> >
> > diff --git a/libavcodec/v210enc.c b/libavcodec/v210enc.c
> > index 16e8810..2180737 100644
> > --- a/libavcodec/v210enc.c
> > +++ b/libavcodec/v210enc.c
> > @@ -131,9 +131,9 @@ static int encode_frame(AVCodecContext *avctx, AVPacket
> > *pkt,
> >      }
> >      dst = pkt->data;
> >
> > -    if (pic->format == AV_PIX_FMT_YUV422P10)
> > +    if (pic->format == AV_PIX_FMT_YUV422P10 || pic->format ==
> > AV_PIX_FMT_YUV420P10)
> >          v210_enc_10(avctx, dst, pic);
> > -    else if(pic->format == AV_PIX_FMT_YUV422P)
> > +    else if(pic->format == AV_PIX_FMT_YUV422P || pic->format ==
> > AV_PIX_FMT_YUV420P)
> >          v210_enc_8(avctx, dst, pic);
> >
> >      side_data = av_frame_get_side_data(pic, AV_FRAME_DATA_A53_CC);
> > @@ -165,5 +165,7 @@ AVCodec ff_v210_encoder = {
> >      .priv_data_size = sizeof(V210EncContext),
> >      .init           = encode_init,
> >      .encode2        = encode_frame,
> > -    .pix_fmts       = (const enum AVPixelFormat[]){ AV_PIX_FMT_YUV422P10,
> > AV_PIX_FMT_YUV422P, AV_PIX_FMT_NONE },
> > +    .pix_fmts       = (const enum AVPixelFormat[]){ AV_PIX_FMT_YUV422P10,
> > AV_PIX_FMT_YUV422P,
> > +                                                    AV_PIX_FMT_YUV420P10,
> > AV_PIX_FMT_YUV420P,
> > +                                                    AV_PIX_FMT_NONE },
> >  };
> > --
> > 2.6.4
> >
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> 
> 
> Why people constantly try to do this? First cinepak, now this?
> 
> Obviously not accepted.

This means that several of your big-name experts should consider the global design,
improve the current system framework, and solve the actual needs, rather than simply
rejecting the performance improvement of different modules. I believe that the kernel
zero copy technology is also forced to come out like this. Technology is to change 
the world, not to be perfect.
Kieran Kunhya Sept. 22, 2019, 4:18 p.m. UTC | #8
>
> This means that several of your big-name experts should consider the
> global design,
> improve the current system framework, and solve the actual needs, rather
> than simply
> rejecting the performance improvement of different modules. I believe that
> the kernel
> zero copy technology is also forced to come out like this. Technology is
> to change
> the world, not to be perfect.
>

And just like in the kernel, zero copy technology is maintained outside.
Good design is separation of different functionality, otherwise the
combinations of codepaths is exponentiated.

Patch rejected

Regards,
Kieran Kunhya
Michael Niedermayer Sept. 22, 2019, 4:59 p.m. UTC | #9
On Sat, Sep 21, 2019 at 10:49:21PM -0400, Devin Heitmueller wrote:
> 
> > On Sep 21, 2019, at 4:44 PM, Michael Niedermayer <michael@niedermayer.cc> wrote:
> > 
> >> The patch just expands 4:2:0 to 4:2:2 while properly supporting interlaced chroma.  
> > 
> > 4:2:0 and 4:2:2 have a chroma plane with different resolution.
> > converting between planes of different resolution is what i called scaling.
> > 
> > 
> >> It avoids having to auto insert the swscale filter in the case where there is no scaling required (e.g. H.264 4:2:0 video being output to decklink in its original resolution).
> > 
> > yes, doing an operation in the encoder avoids a filter being inserted which
> > does that operation.
> > Thats true for every encoder and every filter.
> 
> The key thing here is the encoder is already touching every pixel, so avoiding having the need for the filter essentially allows the conversion to happen at essentially zero cost (as we repack the pixels into the requisite v210 layout).
> 
> > Also replacing interpolation by a nearest neighbor implementation
> > is quite expectedly faster.
> 
> Yes, and we can certainly argue about whether doing interpolation of chroma when doing 4:2:0 to 4:2:2 actually has any visible benefit.  I can however say the cost of having swscaler in the pipeline is considerable.  In fact I didn’t appreciate it myself until I was trying to deliver 1080p60 in realtime to four decklink outputs and couldn’t keep up on my target platform.  And because filters generally aren’t threaded, I got hit with one of those cases where I had to break out the profiler and ask “why on Earth is the main ffmpeg thread so busy?"
> 
> 
> > one problem is
> > the user can setup the scale filter with high quality in mind or with 
> > low quality and speed in mind.
> > But after this patch she always gets low quality because the low quality
> > convertion code is hardcoded in the encoder which pretends to support 420.
> > The outside code has no chance to know it shouldnt feed 420 if high quality
> > is wanted.
> 

> The user can still insert a scaler explicitly or use the pix_fmt argument so the format filter gets put into the pipeline.

The problem is the user first has to know about the "in encoder convert" and
fully understand the implications. Only then can she insert a scaler and 
format filter manually (unless she is copy and pasting this from somewhere).
That is alot of knowledge for the average user


> 
> > 
> > Also why should this be in one encoder and not be available to other
> > encoders supporting 4:2:2 input ?
> > A solution should work for all of them
> 
> I would assume this would really only be helpful in encoders which only support 4:2:2 and not 4:2:0, since typical encoders that accept 4:2:0 would preserve that in their resulting encoding (i.e. they wouldn’t blindly upscale 4:2:0 to 4:2:2 for no good reason).

Maybe. But then there are specifications that demand 4:2:2 in cases where the
underlaying encoder supports 4:2:0. In these cases 4:2:2 would need to be
delivered and delivering 4:2:0 directly to the encoder would not produce
a compliant result (with current encoders)


> 
> I did actually consider doing a separate filter which just does packed/planer conversion and 4:2:0 to 4:2:2 (as opposed to swscaler).  In this case though the additional modularity in such a filter was outweighed by my goal to minimize the number of times I’m copying the frame data.  Combining it with the v210 encoding meant only a single pass over the data.
> 
> > 
> > Iam not sure what is the best solution but simply hardcoding this in
> > one encoder feels rather wrong
> 
> The scale filter performs three basic roles:
> 1.  Scaling
> 2.  Packed to planer conversion (or vice versa)
> 3.  Colorspace conversion
> 

> I supposed potentially someone could redesign swscale to include the option to not take the slow path for cases where scaling isn’t actually required (i.e. cases where only 2 and 3 are needed).

For this look at ff_get_unscaled_swscale() a new 420->422 scaler can easily be
added in there, it would not be 0 copy though. But it certainly would
make sense for the cases where this codepath i used


> 
> Just so we’re all on the same page - this wasn’t a case of random or premature optimization.  I have a specific use case where I’m decoding four instances of 1080p60 video and the platform can’t keep up without this change.  It’s the result of actually profiling the entire pipeline as opposed to some unit test with a benchmark.  In fact I don’t particularly agree with Limin's numbers where he used the benchmark option, since that fails to take into account caching behavior or memory bandwidth on a platform which is constrained (a problem which is exacerbated when running multiple instances).  In a perfect world we would have very small operations which each perform some discrete function, and we can combine all of those in a pipeline.  In the real world though, significant benefits can be gained by combining certain operations to avoid copying the same pixels over and over again.

I certainly see the benefit this 0copy422to420 has. And iam not
against such an optimization. But it needs to be done in a
cleaner way. Iam not sure how to do that ATM.

There possibly could be a function which produces a list of line pointers
so that for a image for each plane and line a pointer is produced.
and encoders which use this would have a capability flag set.

This would then mean that a encoder supporting 422 input
could be fed with 420 and a encoder supporting 420 could be
fed with 422 converted by just different values in the pointer list.

Also this feature would be known from the outside by the flag and
be controlled by the outside. such a encoder would never claim support of
other pixel formats

Thanks

[...]
Derek Buitenhuis Sept. 22, 2019, 6:15 p.m. UTC | #10
On 22/09/2019 17:18, Kieran Kunhya wrote:
> And just like in the kernel, zero copy technology is maintained outside.
> Good design is separation of different functionality, otherwise the
> combinations of codepaths is exponentiated.

Agreed with everyone else here, this code doesn't belong in an encoder.

libavcodec is a general use codec library, and this doesn't fit that
design or philosophy at all. It's hideous.

- Derek
Lance Wang Sept. 23, 2019, 1:09 a.m. UTC | #11
On Sun, Sep 22, 2019 at 07:15:53PM +0100, Derek Buitenhuis wrote:
> On 22/09/2019 17:18, Kieran Kunhya wrote:
> > And just like in the kernel, zero copy technology is maintained outside.
> > Good design is separation of different functionality, otherwise the
> > combinations of codepaths is exponentiated.
> 
> Agreed with everyone else here, this code doesn't belong in an encoder.
> 
> libavcodec is a general use codec library, and this doesn't fit that
> design or philosophy at all. It's hideous.

I agree with your guys. Can I submit a patch to remove AV_PIX_FMT_YUV422P, it's not belong
to v210 encoder also?  It make me misunderstanding it's acceptable to add other format.


> 
> - Derek
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Kieran Kunhya Sept. 23, 2019, 7:36 a.m. UTC | #12
>
> I agree with your guys. Can I submit a patch to remove AV_PIX_FMT_YUV422P,
> it's not belong
> to v210 encoder also?  It make me misunderstanding it's acceptable to add
> other format.
>

Nope, YUV422P is perfectly fine, it's the correct pixel format, there are
no conversions from another pixel format going on.
There's only one way to convert YUV422P to v210, there are numerous ways of
converting YUV420P to v210.

Kieran
diff mbox

Patch

diff --git a/libavcodec/v210_template.c b/libavcodec/v210_template.c
index 9e1d9f9..083a9f1 100644
--- a/libavcodec/v210_template.c
+++ b/libavcodec/v210_template.c
@@ -43,11 +43,31 @@  static void RENAME(v210_enc)(AVCodecContext *avctx,
     const TYPE *y = (const TYPE *)pic->data[0];
     const TYPE *u = (const TYPE *)pic->data[1];
     const TYPE *v = (const TYPE *)pic->data[2];
+    const TYPE *u_even = u;
+    const TYPE *v_even = v;
     const int sample_size = 6 * s->RENAME(sample_factor);
     const int sample_w    = avctx->width / sample_size;
 
     for (h = 0; h < avctx->height; h++) {
         uint32_t val;
+
+        if (pic->format == AV_PIX_FMT_YUV420P10 ||
+            pic->format == AV_PIX_FMT_YUV420P) {
+            int mod = pic->interlaced_frame == 1 ? 4 : 2;
+            if (h % mod == 0) {
+                u_even = u;
+                v_even = v;
+            } else {
+                /* progressive chroma */
+                if (mod == 2) {
+                    u = u_even;
+                    v = v_even;
+                } else if (h % 4 == 2) {
+                    u = u_even;
+                    v = v_even;
+                }
+            }
+        }
         w = sample_w * sample_size;
         s->RENAME(pack_line)(y, u, v, dst, w);
 
diff --git a/libavcodec/v210enc.c b/libavcodec/v210enc.c
index 16e8810..2180737 100644
--- a/libavcodec/v210enc.c
+++ b/libavcodec/v210enc.c
@@ -131,9 +131,9 @@  static int encode_frame(AVCodecContext *avctx, AVPacket *pkt,
     }
     dst = pkt->data;
 
-    if (pic->format == AV_PIX_FMT_YUV422P10)
+    if (pic->format == AV_PIX_FMT_YUV422P10 || pic->format == AV_PIX_FMT_YUV420P10)
         v210_enc_10(avctx, dst, pic);
-    else if(pic->format == AV_PIX_FMT_YUV422P)
+    else if(pic->format == AV_PIX_FMT_YUV422P || pic->format == AV_PIX_FMT_YUV420P)
         v210_enc_8(avctx, dst, pic);
 
     side_data = av_frame_get_side_data(pic, AV_FRAME_DATA_A53_CC);
@@ -165,5 +165,7 @@  AVCodec ff_v210_encoder = {
     .priv_data_size = sizeof(V210EncContext),
     .init           = encode_init,
     .encode2        = encode_frame,
-    .pix_fmts       = (const enum AVPixelFormat[]){ AV_PIX_FMT_YUV422P10, AV_PIX_FMT_YUV422P, AV_PIX_FMT_NONE },
+    .pix_fmts       = (const enum AVPixelFormat[]){ AV_PIX_FMT_YUV422P10, AV_PIX_FMT_YUV422P,
+                                                    AV_PIX_FMT_YUV420P10, AV_PIX_FMT_YUV420P,
+                                                    AV_PIX_FMT_NONE },
 };