diff mbox

[FFmpeg-devel,v2] avcodec/v210enc: add yuv420p/yuv420p10 input pixel format support

Message ID 20190921124235.31729-1-lance.lmwang@gmail.com
State New
Headers show

Commit Message

Lance Wang Sept. 21, 2019, 12:42 p.m. UTC
From: Limin Wang <lance.lmwang@gmail.com>

With the patch, we simply reuse the same source chroma line for each pair
of lines in the output and the yuv420 and yuv420p10 format of the decoder
can go to the v210 encoder without having to touch the pixels at all with
autoscale by swscale filter.

The end effect is swscale filter is running the full scaling algorithm
against a line that doesn't require any scaling or bit depth conversion,
where a simple memcpy() could achieve the same result.

This can improve performance a lot, the following are the benchmark results:
1. yuv420p
./ffmpeg  -benchmark -y -lavfi testsrc2=s=4096x3072:r=10:d=10 -pix_fmt yuv420p -c:v v210 -f null -
master:
frame=  100 fps= 30 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=3.02x
bench: utime=2.762s stime=0.539s rtime=3.308s
bench: maxrss=93372416kB

applied the patch:
frame=  100 fps= 36 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=3.57x
video:3302400kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
bench: utime=2.258s stime=0.536s rtime=2.803s
bench: maxrss=80809984kB

2. yuv420p10
 ./ffmpeg  -benchmark -y -lavfi testsrc2=s=4096x3072:r=10:d=10 -pix_fmt yuv420p10 -c:v v210 -f null -
master:
frame=  100 fps= 26 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=2.61x
video:3302400kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
bench: utime=3.257s stime=0.559s rtime=3.827s
bench: maxrss=152371200kB

applied the patch
frame=  100 fps= 31 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=3.11x
video:3302400kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
bench: utime=2.625s stime=0.573s rtime=3.212s
bench: maxrss=127197184kB

Signed-off-by: Devin Heitmueller <dheitmueller@ltnglobal.com>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
---
 libavcodec/v210_template.c | 20 ++++++++++++++++++++
 libavcodec/v210enc.c       |  8 +++++---
 2 files changed, 25 insertions(+), 3 deletions(-)

Comments

Carl Eugen Hoyos Sept. 21, 2019, 1:13 p.m. UTC | #1
Am Sa., 21. Sept. 2019 um 14:42 Uhr schrieb <lance.lmwang@gmail.com>:
>
> From: Limin Wang <lance.lmwang@gmail.com>
>
> With the patch, we simply reuse the same source chroma line for each pair
> of lines in the output

Is there really no quality hit for a 20% speed-up?

Carl Eugen
Lance Wang Sept. 21, 2019, 2:48 p.m. UTC | #2
On Sat, Sep 21, 2019 at 03:13:46PM +0200, Carl Eugen Hoyos wrote:
> Am Sa., 21. Sept. 2019 um 14:42 Uhr schrieb <lance.lmwang@gmail.com>:
> >
> > From: Limin Wang <lance.lmwang@gmail.com>
> >
> > With the patch, we simply reuse the same source chroma line for each pair
> > of lines in the output
> 
> Is there really no quality hit for a 20% speed-up?

The quality is OK by my testing for the 420 to 422, I have no idea how
to compare with the difference between them, Micheal can give more suggestion on
that.

however the patch try to fix the autoscale, if user prefer to use
swscale conversion, he can use it still by claims pix_fmt clearly.

The following command shows how to use the old conversion by swscale.
no swscale:
./ffmpeg -y -lavfi testsrc2=s=4096x3072:r=10:d=10,format=pix_fmts=yuv420p -c:v v210 -f null -

swscale:
./ffmpeg -y -lavfi testsrc2=s=4096x3072:r=10:d=10,format=pix_fmts=yuv420p -c:v v210 -pix_fmt yuv422p -f null -


Thanks,
Limin
> 
> Carl Eugen
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Carl Eugen Hoyos Sept. 21, 2019, 9:02 p.m. UTC | #3
Am Sa., 21. Sept. 2019 um 16:49 Uhr schrieb Limin Wang <lance.lmwang@gmail.com>:

> however the patch try to fix the autoscale, if user prefer to use
> swscale conversion, he can use it still by claims pix_fmt clearly.

This seems like a very bad argument assuming there is a quality
hit and the speed gain is very limited.

Carl Eugen
diff mbox

Patch

diff --git a/libavcodec/v210_template.c b/libavcodec/v210_template.c
index 9e1d9f9..083a9f1 100644
--- a/libavcodec/v210_template.c
+++ b/libavcodec/v210_template.c
@@ -43,11 +43,31 @@  static void RENAME(v210_enc)(AVCodecContext *avctx,
     const TYPE *y = (const TYPE *)pic->data[0];
     const TYPE *u = (const TYPE *)pic->data[1];
     const TYPE *v = (const TYPE *)pic->data[2];
+    const TYPE *u_even = u;
+    const TYPE *v_even = v;
     const int sample_size = 6 * s->RENAME(sample_factor);
     const int sample_w    = avctx->width / sample_size;
 
     for (h = 0; h < avctx->height; h++) {
         uint32_t val;
+
+        if (pic->format == AV_PIX_FMT_YUV420P10 ||
+            pic->format == AV_PIX_FMT_YUV420P) {
+            int mod = pic->interlaced_frame == 1 ? 4 : 2;
+            if (h % mod == 0) {
+                u_even = u;
+                v_even = v;
+            } else {
+                /* progressive chroma */
+                if (mod == 2) {
+                    u = u_even;
+                    v = v_even;
+                } else if (h % 4 == 2) {
+                    u = u_even;
+                    v = v_even;
+                }
+            }
+        }
         w = sample_w * sample_size;
         s->RENAME(pack_line)(y, u, v, dst, w);
 
diff --git a/libavcodec/v210enc.c b/libavcodec/v210enc.c
index 16e8810..2180737 100644
--- a/libavcodec/v210enc.c
+++ b/libavcodec/v210enc.c
@@ -131,9 +131,9 @@  static int encode_frame(AVCodecContext *avctx, AVPacket *pkt,
     }
     dst = pkt->data;
 
-    if (pic->format == AV_PIX_FMT_YUV422P10)
+    if (pic->format == AV_PIX_FMT_YUV422P10 || pic->format == AV_PIX_FMT_YUV420P10)
         v210_enc_10(avctx, dst, pic);
-    else if(pic->format == AV_PIX_FMT_YUV422P)
+    else if(pic->format == AV_PIX_FMT_YUV422P || pic->format == AV_PIX_FMT_YUV420P)
         v210_enc_8(avctx, dst, pic);
 
     side_data = av_frame_get_side_data(pic, AV_FRAME_DATA_A53_CC);
@@ -165,5 +165,7 @@  AVCodec ff_v210_encoder = {
     .priv_data_size = sizeof(V210EncContext),
     .init           = encode_init,
     .encode2        = encode_frame,
-    .pix_fmts       = (const enum AVPixelFormat[]){ AV_PIX_FMT_YUV422P10, AV_PIX_FMT_YUV422P, AV_PIX_FMT_NONE },
+    .pix_fmts       = (const enum AVPixelFormat[]){ AV_PIX_FMT_YUV422P10, AV_PIX_FMT_YUV422P,
+                                                    AV_PIX_FMT_YUV420P10, AV_PIX_FMT_YUV420P,
+                                                    AV_PIX_FMT_NONE },
 };