Message ID | 1567007116-9088-3-git-send-email-lance.lmwang@gmail.com |
---|---|
State | Superseded |
Headers | show |
Am Mi., 28. Aug. 2019 um 17:46 Uhr schrieb <lance.lmwang@gmail.com>: > > From: Limin Wang <lance.lmwang@gmail.com> > > Signed-off-by: Limin Wang <lance.lmwang@gmail.com> > --- > libavcodec/v210enc.c | 7 +------ > 1 file changed, 1 insertion(+), 6 deletions(-) > > diff --git a/libavcodec/v210enc.c b/libavcodec/v210enc.c > index 1b840b2..69a2efe 100644 > --- a/libavcodec/v210enc.c > +++ b/libavcodec/v210enc.c > @@ -43,12 +43,7 @@ static void v210_planar_pack_8_c(const uint8_t *y, const uint8_t *u, > uint32_t val; > int i; > > - /* unroll this to match the assembly */ > - for (i = 0; i < width - 11; i += 12) { > - WRITE_PIXELS(u, y, v, 8); > - WRITE_PIXELS(y, u, y, 8); > - WRITE_PIXELS(v, y, u, 8); > - WRITE_PIXELS(y, v, y, 8); > + for (i = 0; i < width - 5; i += 6) { > WRITE_PIXELS(u, y, v, 8); > WRITE_PIXELS(y, u, y, 8); > WRITE_PIXELS(v, y, u, 8); Doesn't this have a performance impact? Carl Eugen
On Wed, Aug 28, 2019 at 08:36:14PM +0200, Carl Eugen Hoyos wrote: > Am Mi., 28. Aug. 2019 um 17:46 Uhr schrieb <lance.lmwang@gmail.com>: > > > > From: Limin Wang <lance.lmwang@gmail.com> > > > > Signed-off-by: Limin Wang <lance.lmwang@gmail.com> > > --- > > libavcodec/v210enc.c | 7 +------ > > 1 file changed, 1 insertion(+), 6 deletions(-) > > > > diff --git a/libavcodec/v210enc.c b/libavcodec/v210enc.c > > index 1b840b2..69a2efe 100644 > > --- a/libavcodec/v210enc.c > > +++ b/libavcodec/v210enc.c > > @@ -43,12 +43,7 @@ static void v210_planar_pack_8_c(const uint8_t *y, const uint8_t *u, > > uint32_t val; > > int i; > > > > - /* unroll this to match the assembly */ > > - for (i = 0; i < width - 11; i += 12) { > > - WRITE_PIXELS(u, y, v, 8); > > - WRITE_PIXELS(y, u, y, 8); > > - WRITE_PIXELS(v, y, u, 8); > > - WRITE_PIXELS(y, v, y, 8); > > + for (i = 0; i < width - 5; i += 6) { > > WRITE_PIXELS(u, y, v, 8); > > WRITE_PIXELS(y, u, y, 8); > > WRITE_PIXELS(v, y, u, 8); > > Doesn't this have a performance impact? I think it's compiler job for the optimization. However I have done quick benchmark with ff_v210enc_init_x86 is removed(to trigger the c function), below is the testing result: ./ffmpeg -benchmark -y -f lavfi -i smptehdbars -sws_flags +accurate_rnd+bitexact -vf \ scale=720x576,format=yuv422p -c:v v210 -frames 1000 -f framecrc - master: frame= 1000 fps=754 q=-0.0 Lsize= 57kB time=00:00:40.00 bitrate= 11.6kbits/s speed=30.2x bench: utime=1.299s stime=0.013s rtime=1.326s bench: maxrss=16244736kB patch applied: frame= 1000 fps=756 q=-0.0 Lsize= 57kB time=00:00:40.00 bitrate= 11.6kbits/s speed=30.2x bench: utime=1.300s stime=0.013s rtime=1.323s bench: maxrss=16166912kB Thanks, Limin > > Carl Eugen > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
diff --git a/libavcodec/v210enc.c b/libavcodec/v210enc.c index 1b840b2..69a2efe 100644 --- a/libavcodec/v210enc.c +++ b/libavcodec/v210enc.c @@ -43,12 +43,7 @@ static void v210_planar_pack_8_c(const uint8_t *y, const uint8_t *u, uint32_t val; int i; - /* unroll this to match the assembly */ - for (i = 0; i < width - 11; i += 12) { - WRITE_PIXELS(u, y, v, 8); - WRITE_PIXELS(y, u, y, 8); - WRITE_PIXELS(v, y, u, 8); - WRITE_PIXELS(y, v, y, 8); + for (i = 0; i < width - 5; i += 6) { WRITE_PIXELS(u, y, v, 8); WRITE_PIXELS(y, u, y, 8); WRITE_PIXELS(v, y, u, 8);