Message ID | 20190901132023.28531-2-lance.lmwang@gmail.com |
---|---|
State | New |
Headers | show |
On Sun, Sep 01, 2019 at 09:20:20PM +0800, lance.lmwang@gmail.com wrote: > From: Limin Wang <lance.lmwang@gmail.com> > > I have benchmarked the performance with c code and haven't see any > performance impact. > > Signed-off-by: Limin Wang <lance.lmwang@gmail.com> > --- > libavcodec/v210enc.c | 7 +------ > 1 file changed, 1 insertion(+), 6 deletions(-) > > diff --git a/libavcodec/v210enc.c b/libavcodec/v210enc.c > index 1b840b2..69a2efe 100644 > --- a/libavcodec/v210enc.c > +++ b/libavcodec/v210enc.c > @@ -43,12 +43,7 @@ static void v210_planar_pack_8_c(const uint8_t *y, const uint8_t *u, > uint32_t val; > int i; > > - /* unroll this to match the assembly */ > - for (i = 0; i < width - 11; i += 12) { > - WRITE_PIXELS(u, y, v, 8); > - WRITE_PIXELS(y, u, y, 8); > - WRITE_PIXELS(v, y, u, 8); > - WRITE_PIXELS(y, v, y, 8); > + for (i = 0; i < width - 5; i += 6) { > WRITE_PIXELS(u, y, v, 8); > WRITE_PIXELS(y, u, y, 8); > WRITE_PIXELS(v, y, u, 8); I have retested this with START/STOP_TIMER and the more unrolled loop is consistently faster ./ffmpeg -cpuflags 0 -v 99 -i matrixbench_mpeg2.mpg -vcodec v210 -an test.avi 31620 decicycles in TEST, 2096691 runs, 461 skips 0 0 0 0 0 0 0 0 0 0 0 21 13 9 8 7 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31509 decicycles in TEST, 2096892 runs, 260 skips 0 0 0 0 0 0 0 0 0 0 0 21 10 9 8 6 7 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 32069 decicycles in TEST, 2096965 runs, 187 skips 0 0 0 0 0 0 0 0 0 0 0 21 16 10 8 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31522 decicycles in TEST, 2096962 runs, 190 skips 0 0 0 0 0 0 0 0 0 0 0 21 10 9 8 6 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31537 decicycles in TEST, 2096784 runs, 368 skips 0 0 0 0 0 0 0 0 0 0 0 21 12 8 9 7 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 prev: 30705 decicycles in TEST, 2096875 runs, 277 skips 0 0 0 0 0 0 0 0 0 0 0 21 15 9 9 7 5 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 30771 decicycles in TEST, 2096907 runs, 245 skips 0 0 0 0 0 0 0 0 0 0 0 21 15 9 8 6 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30560 decicycles in TEST, 2096904 runs, 248 skips 0 0 0 0 0 0 0 0 0 0 0 21 10 9 9 6 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31020 decicycles in TEST, 2096974 runs, 178 skips 0 0 0 0 0 0 0 0 0 0 0 21 16 9 8 6 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31018 decicycles in TEST, 2096980 runs, 172 skips 0 0 0 0 0 0 0 0 0 0 0 21 16 9 8 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [...]
On Mon, Sep 16, 2019 at 09:06:06PM +0200, Michael Niedermayer wrote: > On Sun, Sep 01, 2019 at 09:20:20PM +0800, lance.lmwang@gmail.com wrote: > > From: Limin Wang <lance.lmwang@gmail.com> > > > > I have benchmarked the performance with c code and haven't see any > > performance impact. > > > > Signed-off-by: Limin Wang <lance.lmwang@gmail.com> > > --- > > libavcodec/v210enc.c | 7 +------ > > 1 file changed, 1 insertion(+), 6 deletions(-) > > > > diff --git a/libavcodec/v210enc.c b/libavcodec/v210enc.c > > index 1b840b2..69a2efe 100644 > > --- a/libavcodec/v210enc.c > > +++ b/libavcodec/v210enc.c > > @@ -43,12 +43,7 @@ static void v210_planar_pack_8_c(const uint8_t *y, const uint8_t *u, > > uint32_t val; > > int i; > > > > - /* unroll this to match the assembly */ > > - for (i = 0; i < width - 11; i += 12) { > > - WRITE_PIXELS(u, y, v, 8); > > - WRITE_PIXELS(y, u, y, 8); > > - WRITE_PIXELS(v, y, u, 8); > > - WRITE_PIXELS(y, v, y, 8); > > + for (i = 0; i < width - 5; i += 6) { > > WRITE_PIXELS(u, y, v, 8); > > WRITE_PIXELS(y, u, y, 8); > > WRITE_PIXELS(v, y, u, 8); > > I have retested this with START/STOP_TIMER > and the more unrolled loop is consistently faster Sorry, I haven't used START/STOP_TIMER before, so only using -benchmark for checking quickly. As it's faster and we can't make the two function consistent, so I'll update the patch and discard patch#2 and patch#3. > > ./ffmpeg -cpuflags 0 -v 99 -i matrixbench_mpeg2.mpg -vcodec v210 -an test.avi > > 31620 decicycles in TEST, 2096691 runs, 461 skips 0 0 0 0 0 0 0 0 0 0 0 21 13 9 8 7 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 31509 decicycles in TEST, 2096892 runs, 260 skips 0 0 0 0 0 0 0 0 0 0 0 21 10 9 8 6 7 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 > 32069 decicycles in TEST, 2096965 runs, 187 skips 0 0 0 0 0 0 0 0 0 0 0 21 16 10 8 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 31522 decicycles in TEST, 2096962 runs, 190 skips 0 0 0 0 0 0 0 0 0 0 0 21 10 9 8 6 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 31537 decicycles in TEST, 2096784 runs, 368 skips 0 0 0 0 0 0 0 0 0 0 0 21 12 8 9 7 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > prev: > 30705 decicycles in TEST, 2096875 runs, 277 skips 0 0 0 0 0 0 0 0 0 0 0 21 15 9 9 7 5 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 > 30771 decicycles in TEST, 2096907 runs, 245 skips 0 0 0 0 0 0 0 0 0 0 0 21 15 9 8 6 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 30560 decicycles in TEST, 2096904 runs, 248 skips 0 0 0 0 0 0 0 0 0 0 0 21 10 9 9 6 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 31020 decicycles in TEST, 2096974 runs, 178 skips 0 0 0 0 0 0 0 0 0 0 0 21 16 9 8 6 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 31018 decicycles in TEST, 2096980 runs, 172 skips 0 0 0 0 0 0 0 0 0 0 0 21 16 9 8 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > I have often repented speaking, but never of holding my tongue. > -- Xenocrates > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
On Mon, Sep 16, 2019 at 09:06:06PM +0200, Michael Niedermayer wrote: > On Sun, Sep 01, 2019 at 09:20:20PM +0800, lance.lmwang@gmail.com wrote: > > From: Limin Wang <lance.lmwang@gmail.com> > > > > I have benchmarked the performance with c code and haven't see any > > performance impact. > > > > Signed-off-by: Limin Wang <lance.lmwang@gmail.com> > > --- > > libavcodec/v210enc.c | 7 +------ > > 1 file changed, 1 insertion(+), 6 deletions(-) > > > > diff --git a/libavcodec/v210enc.c b/libavcodec/v210enc.c > > index 1b840b2..69a2efe 100644 > > --- a/libavcodec/v210enc.c > > +++ b/libavcodec/v210enc.c > > @@ -43,12 +43,7 @@ static void v210_planar_pack_8_c(const uint8_t *y, const uint8_t *u, > > uint32_t val; > > int i; > > > > - /* unroll this to match the assembly */ > > - for (i = 0; i < width - 11; i += 12) { > > - WRITE_PIXELS(u, y, v, 8); > > - WRITE_PIXELS(y, u, y, 8); > > - WRITE_PIXELS(v, y, u, 8); > > - WRITE_PIXELS(y, v, y, 8); > > + for (i = 0; i < width - 5; i += 6) { > > WRITE_PIXELS(u, y, v, 8); > > WRITE_PIXELS(y, u, y, 8); > > WRITE_PIXELS(v, y, u, 8); > > I have retested this with START/STOP_TIMER > and the more unrolled loop is consistently faster > > ./ffmpeg -cpuflags 0 -v 99 -i matrixbench_mpeg2.mpg -vcodec v210 -an test.avi > > 31620 decicycles in TEST, 2096691 runs, 461 skips 0 0 0 0 0 0 0 0 0 0 0 21 13 9 8 7 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 31509 decicycles in TEST, 2096892 runs, 260 skips 0 0 0 0 0 0 0 0 0 0 0 21 10 9 8 6 7 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 > 32069 decicycles in TEST, 2096965 runs, 187 skips 0 0 0 0 0 0 0 0 0 0 0 21 16 10 8 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 31522 decicycles in TEST, 2096962 runs, 190 skips 0 0 0 0 0 0 0 0 0 0 0 21 10 9 8 6 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 31537 decicycles in TEST, 2096784 runs, 368 skips 0 0 0 0 0 0 0 0 0 0 0 21 12 8 9 7 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > prev: > 30705 decicycles in TEST, 2096875 runs, 277 skips 0 0 0 0 0 0 0 0 0 0 0 21 15 9 9 7 5 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 > 30771 decicycles in TEST, 2096907 runs, 245 skips 0 0 0 0 0 0 0 0 0 0 0 21 15 9 8 6 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 30560 decicycles in TEST, 2096904 runs, 248 skips 0 0 0 0 0 0 0 0 0 0 0 21 10 9 9 6 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 31020 decicycles in TEST, 2096974 runs, 178 skips 0 0 0 0 0 0 0 0 0 0 0 21 16 9 8 6 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 31018 decicycles in TEST, 2096980 runs, 172 skips 0 0 0 0 0 0 0 0 0 0 0 21 16 9 8 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > Michael, I have updated the patch V4 for review, old patch#2,3 are discard for the performance different. > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > I have often repented speaking, but never of holding my tongue. > -- Xenocrates > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
diff --git a/libavcodec/v210enc.c b/libavcodec/v210enc.c index 1b840b2..69a2efe 100644 --- a/libavcodec/v210enc.c +++ b/libavcodec/v210enc.c @@ -43,12 +43,7 @@ static void v210_planar_pack_8_c(const uint8_t *y, const uint8_t *u, uint32_t val; int i; - /* unroll this to match the assembly */ - for (i = 0; i < width - 11; i += 12) { - WRITE_PIXELS(u, y, v, 8); - WRITE_PIXELS(y, u, y, 8); - WRITE_PIXELS(v, y, u, 8); - WRITE_PIXELS(y, v, y, 8); + for (i = 0; i < width - 5; i += 6) { WRITE_PIXELS(u, y, v, 8); WRITE_PIXELS(y, u, y, 8); WRITE_PIXELS(v, y, u, 8);