Message ID | 20170627194735.27533-5-michael@niedermayer.cc |
---|---|
State | Accepted |
Commit | 850c6db97d1f78e7607952ab8b854a93a185319e |
Headers | show |
2017-06-28 3:47 GMT+08:00 Michael Niedermayer <michael@niedermayer.cc>: > 0.5% faster loop > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > --- > libavcodec/utvideodec.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/utvideodec.c b/libavcodec/utvideodec.c > index 788f4475b9..a20e28320c 100644 > --- a/libavcodec/utvideodec.c > +++ b/libavcodec/utvideodec.c > @@ -196,7 +196,8 @@ static int decode_plane10(UtvideoContext *c, int plane_no, > > prev = 0x200; > for (j = sstart; j < send; j++) { > - for (i = 0; i < width * step; i += step) { > + int ws = width * step; > + for (i = 0; i < ws; i += step) { > pix = get_vlc2(&gb, vlc.table, VLC_BITS, 3); > if (pix < 0) { > av_log(c->avctx, AV_LOG_ERROR, "Decoding error\n"); > @@ -300,7 +301,8 @@ static int decode_plane(UtvideoContext *c, int plane_no, > > prev = 0x80; > for (j = sstart; j < send; j++) { > - for (i = 0; i < width * step; i += step) { > + int ws = width * step; > + for (i = 0; i < ws; i += step) { > pix = get_vlc2(&gb, vlc.table, VLC_BITS, 3); > if (pix < 0) { > av_log(c->avctx, AV_LOG_ERROR, "Decoding error\n"); > -- > 2.13.0 > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel LGTM, move the compute before the loop, better then in the loop
Am 27.06.2017 um 21:47 schrieb Michael Niedermayer: > 0.5% faster loop > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > --- > libavcodec/utvideodec.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/utvideodec.c b/libavcodec/utvideodec.c > index 788f4475b9..a20e28320c 100644 > --- a/libavcodec/utvideodec.c > +++ b/libavcodec/utvideodec.c > @@ -196,7 +196,8 @@ static int decode_plane10(UtvideoContext *c, int plane_no, > > prev = 0x200; > for (j = sstart; j < send; j++) { > - for (i = 0; i < width * step; i += step) { > + int ws = width * step; > + for (i = 0; i < ws; i += step) { > pix = get_vlc2(&gb, vlc.table, VLC_BITS, 3); This seems like a pretty obvious optimization, why doesn't gcc or any other compiler see that it can pre-compute that before the loop, as nothing in the loop changes both variables?
On 28.06.2017, at 10:52, Timo Rothenpieler <timo@rothenpieler.org> wrote: > Am 27.06.2017 um 21:47 schrieb Michael Niedermayer: >> 0.5% faster loop >> >> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> >> --- >> libavcodec/utvideodec.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/libavcodec/utvideodec.c b/libavcodec/utvideodec.c >> index 788f4475b9..a20e28320c 100644 >> --- a/libavcodec/utvideodec.c >> +++ b/libavcodec/utvideodec.c >> @@ -196,7 +196,8 @@ static int decode_plane10(UtvideoContext *c, int plane_no, >> >> prev = 0x200; >> for (j = sstart; j < send; j++) { >> - for (i = 0; i < width * step; i += step) { >> + int ws = width * step; >> + for (i = 0; i < ws; i += step) { >> pix = get_vlc2(&gb, vlc.table, VLC_BITS, 3); > > This seems like a pretty obvious optimization, why doesn't gcc or any > other compiler see that it can pre-compute that before the loop, as > nothing in the loop changes both variables? Because despite claims to the contrary, compilers haven't improved much and are still incredibly stupid, and optimizations that are added are usually added without proper tests so most of them just increase compile time without any effect (just look at the code generated for C++ class initialization - there are about 3 places in the compiler supposed to do write merging, yet it still happens element-wise in both gcc and clang). Reminds me that I should finally start opening bugs for my nice collection of "stupidest codegen ever" sample programs (C++ is much, much worse than C btw). And yet, there are now people claiming "oh, nowadays Itanium would work, because compilers can do that now". Just waiting for the next big failure in CPU architecture because people don't get how bad compilers are. I'll start believing in AI when there's a compiler where looking at the assembler output doesn't cause despair within minutes...
diff --git a/libavcodec/utvideodec.c b/libavcodec/utvideodec.c index 788f4475b9..a20e28320c 100644 --- a/libavcodec/utvideodec.c +++ b/libavcodec/utvideodec.c @@ -196,7 +196,8 @@ static int decode_plane10(UtvideoContext *c, int plane_no, prev = 0x200; for (j = sstart; j < send; j++) { - for (i = 0; i < width * step; i += step) { + int ws = width * step; + for (i = 0; i < ws; i += step) { pix = get_vlc2(&gb, vlc.table, VLC_BITS, 3); if (pix < 0) { av_log(c->avctx, AV_LOG_ERROR, "Decoding error\n"); @@ -300,7 +301,8 @@ static int decode_plane(UtvideoContext *c, int plane_no, prev = 0x80; for (j = sstart; j < send; j++) { - for (i = 0; i < width * step; i += step) { + int ws = width * step; + for (i = 0; i < ws; i += step) { pix = get_vlc2(&gb, vlc.table, VLC_BITS, 3); if (pix < 0) { av_log(c->avctx, AV_LOG_ERROR, "Decoding error\n");
0.5% faster loop Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> --- libavcodec/utvideodec.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)