diff mbox

[FFmpeg-devel,5/5] avcodec/utvideodec: Factor multiply out of inner loop

Message ID 20170627194735.27533-5-michael@niedermayer.cc
State Accepted
Commit 850c6db97d1f78e7607952ab8b854a93a185319e
Headers show

Commit Message

Michael Niedermayer June 27, 2017, 7:47 p.m. UTC
0.5% faster loop

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/utvideodec.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Steven Liu June 28, 2017, 3:51 a.m. UTC | #1
2017-06-28 3:47 GMT+08:00 Michael Niedermayer <michael@niedermayer.cc>:
> 0.5% faster loop
>
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  libavcodec/utvideodec.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/libavcodec/utvideodec.c b/libavcodec/utvideodec.c
> index 788f4475b9..a20e28320c 100644
> --- a/libavcodec/utvideodec.c
> +++ b/libavcodec/utvideodec.c
> @@ -196,7 +196,8 @@ static int decode_plane10(UtvideoContext *c, int plane_no,
>
>          prev = 0x200;
>          for (j = sstart; j < send; j++) {
> -            for (i = 0; i < width * step; i += step) {
> +            int ws = width * step;
> +            for (i = 0; i < ws; i += step) {
>                  pix = get_vlc2(&gb, vlc.table, VLC_BITS, 3);
>                  if (pix < 0) {
>                      av_log(c->avctx, AV_LOG_ERROR, "Decoding error\n");
> @@ -300,7 +301,8 @@ static int decode_plane(UtvideoContext *c, int plane_no,
>
>          prev = 0x80;
>          for (j = sstart; j < send; j++) {
> -            for (i = 0; i < width * step; i += step) {
> +            int ws = width * step;
> +            for (i = 0; i < ws; i += step) {
>                  pix = get_vlc2(&gb, vlc.table, VLC_BITS, 3);
>                  if (pix < 0) {
>                      av_log(c->avctx, AV_LOG_ERROR, "Decoding error\n");
> --
> 2.13.0
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


LGTM, move the compute before the loop, better then in the loop
Timo Rothenpieler June 28, 2017, 8:52 a.m. UTC | #2
Am 27.06.2017 um 21:47 schrieb Michael Niedermayer:
> 0.5% faster loop
> 
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  libavcodec/utvideodec.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/libavcodec/utvideodec.c b/libavcodec/utvideodec.c
> index 788f4475b9..a20e28320c 100644
> --- a/libavcodec/utvideodec.c
> +++ b/libavcodec/utvideodec.c
> @@ -196,7 +196,8 @@ static int decode_plane10(UtvideoContext *c, int plane_no,
>  
>          prev = 0x200;
>          for (j = sstart; j < send; j++) {
> -            for (i = 0; i < width * step; i += step) {
> +            int ws = width * step;
> +            for (i = 0; i < ws; i += step) {
>                  pix = get_vlc2(&gb, vlc.table, VLC_BITS, 3);

This seems like a pretty obvious optimization, why doesn't gcc or any
other compiler see that it can pre-compute that before the loop, as
nothing in the loop changes both variables?
Reimar Döffinger July 4, 2017, 7:16 a.m. UTC | #3
On 28.06.2017, at 10:52, Timo Rothenpieler <timo@rothenpieler.org> wrote:

> Am 27.06.2017 um 21:47 schrieb Michael Niedermayer:
>> 0.5% faster loop
>> 
>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
>> ---
>> libavcodec/utvideodec.c | 6 ++++--
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>> 
>> diff --git a/libavcodec/utvideodec.c b/libavcodec/utvideodec.c
>> index 788f4475b9..a20e28320c 100644
>> --- a/libavcodec/utvideodec.c
>> +++ b/libavcodec/utvideodec.c
>> @@ -196,7 +196,8 @@ static int decode_plane10(UtvideoContext *c, int plane_no,
>> 
>>         prev = 0x200;
>>         for (j = sstart; j < send; j++) {
>> -            for (i = 0; i < width * step; i += step) {
>> +            int ws = width * step;
>> +            for (i = 0; i < ws; i += step) {
>>                 pix = get_vlc2(&gb, vlc.table, VLC_BITS, 3);
> 
> This seems like a pretty obvious optimization, why doesn't gcc or any
> other compiler see that it can pre-compute that before the loop, as
> nothing in the loop changes both variables?

Because despite claims to the contrary, compilers haven't improved much and are still incredibly stupid, and optimizations that are added are usually added without proper tests so most of them just increase compile time without any effect (just look at the code generated for C++ class initialization - there are about 3 places in the compiler supposed to do write merging, yet it still happens element-wise in both gcc and clang). Reminds me that I should finally start opening bugs for my nice collection of "stupidest codegen ever" sample programs (C++ is much, much worse than C btw).
And yet, there are now people claiming "oh, nowadays Itanium would work, because compilers can do that now". Just waiting for the next big failure in CPU architecture because people don't get how bad compilers are. I'll start believing in AI when there's a compiler where looking at the assembler output doesn't cause despair within minutes...
diff mbox

Patch

diff --git a/libavcodec/utvideodec.c b/libavcodec/utvideodec.c
index 788f4475b9..a20e28320c 100644
--- a/libavcodec/utvideodec.c
+++ b/libavcodec/utvideodec.c
@@ -196,7 +196,8 @@  static int decode_plane10(UtvideoContext *c, int plane_no,
 
         prev = 0x200;
         for (j = sstart; j < send; j++) {
-            for (i = 0; i < width * step; i += step) {
+            int ws = width * step;
+            for (i = 0; i < ws; i += step) {
                 pix = get_vlc2(&gb, vlc.table, VLC_BITS, 3);
                 if (pix < 0) {
                     av_log(c->avctx, AV_LOG_ERROR, "Decoding error\n");
@@ -300,7 +301,8 @@  static int decode_plane(UtvideoContext *c, int plane_no,
 
         prev = 0x80;
         for (j = sstart; j < send; j++) {
-            for (i = 0; i < width * step; i += step) {
+            int ws = width * step;
+            for (i = 0; i < ws; i += step) {
                 pix = get_vlc2(&gb, vlc.table, VLC_BITS, 3);
                 if (pix < 0) {
                     av_log(c->avctx, AV_LOG_ERROR, "Decoding error\n");