[FFmpeg-devel,2/9] avcodec/gdv: Replace divisions by shifts in rescale()

Submitted by Michael Niedermayer on Aug. 5, 2018, 8:29 p.m.

Details

Message ID 20180805202937.7563-2-michael@niedermayer.cc
State New
Headers show

Commit Message

Michael Niedermayer Aug. 5, 2018, 8:29 p.m.
Divisions tend to be slower than shifts unless the compiler optimizes them out.
And some of these are in inner loops.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/gdv.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

Comments

James Almer Aug. 5, 2018, 9:19 p.m.
On 8/5/2018 5:29 PM, Michael Niedermayer wrote:
> Divisions tend to be slower than shifts unless the compiler optimizes them out.
> And some of these are in inner loops.

This patch makes the code slightly less readable, IMO. What compiler
doesn't optimize out an integer division by 2 into a right shift?
Is this part of the cause for the timeouts you mention in patch 9/9? Do
those run on builds with compiler optimizations disabled then?

The other patches in this set look good since they effectively simplify
and optimize the code for all scenarios (memcpy vs loop, reduced amount
of iterations, etc). But this one is trying to work around a decision
made by a compiler in a non real world scenario with specific runtime
constrains.
Does this timeout still happen if you apply the entire set except for
this patch?

> 
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  libavcodec/gdv.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/libavcodec/gdv.c b/libavcodec/gdv.c
> index e52a637610..79ca157dde 100644
> --- a/libavcodec/gdv.c
> +++ b/libavcodec/gdv.c
> @@ -85,14 +85,14 @@ static void rescale(GDVContext *gdv, uint8_t *dst, int w, int h, int scale_v, in
>              int y = h - j - 1;
>              for (i = 0; i < w; i++) {
>                  int x = w - i - 1;
> -                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x/2 + (y/2) * (w/2)];
> +                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + (x>>1) + (y>>1) * (w>>1)];
>              }
>          }
>      } else if (gdv->scale_h) {
>          for (j = 0; j < h; j++) {
>              int y = h - j - 1;
>              for (x = 0; x < w; x++) {
> -                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x + (y/2) * w];
> +                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x + (y>>1) * w];
>              }
>          }
>      } else if (gdv->scale_v) {
> @@ -100,26 +100,26 @@ static void rescale(GDVContext *gdv, uint8_t *dst, int w, int h, int scale_v, in
>              int y = h - j - 1;
>              for (i = 0; i < w; i++) {
>                  int x = w - i - 1;
> -                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x/2 + y * (w/2)];
> +                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + (x>>1) + y * (w>>1)];
>              }
>          }
>      }
>  
>      if (scale_h && scale_v) {
> -        for (y = 0; y < h/2; y++) {
> -            for (x = 0; x < w/2; x++) {
> -                dst[PREAMBLE_SIZE + x + y * (w/2)] = dst[PREAMBLE_SIZE + x*2 + y*2 * w];
> +        for (y = 0; y < (h>>1); y++) {
> +            for (x = 0; x < (w>>1); x++) {
> +                dst[PREAMBLE_SIZE + x + y * (w>>1)] = dst[PREAMBLE_SIZE + x*2 + y*2 * w];
>              }
>          }
>      } else if (scale_h) {
> -        for (y = 0; y < h/2; y++) {
> +        for (y = 0; y < (h>>1); y++) {
>              for (x = 0; x < w; x++) {
>                  dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x + y*2 * w];
>              }
>          }
>      } else if (scale_v) {
>          for (y = 0; y < h; y++) {
> -            for (x = 0; x < w/2; x++) {
> +            for (x = 0; x < (w>>1); x++) {
>                  dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x*2 + y * w];
>              }
>          }
>
Nicolas George Aug. 5, 2018, 9:37 p.m.
James Almer (2018-08-05):
> This patch makes the code slightly less readable, IMO. What compiler
> doesn't optimize out an integer division by 2 into a right shift?

Any compiler that cannot figure out that x is positive in this
convoluted loop. Signed divisions cannot be optimized as shifts. Make
everything unsigned and it will be fine.

But I do not understand the logic in having i going from 0 to w-1 just
to compute x from w-1 to 0.

	unsigned w, x;
	for (x = w - 1; x < w; x--)

would be better IMHO.

Regards,
Michael Niedermayer Aug. 5, 2018, 9:49 p.m.
On Sun, Aug 05, 2018 at 11:37:24PM +0200, Nicolas George wrote:
> James Almer (2018-08-05):
> > This patch makes the code slightly less readable, IMO. What compiler
> > doesn't optimize out an integer division by 2 into a right shift?
> 
> Any compiler that cannot figure out that x is positive in this
> convoluted loop. Signed divisions cannot be optimized as shifts. Make
> everything unsigned and it will be fine.
> 
> But I do not understand the logic in having i going from 0 to w-1 just
> to compute x from w-1 to 0.
> 
> 	unsigned w, x;


> 	for (x = w - 1; x < w; x--)

thats done in the next patch

[...]
Nicolas George Aug. 5, 2018, 9:55 p.m.
Michael Niedermayer (2018-08-05):
> thats done in the next patch

Indeed, except for the part where the variable are turned to unsigned to
let the compiler optimize the division. Sorry to have missed that.

Regards,
Michael Niedermayer Aug. 5, 2018, 9:58 p.m.
On Sun, Aug 05, 2018 at 06:19:01PM -0300, James Almer wrote:
> On 8/5/2018 5:29 PM, Michael Niedermayer wrote:
> > Divisions tend to be slower than shifts unless the compiler optimizes them out.
> > And some of these are in inner loops.
> 
> This patch makes the code slightly less readable, IMO. What compiler
> doesn't optimize out an integer division by 2 into a right shift?
> Is this part of the cause for the timeouts you mention in patch 9/9? Do
> those run on builds with compiler optimizations disabled then?

Iam running them with -O1
i dont know if gcc succeeded optimizing the /2 to a plain shift. (it has
to proof that the value is positive as nicolas said)
Its more a habit of not doing potentially slow operations in inner loops.

As in "always favor shift over division when possibly" vs. "spend time
proofing that the speedloss either doesnt matter or that every supported
compiler can optimize it"



> 
> The other patches in this set look good since they effectively simplify
> and optimize the code for all scenarios (memcpy vs loop, reduced amount
> of iterations, etc). But this one is trying to work around a decision
> made by a compiler in a non real world scenario with specific runtime
> constrains.
> Does this timeout still happen if you apply the entire set except for
> this patch?

Ill have to try but i dont think its wise to leave divisions in inner
loops where they are easy avoidable.
also makes greping for such suboptimal code harder, if anyone tried that
as in "divisions in a loop"

but ye, if people prefer to leave that, we can do that (assuming it
isnt slower), should i test this ?

[...]
James Almer Aug. 5, 2018, 10:31 p.m.
On 8/5/2018 6:58 PM, Michael Niedermayer wrote:
> On Sun, Aug 05, 2018 at 06:19:01PM -0300, James Almer wrote:
>> On 8/5/2018 5:29 PM, Michael Niedermayer wrote:
>>> Divisions tend to be slower than shifts unless the compiler optimizes them out.
>>> And some of these are in inner loops.
>>
>> This patch makes the code slightly less readable, IMO. What compiler
>> doesn't optimize out an integer division by 2 into a right shift?
>> Is this part of the cause for the timeouts you mention in patch 9/9? Do
>> those run on builds with compiler optimizations disabled then?
> 
> Iam running them with -O1
> i dont know if gcc succeeded optimizing the /2 to a plain shift. (it has
> to proof that the value is positive as nicolas said)
> Its more a habit of not doing potentially slow operations in inner loops.
> 
> As in "always favor shift over division when possibly" vs. "spend time
> proofing that the speedloss either doesnt matter or that every supported
> compiler can optimize it"
> 
> 
> 
>>
>> The other patches in this set look good since they effectively simplify
>> and optimize the code for all scenarios (memcpy vs loop, reduced amount
>> of iterations, etc). But this one is trying to work around a decision
>> made by a compiler in a non real world scenario with specific runtime
>> constrains.
>> Does this timeout still happen if you apply the entire set except for
>> this patch?
> 
> Ill have to try but i dont think its wise to leave divisions in inner
> loops where they are easy avoidable.
> also makes greping for such suboptimal code harder, if anyone tried that
> as in "divisions in a loop"
> 
> but ye, if people prefer to leave that, we can do that (assuming it
> isnt slower), should i test this ?

No, don't bother. If a compiler needs to proof the value is positive
before optimizing the division into a shift then it's a different situation.
Maybe make these unsigned as suggested by Nicolas instead of changing
division into shift, which will also address the readability concerns,
but otherwise this patch is ok.

Patch hide | download patch | download mbox

diff --git a/libavcodec/gdv.c b/libavcodec/gdv.c
index e52a637610..79ca157dde 100644
--- a/libavcodec/gdv.c
+++ b/libavcodec/gdv.c
@@ -85,14 +85,14 @@  static void rescale(GDVContext *gdv, uint8_t *dst, int w, int h, int scale_v, in
             int y = h - j - 1;
             for (i = 0; i < w; i++) {
                 int x = w - i - 1;
-                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x/2 + (y/2) * (w/2)];
+                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + (x>>1) + (y>>1) * (w>>1)];
             }
         }
     } else if (gdv->scale_h) {
         for (j = 0; j < h; j++) {
             int y = h - j - 1;
             for (x = 0; x < w; x++) {
-                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x + (y/2) * w];
+                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x + (y>>1) * w];
             }
         }
     } else if (gdv->scale_v) {
@@ -100,26 +100,26 @@  static void rescale(GDVContext *gdv, uint8_t *dst, int w, int h, int scale_v, in
             int y = h - j - 1;
             for (i = 0; i < w; i++) {
                 int x = w - i - 1;
-                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x/2 + y * (w/2)];
+                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + (x>>1) + y * (w>>1)];
             }
         }
     }
 
     if (scale_h && scale_v) {
-        for (y = 0; y < h/2; y++) {
-            for (x = 0; x < w/2; x++) {
-                dst[PREAMBLE_SIZE + x + y * (w/2)] = dst[PREAMBLE_SIZE + x*2 + y*2 * w];
+        for (y = 0; y < (h>>1); y++) {
+            for (x = 0; x < (w>>1); x++) {
+                dst[PREAMBLE_SIZE + x + y * (w>>1)] = dst[PREAMBLE_SIZE + x*2 + y*2 * w];
             }
         }
     } else if (scale_h) {
-        for (y = 0; y < h/2; y++) {
+        for (y = 0; y < (h>>1); y++) {
             for (x = 0; x < w; x++) {
                 dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x + y*2 * w];
             }
         }
     } else if (scale_v) {
         for (y = 0; y < h; y++) {
-            for (x = 0; x < w/2; x++) {
+            for (x = 0; x < (w>>1); x++) {
                 dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x*2 + y * w];
             }
         }