diff mbox

[FFmpeg-devel,2/9] avcodec/gdv: Replace divisions by shifts in rescale()

Message ID 20180805202937.7563-2-michael@niedermayer.cc
State Accepted
Commit b90d8cc7466386a166dd72107457498aa5a7c43d
Headers show

Commit Message

Michael Niedermayer Aug. 5, 2018, 8:29 p.m. UTC
Divisions tend to be slower than shifts unless the compiler optimizes them out.
And some of these are in inner loops.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/gdv.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

Comments

James Almer Aug. 5, 2018, 9:19 p.m. UTC | #1
On 8/5/2018 5:29 PM, Michael Niedermayer wrote:
> Divisions tend to be slower than shifts unless the compiler optimizes them out.
> And some of these are in inner loops.

This patch makes the code slightly less readable, IMO. What compiler
doesn't optimize out an integer division by 2 into a right shift?
Is this part of the cause for the timeouts you mention in patch 9/9? Do
those run on builds with compiler optimizations disabled then?

The other patches in this set look good since they effectively simplify
and optimize the code for all scenarios (memcpy vs loop, reduced amount
of iterations, etc). But this one is trying to work around a decision
made by a compiler in a non real world scenario with specific runtime
constrains.
Does this timeout still happen if you apply the entire set except for
this patch?

> 
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  libavcodec/gdv.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/libavcodec/gdv.c b/libavcodec/gdv.c
> index e52a637610..79ca157dde 100644
> --- a/libavcodec/gdv.c
> +++ b/libavcodec/gdv.c
> @@ -85,14 +85,14 @@ static void rescale(GDVContext *gdv, uint8_t *dst, int w, int h, int scale_v, in
>              int y = h - j - 1;
>              for (i = 0; i < w; i++) {
>                  int x = w - i - 1;
> -                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x/2 + (y/2) * (w/2)];
> +                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + (x>>1) + (y>>1) * (w>>1)];
>              }
>          }
>      } else if (gdv->scale_h) {
>          for (j = 0; j < h; j++) {
>              int y = h - j - 1;
>              for (x = 0; x < w; x++) {
> -                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x + (y/2) * w];
> +                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x + (y>>1) * w];
>              }
>          }
>      } else if (gdv->scale_v) {
> @@ -100,26 +100,26 @@ static void rescale(GDVContext *gdv, uint8_t *dst, int w, int h, int scale_v, in
>              int y = h - j - 1;
>              for (i = 0; i < w; i++) {
>                  int x = w - i - 1;
> -                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x/2 + y * (w/2)];
> +                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + (x>>1) + y * (w>>1)];
>              }
>          }
>      }
>  
>      if (scale_h && scale_v) {
> -        for (y = 0; y < h/2; y++) {
> -            for (x = 0; x < w/2; x++) {
> -                dst[PREAMBLE_SIZE + x + y * (w/2)] = dst[PREAMBLE_SIZE + x*2 + y*2 * w];
> +        for (y = 0; y < (h>>1); y++) {
> +            for (x = 0; x < (w>>1); x++) {
> +                dst[PREAMBLE_SIZE + x + y * (w>>1)] = dst[PREAMBLE_SIZE + x*2 + y*2 * w];
>              }
>          }
>      } else if (scale_h) {
> -        for (y = 0; y < h/2; y++) {
> +        for (y = 0; y < (h>>1); y++) {
>              for (x = 0; x < w; x++) {
>                  dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x + y*2 * w];
>              }
>          }
>      } else if (scale_v) {
>          for (y = 0; y < h; y++) {
> -            for (x = 0; x < w/2; x++) {
> +            for (x = 0; x < (w>>1); x++) {
>                  dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x*2 + y * w];
>              }
>          }
>
Nicolas George Aug. 5, 2018, 9:37 p.m. UTC | #2
James Almer (2018-08-05):
> This patch makes the code slightly less readable, IMO. What compiler
> doesn't optimize out an integer division by 2 into a right shift?

Any compiler that cannot figure out that x is positive in this
convoluted loop. Signed divisions cannot be optimized as shifts. Make
everything unsigned and it will be fine.

But I do not understand the logic in having i going from 0 to w-1 just
to compute x from w-1 to 0.

	unsigned w, x;
	for (x = w - 1; x < w; x--)

would be better IMHO.

Regards,
Michael Niedermayer Aug. 5, 2018, 9:49 p.m. UTC | #3
On Sun, Aug 05, 2018 at 11:37:24PM +0200, Nicolas George wrote:
> James Almer (2018-08-05):
> > This patch makes the code slightly less readable, IMO. What compiler
> > doesn't optimize out an integer division by 2 into a right shift?
> 
> Any compiler that cannot figure out that x is positive in this
> convoluted loop. Signed divisions cannot be optimized as shifts. Make
> everything unsigned and it will be fine.
> 
> But I do not understand the logic in having i going from 0 to w-1 just
> to compute x from w-1 to 0.
> 
> 	unsigned w, x;


> 	for (x = w - 1; x < w; x--)

thats done in the next patch

[...]
Nicolas George Aug. 5, 2018, 9:55 p.m. UTC | #4
Michael Niedermayer (2018-08-05):
> thats done in the next patch

Indeed, except for the part where the variable are turned to unsigned to
let the compiler optimize the division. Sorry to have missed that.

Regards,
Michael Niedermayer Aug. 5, 2018, 9:58 p.m. UTC | #5
On Sun, Aug 05, 2018 at 06:19:01PM -0300, James Almer wrote:
> On 8/5/2018 5:29 PM, Michael Niedermayer wrote:
> > Divisions tend to be slower than shifts unless the compiler optimizes them out.
> > And some of these are in inner loops.
> 
> This patch makes the code slightly less readable, IMO. What compiler
> doesn't optimize out an integer division by 2 into a right shift?
> Is this part of the cause for the timeouts you mention in patch 9/9? Do
> those run on builds with compiler optimizations disabled then?

Iam running them with -O1
i dont know if gcc succeeded optimizing the /2 to a plain shift. (it has
to proof that the value is positive as nicolas said)
Its more a habit of not doing potentially slow operations in inner loops.

As in "always favor shift over division when possibly" vs. "spend time
proofing that the speedloss either doesnt matter or that every supported
compiler can optimize it"



> 
> The other patches in this set look good since they effectively simplify
> and optimize the code for all scenarios (memcpy vs loop, reduced amount
> of iterations, etc). But this one is trying to work around a decision
> made by a compiler in a non real world scenario with specific runtime
> constrains.
> Does this timeout still happen if you apply the entire set except for
> this patch?

Ill have to try but i dont think its wise to leave divisions in inner
loops where they are easy avoidable.
also makes greping for such suboptimal code harder, if anyone tried that
as in "divisions in a loop"

but ye, if people prefer to leave that, we can do that (assuming it
isnt slower), should i test this ?

[...]
James Almer Aug. 5, 2018, 10:31 p.m. UTC | #6
On 8/5/2018 6:58 PM, Michael Niedermayer wrote:
> On Sun, Aug 05, 2018 at 06:19:01PM -0300, James Almer wrote:
>> On 8/5/2018 5:29 PM, Michael Niedermayer wrote:
>>> Divisions tend to be slower than shifts unless the compiler optimizes them out.
>>> And some of these are in inner loops.
>>
>> This patch makes the code slightly less readable, IMO. What compiler
>> doesn't optimize out an integer division by 2 into a right shift?
>> Is this part of the cause for the timeouts you mention in patch 9/9? Do
>> those run on builds with compiler optimizations disabled then?
> 
> Iam running them with -O1
> i dont know if gcc succeeded optimizing the /2 to a plain shift. (it has
> to proof that the value is positive as nicolas said)
> Its more a habit of not doing potentially slow operations in inner loops.
> 
> As in "always favor shift over division when possibly" vs. "spend time
> proofing that the speedloss either doesnt matter or that every supported
> compiler can optimize it"
> 
> 
> 
>>
>> The other patches in this set look good since they effectively simplify
>> and optimize the code for all scenarios (memcpy vs loop, reduced amount
>> of iterations, etc). But this one is trying to work around a decision
>> made by a compiler in a non real world scenario with specific runtime
>> constrains.
>> Does this timeout still happen if you apply the entire set except for
>> this patch?
> 
> Ill have to try but i dont think its wise to leave divisions in inner
> loops where they are easy avoidable.
> also makes greping for such suboptimal code harder, if anyone tried that
> as in "divisions in a loop"
> 
> but ye, if people prefer to leave that, we can do that (assuming it
> isnt slower), should i test this ?

No, don't bother. If a compiler needs to proof the value is positive
before optimizing the division into a shift then it's a different situation.
Maybe make these unsigned as suggested by Nicolas instead of changing
division into shift, which will also address the readability concerns,
but otherwise this patch is ok.
Michael Niedermayer Aug. 19, 2018, 10:42 p.m. UTC | #7
On Sun, Aug 05, 2018 at 11:55:06PM +0200, Nicolas George wrote:
> Michael Niedermayer (2018-08-05):
> > thats done in the next patch
> 
> Indeed, except for the part where the variable are turned to unsigned to
> let the compiler optimize the division. Sorry to have missed that.

are you ok with leaving the shifts as in the patchset or should i redo the
patchset with unsigned ?

thx

[...]
Michael Niedermayer Sept. 24, 2018, 9:43 p.m. UTC | #8
On Mon, Aug 20, 2018 at 12:42:29AM +0200, Michael Niedermayer wrote:
> On Sun, Aug 05, 2018 at 11:55:06PM +0200, Nicolas George wrote:
> > Michael Niedermayer (2018-08-05):
> > > thats done in the next patch
> > 
> > Indeed, except for the part where the variable are turned to unsigned to
> > let the compiler optimize the division. Sorry to have missed that.
> 
> are you ok with leaving the shifts as in the patchset or should i redo the
> patchset with unsigned ?

CC-ing you so you dont miss this (though its not important)

if i hear nothing from anyone i will apply the patchset as is.
IIUC james is ok with that. 
But as said i can redo it with unsigned if people want.
Just dont want to spend time redoing a patchset when maybe everyone is fine
with it after the discussions. (its hard to read peoples preferance from
silence ...)

thx

[...]
Nicolas George Sept. 26, 2018, 3:06 p.m. UTC | #9
Michael Niedermayer (2018-09-24):
> CC-ing you so you dont miss this (though its not important)
> 
> if i hear nothing from anyone i will apply the patchset as is.
> IIUC james is ok with that. 
> But as said i can redo it with unsigned if people want.
> Just dont want to spend time redoing a patchset when maybe everyone is fine
> with it after the discussions. (its hard to read peoples preferance from
> silence ...)

Sorry, I did miss it indeed. I just wanted to mention the benefit of
using unsigned; I think the person working on the file has final say in
this kind of case.

Regards,
Michael Niedermayer Sept. 27, 2018, 6:07 p.m. UTC | #10
On Wed, Sep 26, 2018 at 05:06:46PM +0200, Nicolas George wrote:
> Michael Niedermayer (2018-09-24):
> > CC-ing you so you dont miss this (though its not important)
> > 
> > if i hear nothing from anyone i will apply the patchset as is.
> > IIUC james is ok with that. 
> > But as said i can redo it with unsigned if people want.
> > Just dont want to spend time redoing a patchset when maybe everyone is fine
> > with it after the discussions. (its hard to read peoples preferance from
> > silence ...)
> 
> Sorry, I did miss it indeed. I just wanted to mention the benefit of
> using unsigned; I think the person working on the file has final say in
> this kind of case.

ok, thanks


[...]
diff mbox

Patch

diff --git a/libavcodec/gdv.c b/libavcodec/gdv.c
index e52a637610..79ca157dde 100644
--- a/libavcodec/gdv.c
+++ b/libavcodec/gdv.c
@@ -85,14 +85,14 @@  static void rescale(GDVContext *gdv, uint8_t *dst, int w, int h, int scale_v, in
             int y = h - j - 1;
             for (i = 0; i < w; i++) {
                 int x = w - i - 1;
-                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x/2 + (y/2) * (w/2)];
+                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + (x>>1) + (y>>1) * (w>>1)];
             }
         }
     } else if (gdv->scale_h) {
         for (j = 0; j < h; j++) {
             int y = h - j - 1;
             for (x = 0; x < w; x++) {
-                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x + (y/2) * w];
+                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x + (y>>1) * w];
             }
         }
     } else if (gdv->scale_v) {
@@ -100,26 +100,26 @@  static void rescale(GDVContext *gdv, uint8_t *dst, int w, int h, int scale_v, in
             int y = h - j - 1;
             for (i = 0; i < w; i++) {
                 int x = w - i - 1;
-                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x/2 + y * (w/2)];
+                dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + (x>>1) + y * (w>>1)];
             }
         }
     }
 
     if (scale_h && scale_v) {
-        for (y = 0; y < h/2; y++) {
-            for (x = 0; x < w/2; x++) {
-                dst[PREAMBLE_SIZE + x + y * (w/2)] = dst[PREAMBLE_SIZE + x*2 + y*2 * w];
+        for (y = 0; y < (h>>1); y++) {
+            for (x = 0; x < (w>>1); x++) {
+                dst[PREAMBLE_SIZE + x + y * (w>>1)] = dst[PREAMBLE_SIZE + x*2 + y*2 * w];
             }
         }
     } else if (scale_h) {
-        for (y = 0; y < h/2; y++) {
+        for (y = 0; y < (h>>1); y++) {
             for (x = 0; x < w; x++) {
                 dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x + y*2 * w];
             }
         }
     } else if (scale_v) {
         for (y = 0; y < h; y++) {
-            for (x = 0; x < w/2; x++) {
+            for (x = 0; x < (w>>1); x++) {
                 dst[PREAMBLE_SIZE + x + y * w] = dst[PREAMBLE_SIZE + x*2 + y * w];
             }
         }