[FFmpeg-devel,1/2] avcodec/fft_template: improve performance of the ff_fft_init in fft_template

Submitted by Steven Liu on Dec. 16, 2018, 1:19 p.m.

Details

Message ID 20181216131918.1562-1-lq@chinaffmpeg.org
State Accepted
Headers show

Commit Message

Steven Liu Dec. 16, 2018, 1:19 p.m.
move the two if condition out of the loop, that can less n-1 times than condition
in loop.

Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
---
 libavcodec/fft_template.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Moritz Barsnick Dec. 16, 2018, 1:47 p.m.
On Sun, Dec 16, 2018 at 21:19:17 +0800, Steven Liu wrote:
> move the two if condition out of the loop, that can less n-1 times than condition
> in loop.
[...]
>              k = -split_radix_permutation(i, n, s->inverse) & (n-1);
> -            if (s->revtab)
>                  s->revtab[k] = j;
> -            if (s->revtab32)
> -                s->revtab32[k] = j;
>          }

Does this really improve performance? You only save one if-check per
loop (as only one of s->revtab and s->revtab32 is defined). Benchmarks?

> +            if (s->revtab32) {
> +                for(i=0; i<n; i++) {
> +                    int k;
> +                    j = i;
> +                    if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)
> +                        j = (j&~3) | ((j>>1)&1) | ((j<<1)&2);
> +                        k = -split_radix_permutation(i, n, s->inverse) & (n-1);
> +                s->revtab32[k] = j;
> +            }
> +            }

On the other hand, all the code is duplicated. Is there a more elegant
way to do this?

Moritz
Steven Liu Dec. 16, 2018, 2:03 p.m.
> On Dec 16, 2018, at 21:47, Moritz Barsnick <barsnick@gmx.net> wrote:
> 
> On Sun, Dec 16, 2018 at 21:19:17 +0800, Steven Liu wrote:
>> move the two if condition out of the loop, that can less n-1 times than condition
>> in loop.
> [...]
>>             k = -split_radix_permutation(i, n, s->inverse) & (n-1);
>> -            if (s->revtab)
>>                 s->revtab[k] = j;
>> -            if (s->revtab32)
>> -                s->revtab32[k] = j;
>>         }
> 
> Does this really improve performance? You only save one if-check per
> loop (as only one of s->revtab and s->revtab32 is defined). Benchmarks?
> 
>> +            if (s->revtab32) {
>> +                for(i=0; i<n; i++) {
>> +                    int k;
>> +                    j = i;
>> +                    if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)
>> +                        j = (j&~3) | ((j>>1)&1) | ((j<<1)&2);
>> +                        k = -split_radix_permutation(i, n, s->inverse) & (n-1);
>> +                s->revtab32[k] = j;
>> +            }
>> +            }
> 
> On the other hand, all the code is duplicated. Is there a more elegant
> way to do this?

Hi Moritz,

For example:

Original logic:
int n = 32;
for (i = 0 ; i < n; i++) {
    if (check1) do_something;
    if (check2) do_something;
}

If check1 is true and check2 is false, this loop 32 times, check condition 64 times;

After this patch:
int n = 32;
if (check1)
for (i = 0 ; i < n; i++) {
    do_something;
}
if (check2)
for (i = 0 ; i < n; i++) {
    do_something;
}
If check1 is true and check2 is false, this loop 32 times, check condition  2 times

And there should only one of the two condition is true,
because the s->revtab or s->revtab32 is allocation at begin of ff_fft_init,
and if nbits <= 16 the s->revtab maybe true else s->revtab32 is true.

So i modify it here.

I think maybe have more elegant way to do this if this logic is ok.



> 
> Moritz
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Thanks
Steven
Carl Eugen Hoyos Dec. 17, 2018, 12:28 a.m.
2018-12-16 14:19 GMT+01:00, Steven Liu <lq@chinaffmpeg.org>:
> move the two if condition out of the loop, that can less
> n-1 times than condition in loop.

Please add some benchmarks to the commit message.

Carl Eugen
Steven Liu Dec. 17, 2018, 3:30 a.m.
Carl Eugen Hoyos <ceffmpeg@gmail.com> 于2018年12月17日周一 上午8:29写道:
>
> 2018-12-16 14:19 GMT+01:00, Steven Liu <lq@chinaffmpeg.org>:
> > move the two if condition out of the loop, that can less
> > n-1 times than condition in loop.
>
> Please add some benchmarks to the commit message.

Hi Folks,

I tested the performance of the modify,
before the patch:
MacBook:dash StevenLiu$ bash get_performance.sh
3921
after the patch:
MacBook:dash StevenLiu$ bash get_performance.sh
3723
MacBook:dash StevenLiu$ cat get_performance.sh

get 10 times data of the speed, and compute an average value, the
script as is bellow:

#!/usr/bin/bash

DURATION=0
for((i=0;i<10;i++)) do
./libavcodec/tests/fft -n 15 &>output
T_DURATION=`grep "duration" output | awk -F"=" '{ print $2}'`
DURATION=`expr $DURATION + $T_DURATION`
done
TOTAL=`expr $DURATION / 10`
echo $TOTAL


>
> Carl Eugen
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Patch hide | download patch | download mbox

diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c
index 762c014bc8..2d97534505 100644
--- a/libavcodec/fft_template.c
+++ b/libavcodec/fft_template.c
@@ -261,17 +261,26 @@  av_cold int ff_fft_init(FFTContext *s, int nbits, int inverse)
     if (s->fft_permutation == FF_FFT_PERM_AVX) {
         fft_perm_avx(s);
     } else {
+        if (s->revtab) {
         for(i=0; i<n; i++) {
             int k;
             j = i;
             if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)
                 j = (j&~3) | ((j>>1)&1) | ((j<<1)&2);
             k = -split_radix_permutation(i, n, s->inverse) & (n-1);
-            if (s->revtab)
                 s->revtab[k] = j;
-            if (s->revtab32)
-                s->revtab32[k] = j;
         }
+        }
+            if (s->revtab32) {
+                for(i=0; i<n; i++) {
+                    int k;
+                    j = i;
+                    if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)
+                        j = (j&~3) | ((j>>1)&1) | ((j<<1)&2);
+                        k = -split_radix_permutation(i, n, s->inverse) & (n-1);
+                s->revtab32[k] = j;
+            }
+            }
     }
 
     return 0;