Message ID | 20181216131918.1562-1-lq@chinaffmpeg.org |
---|---|
State | Accepted |
Headers | show |
On Sun, Dec 16, 2018 at 21:19:17 +0800, Steven Liu wrote: > move the two if condition out of the loop, that can less n-1 times than condition > in loop. [...] > k = -split_radix_permutation(i, n, s->inverse) & (n-1); > - if (s->revtab) > s->revtab[k] = j; > - if (s->revtab32) > - s->revtab32[k] = j; > } Does this really improve performance? You only save one if-check per loop (as only one of s->revtab and s->revtab32 is defined). Benchmarks? > + if (s->revtab32) { > + for(i=0; i<n; i++) { > + int k; > + j = i; > + if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS) > + j = (j&~3) | ((j>>1)&1) | ((j<<1)&2); > + k = -split_radix_permutation(i, n, s->inverse) & (n-1); > + s->revtab32[k] = j; > + } > + } On the other hand, all the code is duplicated. Is there a more elegant way to do this? Moritz
> On Dec 16, 2018, at 21:47, Moritz Barsnick <barsnick@gmx.net> wrote: > > On Sun, Dec 16, 2018 at 21:19:17 +0800, Steven Liu wrote: >> move the two if condition out of the loop, that can less n-1 times than condition >> in loop. > [...] >> k = -split_radix_permutation(i, n, s->inverse) & (n-1); >> - if (s->revtab) >> s->revtab[k] = j; >> - if (s->revtab32) >> - s->revtab32[k] = j; >> } > > Does this really improve performance? You only save one if-check per > loop (as only one of s->revtab and s->revtab32 is defined). Benchmarks? > >> + if (s->revtab32) { >> + for(i=0; i<n; i++) { >> + int k; >> + j = i; >> + if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS) >> + j = (j&~3) | ((j>>1)&1) | ((j<<1)&2); >> + k = -split_radix_permutation(i, n, s->inverse) & (n-1); >> + s->revtab32[k] = j; >> + } >> + } > > On the other hand, all the code is duplicated. Is there a more elegant > way to do this? Hi Moritz, For example: Original logic: int n = 32; for (i = 0 ; i < n; i++) { if (check1) do_something; if (check2) do_something; } If check1 is true and check2 is false, this loop 32 times, check condition 64 times; After this patch: int n = 32; if (check1) for (i = 0 ; i < n; i++) { do_something; } if (check2) for (i = 0 ; i < n; i++) { do_something; } If check1 is true and check2 is false, this loop 32 times, check condition 2 times And there should only one of the two condition is true, because the s->revtab or s->revtab32 is allocation at begin of ff_fft_init, and if nbits <= 16 the s->revtab maybe true else s->revtab32 is true. So i modify it here. I think maybe have more elegant way to do this if this logic is ok. > > Moritz > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel Thanks Steven
2018-12-16 14:19 GMT+01:00, Steven Liu <lq@chinaffmpeg.org>: > move the two if condition out of the loop, that can less > n-1 times than condition in loop. Please add some benchmarks to the commit message. Carl Eugen
Carl Eugen Hoyos <ceffmpeg@gmail.com> 于2018年12月17日周一 上午8:29写道: > > 2018-12-16 14:19 GMT+01:00, Steven Liu <lq@chinaffmpeg.org>: > > move the two if condition out of the loop, that can less > > n-1 times than condition in loop. > > Please add some benchmarks to the commit message. Hi Folks, I tested the performance of the modify, before the patch: MacBook:dash StevenLiu$ bash get_performance.sh 3921 after the patch: MacBook:dash StevenLiu$ bash get_performance.sh 3723 MacBook:dash StevenLiu$ cat get_performance.sh get 10 times data of the speed, and compute an average value, the script as is bellow: #!/usr/bin/bash DURATION=0 for((i=0;i<10;i++)) do ./libavcodec/tests/fft -n 15 &>output T_DURATION=`grep "duration" output | awk -F"=" '{ print $2}'` DURATION=`expr $DURATION + $T_DURATION` done TOTAL=`expr $DURATION / 10` echo $TOTAL > > Carl Eugen > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c index 762c014bc8..2d97534505 100644 --- a/libavcodec/fft_template.c +++ b/libavcodec/fft_template.c @@ -261,17 +261,26 @@ av_cold int ff_fft_init(FFTContext *s, int nbits, int inverse) if (s->fft_permutation == FF_FFT_PERM_AVX) { fft_perm_avx(s); } else { + if (s->revtab) { for(i=0; i<n; i++) { int k; j = i; if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS) j = (j&~3) | ((j>>1)&1) | ((j<<1)&2); k = -split_radix_permutation(i, n, s->inverse) & (n-1); - if (s->revtab) s->revtab[k] = j; - if (s->revtab32) - s->revtab32[k] = j; } + } + if (s->revtab32) { + for(i=0; i<n; i++) { + int k; + j = i; + if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS) + j = (j&~3) | ((j>>1)&1) | ((j<<1)&2); + k = -split_radix_permutation(i, n, s->inverse) & (n-1); + s->revtab32[k] = j; + } + } } return 0;
move the two if condition out of the loop, that can less n-1 times than condition in loop. Signed-off-by: Steven Liu <lq@chinaffmpeg.org> --- libavcodec/fft_template.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)