[FFmpeg-devel,v2] avcodec/fft_template: libavcodec/fft_template: improve performance of the ff_fft_init in fft_template

Submitted by Steven Liu on Dec. 21, 2018, 10:09 a.m.

Details

Message ID 20181221100950.62785-1-lq@chinaffmpeg.org
State New
Headers show

Commit Message

Steven Liu Dec. 21, 2018, 10:09 a.m.
Before patch:
init nbits = 17, get 10000 samples, average cost: 16105 us
After patch:
init nbits = 17, get 10000 samples, average cost: 15221 us

Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
---
 libavcodec/fft_template.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

Comments

Paul B Mahol Dec. 23, 2018, 9:08 p.m.
On 12/21/18, Steven Liu <lq@chinaffmpeg.org> wrote:
> Before patch:
> init nbits = 17, get 10000 samples, average cost: 16105 us
> After patch:
> init nbits = 17, get 10000 samples, average cost: 15221 us
>
> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
> ---
>  libavcodec/fft_template.c | 26 +++++++++++++++-----------
>  1 file changed, 15 insertions(+), 11 deletions(-)

Probably OK, but maybe Michael want to add something?
Michael Niedermayer Dec. 24, 2018, 5:16 p.m.
On Fri, Dec 21, 2018 at 06:09:50PM +0800, Steven Liu wrote:
> Before patch:
> init nbits = 17, get 10000 samples, average cost: 16105 us
> After patch:
> init nbits = 17, get 10000 samples, average cost: 15221 us
> 
> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
> ---
>  libavcodec/fft_template.c | 26 +++++++++++++++-----------
>  1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c
> index 762c014bc8..2b528be882 100644
> --- a/libavcodec/fft_template.c
> +++ b/libavcodec/fft_template.c
> @@ -261,17 +261,21 @@ av_cold int ff_fft_init(FFTContext *s, int nbits, int inverse)
>      if (s->fft_permutation == FF_FFT_PERM_AVX) {
>          fft_perm_avx(s);
>      } else {
> -        for(i=0; i<n; i++) {
> -            int k;
> -            j = i;
> -            if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)
> -                j = (j&~3) | ((j>>1)&1) | ((j<<1)&2);
> -            k = -split_radix_permutation(i, n, s->inverse) & (n-1);
> -            if (s->revtab)
> -                s->revtab[k] = j;
> -            if (s->revtab32)
> -                s->revtab32[k] = j;
> -        }
> +#define SPLIT_RADIX_PERMUTATION(num) do { \
> +    for(i=0; i<n; i++) {\
> +        int k;\
> +        j = i;\

> +        if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)\

maybe this if() should be factored out too ?

the change looks good though, and iam not sure this is speed relevant
enough

thx

[...]
Paul B Mahol Dec. 24, 2018, 5:28 p.m.
On 12/24/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
> On Fri, Dec 21, 2018 at 06:09:50PM +0800, Steven Liu wrote:
>> Before patch:
>> init nbits = 17, get 10000 samples, average cost: 16105 us
>> After patch:
>> init nbits = 17, get 10000 samples, average cost: 15221 us
>>
>> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
>> ---
>>  libavcodec/fft_template.c | 26 +++++++++++++++-----------
>>  1 file changed, 15 insertions(+), 11 deletions(-)
>>
>> diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c
>> index 762c014bc8..2b528be882 100644
>> --- a/libavcodec/fft_template.c
>> +++ b/libavcodec/fft_template.c
>> @@ -261,17 +261,21 @@ av_cold int ff_fft_init(FFTContext *s, int nbits,
>> int inverse)
>>      if (s->fft_permutation == FF_FFT_PERM_AVX) {
>>          fft_perm_avx(s);
>>      } else {
>> -        for(i=0; i<n; i++) {
>> -            int k;
>> -            j = i;
>> -            if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)
>> -                j = (j&~3) | ((j>>1)&1) | ((j<<1)&2);
>> -            k = -split_radix_permutation(i, n, s->inverse) & (n-1);
>> -            if (s->revtab)
>> -                s->revtab[k] = j;
>> -            if (s->revtab32)
>> -                s->revtab32[k] = j;
>> -        }
>> +#define SPLIT_RADIX_PERMUTATION(num) do { \
>> +    for(i=0; i<n; i++) {\
>> +        int k;\
>> +        j = i;\
>
>> +        if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)\
>
> maybe this if() should be factored out too ?
>
> the change looks good though, and iam not sure this is speed relevant
> enough

Well, it helps. But this code is partially going to be rewritten
anyway in future.
Steven Liu Dec. 25, 2018, 2:47 a.m.
> 在 2018年12月25日,上午1:28,Paul B Mahol <onemda@gmail.com> 写道:
> 
> On 12/24/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
>> On Fri, Dec 21, 2018 at 06:09:50PM +0800, Steven Liu wrote:
>>> Before patch:
>>> init nbits = 17, get 10000 samples, average cost: 16105 us
>>> After patch:
>>> init nbits = 17, get 10000 samples, average cost: 15221 us
>>> 
>>> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
>>> ---
>>> libavcodec/fft_template.c | 26 +++++++++++++++-----------
>>> 1 file changed, 15 insertions(+), 11 deletions(-)
>>> 
>>> diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c
>>> index 762c014bc8..2b528be882 100644
>>> --- a/libavcodec/fft_template.c
>>> +++ b/libavcodec/fft_template.c
>>> @@ -261,17 +261,21 @@ av_cold int ff_fft_init(FFTContext *s, int nbits,
>>> int inverse)
>>>     if (s->fft_permutation == FF_FFT_PERM_AVX) {
>>>         fft_perm_avx(s);
>>>     } else {
>>> -        for(i=0; i<n; i++) {
>>> -            int k;
>>> -            j = i;
>>> -            if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)
>>> -                j = (j&~3) | ((j>>1)&1) | ((j<<1)&2);
>>> -            k = -split_radix_permutation(i, n, s->inverse) & (n-1);
>>> -            if (s->revtab)
>>> -                s->revtab[k] = j;
>>> -            if (s->revtab32)
>>> -                s->revtab32[k] = j;
>>> -        }
>>> +#define SPLIT_RADIX_PERMUTATION(num) do { \
>>> +    for(i=0; i<n; i++) {\
>>> +        int k;\
>>> +        j = i;\
>> 
>>> +        if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)\
>> 
>> maybe this if() should be factored out too ?
>> 
>> the change looks good though, and iam not sure this is speed relevant
>> enough
> 
> Well, it helps. But this code is partially going to be rewritten
> anyway in future.

Hi Paul,

    So what i should do? Go on update this patch with Michael’s comment  or waiting for your rewritten?

Thanks

Steven
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Paul B Mahol Dec. 25, 2018, 9:35 a.m.
On 12/25/18, Liu Steven <lq@chinaffmpeg.org> wrote:
>
>
>> 在 2018年12月25日,上午1:28,Paul B Mahol <onemda@gmail.com> 写道:
>>
>> On 12/24/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
>>> On Fri, Dec 21, 2018 at 06:09:50PM +0800, Steven Liu wrote:
>>>> Before patch:
>>>> init nbits = 17, get 10000 samples, average cost: 16105 us
>>>> After patch:
>>>> init nbits = 17, get 10000 samples, average cost: 15221 us
>>>>
>>>> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
>>>> ---
>>>> libavcodec/fft_template.c | 26 +++++++++++++++-----------
>>>> 1 file changed, 15 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c
>>>> index 762c014bc8..2b528be882 100644
>>>> --- a/libavcodec/fft_template.c
>>>> +++ b/libavcodec/fft_template.c
>>>> @@ -261,17 +261,21 @@ av_cold int ff_fft_init(FFTContext *s, int nbits,
>>>> int inverse)
>>>>     if (s->fft_permutation == FF_FFT_PERM_AVX) {
>>>>         fft_perm_avx(s);
>>>>     } else {
>>>> -        for(i=0; i<n; i++) {
>>>> -            int k;
>>>> -            j = i;
>>>> -            if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)
>>>> -                j = (j&~3) | ((j>>1)&1) | ((j<<1)&2);
>>>> -            k = -split_radix_permutation(i, n, s->inverse) & (n-1);
>>>> -            if (s->revtab)
>>>> -                s->revtab[k] = j;
>>>> -            if (s->revtab32)
>>>> -                s->revtab32[k] = j;
>>>> -        }
>>>> +#define SPLIT_RADIX_PERMUTATION(num) do { \
>>>> +    for(i=0; i<n; i++) {\
>>>> +        int k;\
>>>> +        j = i;\
>>>
>>>> +        if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)\
>>>
>>> maybe this if() should be factored out too ?
>>>
>>> the change looks good though, and iam not sure this is speed relevant
>>> enough
>>
>> Well, it helps. But this code is partially going to be rewritten
>> anyway in future.
>
> Hi Paul,
>
>     So what i should do? Go on update this patch with Michael’s comment  or
> waiting for your rewritten?

Update patch.

Patch hide | download patch | download mbox

diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c
index 762c014bc8..2b528be882 100644
--- a/libavcodec/fft_template.c
+++ b/libavcodec/fft_template.c
@@ -261,17 +261,21 @@  av_cold int ff_fft_init(FFTContext *s, int nbits, int inverse)
     if (s->fft_permutation == FF_FFT_PERM_AVX) {
         fft_perm_avx(s);
     } else {
-        for(i=0; i<n; i++) {
-            int k;
-            j = i;
-            if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)
-                j = (j&~3) | ((j>>1)&1) | ((j<<1)&2);
-            k = -split_radix_permutation(i, n, s->inverse) & (n-1);
-            if (s->revtab)
-                s->revtab[k] = j;
-            if (s->revtab32)
-                s->revtab32[k] = j;
-        }
+#define SPLIT_RADIX_PERMUTATION(num) do { \
+    for(i=0; i<n; i++) {\
+        int k;\
+        j = i;\
+        if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)\
+        j = (j&~3) | ((j>>1)&1) | ((j<<1)&2);\
+        k = -split_radix_permutation(i, n, s->inverse) & (n-1);\
+        s->revtab##num[k] = j;\
+    }\
+} while(0);
+    if (s->revtab)
+        SPLIT_RADIX_PERMUTATION()
+    if (s->revtab32)
+        SPLIT_RADIX_PERMUTATION(32)
+#undef SPLIT_RADIX_PERMUTATION
     }
 
     return 0;