Message ID | 20181221100950.62785-1-lq@chinaffmpeg.org |
---|---|
State | New |
Headers | show |
On 12/21/18, Steven Liu <lq@chinaffmpeg.org> wrote: > Before patch: > init nbits = 17, get 10000 samples, average cost: 16105 us > After patch: > init nbits = 17, get 10000 samples, average cost: 15221 us > > Signed-off-by: Steven Liu <lq@chinaffmpeg.org> > --- > libavcodec/fft_template.c | 26 +++++++++++++++----------- > 1 file changed, 15 insertions(+), 11 deletions(-) Probably OK, but maybe Michael want to add something?
On Fri, Dec 21, 2018 at 06:09:50PM +0800, Steven Liu wrote: > Before patch: > init nbits = 17, get 10000 samples, average cost: 16105 us > After patch: > init nbits = 17, get 10000 samples, average cost: 15221 us > > Signed-off-by: Steven Liu <lq@chinaffmpeg.org> > --- > libavcodec/fft_template.c | 26 +++++++++++++++----------- > 1 file changed, 15 insertions(+), 11 deletions(-) > > diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c > index 762c014bc8..2b528be882 100644 > --- a/libavcodec/fft_template.c > +++ b/libavcodec/fft_template.c > @@ -261,17 +261,21 @@ av_cold int ff_fft_init(FFTContext *s, int nbits, int inverse) > if (s->fft_permutation == FF_FFT_PERM_AVX) { > fft_perm_avx(s); > } else { > - for(i=0; i<n; i++) { > - int k; > - j = i; > - if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS) > - j = (j&~3) | ((j>>1)&1) | ((j<<1)&2); > - k = -split_radix_permutation(i, n, s->inverse) & (n-1); > - if (s->revtab) > - s->revtab[k] = j; > - if (s->revtab32) > - s->revtab32[k] = j; > - } > +#define SPLIT_RADIX_PERMUTATION(num) do { \ > + for(i=0; i<n; i++) {\ > + int k;\ > + j = i;\ > + if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)\ maybe this if() should be factored out too ? the change looks good though, and iam not sure this is speed relevant enough thx [...]
On 12/24/18, Michael Niedermayer <michael@niedermayer.cc> wrote: > On Fri, Dec 21, 2018 at 06:09:50PM +0800, Steven Liu wrote: >> Before patch: >> init nbits = 17, get 10000 samples, average cost: 16105 us >> After patch: >> init nbits = 17, get 10000 samples, average cost: 15221 us >> >> Signed-off-by: Steven Liu <lq@chinaffmpeg.org> >> --- >> libavcodec/fft_template.c | 26 +++++++++++++++----------- >> 1 file changed, 15 insertions(+), 11 deletions(-) >> >> diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c >> index 762c014bc8..2b528be882 100644 >> --- a/libavcodec/fft_template.c >> +++ b/libavcodec/fft_template.c >> @@ -261,17 +261,21 @@ av_cold int ff_fft_init(FFTContext *s, int nbits, >> int inverse) >> if (s->fft_permutation == FF_FFT_PERM_AVX) { >> fft_perm_avx(s); >> } else { >> - for(i=0; i<n; i++) { >> - int k; >> - j = i; >> - if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS) >> - j = (j&~3) | ((j>>1)&1) | ((j<<1)&2); >> - k = -split_radix_permutation(i, n, s->inverse) & (n-1); >> - if (s->revtab) >> - s->revtab[k] = j; >> - if (s->revtab32) >> - s->revtab32[k] = j; >> - } >> +#define SPLIT_RADIX_PERMUTATION(num) do { \ >> + for(i=0; i<n; i++) {\ >> + int k;\ >> + j = i;\ > >> + if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)\ > > maybe this if() should be factored out too ? > > the change looks good though, and iam not sure this is speed relevant > enough Well, it helps. But this code is partially going to be rewritten anyway in future.
> 在 2018年12月25日,上午1:28,Paul B Mahol <onemda@gmail.com> 写道: > > On 12/24/18, Michael Niedermayer <michael@niedermayer.cc> wrote: >> On Fri, Dec 21, 2018 at 06:09:50PM +0800, Steven Liu wrote: >>> Before patch: >>> init nbits = 17, get 10000 samples, average cost: 16105 us >>> After patch: >>> init nbits = 17, get 10000 samples, average cost: 15221 us >>> >>> Signed-off-by: Steven Liu <lq@chinaffmpeg.org> >>> --- >>> libavcodec/fft_template.c | 26 +++++++++++++++----------- >>> 1 file changed, 15 insertions(+), 11 deletions(-) >>> >>> diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c >>> index 762c014bc8..2b528be882 100644 >>> --- a/libavcodec/fft_template.c >>> +++ b/libavcodec/fft_template.c >>> @@ -261,17 +261,21 @@ av_cold int ff_fft_init(FFTContext *s, int nbits, >>> int inverse) >>> if (s->fft_permutation == FF_FFT_PERM_AVX) { >>> fft_perm_avx(s); >>> } else { >>> - for(i=0; i<n; i++) { >>> - int k; >>> - j = i; >>> - if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS) >>> - j = (j&~3) | ((j>>1)&1) | ((j<<1)&2); >>> - k = -split_radix_permutation(i, n, s->inverse) & (n-1); >>> - if (s->revtab) >>> - s->revtab[k] = j; >>> - if (s->revtab32) >>> - s->revtab32[k] = j; >>> - } >>> +#define SPLIT_RADIX_PERMUTATION(num) do { \ >>> + for(i=0; i<n; i++) {\ >>> + int k;\ >>> + j = i;\ >> >>> + if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)\ >> >> maybe this if() should be factored out too ? >> >> the change looks good though, and iam not sure this is speed relevant >> enough > > Well, it helps. But this code is partially going to be rewritten > anyway in future. Hi Paul, So what i should do? Go on update this patch with Michael’s comment or waiting for your rewritten? Thanks Steven > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
On 12/25/18, Liu Steven <lq@chinaffmpeg.org> wrote: > > >> 在 2018年12月25日,上午1:28,Paul B Mahol <onemda@gmail.com> 写道: >> >> On 12/24/18, Michael Niedermayer <michael@niedermayer.cc> wrote: >>> On Fri, Dec 21, 2018 at 06:09:50PM +0800, Steven Liu wrote: >>>> Before patch: >>>> init nbits = 17, get 10000 samples, average cost: 16105 us >>>> After patch: >>>> init nbits = 17, get 10000 samples, average cost: 15221 us >>>> >>>> Signed-off-by: Steven Liu <lq@chinaffmpeg.org> >>>> --- >>>> libavcodec/fft_template.c | 26 +++++++++++++++----------- >>>> 1 file changed, 15 insertions(+), 11 deletions(-) >>>> >>>> diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c >>>> index 762c014bc8..2b528be882 100644 >>>> --- a/libavcodec/fft_template.c >>>> +++ b/libavcodec/fft_template.c >>>> @@ -261,17 +261,21 @@ av_cold int ff_fft_init(FFTContext *s, int nbits, >>>> int inverse) >>>> if (s->fft_permutation == FF_FFT_PERM_AVX) { >>>> fft_perm_avx(s); >>>> } else { >>>> - for(i=0; i<n; i++) { >>>> - int k; >>>> - j = i; >>>> - if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS) >>>> - j = (j&~3) | ((j>>1)&1) | ((j<<1)&2); >>>> - k = -split_radix_permutation(i, n, s->inverse) & (n-1); >>>> - if (s->revtab) >>>> - s->revtab[k] = j; >>>> - if (s->revtab32) >>>> - s->revtab32[k] = j; >>>> - } >>>> +#define SPLIT_RADIX_PERMUTATION(num) do { \ >>>> + for(i=0; i<n; i++) {\ >>>> + int k;\ >>>> + j = i;\ >>> >>>> + if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)\ >>> >>> maybe this if() should be factored out too ? >>> >>> the change looks good though, and iam not sure this is speed relevant >>> enough >> >> Well, it helps. But this code is partially going to be rewritten >> anyway in future. > > Hi Paul, > > So what i should do? Go on update this patch with Michael’s comment or > waiting for your rewritten? Update patch.
diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c index 762c014bc8..2b528be882 100644 --- a/libavcodec/fft_template.c +++ b/libavcodec/fft_template.c @@ -261,17 +261,21 @@ av_cold int ff_fft_init(FFTContext *s, int nbits, int inverse) if (s->fft_permutation == FF_FFT_PERM_AVX) { fft_perm_avx(s); } else { - for(i=0; i<n; i++) { - int k; - j = i; - if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS) - j = (j&~3) | ((j>>1)&1) | ((j<<1)&2); - k = -split_radix_permutation(i, n, s->inverse) & (n-1); - if (s->revtab) - s->revtab[k] = j; - if (s->revtab32) - s->revtab32[k] = j; - } +#define SPLIT_RADIX_PERMUTATION(num) do { \ + for(i=0; i<n; i++) {\ + int k;\ + j = i;\ + if (s->fft_permutation == FF_FFT_PERM_SWAP_LSBS)\ + j = (j&~3) | ((j>>1)&1) | ((j<<1)&2);\ + k = -split_radix_permutation(i, n, s->inverse) & (n-1);\ + s->revtab##num[k] = j;\ + }\ +} while(0); + if (s->revtab) + SPLIT_RADIX_PERMUTATION() + if (s->revtab32) + SPLIT_RADIX_PERMUTATION(32) +#undef SPLIT_RADIX_PERMUTATION } return 0;
Before patch: init nbits = 17, get 10000 samples, average cost: 16105 us After patch: init nbits = 17, get 10000 samples, average cost: 15221 us Signed-off-by: Steven Liu <lq@chinaffmpeg.org> --- libavcodec/fft_template.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-)