diff mbox series

[FFmpeg-devel,3/5] avcodec/fft_fixed: Hardcode cosine tables to save space

Message ID 20210106231308.2952217-3-andreas.rheinhardt@gmail.com
State Superseded
Headers show
Series [FFmpeg-devel,1/5] avcodec/tableprint: Don't include mem_internal.h
Related show

Checks

Context Check Description
andriy/x86_make success Make finished
andriy/x86_make_fate success Make fate finished
andriy/PPC64_make success Make finished
andriy/PPC64_make_fate success Make fate finished

Commit Message

Andreas Rheinhardt Jan. 6, 2021, 11:13 p.m. UTC
The tables that are used take 256B; the code to initialize them uses
281B here (GCC 9.3, x64, -O3, but in av_cold functions). On top of that,
removing this code also allows to remove the array of AVOnce used to
guard the cosine tables against multiple initializations; this also
removes relocations.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
---
 libavcodec/Makefile       |  2 +-
 libavcodec/fft.h          | 19 +++++++++---------
 libavcodec/fft_fixed.c    | 42 +++++++++++++++++++++++++++++++++++++++
 libavcodec/fft_template.c | 27 +++++++++----------------
 4 files changed, 62 insertions(+), 28 deletions(-)

Comments

Lynne Jan. 7, 2021, 3:42 p.m. UTC | #1
Jan 7, 2021, 00:13 by andreas.rheinhardt@gmail.com:

> The tables that are used take 256B; the code to initialize them uses
> 281B here (GCC 9.3, x64, -O3, but in av_cold functions). On top of that,
> removing this code also allows to remove the array of AVOnce used to
> guard the cosine tables against multiple initializations; this also
> removes relocations.
>
> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
> ---
>  libavcodec/Makefile       |  2 +-
>  libavcodec/fft.h          | 19 +++++++++---------
>  libavcodec/fft_fixed.c    | 42 +++++++++++++++++++++++++++++++++++++++
>  libavcodec/fft_template.c | 27 +++++++++----------------
>  4 files changed, 62 insertions(+), 28 deletions(-)
>

I'm not a big fan of this one. It saves minor amounts of space, introduces
more hard/soft table init splits, and is for code due to be replaced by
libavutil/tx anyway.
Andreas Rheinhardt Jan. 7, 2021, 4:05 p.m. UTC | #2
Lynne:
> Jan 7, 2021, 00:13 by andreas.rheinhardt@gmail.com:
> 
>> The tables that are used take 256B; the code to initialize them uses
>> 281B here (GCC 9.3, x64, -O3, but in av_cold functions). On top of that,
>> removing this code also allows to remove the array of AVOnce used to
>> guard the cosine tables against multiple initializations; this also
>> removes relocations.
>>
>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
>> ---
>>  libavcodec/Makefile       |  2 +-
>>  libavcodec/fft.h          | 19 +++++++++---------
>>  libavcodec/fft_fixed.c    | 42 +++++++++++++++++++++++++++++++++++++++
>>  libavcodec/fft_template.c | 27 +++++++++----------------
>>  4 files changed, 62 insertions(+), 28 deletions(-)
>>
> 
> I'm not a big fan of this one. It saves minor amounts of space, introduces
> more hard/soft table init splits, and is for code due to be replaced by
> libavutil/tx anyway.

It actually removes a hard/soft table init split: Before this patch the
cos tables for the 16-bit fixed-point FFT were sometimes hardcoded and
sometimes not; now the latter option doesn't exist any more, reducing
complexity.
(I actually pondered removing the big #if at the beginning of
fft_template.c by moving the code for FFT_FIXED_32 to fft_fixed_32.c and
the code for FFT_FLOAT to fft_float.c.)

- Andreas
Lynne Jan. 7, 2021, 4:32 p.m. UTC | #3
Jan 7, 2021, 17:05 by andreas.rheinhardt@gmail.com:

> Lynne:
>
>> Jan 7, 2021, 00:13 by andreas.rheinhardt@gmail.com:
>>
>>> The tables that are used take 256B; the code to initialize them uses
>>> 281B here (GCC 9.3, x64, -O3, but in av_cold functions). On top of that,
>>> removing this code also allows to remove the array of AVOnce used to
>>> guard the cosine tables against multiple initializations; this also
>>> removes relocations.
>>>
>>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
>>> ---
>>>  libavcodec/Makefile       |  2 +-
>>>  libavcodec/fft.h          | 19 +++++++++---------
>>>  libavcodec/fft_fixed.c    | 42 +++++++++++++++++++++++++++++++++++++++
>>>  libavcodec/fft_template.c | 27 +++++++++----------------
>>>  4 files changed, 62 insertions(+), 28 deletions(-)
>>>
>>
>> I'm not a big fan of this one. It saves minor amounts of space, introduces
>> more hard/soft table init splits, and is for code due to be replaced by
>> libavutil/tx anyway.
>>
>
> It actually removes a hard/soft table init split: Before this patch the
> cos tables for the 16-bit fixed-point FFT were sometimes hardcoded and
> sometimes not; now the latter option doesn't exist any more, reducing
> complexity.
> (I actually pondered removing the big #if at the beginning of
> fft_template.c by moving the code for FFT_FIXED_32 to fft_fixed_32.c and
> the code for FFT_FLOAT to fft_float.c.)
>

My opinion still stands for this patch. This does introduce ifdeffery.
And the size gains are marginal at best, 256 bytes aren't worth it,
and can be likely saved in an easier and cleaner way elsewhere.
Hendrik Leppkes Jan. 7, 2021, 8:26 p.m. UTC | #4
On Thu, Jan 7, 2021 at 5:32 PM Lynne <dev@lynne.ee> wrote:
>
> Jan 7, 2021, 17:05 by andreas.rheinhardt@gmail.com:
>
> > Lynne:
> >
> >> Jan 7, 2021, 00:13 by andreas.rheinhardt@gmail.com:
> >>
> >>> The tables that are used take 256B; the code to initialize them uses
> >>> 281B here (GCC 9.3, x64, -O3, but in av_cold functions). On top of that,
> >>> removing this code also allows to remove the array of AVOnce used to
> >>> guard the cosine tables against multiple initializations; this also
> >>> removes relocations.
> >>>
> >>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
> >>> ---
> >>>  libavcodec/Makefile       |  2 +-
> >>>  libavcodec/fft.h          | 19 +++++++++---------
> >>>  libavcodec/fft_fixed.c    | 42 +++++++++++++++++++++++++++++++++++++++
> >>>  libavcodec/fft_template.c | 27 +++++++++----------------
> >>>  4 files changed, 62 insertions(+), 28 deletions(-)
> >>>
> >>
> >> I'm not a big fan of this one. It saves minor amounts of space, introduces
> >> more hard/soft table init splits, and is for code due to be replaced by
> >> libavutil/tx anyway.
> >>
> >
> > It actually removes a hard/soft table init split: Before this patch the
> > cos tables for the 16-bit fixed-point FFT were sometimes hardcoded and
> > sometimes not; now the latter option doesn't exist any more, reducing
> > complexity.
> > (I actually pondered removing the big #if at the beginning of
> > fft_template.c by moving the code for FFT_FIXED_32 to fft_fixed_32.c and
> > the code for FFT_FLOAT to fft_float.c.)
> >
>
> My opinion still stands for this patch. This does introduce ifdeffery.
> And the size gains are marginal at best, 256 bytes aren't worth it,
> and can be likely saved in an easier and cleaner way elsewhere.

The patch seems to remove more ifdeffery then it adds, at worst it
moves a bit of it due to changing the previous hardcoded behavior.

- Hendrik
Lynne Jan. 7, 2021, 9:47 p.m. UTC | #5
Jan 7, 2021, 21:26 by h.leppkes@gmail.com:

> On Thu, Jan 7, 2021 at 5:32 PM Lynne <dev@lynne.ee> wrote:
>
>>
>> Jan 7, 2021, 17:05 by andreas.rheinhardt@gmail.com:
>>
>> > Lynne:
>> >
>> >> Jan 7, 2021, 00:13 by andreas.rheinhardt@gmail.com:
>> >>
>> >>> The tables that are used take 256B; the code to initialize them uses
>> >>> 281B here (GCC 9.3, x64, -O3, but in av_cold functions). On top of that,
>> >>> removing this code also allows to remove the array of AVOnce used to
>> >>> guard the cosine tables against multiple initializations; this also
>> >>> removes relocations.
>> >>>
>> >>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
>> >>> ---
>> >>>  libavcodec/Makefile       |  2 +-
>> >>>  libavcodec/fft.h          | 19 +++++++++---------
>> >>>  libavcodec/fft_fixed.c    | 42 +++++++++++++++++++++++++++++++++++++++
>> >>>  libavcodec/fft_template.c | 27 +++++++++----------------
>> >>>  4 files changed, 62 insertions(+), 28 deletions(-)
>> >>>
>> >>
>> >> I'm not a big fan of this one. It saves minor amounts of space, introduces
>> >> more hard/soft table init splits, and is for code due to be replaced by
>> >> libavutil/tx anyway.
>> >>
>> >
>> > It actually removes a hard/soft table init split: Before this patch the
>> > cos tables for the 16-bit fixed-point FFT were sometimes hardcoded and
>> > sometimes not; now the latter option doesn't exist any more, reducing
>> > complexity.
>> > (I actually pondered removing the big #if at the beginning of
>> > fft_template.c by moving the code for FFT_FIXED_32 to fft_fixed_32.c and
>> > the code for FFT_FLOAT to fft_float.c.)
>> >
>>
>> My opinion still stands for this patch. This does introduce ifdeffery.
>> And the size gains are marginal at best, 256 bytes aren't worth it,
>> and can be likely saved in an easier and cleaner way elsewhere.
>>
>
> The patch seems to remove more ifdeffery then it adds, at worst it
> moves a bit of it due to changing the previous hardcoded behavior.
>

To an extent.
I'm still objecting to hardcoding those tables though.
diff mbox series

Patch

diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index fea37ef3c9..8e03feb7d1 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -83,7 +83,7 @@  OBJS-$(CONFIG_EXIF)                    += exif.o tiff_common.o
 OBJS-$(CONFIG_FAANDCT)                 += faandct.o
 OBJS-$(CONFIG_FAANIDCT)                += faanidct.o
 OBJS-$(CONFIG_FDCTDSP)                 += fdctdsp.o jfdctfst.o jfdctint.o
-FFT-OBJS-$(CONFIG_HARDCODED_TABLES)    += cos_tables.o cos_fixed_tables.o
+FFT-OBJS-$(CONFIG_HARDCODED_TABLES)    += cos_tables.o
 OBJS-$(CONFIG_FFT)                     += avfft.o fft_fixed.o fft_float.o \
                                           fft_fixed_32.o fft_init_table.o \
                                           $(FFT-OBJS-yes)
diff --git a/libavcodec/fft.h b/libavcodec/fft.h
index fedc0c5ef0..4b54265b7f 100644
--- a/libavcodec/fft.h
+++ b/libavcodec/fft.h
@@ -114,10 +114,18 @@  struct FFTContext {
     uint32_t *revtab32;
 };
 
-#if CONFIG_HARDCODED_TABLES
+#if CONFIG_HARDCODED_TABLES || !FFT_FLOAT
 #define COSTABLE_CONST const
+#define ff_init_ff_cos_tabs(...)
 #else
 #define COSTABLE_CONST
+#define ff_init_ff_cos_tabs FFT_NAME(ff_init_ff_cos_tabs)
+
+/**
+ * Initialize the cosine table in ff_cos_tabs[index]
+ * @param index index in ff_cos_tabs array of the table to initialize
+ */
+void ff_init_ff_cos_tabs(int index);
 #endif
 
 #define COSTABLE(size) \
@@ -138,16 +146,9 @@  extern COSTABLE(16384);
 extern COSTABLE(32768);
 extern COSTABLE(65536);
 extern COSTABLE(131072);
-#endif /* FFT_FLOAT */
 extern COSTABLE_CONST FFTSample* const FFT_NAME(ff_cos_tabs)[];
 
-#define ff_init_ff_cos_tabs FFT_NAME(ff_init_ff_cos_tabs)
-
-/**
- * Initialize the cosine table in ff_cos_tabs[index]
- * @param index index in ff_cos_tabs array of the table to initialize
- */
-void ff_init_ff_cos_tabs(int index);
+#endif /* FFT_FLOAT */
 
 #define ff_fft_init FFT_NAME(ff_fft_init)
 #define ff_fft_end  FFT_NAME(ff_fft_end)
diff --git a/libavcodec/fft_fixed.c b/libavcodec/fft_fixed.c
index 52d225ee09..ce52dec7fd 100644
--- a/libavcodec/fft_fixed.c
+++ b/libavcodec/fft_fixed.c
@@ -19,4 +19,46 @@ 
 #define FFT_FLOAT 0
 #define FFT_FIXED_32 0
 #define MAX_BITS 7
+
+#include "fft.h"
+
+COSTABLE(16) = {
+     32767,  30274,  23170,  12540,
+         0,  12540,  23170,  30274,
+};
+COSTABLE(32) = {
+     32767,  32138,  30274,  27246,
+     23170,  18205,  12540,   6393,
+         0,   6393,  12540,  18205,
+     23170,  27246,  30274,  32138,
+};
+COSTABLE(64) = {
+     32767,  32610,  32138,  31357,
+     30274,  28899,  27246,  25330,
+     23170,  20788,  18205,  15447,
+     12540,   9512,   6393,   3212,
+         0,   3212,   6393,   9512,
+     12540,  15447,  18205,  20788,
+     23170,  25330,  27246,  28899,
+     30274,  31357,  32138,  32610,
+};
+COSTABLE(128) = {
+     32767,  32729,  32610,  32413,
+     32138,  31786,  31357,  30853,
+     30274,  29622,  28899,  28106,
+     27246,  26320,  25330,  24279,
+     23170,  22006,  20788,  19520,
+     18205,  16846,  15447,  14010,
+     12540,  11039,   9512,   7962,
+      6393,   4808,   3212,   1608,
+         0,   1608,   3212,   4808,
+      6393,   7962,   9512,  11039,
+     12540,  14010,  15447,  16846,
+     18205,  19520,  20788,  22006,
+     23170,  24279,  25330,  26320,
+     27246,  28106,  28899,  29622,
+     30274,  30853,  31357,  31786,
+     32138,  32413,  32610,  32729,
+};
+
 #include "fft_template.c"
diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c
index 7a7d51a6b4..9d125de073 100644
--- a/libavcodec/fft_template.c
+++ b/libavcodec/fft_template.c
@@ -42,12 +42,12 @@ 
 #else /* FFT_FIXED_32 */
 
 /* cos(2*pi*x/n) for 0<=x<=n/4, followed by its reverse */
+#if FFT_FLOAT
 #if !CONFIG_HARDCODED_TABLES
 COSTABLE(16);
 COSTABLE(32);
 COSTABLE(64);
 COSTABLE(128);
-#if FFT_FLOAT
 COSTABLE(256);
 COSTABLE(512);
 COSTABLE(1024);
@@ -58,7 +58,6 @@  COSTABLE(16384);
 COSTABLE(32768);
 COSTABLE(65536);
 COSTABLE(131072);
-#endif /* FFT_FLOAT */
 
 static av_cold void init_ff_cos_tabs(int index)
 {
@@ -87,7 +86,6 @@  INIT_FF_COS_TABS_FUNC(4, 16)
 INIT_FF_COS_TABS_FUNC(5, 32)
 INIT_FF_COS_TABS_FUNC(6, 64)
 INIT_FF_COS_TABS_FUNC(7, 128)
-#if FFT_FLOAT
 INIT_FF_COS_TABS_FUNC(8, 256)
 INIT_FF_COS_TABS_FUNC(9, 512)
 INIT_FF_COS_TABS_FUNC(10, 1024)
@@ -98,7 +96,6 @@  INIT_FF_COS_TABS_FUNC(14, 16384)
 INIT_FF_COS_TABS_FUNC(15, 32768)
 INIT_FF_COS_TABS_FUNC(16, 65536)
 INIT_FF_COS_TABS_FUNC(17, 131072)
-#endif /* FFT_FLOAT */
 
 static CosTabsInitOnce cos_tabs_init_once[] = {
     { NULL },
@@ -109,7 +106,6 @@  static CosTabsInitOnce cos_tabs_init_once[] = {
     { init_ff_cos_tabs_32, AV_ONCE_INIT },
     { init_ff_cos_tabs_64, AV_ONCE_INIT },
     { init_ff_cos_tabs_128, AV_ONCE_INIT },
-#if FFT_FLOAT
     { init_ff_cos_tabs_256, AV_ONCE_INIT },
     { init_ff_cos_tabs_512, AV_ONCE_INIT },
     { init_ff_cos_tabs_1024, AV_ONCE_INIT },
@@ -120,17 +116,20 @@  static CosTabsInitOnce cos_tabs_init_once[] = {
     { init_ff_cos_tabs_32768, AV_ONCE_INIT },
     { init_ff_cos_tabs_65536, AV_ONCE_INIT },
     { init_ff_cos_tabs_131072, AV_ONCE_INIT },
-#endif /* FFT_FLOAT */
 };
 
+av_cold void ff_init_ff_cos_tabs(int index)
+{
+    ff_thread_once(&cos_tabs_init_once[index].control, cos_tabs_init_once[index].func);
+}
 #endif
+
 COSTABLE_CONST FFTSample * const FFT_NAME(ff_cos_tabs)[] = {
     NULL, NULL, NULL, NULL,
     FFT_NAME(ff_cos_16),
     FFT_NAME(ff_cos_32),
     FFT_NAME(ff_cos_64),
     FFT_NAME(ff_cos_128),
-#if FFT_FLOAT
     FFT_NAME(ff_cos_256),
     FFT_NAME(ff_cos_512),
     FFT_NAME(ff_cos_1024),
@@ -141,8 +140,8 @@  COSTABLE_CONST FFTSample * const FFT_NAME(ff_cos_tabs)[] = {
     FFT_NAME(ff_cos_32768),
     FFT_NAME(ff_cos_65536),
     FFT_NAME(ff_cos_131072),
-#endif /* FFT_FLOAT */
 };
+#endif /* FFT_FLOAT */
 
 #endif /* FFT_FIXED_32 */
 
@@ -160,13 +159,6 @@  static int split_radix_permutation(int i, int n, int inverse)
     else                  return split_radix_permutation(i, m, inverse)*4 - 1;
 }
 
-av_cold void ff_init_ff_cos_tabs(int index)
-{
-#if (!CONFIG_HARDCODED_TABLES) && (!FFT_FIXED_32)
-    ff_thread_once(&cos_tabs_init_once[index].control, cos_tabs_init_once[index].func);
-#endif
-}
-
 static const int avx_tab[] = {
     0, 4, 1, 5, 8, 12, 9, 13, 2, 6, 3, 7, 10, 14, 11, 15
 };
@@ -250,13 +242,12 @@  av_cold int ff_fft_init(FFTContext *s, int nbits, int inverse)
     if (ARCH_X86)     ff_fft_init_x86(s);
     if (CONFIG_MDCT)  s->mdct_calcw = s->mdct_calc;
     if (HAVE_MIPSFPU) ff_fft_init_mips(s);
+    for (j = 4; !CONFIG_HARDCODED_TABLES && j <= nbits; j++)
+        ff_init_ff_cos_tabs(j);
 #else
     if (CONFIG_MDCT)  s->mdct_calcw = ff_mdct_calcw_c;
     if (ARCH_ARM)     ff_fft_fixed_init_arm(s);
 #endif
-    for(j=4; j<=nbits; j++) {
-        ff_init_ff_cos_tabs(j);
-    }
 #endif /* FFT_FIXED_32 */