Message ID | 20240326164739.153011-1-asoulier@google.com |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel,1/5] configure: Add option for enabling LC3/LC3plus wrapper | expand |
Context | Check | Description |
---|---|---|
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
Isn't this using sub-optimal MDCT implementation?
What do you mean by sub-optimal? It's stacked by prime factors, and unrolled for FFT3 and FF5. The butterfly implementations of FFT3 and FF5, gives me slightly slower computation. FFT5 is done first, so it takes advantage of sin()/cos() values of 0 or 1. There are also no reordering steps (this stage is completely removed), but cannot run in-place. Benchmarks I made show that it runs slightly faster. On Tue, Mar 26, 2024 at 9:59 AM Paul B Mahol <onemda@gmail.com> wrote: > > Isn't this using sub-optimal MDCT implementation? >
On date Tuesday 2024-03-26 16:47:35 +0000, ffmpeg-devel Mailing List wrote: > Signed-off-by: Antoine Soulier <asoulier@google.com> > Signed-off-by: Antoine SOULIER <asoulier@google.com> why the double sign-off? [...] LGTM.
Arf, sorry for that. I used `git send-email -s`, perhaps it's the source of the double signed-off. On Tue, Mar 26, 2024 at 10:32 AM Stefano Sabatini <stefasab@gmail.com> wrote: > On date Tuesday 2024-03-26 16:47:35 +0000, ffmpeg-devel Mailing List wrote: > > > Signed-off-by: Antoine Soulier <asoulier@google.com> > > Signed-off-by: Antoine SOULIER <asoulier@google.com> > > why the double sign-off? > > [...] > > LGTM. >
On Tue, Mar 26, 2024 at 6:07 PM Antoine Soulier <asoulier@google.com> wrote: > What do you mean by sub-optimal? > It's stacked by prime factors, and unrolled for FFT3 and FF5. > The butterfly implementations of FFT3 and FF5, gives me slightly slower > computation. FFT5 is done first, so it takes advantage of sin()/cos() > values of 0 or 1. > There are also no reordering steps (this stage is completely removed), but > cannot run in-place. > Benchmarks I made show that it runs slightly faster. > Compared with what? Where is at least x86 SIMD for that MDCT? > > On Tue, Mar 26, 2024 at 9:59 AM Paul B Mahol <onemda@gmail.com> wrote: > >> >> Isn't this using sub-optimal MDCT implementation? >> >
Compared with the C implementation of KissFFT (it's the only one I tested on ARM M4). Yes, there is no SIMD on x86. This was not the main target. Was mainly made for ARM M4 (for BLE devices Nordic Semi / Zephyr), and ARM Neon (Android). By the way, this does not change a lot, the FFT/MDCT on powerful CPU's is marginal compared to the read/write of the bitstream arithmetically coded. We can perhaps connect the FFMpeg implementation, but it will probably miss 2 things: - Some transformations are not a multiple of 15, but only 5 * 2^n. I guess FFmpeg only has a base 15 implementation. - It uses asymmetric windowing, to reduce algorithmic delay. Some coefficients are zeroed. Not important, but will need a larger coefficients table, and a bunch of multiplication by 0, without a specific implementation. So I think it will need some work. On Tue, Mar 26, 2024 at 10:45 AM Paul B Mahol <onemda@gmail.com> wrote: > > > On Tue, Mar 26, 2024 at 6:07 PM Antoine Soulier <asoulier@google.com> > wrote: > >> What do you mean by sub-optimal? >> It's stacked by prime factors, and unrolled for FFT3 and FF5. >> The butterfly implementations of FFT3 and FF5, gives me slightly slower >> computation. FFT5 is done first, so it takes advantage of sin()/cos() >> values of 0 or 1. >> There are also no reordering steps (this stage is completely removed), >> but cannot run in-place. >> Benchmarks I made show that it runs slightly faster. >> > > Compared with what? > Where is at least x86 SIMD for that MDCT? > > >> >> On Tue, Mar 26, 2024 at 9:59 AM Paul B Mahol <onemda@gmail.com> wrote: >> >>> >>> Isn't this using sub-optimal MDCT implementation? >>> >>
diff --git a/configure b/configure index 343edb38ab..eb8ff81a11 100755 --- a/configure +++ b/configure @@ -244,6 +244,7 @@ External library support: --enable-libjxl enable JPEG XL de/encoding via libjxl [no] --enable-libklvanc enable Kernel Labs VANC processing [no] --enable-libkvazaar enable HEVC encoding via libkvazaar [no] + --enable-liblc3 enable LC3 de/encoding via liblc3 [no] --enable-liblensfun enable lensfun lens correction [no] --enable-libmodplug enable ModPlug via libmodplug [no] --enable-libmp3lame enable MP3 encoding via libmp3lame [no] @@ -1926,6 +1927,7 @@ EXTERNAL_LIBRARY_LIST=" libjxl libklvanc libkvazaar + liblc3 libmodplug libmp3lame libmysofa @@ -3501,6 +3503,10 @@ libilbc_encoder_deps="libilbc" libjxl_decoder_deps="libjxl libjxl_threads" libjxl_encoder_deps="libjxl libjxl_threads" libkvazaar_encoder_deps="libkvazaar" +liblc3_lc3_decoder_deps="liblc3" +liblc3_lc3plus_decoder_deps="liblc3" +liblc3_encoder_deps="liblc3" +liblc3_encoder_select="audio_frame_queue" libmodplug_demuxer_deps="libmodplug" libmp3lame_encoder_deps="libmp3lame" libmp3lame_encoder_select="audio_frame_queue mpegaudioheader" @@ -6858,6 +6864,7 @@ enabled libjxl && require_pkg_config libjxl "libjxl >= 0.7.0" jxl/dec require_pkg_config libjxl_threads "libjxl_threads >= 0.7.0" jxl/thread_parallel_runner.h JxlThreadParallelRunner enabled libklvanc && require libklvanc libklvanc/vanc.h klvanc_context_create -lklvanc enabled libkvazaar && require_pkg_config libkvazaar "kvazaar >= 2.0.0" kvazaar.h kvz_api_get +enabled liblc3 && require_pkg_config liblc3 "lc3 >= 1.1.0" lc3.h lc3_hr_setup_encoder enabled liblensfun && require_pkg_config liblensfun lensfun lensfun.h lf_db_create if enabled libmfx && enabled libvpl; then