mbox series

[FFmpeg-devel,v3,0/5] avcodec/ac3: Add aarch64 NEON DSP

Message ID 79644c46-4f7b-4713-901e-f9c0e5d46f09@geoffhill.org
Headers show
Series avcodec/ac3: Add aarch64 NEON DSP | expand

Message

Geoff Hill April 3, 2024, 6:43 a.m. UTC
Here's v3 to push the AC-3 ARMv8 NEON experiment a step further.

This version implements 5 of the AC-3 encoder DSP functions,
and adds checkasm tests where missing.

I've tested that the checkasm tests pass on aarch64 and x86.

On AWS Graviton2 (t4g.medium), GCC 12.3:

$ tests/checkasm/checkasm --bench --verbose --test=ac3dsp
...
NEON:
 - ac3dsp.ac3_exponent_min               [OK]
 - ac3dsp.ac3_extract_exponents          [OK]
 - ac3dsp.float_to_fixed24               [OK]
 - ac3dsp.ac3_sum_square_butterfly_int32 [OK]
 - ac3dsp.ac3_sum_square_butterfly_float [OK]
checkasm: all 20 tests passed
ac3_exponent_min_reuse0_c: 9.0
ac3_exponent_min_reuse0_neon: 9.7
ac3_exponent_min_reuse1_c: 1037.5
ac3_exponent_min_reuse1_neon: 54.0
ac3_exponent_min_reuse2_c: 1820.7
ac3_exponent_min_reuse2_neon: 135.2
ac3_exponent_min_reuse3_c: 2080.5
ac3_exponent_min_reuse3_neon: 167.7
ac3_exponent_min_reuse4_c: 2493.2
ac3_exponent_min_reuse4_neon: 200.0
ac3_exponent_min_reuse5_c: 2970.0
ac3_exponent_min_reuse5_neon: 231.7
ac3_extract_exponents_n512_c: 1717.5
ac3_extract_exponents_n512_neon: 506.7
ac3_extract_exponents_n768_c: 2562.7
ac3_extract_exponents_n768_neon: 769.7
ac3_extract_exponents_n1024_c: 3389.2
ac3_extract_exponents_n1024_neon: 1019.0
ac3_extract_exponents_n1280_c: 4210.7
ac3_extract_exponents_n1280_neon: 1267.5
ac3_extract_exponents_n1536_c: 5071.5
ac3_extract_exponents_n1536_neon: 1522.0
ac3_extract_exponents_n1792_c: 5896.5
ac3_extract_exponents_n1792_neon: 1784.0
ac3_extract_exponents_n2048_c: 6779.2
ac3_extract_exponents_n2048_neon: 2051.0
ac3_extract_exponents_n2304_c: 7559.5
ac3_extract_exponents_n2304_neon: 2290.0
ac3_extract_exponents_n2560_c: 8397.2
ac3_extract_exponents_n2560_neon: 2552.5
ac3_extract_exponents_n2816_c: 9224.2
ac3_extract_exponents_n2816_neon: 2797.7
ac3_extract_exponents_n3072_c: 10026.2
ac3_extract_exponents_n3072_neon: 3047.7
ac3_sum_square_bufferfly_float_c: 1605.7
ac3_sum_square_bufferfly_float_neon: 365.7
ac3_sum_square_bufferfly_int32_c: 965.5
ac3_sum_square_bufferfly_int32_neon: 486.2
float_to_fixed24_c: 2453.7
float_to_fixed24_neon: 516.2

Geoff Hill (5):
  avcodec/ac3: Implement float_to_fixed24 for aarch64 NEON
  avcodec/ac3: Implement ac3_exponent_min for aarch64 NEON
  avcodec/ac3: Implement ac3_extract_exponents for aarch64 NEON
  avcodec/ac3: Implement sum_square_butterfly_int32 for aarch64 NEON
  avcodec/ac3: Implement sum_square_butterfly_float for aarch64 NEON

 libavcodec/aarch64/Makefile              |   2 +
 libavcodec/aarch64/ac3dsp_init_aarch64.c |  50 +++++++++
 libavcodec/aarch64/ac3dsp_neon.S         | 125 ++++++++++++++++++++++
 libavcodec/ac3dsp.c                      |   4 +-
 libavcodec/ac3dsp.h                      |   3 +-
 tests/checkasm/ac3dsp.c                  | 130 +++++++++++++++++++++++
 6 files changed, 312 insertions(+), 2 deletions(-)
 create mode 100644 libavcodec/aarch64/ac3dsp_init_aarch64.c
 create mode 100644 libavcodec/aarch64/ac3dsp_neon.S

Comments

Martin Storsjö April 4, 2024, 12:57 p.m. UTC | #1
On Tue, 2 Apr 2024, Geoff Hill wrote:

> Here's v3 to push the AC-3 ARMv8 NEON experiment a step further.
>
> This version implements 5 of the AC-3 encoder DSP functions,
> and adds checkasm tests where missing.
>
> I've tested that the checkasm tests pass on aarch64 and x86.

Thanks, I've tested that checkasm also passes on 32 bit arm (where we also 
do have an ac3dsp implementation).

Overall the patches look mostly fine.

Are these implementations based on the existing 32 bit arm ones? The code 
is quite similar (although there's not very many different ways to 
implement things, so this could be a coincidence)? If based on the 
existing code, it would be good to retain the copyright statement from 
that file.

These functions have a different indentation than the rest of 
essentially all our aarch64 assembly (the code you're adding is aligned in 
two different ways) - please check other files (e.g. vp8dsp_neon.S) for 
example. The instructions should be aligned to 8 leading spaces, and 
operands to 24 leading characters.

Other than those generic points, I have two comments on the patches 
themselves.

// Martin