Message ID | 20240408105705.99898-1-martin@martin.st |
---|---|
State | Accepted |
Commit | 359b6a7f8aec3736451f5179deb98f173fbff0dd |
Headers | show |
Series | [FFmpeg-devel] aarch64: ac3dsp: Simplify the end of ff_ac3_sum_square_butterfly_float_neon | expand |
Context | Check | Description |
---|---|---|
yinshiyou/make_loongarch64 | success | Make finished |
yinshiyou/make_fate_loongarch64 | success | Make fate finished |
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
Martin Storsjö <martin@martin.st> writes: > Before: Cortex A53 A72 A78 > ac3_sum_square_bufferfly_float_neon: 1005.7 516.5 224.5 > After: > ac3_sum_square_bufferfly_float_neon: 981.7 504.5 223.2 > --- > libavcodec/aarch64/ac3dsp_neon.S | 16 ++++------------ > 1 file changed, 4 insertions(+), 12 deletions(-) > > diff --git a/libavcodec/aarch64/ac3dsp_neon.S b/libavcodec/aarch64/ac3dsp_neon.S > index 20beb6cc50..7e97cc39f7 100644 > --- a/libavcodec/aarch64/ac3dsp_neon.S > +++ b/libavcodec/aarch64/ac3dsp_neon.S > @@ -103,17 +103,9 @@ function ff_ac3_sum_square_butterfly_float_neon, export=1 > fmla v3.4s, v17.4s, v17.4s > subs w3, w3, #4 > b.gt 1b > - faddp v0.4s, v0.4s, v0.4s > - faddp v0.2s, v0.2s, v0.2s > - st1 {v0.s}[0], [x0], #4 > - faddp v1.4s, v1.4s, v1.4s > - faddp v1.2s, v1.2s, v1.2s > - st1 {v1.s}[0], [x0], #4 > - faddp v2.4s, v2.4s, v2.4s > - faddp v2.2s, v2.2s, v2.2s > - st1 {v2.s}[0], [x0], #4 > - faddp v3.4s, v3.4s, v3.4s > - faddp v3.2s, v3.2s, v3.2s > - st1 {v3.s}[0], [x0] > + faddp v0.4s, v0.4s, v1.4s > + faddp v2.4s, v2.4s, v3.4s > + faddp v0.4s, v0.4s, v2.4s > + st1 {v0.4s}, [x0] > ret > endfunc Thanks, LGTM. Pushed with M1 benchmark on Linux.
diff --git a/libavcodec/aarch64/ac3dsp_neon.S b/libavcodec/aarch64/ac3dsp_neon.S index 20beb6cc50..7e97cc39f7 100644 --- a/libavcodec/aarch64/ac3dsp_neon.S +++ b/libavcodec/aarch64/ac3dsp_neon.S @@ -103,17 +103,9 @@ function ff_ac3_sum_square_butterfly_float_neon, export=1 fmla v3.4s, v17.4s, v17.4s subs w3, w3, #4 b.gt 1b - faddp v0.4s, v0.4s, v0.4s - faddp v0.2s, v0.2s, v0.2s - st1 {v0.s}[0], [x0], #4 - faddp v1.4s, v1.4s, v1.4s - faddp v1.2s, v1.2s, v1.2s - st1 {v1.s}[0], [x0], #4 - faddp v2.4s, v2.4s, v2.4s - faddp v2.2s, v2.2s, v2.2s - st1 {v2.s}[0], [x0], #4 - faddp v3.4s, v3.4s, v3.4s - faddp v3.2s, v3.2s, v3.2s - st1 {v3.s}[0], [x0] + faddp v0.4s, v0.4s, v1.4s + faddp v2.4s, v2.4s, v3.4s + faddp v0.4s, v0.4s, v2.4s + st1 {v0.4s}, [x0] ret endfunc