mbox series

[FFmpeg-devel,0/3] Provide neon implementations

Message ID 20220920110158.15384-1-hum@semihalf.com
Headers show
Series Provide neon implementations | expand

Message

Hubert Mazur Sept. 20, 2022, 11:01 a.m. UTC
This fixes issues addressed in previous patchset:
 - move sub instruction in vsad8_intra,
 - remove unnecessary mov instructions,
 - remove single lane extraction in loop and place it at the end.

Removing mov instructions from pix_median_abs functions significantly
increased peformance for both.

Hubert Mazur (3):
  lavc/aarch64: Add neon implementation for pix_median_abs16
  lavc/aarch64: Add neon implementation for vsad8_intra
  lavc/aarch64: Add neon implementation for pix_median_abs8

 libavcodec/aarch64/me_cmp_init_aarch64.c |  10 ++
 libavcodec/aarch64/me_cmp_neon.S         | 182 +++++++++++++++++++++++
 libavcodec/me_cmp.c                      |   5 +-
 3 files changed, 195 insertions(+), 2 deletions(-)

Comments

Martin Storsjö Sept. 21, 2022, 10:11 a.m. UTC | #1
On Tue, 20 Sep 2022, Hubert Mazur wrote:

> This fixes issues addressed in previous patchset:
> - move sub instruction in vsad8_intra,
> - remove unnecessary mov instructions,
> - remove single lane extraction in loop and place it at the end.
>
> Removing mov instructions from pix_median_abs functions significantly
> increased peformance for both.

I'm quite sure that it wasn't the removed mov instructions that improved 
performance (those instructions should be essentially free, they're just 
misleading), but the fact that you got rid of the extra single-element 
handling within the loop.

Anyway, the patches seem fine to me now, so I'll push them. Thanks!

// Martin