mbox series

[FFmpeg-devel,0/5] Provide optimized neon implementation

Message ID 20220906102722.53266-1-hum@semihalf.com
Headers show
Series Provide optimized neon implementation | expand

Message

Hubert Mazur Sept. 6, 2022, 10:27 a.m. UTC
Provide optimized implementations for me_cmp functions.
This set of patches fixes all issues addressed in previous review.
Major changes:
- Remove redundant loads since the data can be reused.
- Improve style.
- Fix issues with unrecognized symbols.

Hubert Mazur (5):
  lavc/aarch64: Add neon implementation for vsad16
  lavc/aarch64: Add neon implementation of vsse16
  lavc/aarch64: Add neon implementation for vsad_intra16
  lavc/aarch64: Add neon implementation for vsse_intra16
  lavc/aarch64: Provide neon implementation of nsse16

 libavcodec/aarch64/me_cmp_init_aarch64.c |  30 ++
 libavcodec/aarch64/me_cmp_neon.S         | 387 +++++++++++++++++++++++
 2 files changed, 417 insertions(+)

Comments

Martin Storsjö Sept. 7, 2022, 8:55 a.m. UTC | #1
On Tue, 6 Sep 2022, Hubert Mazur wrote:

> Provide optimized implementations for me_cmp functions.
> This set of patches fixes all issues addressed in previous review.
> Major changes:
> - Remove redundant loads since the data can be reused.
> - Improve style.
> - Fix issues with unrecognized symbols.

Thanks! This looks quite good to me now. I have a minor comment on vsse16 
and a slightly bigger one on nsse16, but other than that, this is almost 
good to go.

I noticed that you haven't updated the checkasm benchmark numbers in the 
commit messages since the previous round, while the code has seen some 
quite major changes. Please do rerun the numbers since I think they have 
changed notably.

// Martin