mbox series

[FFmpeg-devel,0/5] Provide neon implementation for me_cmp functions

Message ID 20220816122016.64929-1-hum@semihalf.com
Headers show
Series Provide neon implementation for me_cmp functions | expand

Message

Hubert Mazur Aug. 16, 2022, 12:20 p.m. UTC
Add arm64 neon implementation for functions from motion estimation
family. All of them were tested and benchmarked using checkasm tool.
The rare code paths, e.g. when filter_size % 4 != 0 were also tested.
Instructions were manualy deinterleaved to reach best performance.

Hubert Mazur (5):
  lavc/aarch64: Add neon implementation for sse16
  lavc/aarch64: Add neon implementation for sse4
  lavc/aarch64: Add neon implementation for pix_abs16_y2
  lavc/aarch64: Add neon implementation for sse8
  lavc/aarch64: Add neon implementation for pix_abs8

 libavcodec/aarch64/me_cmp_init_aarch64.c |  18 ++
 libavcodec/aarch64/me_cmp_neon.S         | 324 +++++++++++++++++++++++
 2 files changed, 342 insertions(+)

Comments

Martin Storsjö Aug. 18, 2022, 9:07 a.m. UTC | #1
On Tue, 16 Aug 2022, Hubert Mazur wrote:

> Add arm64 neon implementation for functions from motion estimation
> family. All of them were tested and benchmarked using checkasm tool.
> The rare code paths, e.g. when filter_size % 4 != 0 were also tested.


> Instructions were manualy deinterleaved to reach best performance.

You probably mean "interleaved", as deinterleaved would be how it was 
initially, which is detrimental for performance.

Overall I think this patchset is close enough now. There were a bunch of 
minor details left on the patches, but I'll fix that up locally and push 
them, instead of doing yet another round of these. I'll comment and point 
out the details I changed - please pay attention to them for future 
patches though!

// Martin
Hubert Mazur Aug. 18, 2022, 9:24 a.m. UTC | #2
Thanks for the review and pointing out the issues. I will check out the
other patches for such things and fix them if needed.

Regards

On Thu, Aug 18, 2022 at 11:08 AM Martin Storsjö <martin@martin.st> wrote:

> On Tue, 16 Aug 2022, Hubert Mazur wrote:
>
> > Add arm64 neon implementation for functions from motion estimation
> > family. All of them were tested and benchmarked using checkasm tool.
> > The rare code paths, e.g. when filter_size % 4 != 0 were also tested.
>
>
> > Instructions were manualy deinterleaved to reach best performance.
>
> You probably mean "interleaved", as deinterleaved would be how it was
> initially, which is detrimental for performance.
>
> Overall I think this patchset is close enough now. There were a bunch of
> minor details left on the patches, but I'll fix that up locally and push
> them, instead of doing yet another round of these. I'll comment and point
> out the details I changed - please pay attention to them for future
> patches though!
>
> // Martin
>
>