mbox series

[FFmpeg-devel,0/3] sw_scale: Provide neon implementation for hscale

Message ID 20221028113439.30279-1-hum@semihalf.com
Headers show
Series sw_scale: Provide neon implementation for hscale | expand

Message

Hubert Mazur Oct. 28, 2022, 11:34 a.m. UTC
This patchset contains arm64 neon implementation of hscale functions.
Fixed minor style issues and declared C function wrappers as static.
This patchset do not contain the patch for checkasm tool, as the
previous one did. The reason behind it was failing tests on x86 arch
but not on aarch64 or loongarch. Probably the hscale functions on x86
have some bugs. Currently the checkasm tool does not check
the validity of hscale functions on x86 at all. Implementation of hscale
for x86 should be fixed anyway. As it comes to aarch64 the tests were
passing. Attaching a link to fate result in patchwork. After fixing x86
the patch for checkasm could be merged.

https://patchwork.ffmpeg.org/project/ffmpeg/patch/20221017130715.30896-3-hum@semihalf.com/

Hubert Mazur (3):
  sw_scale: Add specializations for hscale 8 to 19
  sw_scale: Add specializations for hscale 16 to 15
  sw_scale: Add specializations for hscale 16 to 19

 libswscale/aarch64/hscale.S  | 1100 ++++++++++++++++++++++++++++++++++
 libswscale/aarch64/swscale.c |  140 ++++-
 libswscale/swscale.c         |    1 -
 3 files changed, 1236 insertions(+), 5 deletions(-)

Comments

Martin Storsjö Nov. 1, 2022, 1:26 p.m. UTC | #1
On Fri, 28 Oct 2022, Hubert Mazur wrote:

> This patchset contains arm64 neon implementation of hscale functions.
> Fixed minor style issues and declared C function wrappers as static.
> This patchset do not contain the patch for checkasm tool, as the
> previous one did. The reason behind it was failing tests on x86 arch
> but not on aarch64 or loongarch. Probably the hscale functions on x86
> have some bugs. Currently the checkasm tool does not check
> the validity of hscale functions on x86 at all. Implementation of hscale
> for x86 should be fixed anyway. As it comes to aarch64 the tests were
> passing. Attaching a link to fate result in patchwork. After fixing x86
> the patch for checkasm could be merged.

It's too bad that we don't have that test mergeable at this point, but I 
guess we can't hold back the new aarch64 assembly due to that...

The patches seem fine as noted before; I found some more cases of missing 
trailing newlines at the end of the file, and some other minor stray 
changes in the patches, which I fixed. I also amended the commit messages 
saying that some of the benchmarks are from the non-merged checkasm test.

With that, I pushed these patches.

// Martin