mbox series

[FFmpeg-devel,00/15] avfilter/vf_bwdif: Add aarch64 neon functions

Message ID 20230629175729.224383-1-jc@kynesim.co.uk
Headers show
Series avfilter/vf_bwdif: Add aarch64 neon functions | expand

Message

John Cox June 29, 2023, 5:57 p.m. UTC
Also adds a filter_line3 method which on aarch64 neon yields approx 30%
speedup over 2xfilter_line and a memcpy

John Cox (15):
  avfilter/vf_bwdif: Add outline for aarch neon functions
  avfilter/vf_bwdif: Add common macros and consts for aarch64 neon
  avfilter/vf_bwdif: Export C filter_intra
  avfilter/vf_bwdif: Add neon for filter_intra
  tests/checkasm: Add test for vf_bwdif filter_intra
  avfilter/vf_bwdif: Add clip and spatial macros for aarch64 neon
  avfilter/vf_bwdif: Export C filter_edge
  avfilter/vf_bwdif: Add neon for filter_edge
  tests/checkasm: Add test for vf_bwdif filter_edge
  avfilter/vf_bwdif: Export C filter_line
  avfilter/vf_bwdif: Add neon for filter_line
  avfilter/vf_bwdif: Add a filter_line3 method for optimisation
  avfilter/vf_bwdif: Add neon for filter_line3
  tests/checkasm: Add test for vf_bwdif filter_line3
  avfilter/vf_bwdif: Block filter slices into a multiple of 4 lines

 libavfilter/aarch64/Makefile                |   2 +
 libavfilter/aarch64/vf_bwdif_init_aarch64.c | 125 ++++
 libavfilter/aarch64/vf_bwdif_neon.S         | 780 ++++++++++++++++++++
 libavfilter/bwdif.h                         |  20 +
 libavfilter/vf_bwdif.c                      |  70 +-
 tests/checkasm/vf_bwdif.c                   | 172 +++++
 6 files changed, 1154 insertions(+), 15 deletions(-)
 create mode 100644 libavfilter/aarch64/vf_bwdif_init_aarch64.c
 create mode 100644 libavfilter/aarch64/vf_bwdif_neon.S

Comments

Martin Storsjö July 1, 2023, 9:33 p.m. UTC | #1
On Thu, 29 Jun 2023, John Cox wrote:

> Also adds a filter_line3 method which on aarch64 neon yields approx 30%
> speedup over 2xfilter_line and a memcpy
>
> John Cox (15):
>  avfilter/vf_bwdif: Add outline for aarch neon functions
>  avfilter/vf_bwdif: Add common macros and consts for aarch64 neon
>  avfilter/vf_bwdif: Export C filter_intra
>  avfilter/vf_bwdif: Add neon for filter_intra
>  tests/checkasm: Add test for vf_bwdif filter_intra
>  avfilter/vf_bwdif: Add clip and spatial macros for aarch64 neon
>  avfilter/vf_bwdif: Export C filter_edge
>  avfilter/vf_bwdif: Add neon for filter_edge
>  tests/checkasm: Add test for vf_bwdif filter_edge
>  avfilter/vf_bwdif: Export C filter_line
>  avfilter/vf_bwdif: Add neon for filter_line
>  avfilter/vf_bwdif: Add a filter_line3 method for optimisation
>  avfilter/vf_bwdif: Add neon for filter_line3
>  tests/checkasm: Add test for vf_bwdif filter_line3
>  avfilter/vf_bwdif: Block filter slices into a multiple of 4 lines

It's nice to have this split up in small easily checkable patches, but 
this is perhaps a bit more finegrained than what's usual. But I guess 
that's ok...

I'll comment on the patches that need commenting on.

// Martin
John Cox July 2, 2023, 10:18 a.m. UTC | #2
Hi

>On Thu, 29 Jun 2023, John Cox wrote:
>
>> Also adds a filter_line3 method which on aarch64 neon yields approx 30%
>> speedup over 2xfilter_line and a memcpy
>>
>> John Cox (15):
>>  avfilter/vf_bwdif: Add outline for aarch neon functions
>>  avfilter/vf_bwdif: Add common macros and consts for aarch64 neon
>>  avfilter/vf_bwdif: Export C filter_intra
>>  avfilter/vf_bwdif: Add neon for filter_intra
>>  tests/checkasm: Add test for vf_bwdif filter_intra
>>  avfilter/vf_bwdif: Add clip and spatial macros for aarch64 neon
>>  avfilter/vf_bwdif: Export C filter_edge
>>  avfilter/vf_bwdif: Add neon for filter_edge
>>  tests/checkasm: Add test for vf_bwdif filter_edge
>>  avfilter/vf_bwdif: Export C filter_line
>>  avfilter/vf_bwdif: Add neon for filter_line
>>  avfilter/vf_bwdif: Add a filter_line3 method for optimisation
>>  avfilter/vf_bwdif: Add neon for filter_line3
>>  tests/checkasm: Add test for vf_bwdif filter_line3
>>  avfilter/vf_bwdif: Block filter slices into a multiple of 4 lines
>
>It's nice to have this split up in small easily checkable patches, but 
>this is perhaps a bit more finegrained than what's usual. But I guess 
>that's ok...

I normally find that people ask me to split patches so I though I'd cut
stuff down to the minimum plausible unit.

I'm more than happy to coalesce stuff if wanted.

JC

>I'll comment on the patches that need commenting on.
>
>// Martin