[FFmpeg-devel,v2,0/7] arm64 neon implementation for 8bits functions

Message ID	20221003141020.3564715-1-gjb@semihalf.com
Headers	show Delivered-To: ffmpegpatchwork2@gmail.com Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; From: Grzegorz Bernacki <gjb@semihalf.com> To: ffmpeg-devel@ffmpeg.org Date: Mon, 3 Oct 2022 16:10:13 +0200 Message-Id: <20221003141020.3564715-1-gjb@semihalf.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 0/7] arm64 neon implementation for 8bits functions Precedence: list Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com, hum@semihalf.com, martin@martin.st, mw@semihalf.com, spop@amazon.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Series	arm64 neon implementation for 8bits functions \| expand [FFmpeg-devel,v2,0/7] arm64 neon implementation for 8bits functions [FFmpeg-devel,v2,1/7] lavc/aarch64: Add neon implementation for pix_abs8 functions. [FFmpeg-devel,v2,2/7] aarch64: me_cmp: Improve scheduling in ff_pix_abs8_y2_neon [FFmpeg-devel,v2,3/7] aarch64: me_cmp: Fix up the prologue of ff_pix_abs8_xy2_neon [FFmpeg-devel,v2,4/7] lavc/aarch64: Provide neon implementation of nsse8 [FFmpeg-devel,v2,5/7] lavc/aarch64: Provide optimized implementation of vsse8 for arm64. [FFmpeg-devel,v2,6/7] lavc/aarch64: Add neon implementation for vsse_intra8 [FFmpeg-devel,v2,7/7] aarch64: me_cmp: Improve scheduling in vsse_intra8

Message ID

20221003141020.3564715-1-gjb@semihalf.com

Headers

Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
From: Grzegorz Bernacki <gjb@semihalf.com>
To: ffmpeg-devel@ffmpeg.org
Date: Mon,  3 Oct 2022 16:10:13 +0200
Message-Id: <20221003141020.3564715-1-gjb@semihalf.com>
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH v2 0/7] arm64 neon implementation for 8bits
 functions
Precedence: list
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com,
 hum@semihalf.com, martin@martin.st, mw@semihalf.com, spop@amazon.com
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Series

arm64 neon implementation for 8bits functions | expand

Message

Grzegorz Bernacki Oct. 3, 2022, 2:10 p.m. UTC

Changes since v1:

- changed tabs to spaces
- modified branch instruction in vsse8
- apply Martin's patches with improved instructions scheduling 

Grzegorz Bernacki (4):
  lavc/aarch64: Add neon implementation for pix_abs8 functions.
  lavc/aarch64: Provide neon implementation of nsse8
  lavc/aarch64: Provide optimized implementation of vsse8 for arm64.
  lavc/aarch64: Add neon implementation for vsse_intra8

Martin Storsjö (3):
  aarch64: me_cmp: Improve scheduling in ff_pix_abs8_y2_neon
  aarch64: me_cmp: Fix up the prologue of ff_pix_abs8_xy2_neon
  aarch64: me_cmp: Improve scheduling in vsse_intra8

 libavcodec/aarch64/me_cmp_init_aarch64.c |  33 ++
 libavcodec/aarch64/me_cmp_neon.S         | 414 +++++++++++++++++++++++
 2 files changed, 447 insertions(+)

Comments

Martin Storsjö Oct. 4, 2022, 10:56 a.m. UTC | #1

On Mon, 3 Oct 2022, Grzegorz Bernacki wrote:

> Changes since v1:
>
> - changed tabs to spaces
> - modified branch instruction in vsse8
> - apply Martin's patches with improved instructions scheduling 
>
> Grzegorz Bernacki (4):
>  lavc/aarch64: Add neon implementation for pix_abs8 functions.
>  lavc/aarch64: Provide neon implementation of nsse8
>  lavc/aarch64: Provide optimized implementation of vsse8 for arm64.
>  lavc/aarch64: Add neon implementation for vsse_intra8
>
> Martin Storsjö (3):
>  aarch64: me_cmp: Improve scheduling in ff_pix_abs8_y2_neon
>  aarch64: me_cmp: Fix up the prologue of ff_pix_abs8_xy2_neon
>  aarch64: me_cmp: Improve scheduling in vsse_intra8
>
> libavcodec/aarch64/me_cmp_init_aarch64.c |  33 ++
> libavcodec/aarch64/me_cmp_neon.S         | 414 +++++++++++++++++++++++
> 2 files changed, 447 insertions(+)

Thanks! This mostly looked good to me.

I had actually meant that you would squash my fixes into your patches, 
instead of keeping them as separate ones.

After squashing such changes, it might have been interesting to get 
updated benchmarks in those commit messages (the ones that you have from 
Graviton 3). However in this case, these changes didn't really make much 
difference on out-of-order cores, only on in-order cores, so I guess 
there's not that much value in getting updated benchmarks from Graviton 3 
in this case.

So I went ahead and squashed those patches (and added co-authored-by lines 
where relevant), and pushed them now. Thanks for your contribution!

// Martin

Grzegorz Bernacki Oct. 4, 2022, 11:34 a.m. UTC | #2

Great!! Thanks a lot for your help and your review.
thanks,
greg

wt., 4 paź 2022 o 12:57 Martin Storsjö <martin@martin.st> napisał(a):

> On Mon, 3 Oct 2022, Grzegorz Bernacki wrote:
>
> > Changes since v1:
> >
> > - changed tabs to spaces
> > - modified branch instruction in vsse8
> > - apply Martin's patches with improved instructions scheduling
> >
> > Grzegorz Bernacki (4):
> >  lavc/aarch64: Add neon implementation for pix_abs8 functions.
> >  lavc/aarch64: Provide neon implementation of nsse8
> >  lavc/aarch64: Provide optimized implementation of vsse8 for arm64.
> >  lavc/aarch64: Add neon implementation for vsse_intra8
> >
> > Martin Storsjö (3):
> >  aarch64: me_cmp: Improve scheduling in ff_pix_abs8_y2_neon
> >  aarch64: me_cmp: Fix up the prologue of ff_pix_abs8_xy2_neon
> >  aarch64: me_cmp: Improve scheduling in vsse_intra8
> >
> > libavcodec/aarch64/me_cmp_init_aarch64.c |  33 ++
> > libavcodec/aarch64/me_cmp_neon.S         | 414 +++++++++++++++++++++++
> > 2 files changed, 447 insertions(+)
>
> Thanks! This mostly looked good to me.
>
> I had actually meant that you would squash my fixes into your patches,
> instead of keeping them as separate ones.
>
> After squashing such changes, it might have been interesting to get
> updated benchmarks in those commit messages (the ones that you have from
> Graviton 3). However in this case, these changes didn't really make much
> difference on out-of-order cores, only on in-order cores, so I guess
> there's not that much value in getting updated benchmarks from Graviton 3
> in this case.
>
> So I went ahead and squashed those patches (and added co-authored-by lines
> where relevant), and pushed them now. Thanks for your contribution!
>
> // Martin
>