[FFmpeg-devel,PATCHv2,0/10] RISC-V V floating point DSP

Message ID	3372981.QJadu78ljV@basile.remlab.net
Headers	show Delivered-To: ffmpegpatchwork2@gmail.com Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; From: =?iso-8859-1?q?R=E9mi?= Denis-Courmont <remi@remlab.net> To: ffmpeg-devel@ffmpeg.org Date: Sun, 04 Sep 2022 16:54:34 +0300 Message-ID: <3372981.QJadu78ljV@basile.remlab.net> Organization: Remlab MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP Precedence: list Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Series	RISC-V V floating point DSP \| expand [FFmpeg-devel,PATCHv2,0/10] RISC-V V floating point DSP [FFmpeg-devel,01/10] riscv: add CPU flags for the RISC-V Vector extension [FFmpeg-devel,02/10] riscv: initial common header for assembler macros [FFmpeg-devel,03/10] riscv: float vector-scalar multiplication with RVV [FFmpeg-devel,04/10] riscv: float vector-vector multiplication with RVV [FFmpeg-devel,05/10] riscv: float vector multiply-accumulate with RVV [FFmpeg-devel,06/10] riscv: float vector multiplication-addition with RVV [FFmpeg-devel,07/10] riscv: float vector sum-and-difference with RVV [FFmpeg-devel,08/10] riscv: float reversed vector multiplication with RVV [FFmpeg-devel,09/10] riscv: float vector windowed overlap/add with RVV [FFmpeg-devel,10/10] riscv: float vector dot product with RVV

Message ID

3372981.QJadu78ljV@basile.remlab.net

Headers

Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
From: =?iso-8859-1?q?R=E9mi?= Denis-Courmont <remi@remlab.net>
To: ffmpeg-devel@ffmpeg.org
Date: Sun, 04 Sep 2022 16:54:34 +0300
Message-ID: <3372981.QJadu78ljV@basile.remlab.net>
Organization: Remlab
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP
Precedence: list
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Series

RISC-V V floating point DSP | expand

Message

Rémi Denis-Courmont Sept. 4, 2022, 1:54 p.m. UTC

The following changes since commit b6e8fc1c201d58672639134a737137e1ba7b55fe:

  avcodec/speexdec: improve support for speex in non-ogg (2022-09-04 11:31:57 +0200)

are waiting thorough bashing at your express convenience up to:

  riscv: float vector dot product with RVV (2022-09-04 16:45:38 +0300)

Changes since v1:

- Removed stray define.
- Fixed mismatch between byte and element size in mul-scalar.
- Added fmul, fac, dmul, dmac, fmul-add, fmul-reverse, fmul-window.
- Added float butterfly and dot product.

All operations are unrolled to the maximum group size (8), with the
exception of overlap/add. The later seems to require a minimum of 6
vectors (maybe 5 by extremely careful ordering), so the group size is
only 4.

The pointer arithmetic could be slightly optimised with SH2ADD and
SH3ADD instructions from the Zvba extension. This would require more
conditional code, or requiring support for Zvba for probably neglible
performance gains though.

----------------------------------------------------------------
Rémi Denis-Courmont (10):
      riscv: add CPU flags for the RISC-V Vector extension
      riscv: initial common header for assembler macros
      riscv: float vector-scalar multiplication with RVV
      riscv: float vector-vector multiplication with RVV
      riscv: float vector multiply-accumulate with RVV
      riscv: float vector multiplication-addition with RVV
      riscv: float vector sum-and-difference with RVV
      riscv: float reversed vector multiplication with RVV
      riscv: float vector windowed overlap/add with RVV
      riscv: float vector dot product with RVV

 libavutil/cpu.c                  |  14 +++
 libavutil/cpu.h                  |   6 +
 libavutil/cpu_internal.h         |   1 +
 libavutil/float_dsp.c            |   2 +
 libavutil/float_dsp.h            |   1 +
 libavutil/riscv/Makefile         |   3 +
 libavutil/riscv/asm.S            |  33 +++++
 libavutil/riscv/cpu.c            |  57 +++++++++
 libavutil/riscv/float_dsp_init.c |  67 ++++++++++
 libavutil/riscv/float_dsp_rvv.S  | 255 +++++++++++++++++++++++++++++++++++++++
 10 files changed, 439 insertions(+)
 create mode 100644 libavutil/riscv/Makefile
 create mode 100644 libavutil/riscv/asm.S
 create mode 100644 libavutil/riscv/cpu.c
 create mode 100644 libavutil/riscv/float_dsp_init.c
 create mode 100644 libavutil/riscv/float_dsp_rvv.S

Comments

Lynne Sept. 4, 2022, 5:48 p.m. UTC | #1

Sep 4, 2022, 15:54 by remi@remlab.net:

> The following changes since commit b6e8fc1c201d58672639134a737137e1ba7b55fe:
>
>  avcodec/speexdec: improve support for speex in non-ogg (2022-09-04 11:31:57 +0200)
>
> are waiting thorough bashing at your express convenience up to:
>
>  riscv: float vector dot product with RVV (2022-09-04 16:45:38 +0300)
>
> Changes since v1:
>
> - Removed stray define.
> - Fixed mismatch between byte and element size in mul-scalar.
> - Added fmul, fac, dmul, dmac, fmul-add, fmul-reverse, fmul-window.
> - Added float butterfly and dot product.
>
> All operations are unrolled to the maximum group size (8), with the
> exception of overlap/add. The later seems to require a minimum of 6
> vectors (maybe 5 by extremely careful ordering), so the group size is
> only 4.
>
> The pointer arithmetic could be slightly optimised with SH2ADD and
> SH3ADD instructions from the Zvba extension. This would require more
> conditional code, or requiring support for Zvba for probably neglible
> performance gains though.
>

Did you test on real hardware or a VM?
If the former, what does checkasm --bench report?

Rémi Denis-Courmont Sept. 4, 2022, 7:01 p.m. UTC | #2

Le sunnuntaina 4. syyskuuta 2022, 20.48.26 EEST Lynne a écrit :
> > The pointer arithmetic could be slightly optimised with SH2ADD and
> > SH3ADD instructions from the Zvba extension. This would require more
> > conditional code, or requiring support for Zvba for probably neglible
> > performance gains though.
> 
> Did you test on real hardware or a VM?

I don't think we will see real and conforming RV64GV hardware this year, 
unless you count FPGAs. I hope we can get affordable stuff in 2023H1.

This is running on simulator. But since the code is already (AFAICT) using the 
largest possible grouping, there won't be that much room for further 
optimisations, other maybe than to add prefetching hints. In that sense, RVV 
is really nice in how it makes unrolling almost unnecessary/effortless.

The code is also already laid out to leverage multiple issue if available. 
RISC-V does not have post-index addressing modes, so the interleaving a fair 
amount of pointer arithmetic is unavoidable.

> If the former, what does checkasm --bench report?

...