Message ID | 3372981.QJadu78ljV@basile.remlab.net |
---|---|
Headers | show |
Series | RISC-V V floating point DSP | expand |
Sep 4, 2022, 15:54 by remi@remlab.net: > The following changes since commit b6e8fc1c201d58672639134a737137e1ba7b55fe: > > avcodec/speexdec: improve support for speex in non-ogg (2022-09-04 11:31:57 +0200) > > are waiting thorough bashing at your express convenience up to: > > riscv: float vector dot product with RVV (2022-09-04 16:45:38 +0300) > > Changes since v1: > > - Removed stray define. > - Fixed mismatch between byte and element size in mul-scalar. > - Added fmul, fac, dmul, dmac, fmul-add, fmul-reverse, fmul-window. > - Added float butterfly and dot product. > > All operations are unrolled to the maximum group size (8), with the > exception of overlap/add. The later seems to require a minimum of 6 > vectors (maybe 5 by extremely careful ordering), so the group size is > only 4. > > The pointer arithmetic could be slightly optimised with SH2ADD and > SH3ADD instructions from the Zvba extension. This would require more > conditional code, or requiring support for Zvba for probably neglible > performance gains though. > Did you test on real hardware or a VM? If the former, what does checkasm --bench report?
Le sunnuntaina 4. syyskuuta 2022, 20.48.26 EEST Lynne a écrit : > > The pointer arithmetic could be slightly optimised with SH2ADD and > > SH3ADD instructions from the Zvba extension. This would require more > > conditional code, or requiring support for Zvba for probably neglible > > performance gains though. > > Did you test on real hardware or a VM? I don't think we will see real and conforming RV64GV hardware this year, unless you count FPGAs. I hope we can get affordable stuff in 2023H1. This is running on simulator. But since the code is already (AFAICT) using the largest possible grouping, there won't be that much room for further optimisations, other maybe than to add prefetching hints. In that sense, RVV is really nice in how it makes unrolling almost unnecessary/effortless. The code is also already laid out to leverage multiple issue if available. RISC-V does not have post-index addressing modes, so the interleaving a fair amount of pointer arithmetic is unavoidable. > If the former, what does checkasm --bench report? ...