Message ID | 20220925142619.67917-9-remi@remlab.net |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel,01/31] lavu/cpu: detect RISC-V base extensions | expand |
Context | Check | Description |
---|---|---|
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
Sep 25, 2022, 16:25 by remi@remlab.net: > From: Rémi Denis-Courmont <remi@remlab.net> > > --- > libavutil/riscv/float_dsp_init.c | 9 ++++++++- > libavutil/riscv/float_dsp_rvv.S | 17 +++++++++++++++++ > 2 files changed, 25 insertions(+), 1 deletion(-) > > diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c > index de567c50d2..b829c0f736 100644 > --- a/libavutil/riscv/float_dsp_init.c > +++ b/libavutil/riscv/float_dsp_init.c > @@ -28,12 +28,19 @@ > void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul, > int len); > > +void ff_vector_dmul_scalar_rvv(double *dst, const double *src, double mul, > + int len); > + > av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp) > { > #if HAVE_RVV > int flags = av_get_cpu_flags(); > > - if (flags & AV_CPU_FLAG_RV_ZVE32F) > + if (flags & AV_CPU_FLAG_RV_ZVE32F) { > fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv; > + > + if (flags & AV_CPU_FLAG_RV_ZVE64D) > + fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_rvv; > + } > You don't need to put doubles in the same branch as floats, it's just extra indentation as one implies the other anyway.
Le 26 septembre 2022 09:53:19 GMT+03:00, Lynne <dev@lynne.ee> a écrit : >Sep 25, 2022, 16:25 by remi@remlab.net: > >> From: Rémi Denis-Courmont <remi@remlab.net> >> >> --- >> libavutil/riscv/float_dsp_init.c | 9 ++++++++- >> libavutil/riscv/float_dsp_rvv.S | 17 +++++++++++++++++ >> 2 files changed, 25 insertions(+), 1 deletion(-) >> >> diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c >> index de567c50d2..b829c0f736 100644 >> --- a/libavutil/riscv/float_dsp_init.c >> +++ b/libavutil/riscv/float_dsp_init.c >> @@ -28,12 +28,19 @@ >> void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul, >> int len); >> >> +void ff_vector_dmul_scalar_rvv(double *dst, const double *src, double mul, >> + int len); >> + >> av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp) >> { >> #if HAVE_RVV >> int flags = av_get_cpu_flags(); >> >> - if (flags & AV_CPU_FLAG_RV_ZVE32F) >> + if (flags & AV_CPU_FLAG_RV_ZVE32F) { >> fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv; >> + >> + if (flags & AV_CPU_FLAG_RV_ZVE64D) >> + fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_rvv; >> + } >> > >You don't need to put doubles in the same branch as floats, >it's just extra indentation as one implies the other anyway. Well, the idea was to skip an useless check if Zve32f is unsupported. As this is a cold path, I don't really mind either way though. Note the same construct is used elsewhere. On top of my head, audiodsp and aacpsdsp. Thanks for the review.
diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index de567c50d2..b829c0f736 100644 --- a/libavutil/riscv/float_dsp_init.c +++ b/libavutil/riscv/float_dsp_init.c @@ -28,12 +28,19 @@ void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul, int len); +void ff_vector_dmul_scalar_rvv(double *dst, const double *src, double mul, + int len); + av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp) { #if HAVE_RVV int flags = av_get_cpu_flags(); - if (flags & AV_CPU_FLAG_RV_ZVE32F) + if (flags & AV_CPU_FLAG_RV_ZVE32F) { fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv; + + if (flags & AV_CPU_FLAG_RV_ZVE64D) + fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_rvv; + } #endif } diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S index 50cb1fa90f..17dda471b4 100644 --- a/libavutil/riscv/float_dsp_rvv.S +++ b/libavutil/riscv/float_dsp_rvv.S @@ -37,3 +37,20 @@ NOHWF mv a2, a3 ret endfunc + +// (a0) = (a1) * fa0 [0..a2-1] +func ff_vector_dmul_scalar_rvv, zve64d +NOHWD fmv.d.x fa0, a2 +NOHWD mv a2, a3 +1: + vsetvli t0, a2, e64, m1, ta, ma + vle64.v v16, (a1) + sub a2, a2, t0 + vfmul.vf v16, v16, fa0 + sh3add a1, t0, a1 + vse64.v v16, (a0) + sh3add a0, t0, a0 + bnez a2, 1b + + ret +endfunc
From: Rémi Denis-Courmont <remi@remlab.net> --- libavutil/riscv/float_dsp_init.c | 9 ++++++++- libavutil/riscv/float_dsp_rvv.S | 17 +++++++++++++++++ 2 files changed, 25 insertions(+), 1 deletion(-)