diff mbox series

[FFmpeg-devel,2/4] lavc/rv34dsp: R-V V rv34_inv_transform_dc

Message ID CAEa-L+tMzOpUg0GizA4c4F0VLEfJRMUAK=s3LO2L3Fh1ML50Lw@mail.gmail.com
State New
Headers show
Series [FFmpeg-devel,1/4] checkasm/rv34dsp: add rv34_inv_transform_dc test | expand

Checks

Context Check Description
yinshiyou/configure_loongarch64 warning Failed to apply patch
andriy/configure_x86 warning Failed to apply patch

Commit Message

flow gg Jan. 31, 2024, noon UTC

Comments

Rémi Denis-Courmont Jan. 31, 2024, 4:31 p.m. UTC | #1
Hi,

I think this breaks the build for RV32, and it lacks checks for the vector 
length.

Also fractional multipler should never be smaller than the ratio of the 
specified element size to the largest element size used in the function. Here 
it is largelly inconsequential, but for instance "e32, mf4" and "e64, mf2" are 
invalid.
flow gg Jan. 31, 2024, 5:57 p.m. UTC | #2
> Also fractional multipler should never be smaller than the ratio of the
> specified element size to the largest element size used in the function.
Here
> it is largelly inconsequential, but for instance "e32, mf4" and "e64,
mf2" are
> invalid.

Thanks, I indeed almost forgot about this part

> I think this breaks the build for RV32

Okay, modified in the reply

> it lacks checks for the vector length.

In the rv34dsp_init.c, there's a check with ff_get_rv_vlenb() >= 16.
Doesn't this already check the vector length?



Rémi Denis-Courmont <remi@remlab.net> 于2024年2月1日周四 00:31写道:

> Hi,
>
> I think this breaks the build for RV32, and it lacks checks for the vector
> length.
>
> Also fractional multipler should never be smaller than the ratio of the
> specified element size to the largest element size used in the function.
> Here
> it is largelly inconsequential, but for instance "e32, mf4" and "e64, mf2"
> are
> invalid.
>
> --
> 雷米‧德尼-库尔蒙
> http://www.remlab.net/
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Rémi Denis-Courmont Feb. 6, 2024, 5:26 p.m. UTC | #3
Hi,

I'm not sure why you're mixing element sizes this way, but the code should not 
even compile due to mismatched extensions.
flow gg Feb. 7, 2024, 12:12 a.m. UTC | #4
My carelessness.. fixed it in the reply.

Rémi Denis-Courmont <remi@remlab.net> 于2024年2月7日周三 01:26写道:

>         Hi,
>
> I'm not sure why you're mixing element sizes this way, but the code should
> not
> even compile due to mismatched extensions.
>
> --
> Rémi Denis-Courmont
> http://www.remlab.net/
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Rémi Denis-Courmont Feb. 9, 2024, 9:13 p.m. UTC | #5
Le keskiviikkona 7. helmikuuta 2024, 2.12.22 EET flow gg a écrit :
> My carelessness.. fixed it in the reply.

I know I said to avoid scalar multiplications, but this may be taking it a 
little too far. Either this works:
   slli t1, t0, 9
   sh2add t0, t0, t0
   sub t0, t1, t0
or just:
   li t1, 13 * 13 * 3
   mul t0, t0, t1

Also the second vsetvl seems pointless, unless you specifically meant that the 
pointer was aligned to 32 bits?
flow gg Feb. 10, 2024, 1:57 a.m. UTC | #6
Okay, I have updated them in the response

Rémi Denis-Courmont <remi@remlab.net> 于2024年2月10日周六 05:14写道:

> Le keskiviikkona 7. helmikuuta 2024, 2.12.22 EET flow gg a écrit :
> > My carelessness.. fixed it in the reply.
>
> I know I said to avoid scalar multiplications, but this may be taking it a
> little too far. Either this works:
>    slli t1, t0, 9
>    sh2add t0, t0, t0
>    sub t0, t1, t0
> or just:
>    li t1, 13 * 13 * 3
>    mul t0, t0, t1
>
> Also the second vsetvl seems pointless, unless you specifically meant that
> the
> pointer was aligned to 32 bits?
>
> --
> Rémi Denis-Courmont
> http://www.remlab.net/
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Rémi Denis-Courmont Feb. 10, 2024, 9:18 a.m. UTC | #7
Happy new year,

The gains are -unsurprisingly- modest here. Did you try to reorder 
instructions to improve scheduling?
flow gg Feb. 10, 2024, 3:29 p.m. UTC | #8
Happy new year ~

Yes, I've tried reordering.

Rémi Denis-Courmont <remi@remlab.net> 于2024年2月10日周六 17:18写道:

> Happy new year,
>
> The gains are -unsurprisingly- modest here. Did you try to reorder
> instructions to improve scheduling?
>
> --
> Rémi Denis-Courmont
> http://www.remlab.net/
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
diff mbox series

Patch

From 7e1c8d6b73afad9885222c0c9012543aface5397 Mon Sep 17 00:00:00 2001
From: sunyuechi <sunyuechi@iscas.ac.cn>
Date: Wed, 31 Jan 2024 19:03:20 +0800
Subject: [PATCH 2/4] lavc/rv34dsp: R-V V rv34_inv_transform_dc

C908:
rv34_inv_transform_dc_c: 35.5
rv34_inv_transform_dc_rvv_i32: 27.0
---
 libavcodec/riscv/Makefile       |  2 ++
 libavcodec/riscv/rv34dsp_init.c | 39 +++++++++++++++++++++++++++++++++
 libavcodec/riscv/rv34dsp_rvv.S  | 38 ++++++++++++++++++++++++++++++++
 libavcodec/rv34dsp.c            |  2 ++
 libavcodec/rv34dsp.h            |  1 +
 5 files changed, 82 insertions(+)
 create mode 100644 libavcodec/riscv/rv34dsp_init.c
 create mode 100644 libavcodec/riscv/rv34dsp_rvv.S

diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile
index e15aba58f4..ffe6631cf2 100644
--- a/libavcodec/riscv/Makefile
+++ b/libavcodec/riscv/Makefile
@@ -44,6 +44,8 @@  RVV-OBJS-$(CONFIG_OPUS_DECODER) += riscv/opusdsp_rvv.o
 OBJS-$(CONFIG_PIXBLOCKDSP) += riscv/pixblockdsp_init.o
 RV-OBJS-$(CONFIG_PIXBLOCKDSP) += riscv/pixblockdsp_rvi.o
 RVV-OBJS-$(CONFIG_PIXBLOCKDSP) += riscv/pixblockdsp_rvv.o
+OBJS-$(CONFIG_RV34DSP) += riscv/rv34dsp_init.o
+RVV-OBJS-$(CONFIG_RV34DSP) += riscv/rv34dsp_rvv.o
 OBJS-$(CONFIG_SVQ1_ENCODER) += riscv/svqenc_init.o
 RVV-OBJS-$(CONFIG_SVQ1_ENCODER) += riscv/svqenc_rvv.o
 OBJS-$(CONFIG_TAK_DECODER) += riscv/takdsp_init.o
diff --git a/libavcodec/riscv/rv34dsp_init.c b/libavcodec/riscv/rv34dsp_init.c
new file mode 100644
index 0000000000..852c8ad9a8
--- /dev/null
+++ b/libavcodec/riscv/rv34dsp_init.c
@@ -0,0 +1,39 @@ 
+/*
+ * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS).
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+
+#include "libavutil/attributes.h"
+#include "libavutil/cpu.h"
+#include "libavutil/riscv/cpu.h"
+#include "libavcodec/rv34dsp.h"
+
+void ff_rv34_inv_transform_dc_rvv(int16_t *block);
+
+av_cold void ff_rv34dsp_init_riscv(RV34DSPContext *c)
+{
+#if HAVE_RVV
+    int flags = av_get_cpu_flags();
+
+    if (flags & AV_CPU_FLAG_RVV_I32 && ff_get_rv_vlenb() >= 16) {
+        c->rv34_inv_transform_dc = ff_rv34_inv_transform_dc_rvv;
+    }
+#endif
+}
diff --git a/libavcodec/riscv/rv34dsp_rvv.S b/libavcodec/riscv/rv34dsp_rvv.S
new file mode 100644
index 0000000000..acf5b0c3e8
--- /dev/null
+++ b/libavcodec/riscv/rv34dsp_rvv.S
@@ -0,0 +1,38 @@ 
+/*
+ * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS).
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/riscv/asm.S"
+
+func ff_rv34_inv_transform_dc_rvv, zve32x
+        lh            t1, 0(a0)
+        slliw         t2, t1, 7
+        subw          t2, t2, t1
+        slliw         t2, t2, 2
+        subw          t2, t2, t1
+        sraiw         t2, t2, 11
+        slliw         t2, t2, 16
+        sraiw         t2, t2, 16
+        vsetivli      zero, 16, e16, m2, ta, ma
+        vmv.v.x       v8, t2
+        vsetivli      zero, 4, e8, mf4, ta, ma
+        vse64.v       v8, (a0)
+
+        ret
+endfunc
diff --git a/libavcodec/rv34dsp.c b/libavcodec/rv34dsp.c
index 8f9d88396c..44486f8edd 100644
--- a/libavcodec/rv34dsp.c
+++ b/libavcodec/rv34dsp.c
@@ -138,6 +138,8 @@  av_cold void ff_rv34dsp_init(RV34DSPContext *c)
 
 #if ARCH_ARM
     ff_rv34dsp_init_arm(c);
+#elif ARCH_RISCV
+    ff_rv34dsp_init_riscv(c);
 #elif ARCH_X86
     ff_rv34dsp_init_x86(c);
 #endif
diff --git a/libavcodec/rv34dsp.h b/libavcodec/rv34dsp.h
index 2e9ec4eee4..b15424d4ae 100644
--- a/libavcodec/rv34dsp.h
+++ b/libavcodec/rv34dsp.h
@@ -79,6 +79,7 @@  void ff_rv34dsp_init(RV34DSPContext *c);
 void ff_rv40dsp_init(RV34DSPContext *c);
 
 void ff_rv34dsp_init_arm(RV34DSPContext *c);
+void ff_rv34dsp_init_riscv(RV34DSPContext *c);
 void ff_rv34dsp_init_x86(RV34DSPContext *c);
 
 void ff_rv40dsp_init_aarch64(RV34DSPContext *c);
-- 
2.43.0