diff mbox series

[FFmpeg-devel] lavc/exrdsp: unroll predictor

Message ID 20231111152723.41888-1-remi@remlab.net
State Accepted
Commit ce467421dc9e2061b8af22973ba4ba6248f16de9
Headers show
Series [FFmpeg-devel] lavc/exrdsp: unroll predictor | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Rémi Denis-Courmont Nov. 11, 2023, 3:27 p.m. UTC
With explicit unrolling, we can skip half of the sign bit flips, and
the compiler is then better able to optimise the scalar loop:

predictor_c: 31376.0 (before)
predictor_c: 23703.0 (after)
---
 libavcodec/exrdsp.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)
diff mbox series

Patch

diff --git a/libavcodec/exrdsp.c b/libavcodec/exrdsp.c
index 752e1eb553..248cb93c5a 100644
--- a/libavcodec/exrdsp.c
+++ b/libavcodec/exrdsp.c
@@ -40,10 +40,20 @@  static void reorder_pixels_scalar(uint8_t *dst, const uint8_t *src, ptrdiff_t si
 
 static void predictor_scalar(uint8_t *src, ptrdiff_t size)
 {
-    ptrdiff_t i;
+    /* Unrolled: `src[i + 1] += src[i] - 128;` */
+    if ((size & 1) == 0) {
+        src[1] += src[0] ^ 0x80;
+        src++;
+        size--;
+    }
+
+    for (ptrdiff_t i = 1; i < size; i += 2) {
+        uint8_t a = src[i] + src[i - 1];
 
-    for (i = 1; i < size; i++)
-        src[i] += src[i-1] - 128;
+        src[i] = a;
+        src[i + 1] += a;
+        src[i] ^= 0x80;
+    }
 }
 
 av_cold void ff_exrdsp_init(ExrDSPContext *c)