diff mbox series

[FFmpeg-devel,3/3] avcodec/dfa: Optimize output reshuffle loop

Message ID 20220311202025.28139-3-michael@niedermayer.cc
State Accepted
Commit 18bc612f2fd33b6ac943bf1a0fdaa55b1f4c9d25
Headers show
Series [FFmpeg-devel,1/3] avcodec/mjpegbdec: Set buf_size | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_aarch64_jetson success Make finished
andriy/make_fate_aarch64_jetson success Make fate finished
andriy/make_armv7_RPi4 success Make finished
andriy/make_fate_armv7_RPi4 success Make fate finished

Commit Message

Michael Niedermayer March 11, 2022, 8:20 p.m. UTC
18035 -> 4018 dezicycles (Tested with LOGOS.DFA, gcc 7, 3950X)

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/dfa.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

Michael Niedermayer March 17, 2022, 7:23 p.m. UTC | #1
On Fri, Mar 11, 2022 at 09:20:25PM +0100, Michael Niedermayer wrote:
> 18035 -> 4018 dezicycles (Tested with LOGOS.DFA, gcc 7, 3950X)
> 
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  libavcodec/dfa.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)

will apply

[...]
diff mbox series

Patch

diff --git a/libavcodec/dfa.c b/libavcodec/dfa.c
index 0cf3641a38..ab78d66763 100644
--- a/libavcodec/dfa.c
+++ b/libavcodec/dfa.c
@@ -388,9 +388,17 @@  static int dfa_decode_frame(AVCodecContext *avctx,
     for (i = 0; i < avctx->height; i++) {
         if(version == 0x100) {
             int j;
-            for(j = 0; j < avctx->width; j++) {
-                dst[j] = buf[ (i&3)*(avctx->width /4) + (j/4) +
-                             ((j&3)*(avctx->height/4) + (i/4))*avctx->width];
+            const uint8_t *buf1 = buf + (i&3)*(avctx->width/4) + (i/4)*avctx->width;
+            int stride = (avctx->height/4)*avctx->width;
+            for(j = 0; j < avctx->width/4; j++) {
+                dst[4*j+0] = buf1[j + 0*stride];
+                dst[4*j+1] = buf1[j + 1*stride];
+                dst[4*j+2] = buf1[j + 2*stride];
+                dst[4*j+3] = buf1[j + 3*stride];
+            }
+            j *= 4;
+            for(; j < avctx->width; j++) {
+                dst[j] = buf1[(j/4) + (j&3)*stride];
             }
         } else {
             memcpy(dst, buf, avctx->width);