[FFmpeg-devel] avcodec/scpr: optimize shift loop.

Message ID	20170909182740.11094-1-michael@niedermayer.cc
State	Accepted
Commit	981f04b2ae2d6e0355386aaff39840eb5d390a36
Headers	show Delivered-To: ffmpegpatchwork@gmail.com Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; From: Michael Niedermayer <michael@niedermayer.cc> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Date: Sat, 9 Sep 2017 20:27:40 +0200 Message-Id: <20170909182740.11094-1-michael@niedermayer.cc> Subject: [FFmpeg-devel] [PATCH] avcodec/scpr: optimize shift loop. Precedence: list Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Message ID

20170909182740.11094-1-michael@niedermayer.cc

State

Accepted

Commit

981f04b2ae2d6e0355386aaff39840eb5d390a36

Headers

Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	client-ip=79.124.17.100; 
From: Michael Niedermayer <michael@niedermayer.cc>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Date: Sat,  9 Sep 2017 20:27:40 +0200
Message-Id: <20170909182740.11094-1-michael@niedermayer.cc>
Subject: [FFmpeg-devel] [PATCH] avcodec/scpr: optimize shift loop.
Precedence: list
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Commit Message

Michael Niedermayer Sept. 9, 2017, 6:27 p.m. UTC

Speeds code up from 50sec to 15sec

Fixes Timeout
Fixes: 3242/clusterfuzz-testcase-5811951672229888

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/scpr.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Brian Matherly Sept. 9, 2017, 9:37 p.m. UTC | #1

On 9/9/2017 1:27 PM, Michael Niedermayer wrote:
> +            // If the image is sufficiently aligned, compute 8 samples at once
> +            if (!(((uintptr_t)dst) & 7)) {
> +                uint64_t *dst64 = (uint64_t *)dst;
> +                int w = avctx->width>>1;
> +                for (x = 0; x < w; x++) {
> +                    dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
> +                }
> +                x *= 8;
> +            } else
> +                x = 0;
> +            for (; x < avctx->width * 4; x++) {
>                   dst[x] = dst[x] << 3;
>               }

Forgive me if I'm not understanding the code correctly, but couldn't you 
always apply the optimization if you align the first (up to) 7 samples?

Pseudocode:

uint64_t *dst64 = (uint64_t *)dst;
int w = avctx->width>>1;
x=0
// compute un-aligned beginning samples
for (; x < (avctx->width * 4) && (((uintptr_t)dst) & 7); x++) {
     dst[x] = dst[x] << 3;
}
// compute aligned samples
for (; x < w; x+=8) {
     dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
}
x -= 8;
// compute un-aligned ending samples
for (; x < avctx->width * 4; x++) {
     dst[x] = dst[x] << 3;
}

Michael Niedermayer Sept. 9, 2017, 10:47 p.m. UTC | #2

On Sat, Sep 09, 2017 at 04:37:52PM -0500, Brian Matherly wrote:
> 
> On 9/9/2017 1:27 PM, Michael Niedermayer wrote:
> >+            // If the image is sufficiently aligned, compute 8 samples at once
> >+            if (!(((uintptr_t)dst) & 7)) {
> >+                uint64_t *dst64 = (uint64_t *)dst;
> >+                int w = avctx->width>>1;
> >+                for (x = 0; x < w; x++) {
> >+                    dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
> >+                }
> >+                x *= 8;
> >+            } else
> >+                x = 0;
> >+            for (; x < avctx->width * 4; x++) {
> >                  dst[x] = dst[x] << 3;
> >              }
> 
> Forgive me if I'm not understanding the code correctly, but couldn't
> you always apply the optimization if you align the first (up to) 7
> samples?

yes, thats possible, it would be optimizing a case that probably
never occurs in practice though.

If people want, i can add code to handle misaligned cases ?

thx

[...]

James Almer Sept. 9, 2017, 11:02 p.m. UTC | #3

On 9/9/2017 7:47 PM, Michael Niedermayer wrote:
> On Sat, Sep 09, 2017 at 04:37:52PM -0500, Brian Matherly wrote:
>>
>> On 9/9/2017 1:27 PM, Michael Niedermayer wrote:
>>> +            // If the image is sufficiently aligned, compute 8 samples at once
>>> +            if (!(((uintptr_t)dst) & 7)) {
>>> +                uint64_t *dst64 = (uint64_t *)dst;
>>> +                int w = avctx->width>>1;
>>> +                for (x = 0; x < w; x++) {
>>> +                    dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
>>> +                }
>>> +                x *= 8;
>>> +            } else
>>> +                x = 0;
>>> +            for (; x < avctx->width * 4; x++) {
>>>                  dst[x] = dst[x] << 3;
>>>              }
>>
>> Forgive me if I'm not understanding the code correctly, but couldn't
>> you always apply the optimization if you align the first (up to) 7
>> samples?
> 
> yes, thats possible, it would be optimizing a case that probably
> never occurs in practice though.
> 
> If people want, i can add code to handle misaligned cases ?

No, frame->data[0] should always be sufficiently aligned.
I hadn't even looked what dst pointed to, hence my original comment
about this change potentially not have an effect in all cases.

It's ok as is, no need to make it anymore complex.

> 
> thx
> 
> [...]
> 
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

Michael Niedermayer Sept. 10, 2017, 2:28 p.m. UTC | #4

On Sat, Sep 09, 2017 at 08:02:04PM -0300, James Almer wrote:
> On 9/9/2017 7:47 PM, Michael Niedermayer wrote:
> > On Sat, Sep 09, 2017 at 04:37:52PM -0500, Brian Matherly wrote:
> >>
> >> On 9/9/2017 1:27 PM, Michael Niedermayer wrote:
> >>> +            // If the image is sufficiently aligned, compute 8 samples at once
> >>> +            if (!(((uintptr_t)dst) & 7)) {
> >>> +                uint64_t *dst64 = (uint64_t *)dst;
> >>> +                int w = avctx->width>>1;
> >>> +                for (x = 0; x < w; x++) {
> >>> +                    dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
> >>> +                }
> >>> +                x *= 8;
> >>> +            } else
> >>> +                x = 0;
> >>> +            for (; x < avctx->width * 4; x++) {
> >>>                  dst[x] = dst[x] << 3;
> >>>              }
> >>
> >> Forgive me if I'm not understanding the code correctly, but couldn't
> >> you always apply the optimization if you align the first (up to) 7
> >> samples?
> > 
> > yes, thats possible, it would be optimizing a case that probably
> > never occurs in practice though.
> > 
> > If people want, i can add code to handle misaligned cases ?
> 
> No, frame->data[0] should always be sufficiently aligned.
> I hadn't even looked what dst pointed to, hence my original comment
> about this change potentially not have an effect in all cases.
> 
> It's ok as is, no need to make it anymore complex.

will apply

thanks

[...]

diff --git a/libavcodec/scpr.c b/libavcodec/scpr.c
index 37fbe7a106..cbe1bc40d9 100644
--- a/libavcodec/scpr.c
+++ b/libavcodec/scpr.c
@@ -826,8 +826,19 @@  static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame,
         if (ret < 0)
             return ret;
 
+        // scale up each sample by 8
         for (y = 0; y < avctx->height; y++) {
-            for (x = 0; x < avctx->width * 4; x++) {
+            // If the image is sufficiently aligned, compute 8 samples at once
+            if (!(((uintptr_t)dst) & 7)) {
+                uint64_t *dst64 = (uint64_t *)dst;
+                int w = avctx->width>>1;
+                for (x = 0; x < w; x++) {
+                    dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
+                }
+                x *= 8;
+            } else
+                x = 0;
+            for (; x < avctx->width * 4; x++) {
                 dst[x] = dst[x] << 3;
             }
             dst += frame->linesize[0];

[FFmpeg-devel] avcodec/scpr: optimize shift loop.

Commit Message

Comments

Patch