diff mbox

[FFmpeg-devel] avcodec/scpr: optimize shift loop.

Message ID 20170909182740.11094-1-michael@niedermayer.cc
State Accepted
Commit 981f04b2ae2d6e0355386aaff39840eb5d390a36
Headers show

Commit Message

Michael Niedermayer Sept. 9, 2017, 6:27 p.m. UTC
Speeds code up from 50sec to 15sec

Fixes Timeout
Fixes: 3242/clusterfuzz-testcase-5811951672229888

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/scpr.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Brian Matherly Sept. 9, 2017, 9:37 p.m. UTC | #1
On 9/9/2017 1:27 PM, Michael Niedermayer wrote:
> +            // If the image is sufficiently aligned, compute 8 samples at once
> +            if (!(((uintptr_t)dst) & 7)) {
> +                uint64_t *dst64 = (uint64_t *)dst;
> +                int w = avctx->width>>1;
> +                for (x = 0; x < w; x++) {
> +                    dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
> +                }
> +                x *= 8;
> +            } else
> +                x = 0;
> +            for (; x < avctx->width * 4; x++) {
>                   dst[x] = dst[x] << 3;
>               }

Forgive me if I'm not understanding the code correctly, but couldn't you 
always apply the optimization if you align the first (up to) 7 samples?

Pseudocode:

uint64_t *dst64 = (uint64_t *)dst;
int w = avctx->width>>1;
x=0
// compute un-aligned beginning samples
for (; x < (avctx->width * 4) && (((uintptr_t)dst) & 7); x++) {
     dst[x] = dst[x] << 3;
}
// compute aligned samples
for (; x < w; x+=8) {
     dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
}
x -= 8;
// compute un-aligned ending samples
for (; x < avctx->width * 4; x++) {
     dst[x] = dst[x] << 3;
}
Michael Niedermayer Sept. 9, 2017, 10:47 p.m. UTC | #2
On Sat, Sep 09, 2017 at 04:37:52PM -0500, Brian Matherly wrote:
> 
> On 9/9/2017 1:27 PM, Michael Niedermayer wrote:
> >+            // If the image is sufficiently aligned, compute 8 samples at once
> >+            if (!(((uintptr_t)dst) & 7)) {
> >+                uint64_t *dst64 = (uint64_t *)dst;
> >+                int w = avctx->width>>1;
> >+                for (x = 0; x < w; x++) {
> >+                    dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
> >+                }
> >+                x *= 8;
> >+            } else
> >+                x = 0;
> >+            for (; x < avctx->width * 4; x++) {
> >                  dst[x] = dst[x] << 3;
> >              }
> 
> Forgive me if I'm not understanding the code correctly, but couldn't
> you always apply the optimization if you align the first (up to) 7
> samples?

yes, thats possible, it would be optimizing a case that probably
never occurs in practice though.

If people want, i can add code to handle misaligned cases ?

thx

[...]
James Almer Sept. 9, 2017, 11:02 p.m. UTC | #3
On 9/9/2017 7:47 PM, Michael Niedermayer wrote:
> On Sat, Sep 09, 2017 at 04:37:52PM -0500, Brian Matherly wrote:
>>
>> On 9/9/2017 1:27 PM, Michael Niedermayer wrote:
>>> +            // If the image is sufficiently aligned, compute 8 samples at once
>>> +            if (!(((uintptr_t)dst) & 7)) {
>>> +                uint64_t *dst64 = (uint64_t *)dst;
>>> +                int w = avctx->width>>1;
>>> +                for (x = 0; x < w; x++) {
>>> +                    dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
>>> +                }
>>> +                x *= 8;
>>> +            } else
>>> +                x = 0;
>>> +            for (; x < avctx->width * 4; x++) {
>>>                  dst[x] = dst[x] << 3;
>>>              }
>>
>> Forgive me if I'm not understanding the code correctly, but couldn't
>> you always apply the optimization if you align the first (up to) 7
>> samples?
> 
> yes, thats possible, it would be optimizing a case that probably
> never occurs in practice though.
> 
> If people want, i can add code to handle misaligned cases ?

No, frame->data[0] should always be sufficiently aligned.
I hadn't even looked what dst pointed to, hence my original comment
about this change potentially not have an effect in all cases.

It's ok as is, no need to make it anymore complex.

> 
> thx
> 
> [...]
> 
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
Michael Niedermayer Sept. 10, 2017, 2:28 p.m. UTC | #4
On Sat, Sep 09, 2017 at 08:02:04PM -0300, James Almer wrote:
> On 9/9/2017 7:47 PM, Michael Niedermayer wrote:
> > On Sat, Sep 09, 2017 at 04:37:52PM -0500, Brian Matherly wrote:
> >>
> >> On 9/9/2017 1:27 PM, Michael Niedermayer wrote:
> >>> +            // If the image is sufficiently aligned, compute 8 samples at once
> >>> +            if (!(((uintptr_t)dst) & 7)) {
> >>> +                uint64_t *dst64 = (uint64_t *)dst;
> >>> +                int w = avctx->width>>1;
> >>> +                for (x = 0; x < w; x++) {
> >>> +                    dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
> >>> +                }
> >>> +                x *= 8;
> >>> +            } else
> >>> +                x = 0;
> >>> +            for (; x < avctx->width * 4; x++) {
> >>>                  dst[x] = dst[x] << 3;
> >>>              }
> >>
> >> Forgive me if I'm not understanding the code correctly, but couldn't
> >> you always apply the optimization if you align the first (up to) 7
> >> samples?
> > 
> > yes, thats possible, it would be optimizing a case that probably
> > never occurs in practice though.
> > 
> > If people want, i can add code to handle misaligned cases ?
> 
> No, frame->data[0] should always be sufficiently aligned.
> I hadn't even looked what dst pointed to, hence my original comment
> about this change potentially not have an effect in all cases.
> 
> It's ok as is, no need to make it anymore complex.

will apply

thanks

[...]
diff mbox

Patch

diff --git a/libavcodec/scpr.c b/libavcodec/scpr.c
index 37fbe7a106..cbe1bc40d9 100644
--- a/libavcodec/scpr.c
+++ b/libavcodec/scpr.c
@@ -826,8 +826,19 @@  static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame,
         if (ret < 0)
             return ret;
 
+        // scale up each sample by 8
         for (y = 0; y < avctx->height; y++) {
-            for (x = 0; x < avctx->width * 4; x++) {
+            // If the image is sufficiently aligned, compute 8 samples at once
+            if (!(((uintptr_t)dst) & 7)) {
+                uint64_t *dst64 = (uint64_t *)dst;
+                int w = avctx->width>>1;
+                for (x = 0; x < w; x++) {
+                    dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
+                }
+                x *= 8;
+            } else
+                x = 0;
+            for (; x < avctx->width * 4; x++) {
                 dst[x] = dst[x] << 3;
             }
             dst += frame->linesize[0];