Message ID | 20170909182740.11094-1-michael@niedermayer.cc |
---|---|
State | Accepted |
Commit | 981f04b2ae2d6e0355386aaff39840eb5d390a36 |
Headers | show |
On 9/9/2017 1:27 PM, Michael Niedermayer wrote: > + // If the image is sufficiently aligned, compute 8 samples at once > + if (!(((uintptr_t)dst) & 7)) { > + uint64_t *dst64 = (uint64_t *)dst; > + int w = avctx->width>>1; > + for (x = 0; x < w; x++) { > + dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL; > + } > + x *= 8; > + } else > + x = 0; > + for (; x < avctx->width * 4; x++) { > dst[x] = dst[x] << 3; > } Forgive me if I'm not understanding the code correctly, but couldn't you always apply the optimization if you align the first (up to) 7 samples? Pseudocode: uint64_t *dst64 = (uint64_t *)dst; int w = avctx->width>>1; x=0 // compute un-aligned beginning samples for (; x < (avctx->width * 4) && (((uintptr_t)dst) & 7); x++) { dst[x] = dst[x] << 3; } // compute aligned samples for (; x < w; x+=8) { dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL; } x -= 8; // compute un-aligned ending samples for (; x < avctx->width * 4; x++) { dst[x] = dst[x] << 3; }
On Sat, Sep 09, 2017 at 04:37:52PM -0500, Brian Matherly wrote: > > On 9/9/2017 1:27 PM, Michael Niedermayer wrote: > >+ // If the image is sufficiently aligned, compute 8 samples at once > >+ if (!(((uintptr_t)dst) & 7)) { > >+ uint64_t *dst64 = (uint64_t *)dst; > >+ int w = avctx->width>>1; > >+ for (x = 0; x < w; x++) { > >+ dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL; > >+ } > >+ x *= 8; > >+ } else > >+ x = 0; > >+ for (; x < avctx->width * 4; x++) { > > dst[x] = dst[x] << 3; > > } > > Forgive me if I'm not understanding the code correctly, but couldn't > you always apply the optimization if you align the first (up to) 7 > samples? yes, thats possible, it would be optimizing a case that probably never occurs in practice though. If people want, i can add code to handle misaligned cases ? thx [...]
On 9/9/2017 7:47 PM, Michael Niedermayer wrote: > On Sat, Sep 09, 2017 at 04:37:52PM -0500, Brian Matherly wrote: >> >> On 9/9/2017 1:27 PM, Michael Niedermayer wrote: >>> + // If the image is sufficiently aligned, compute 8 samples at once >>> + if (!(((uintptr_t)dst) & 7)) { >>> + uint64_t *dst64 = (uint64_t *)dst; >>> + int w = avctx->width>>1; >>> + for (x = 0; x < w; x++) { >>> + dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL; >>> + } >>> + x *= 8; >>> + } else >>> + x = 0; >>> + for (; x < avctx->width * 4; x++) { >>> dst[x] = dst[x] << 3; >>> } >> >> Forgive me if I'm not understanding the code correctly, but couldn't >> you always apply the optimization if you align the first (up to) 7 >> samples? > > yes, thats possible, it would be optimizing a case that probably > never occurs in practice though. > > If people want, i can add code to handle misaligned cases ? No, frame->data[0] should always be sufficiently aligned. I hadn't even looked what dst pointed to, hence my original comment about this change potentially not have an effect in all cases. It's ok as is, no need to make it anymore complex. > > thx > > [...] > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >
On Sat, Sep 09, 2017 at 08:02:04PM -0300, James Almer wrote: > On 9/9/2017 7:47 PM, Michael Niedermayer wrote: > > On Sat, Sep 09, 2017 at 04:37:52PM -0500, Brian Matherly wrote: > >> > >> On 9/9/2017 1:27 PM, Michael Niedermayer wrote: > >>> + // If the image is sufficiently aligned, compute 8 samples at once > >>> + if (!(((uintptr_t)dst) & 7)) { > >>> + uint64_t *dst64 = (uint64_t *)dst; > >>> + int w = avctx->width>>1; > >>> + for (x = 0; x < w; x++) { > >>> + dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL; > >>> + } > >>> + x *= 8; > >>> + } else > >>> + x = 0; > >>> + for (; x < avctx->width * 4; x++) { > >>> dst[x] = dst[x] << 3; > >>> } > >> > >> Forgive me if I'm not understanding the code correctly, but couldn't > >> you always apply the optimization if you align the first (up to) 7 > >> samples? > > > > yes, thats possible, it would be optimizing a case that probably > > never occurs in practice though. > > > > If people want, i can add code to handle misaligned cases ? > > No, frame->data[0] should always be sufficiently aligned. > I hadn't even looked what dst pointed to, hence my original comment > about this change potentially not have an effect in all cases. > > It's ok as is, no need to make it anymore complex. will apply thanks [...]
diff --git a/libavcodec/scpr.c b/libavcodec/scpr.c index 37fbe7a106..cbe1bc40d9 100644 --- a/libavcodec/scpr.c +++ b/libavcodec/scpr.c @@ -826,8 +826,19 @@ static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, if (ret < 0) return ret; + // scale up each sample by 8 for (y = 0; y < avctx->height; y++) { - for (x = 0; x < avctx->width * 4; x++) { + // If the image is sufficiently aligned, compute 8 samples at once + if (!(((uintptr_t)dst) & 7)) { + uint64_t *dst64 = (uint64_t *)dst; + int w = avctx->width>>1; + for (x = 0; x < w; x++) { + dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL; + } + x *= 8; + } else + x = 0; + for (; x < avctx->width * 4; x++) { dst[x] = dst[x] << 3; } dst += frame->linesize[0];
Speeds code up from 50sec to 15sec Fixes Timeout Fixes: 3242/clusterfuzz-testcase-5811951672229888 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> --- libavcodec/scpr.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)