[FFmpeg-devel,1/2] avcodec/lcldec: Optimize YUV422 case

Submitted by Michael Niedermayer on July 27, 2019, 10:31 p.m.

Details

Message ID 20190727223122.28243-1-michael@niedermayer.cc
State New
Headers show

Commit Message

Michael Niedermayer July 27, 2019, 10:31 p.m.
This merges several byte operations and avoids some shifts inside the loop

Improves: Timeout (330sec -> 134sec)
Improves: 15599/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MSZH_fuzzer-5658127116009472

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/lcldec.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Reimar Döffinger July 27, 2019, 10:45 p.m.
On 28.07.2019, at 00:31, Michael Niedermayer <michael@niedermayer.cc> wrote:

> This merges several byte operations and avoids some shifts inside the loop
> 
> Improves: Timeout (330sec -> 134sec)
> Improves: 15599/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MSZH_fuzzer-5658127116009472
> 
> Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
> libavcodec/lcldec.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/libavcodec/lcldec.c b/libavcodec/lcldec.c
> index 104defa5f5..c3787b3cbe 100644
> --- a/libavcodec/lcldec.c
> +++ b/libavcodec/lcldec.c
> @@ -391,13 +391,13 @@ static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, AVPac
>         break;
>     case IMGTYPE_YUV422:
>         for (row = 0; row < height; row++) {
> -            for (col = 0; col < width - 3; col += 4) {
> +            for (col = 0; col < (width - 2)>>1; col += 2) {
>                 memcpy(y_out + col, encoded, 4);
>                 encoded += 4;
> -                u_out[ col >> 1     ] = *encoded++ + 128;
> -                u_out[(col >> 1) + 1] = *encoded++ + 128;
> -                v_out[ col >> 1     ] = *encoded++ + 128;
> -                v_out[(col >> 1) + 1] = *encoded++ + 128;
> +                AV_WN16(u_out + col, AV_RN16(encoded) ^ 0x8080);
> +                encoded += 2;
> +                AV_WN16(v_out + col, AV_RN16(encoded) ^ 0x8080);
> +                encoded += 2;

Huh? Surely the pixel stride used for y_out still needs to be double of the u/v one?
I suspect doing only the AV_RN16/xor optimization might be best, the one shift saved seems not worth the risk/complexity...
Michael Niedermayer July 28, 2019, 11:22 a.m.
On Sun, Jul 28, 2019 at 12:45:36AM +0200, Reimar Döffinger wrote:
> 
> 
> On 28.07.2019, at 00:31, Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > This merges several byte operations and avoids some shifts inside the loop
> > 
> > Improves: Timeout (330sec -> 134sec)
> > Improves: 15599/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MSZH_fuzzer-5658127116009472
> > 
> > Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > ---
> > libavcodec/lcldec.c | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/libavcodec/lcldec.c b/libavcodec/lcldec.c
> > index 104defa5f5..c3787b3cbe 100644
> > --- a/libavcodec/lcldec.c
> > +++ b/libavcodec/lcldec.c
> > @@ -391,13 +391,13 @@ static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, AVPac
> >         break;
> >     case IMGTYPE_YUV422:
> >         for (row = 0; row < height; row++) {
> > -            for (col = 0; col < width - 3; col += 4) {
> > +            for (col = 0; col < (width - 2)>>1; col += 2) {
> >                 memcpy(y_out + col, encoded, 4);
> >                 encoded += 4;
> > -                u_out[ col >> 1     ] = *encoded++ + 128;
> > -                u_out[(col >> 1) + 1] = *encoded++ + 128;
> > -                v_out[ col >> 1     ] = *encoded++ + 128;
> > -                v_out[(col >> 1) + 1] = *encoded++ + 128;
> > +                AV_WN16(u_out + col, AV_RN16(encoded) ^ 0x8080);
> > +                encoded += 2;
> > +                AV_WN16(v_out + col, AV_RN16(encoded) ^ 0x8080);
> > +                encoded += 2;
> 
> Huh? Surely the pixel stride used for y_out still needs to be double of the u/v one?
> I suspect doing only the AV_RN16/xor optimization might be best, the one shift saved seems not worth the risk/complexity...

will fix and retest while ensuring this code is actually tested

thanks

[...]

Patch hide | download patch | download mbox

diff --git a/libavcodec/lcldec.c b/libavcodec/lcldec.c
index 104defa5f5..c3787b3cbe 100644
--- a/libavcodec/lcldec.c
+++ b/libavcodec/lcldec.c
@@ -391,13 +391,13 @@  static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, AVPac
         break;
     case IMGTYPE_YUV422:
         for (row = 0; row < height; row++) {
-            for (col = 0; col < width - 3; col += 4) {
+            for (col = 0; col < (width - 2)>>1; col += 2) {
                 memcpy(y_out + col, encoded, 4);
                 encoded += 4;
-                u_out[ col >> 1     ] = *encoded++ + 128;
-                u_out[(col >> 1) + 1] = *encoded++ + 128;
-                v_out[ col >> 1     ] = *encoded++ + 128;
-                v_out[(col >> 1) + 1] = *encoded++ + 128;
+                AV_WN16(u_out + col, AV_RN16(encoded) ^ 0x8080);
+                encoded += 2;
+                AV_WN16(v_out + col, AV_RN16(encoded) ^ 0x8080);
+                encoded += 2;
             }
             y_out -= frame->linesize[0];
             u_out -= frame->linesize[1];