From patchwork Sun Jul 28 11:56:39 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Michael Niedermayer <michael@niedermayer.cc>
X-Patchwork-Id: 14106
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
X-Original-To: patchwork@ffaux-bg.ffmpeg.org
Delivered-To: patchwork@ffaux-bg.ffmpeg.org
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by ffaux.localdomain (Postfix) with ESMTP id 1018344A01C
	for <patchwork@ffaux-bg.ffmpeg.org>;
	Sun, 28 Jul 2019 14:56:48 +0300 (EEST)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DBCC368AAF4;
	Sun, 28 Jul 2019 14:56:47 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from relay12.mail.gandi.net (relay12.mail.gandi.net
	[217.70.178.232])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id AA9EC68A0FA
	for <ffmpeg-devel@ffmpeg.org>; Sun, 28 Jul 2019 14:56:41 +0300 (EEST)
Received: from localhost (213-47-41-20.cable.dynamic.surfer.at
	[213.47.41.20]) (Authenticated sender: michael@niedermayer.cc)
	by relay12.mail.gandi.net (Postfix) with ESMTPSA id D7B63200006
	for <ffmpeg-devel@ffmpeg.org>; Sun, 28 Jul 2019 11:56:40 +0000 (UTC)
Date: Sun, 28 Jul 2019 13:56:39 +0200
From: Michael Niedermayer <michael@niedermayer.cc>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Message-ID: <20190728115639.GQ3219@michaelspb>
References: <20190727223122.28243-1-michael@niedermayer.cc>
	<95676D76-6B5F-4FD5-84E6-CD772DF1D68A@gmx.de>
MIME-Version: 1.0
In-Reply-To: <95676D76-6B5F-4FD5-84E6-CD772DF1D68A@gmx.de>
User-Agent: Mutt/1.5.24 (2015-08-30)
Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/lcldec: Optimize YUV422 case
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <http://ffmpeg.org/mailman/options/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <http://ffmpeg.org/pipermail/ffmpeg-devel/>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

On Sun, Jul 28, 2019 at 12:45:36AM +0200, Reimar Döffinger wrote:
> 
> 
> On 28.07.2019, at 00:31, Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > This merges several byte operations and avoids some shifts inside the loop
> > 
> > Improves: Timeout (330sec -> 134sec)
> > Improves: 15599/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MSZH_fuzzer-5658127116009472
> > 
> > Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > ---
> > libavcodec/lcldec.c | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/libavcodec/lcldec.c b/libavcodec/lcldec.c
> > index 104defa5f5..c3787b3cbe 100644
> > --- a/libavcodec/lcldec.c
> > +++ b/libavcodec/lcldec.c
> > @@ -391,13 +391,13 @@ static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, AVPac
> >         break;
> >     case IMGTYPE_YUV422:
> >         for (row = 0; row < height; row++) {
> > -            for (col = 0; col < width - 3; col += 4) {
> > +            for (col = 0; col < (width - 2)>>1; col += 2) {
> >                 memcpy(y_out + col, encoded, 4);
> >                 encoded += 4;
> > -                u_out[ col >> 1     ] = *encoded++ + 128;
> > -                u_out[(col >> 1) + 1] = *encoded++ + 128;
> > -                v_out[ col >> 1     ] = *encoded++ + 128;
> > -                v_out[(col >> 1) + 1] = *encoded++ + 128;
> > +                AV_WN16(u_out + col, AV_RN16(encoded) ^ 0x8080);
> > +                encoded += 2;
> > +                AV_WN16(v_out + col, AV_RN16(encoded) ^ 0x8080);
> > +                encoded += 2;
> 
> Huh? Surely the pixel stride used for y_out still needs to be double of the u/v one?

> I suspect doing only the AV_RN16/xor optimization might be best, the one shift saved seems not worth the risk/complexity...

if you want i can remove the shift change ?
with the fixed shift change its 155sec, if i remove the shift optimization its 170sec

patch for the 155 case below:

commit 56998b7d57a2cd0ed7f53981c50e76fd419cd86f (HEAD)
Author: Michael Niedermayer <michael@niedermayer.cc>
Date:   Sat Jul 27 22:46:34 2019 +0200

    avcodec/lcldec: Optimize YUV422 case
    
    This merges several byte operations and avoids some shifts inside the loop
    
    Improves: Timeout (330sec -> 155sec)
    Improves: 15599/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MSZH_fuzzer-5658127116009472
    
    Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
    Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

[...]

diff --git a/libavcodec/lcldec.c b/libavcodec/lcldec.c
index 104defa5f5..9e018ff5a9 100644
--- a/libavcodec/lcldec.c
+++ b/libavcodec/lcldec.c
@@ -391,13 +391,13 @@ static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, AVPac
         break;
     case IMGTYPE_YUV422:
         for (row = 0; row < height; row++) {
-            for (col = 0; col < width - 3; col += 4) {
-                memcpy(y_out + col, encoded, 4);
+            for (col = 0; col < (width - 2)>>1; col += 2) {
+                memcpy(y_out + 2 * col, encoded, 4);
                 encoded += 4;
-                u_out[ col >> 1     ] = *encoded++ + 128;
-                u_out[(col >> 1) + 1] = *encoded++ + 128;
-                v_out[ col >> 1     ] = *encoded++ + 128;
-                v_out[(col >> 1) + 1] = *encoded++ + 128;
+                AV_WN16(u_out + col, AV_RN16(encoded) ^ 0x8080);
+                encoded += 2;
+                AV_WN16(v_out + col, AV_RN16(encoded) ^ 0x8080);
+                encoded += 2;
             }
             y_out -= frame->linesize[0];
             u_out -= frame->linesize[1];