From patchwork Sun Jul 28 11:56:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Michael Niedermayer X-Patchwork-Id: 14106 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 1018344A01C for ; Sun, 28 Jul 2019 14:56:48 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DBCC368AAF4; Sun, 28 Jul 2019 14:56:47 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from relay12.mail.gandi.net (relay12.mail.gandi.net [217.70.178.232]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id AA9EC68A0FA for ; Sun, 28 Jul 2019 14:56:41 +0300 (EEST) Received: from localhost (213-47-41-20.cable.dynamic.surfer.at [213.47.41.20]) (Authenticated sender: michael@niedermayer.cc) by relay12.mail.gandi.net (Postfix) with ESMTPSA id D7B63200006 for ; Sun, 28 Jul 2019 11:56:40 +0000 (UTC) Date: Sun, 28 Jul 2019 13:56:39 +0200 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20190728115639.GQ3219@michaelspb> References: <20190727223122.28243-1-michael@niedermayer.cc> <95676D76-6B5F-4FD5-84E6-CD772DF1D68A@gmx.de> MIME-Version: 1.0 In-Reply-To: <95676D76-6B5F-4FD5-84E6-CD772DF1D68A@gmx.de> User-Agent: Mutt/1.5.24 (2015-08-30) Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/lcldec: Optimize YUV422 case X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" On Sun, Jul 28, 2019 at 12:45:36AM +0200, Reimar Döffinger wrote: > > > On 28.07.2019, at 00:31, Michael Niedermayer wrote: > > > This merges several byte operations and avoids some shifts inside the loop > > > > Improves: Timeout (330sec -> 134sec) > > Improves: 15599/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MSZH_fuzzer-5658127116009472 > > > > Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > > Signed-off-by: Michael Niedermayer > > --- > > libavcodec/lcldec.c | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/libavcodec/lcldec.c b/libavcodec/lcldec.c > > index 104defa5f5..c3787b3cbe 100644 > > --- a/libavcodec/lcldec.c > > +++ b/libavcodec/lcldec.c > > @@ -391,13 +391,13 @@ static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, AVPac > > break; > > case IMGTYPE_YUV422: > > for (row = 0; row < height; row++) { > > - for (col = 0; col < width - 3; col += 4) { > > + for (col = 0; col < (width - 2)>>1; col += 2) { > > memcpy(y_out + col, encoded, 4); > > encoded += 4; > > - u_out[ col >> 1 ] = *encoded++ + 128; > > - u_out[(col >> 1) + 1] = *encoded++ + 128; > > - v_out[ col >> 1 ] = *encoded++ + 128; > > - v_out[(col >> 1) + 1] = *encoded++ + 128; > > + AV_WN16(u_out + col, AV_RN16(encoded) ^ 0x8080); > > + encoded += 2; > > + AV_WN16(v_out + col, AV_RN16(encoded) ^ 0x8080); > > + encoded += 2; > > Huh? Surely the pixel stride used for y_out still needs to be double of the u/v one? > I suspect doing only the AV_RN16/xor optimization might be best, the one shift saved seems not worth the risk/complexity... if you want i can remove the shift change ? with the fixed shift change its 155sec, if i remove the shift optimization its 170sec patch for the 155 case below: commit 56998b7d57a2cd0ed7f53981c50e76fd419cd86f (HEAD) Author: Michael Niedermayer Date: Sat Jul 27 22:46:34 2019 +0200 avcodec/lcldec: Optimize YUV422 case This merges several byte operations and avoids some shifts inside the loop Improves: Timeout (330sec -> 155sec) Improves: 15599/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MSZH_fuzzer-5658127116009472 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer [...] diff --git a/libavcodec/lcldec.c b/libavcodec/lcldec.c index 104defa5f5..9e018ff5a9 100644 --- a/libavcodec/lcldec.c +++ b/libavcodec/lcldec.c @@ -391,13 +391,13 @@ static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, AVPac break; case IMGTYPE_YUV422: for (row = 0; row < height; row++) { - for (col = 0; col < width - 3; col += 4) { - memcpy(y_out + col, encoded, 4); + for (col = 0; col < (width - 2)>>1; col += 2) { + memcpy(y_out + 2 * col, encoded, 4); encoded += 4; - u_out[ col >> 1 ] = *encoded++ + 128; - u_out[(col >> 1) + 1] = *encoded++ + 128; - v_out[ col >> 1 ] = *encoded++ + 128; - v_out[(col >> 1) + 1] = *encoded++ + 128; + AV_WN16(u_out + col, AV_RN16(encoded) ^ 0x8080); + encoded += 2; + AV_WN16(v_out + col, AV_RN16(encoded) ^ 0x8080); + encoded += 2; } y_out -= frame->linesize[0]; u_out -= frame->linesize[1];