From patchwork Thu Jan 17 22:28:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Niedermayer X-Patchwork-Id: 11785 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 3051244DEFC for ; Fri, 18 Jan 2019 00:28:08 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 516B268A95D; Fri, 18 Jan 2019 00:27:56 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from relay7-d.mail.gandi.net (relay7-d.mail.gandi.net [217.70.183.200]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4751168A6B5 for ; Fri, 18 Jan 2019 00:27:49 +0200 (EET) X-Originating-IP: 213.47.41.20 Received: from localhost (213-47-41-20.cable.dynamic.surfer.at [213.47.41.20]) (Authenticated sender: michael@niedermayer.cc) by relay7-d.mail.gandi.net (Postfix) with ESMTPSA id D23B320005 for ; Thu, 17 Jan 2019 22:28:03 +0000 (UTC) Date: Thu, 17 Jan 2019 23:28:02 +0100 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20190117222802.GP3501@michaelspb> References: <20181225221522.18064-1-michael@niedermayer.cc> <20181226193700.GB3501@michaelspb> <20181228165141.GI3501@michaelspb> <20190115223303.GH3501@michaelspb> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" On Wed, Jan 16, 2019 at 08:00:22PM +0100, Marton Balint wrote: > > > On Tue, 15 Jan 2019, Michael Niedermayer wrote: > > >On Sun, Dec 30, 2018 at 07:15:49PM +0100, Marton Balint wrote: > >> > >> > >>On Fri, 28 Dec 2018, Michael Niedermayer wrote: > >> > >>>On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote: > >>>> > >>>> > >>>>On Wed, 26 Dec 2018, Paul B Mahol wrote: > >>>> > >>>>>On 12/26/18, Michael Niedermayer wrote: > >>>>>>On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: > >>>>>>>On 12/25/18, Michael Niedermayer wrote: > >>>>>>>>Fixes: Timeout > >>>>>>>>Fixes: > >>>>>>>>11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>>>>>>>Before: Executed > >>>>>>>>clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>>>>>>>in 11294 ms > >>>>>>>>After : Executed > >>>>>>>>clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>>>>>>>in 4249 ms > >>>>>>>> > >>>>>>>>Found-by: continuous fuzzing process > >>>>>>>>https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > >>>>>>>>Signed-off-by: Michael Niedermayer > >>>>>>>>--- > >>>>>>>>libavutil/imgutils.c | 6 ++++++ > >>>>>>>>1 file changed, 6 insertions(+) > >>>>>>>> > >>>>>>>>diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c > >>>>>>>>index 4938a7ef67..cc38f1e878 100644 > >>>>>>>>--- a/libavutil/imgutils.c > >>>>>>>>+++ b/libavutil/imgutils.c > >>>>>>>>@@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t > >>>>>>>>dst_size, > >>>>>>>>uint8_t *clear, > >>>>>>>> } > >>>>>>>> } else if (clear_size == 4) { > >>>>>>>> uint32_t val = AV_RN32(clear); > >>>>>>>>+ uint64_t val8 = val * 0x100000001ULL; > >>>>>>>>+ for (; dst_size >= 32; dst_size -= 32) { > >>>>>>>>+ AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); > >>>>>>>>+ AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); > >>>>>>>>+ dst += 32; > >>>>>>>>+ } > >>>>>>>> for (; dst_size >= 4; dst_size -= 4) { > >>>>>>>> AV_WN32(dst, val); > >>>>>>>> dst += 4; > >>>>>>>>-- > >>>>>>>>2.20.1 > >>>>>>>> > >>>>>>> > >>>>>>>NAK, implement special memset function instead. > >>>>>> > >>>>>>I can move the added loop into a seperate function, if thats what you > >>>>>>suggest ? > >>>>> > >>>>>No, don't do that. > >>>>> > >>>>>>All the code is already in a "special" memset though, this is > >>>>>>memset_bytes() > >>>>>> > >>>>> > >>>>>I guess function is less useful if its static. So any duplicate should > >>>>>be avoided in codebase. > >>>> > >>>>Isn't av_memcpy_backptr does almost exactly what is needed here? That can > >>>>also be optimized further if needed. > >>> > >>>av_memcpy_backptr() copies data with overlap, its more like a recursive > >>>memmove(). > >> > >>So? As far as I see the memset_bytes function in imgutils.c can be replaced > >>with this: > >> > >> if (clear_size > dst_size) > >> clear_size = dst_size; > >> memcpy(dst, clear, clear_size); > >> av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size); > >> > >>I am not against an av_memset_bytes API addition, but I believe it should > >>share code with av_memcpy_backptr to avoid duplication. > > > >ive implemented this, it does not seem to be really faster in the testcase > > I guess it is not faster because you have not applied your original > optimalization to fill32 in libavutil/mem.c. Could you compare speed after > optimizing that the same way your original patch did it with imgutils > memset_bytes? sure, that makes it faster: From f5660e4025bb8161ebdb55cda03b656cbf685b1a Mon Sep 17 00:00:00 2001 From: Michael Niedermayer Date: Thu, 17 Jan 2019 22:35:10 +0100 Subject: [PATCH 1/2] avutil/mem: Optimize fill32() by unrolling and using 64bit Signed-off-by: Michael Niedermayer --- libavutil/mem.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/libavutil/mem.c b/libavutil/mem.c index 6149755a6b..88fe09b179 100644 --- a/libavutil/mem.c +++ b/libavutil/mem.c @@ -399,6 +399,18 @@ static void fill32(uint8_t *dst, int len) { uint32_t v = AV_RN32(dst - 4); +#if HAVE_FAST_64BIT + uint64_t v2= v + ((uint64_t)v<<32); + while (len >= 32) { + AV_WN64(dst , v2); + AV_WN64(dst+ 8, v2); + AV_WN64(dst+16, v2); + AV_WN64(dst+24, v2); + dst += 32; + len -= 32; + } +#endif + while (len >= 4) { AV_WN32(dst, v); dst += 4; -- 2.20.1 From 9b5573f91a043a818fe1fd6b93d0d36c4830cd9c Mon Sep 17 00:00:00 2001 From: Michael Niedermayer Date: Tue, 25 Dec 2018 23:15:20 +0100 Subject: [PATCH 2/2] avutil/imgutils: Optimize memset_bytes() by using av_memcpy_backptr() This is strongly based on code by Marton Balint Fixes: Timeout Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 Before: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 11209 ms After: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 4104 ms Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer --- libavutil/imgutils.c | 26 +++++--------------------- 1 file changed, 5 insertions(+), 21 deletions(-) diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c index 4938a7ef67..cf06afde3f 100644 --- a/libavutil/imgutils.c +++ b/libavutil/imgutils.c @@ -521,28 +521,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, uint8_t *clear, if (clear_size == 1) { memset(dst, clear[0], dst_size); dst_size = 0; - } else if (clear_size == 2) { - uint16_t val = AV_RN16(clear); - for (; dst_size >= 2; dst_size -= 2) { - AV_WN16(dst, val); - dst += 2; - } - } else if (clear_size == 4) { - uint32_t val = AV_RN32(clear); - for (; dst_size >= 4; dst_size -= 4) { - AV_WN32(dst, val); - dst += 4; - } - } else if (clear_size == 8) { - uint32_t val = AV_RN64(clear); - for (; dst_size >= 8; dst_size -= 8) { - AV_WN64(dst, val); - dst += 8; - } + } else { + if (clear_size > dst_size) + clear_size = dst_size; + memcpy(dst, clear, clear_size); + av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size); } - - for (; dst_size; dst_size--) - *dst++ = clear[pos++ % clear_size]; } // Maximum size in bytes of a plane element (usually a pixel, or multiple pixels