Message ID | 20190117222802.GP3501@michaelspb |
---|---|
State | New |
Headers | show |
On Thu, 17 Jan 2019, Michael Niedermayer wrote: > On Wed, Jan 16, 2019 at 08:00:22PM +0100, Marton Balint wrote: >> >> >> On Tue, 15 Jan 2019, Michael Niedermayer wrote: >> >>> On Sun, Dec 30, 2018 at 07:15:49PM +0100, Marton Balint wrote: >>>> >>>> >>>> On Fri, 28 Dec 2018, Michael Niedermayer wrote: >>>> >>>>> On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote: >>>>>> >>>>>> >>>>>> On Wed, 26 Dec 2018, Paul B Mahol wrote: >>>>>> >>>>>>> On 12/26/18, Michael Niedermayer <michael@niedermayer.cc> wrote: >>>>>>>> On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: >>>>>>>>> On 12/25/18, Michael Niedermayer <michael@niedermayer.cc> wrote: >>>>>>>>>> Fixes: Timeout >>>>>>>>>> Fixes: >>>>>>>>>> 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 >>>>>>>>>> Before: Executed >>>>>>>>>> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 >>>>>>>>>> in 11294 ms >>>>>>>>>> After : Executed >>>>>>>>>> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 >>>>>>>>>> in 4249 ms >>>>>>>>>> >>>>>>>>>> Found-by: continuous fuzzing process >>>>>>>>>> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg >>>>>>>>>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> >>>>>>>>>> --- >>>>>>>>>> libavutil/imgutils.c | 6 ++++++ >>>>>>>>>> 1 file changed, 6 insertions(+) >>>>>>>>>> >>>>>>>>>> diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c >>>>>>>>>> index 4938a7ef67..cc38f1e878 100644 >>>>>>>>>> --- a/libavutil/imgutils.c >>>>>>>>>> +++ b/libavutil/imgutils.c >>>>>>>>>> @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t >>>>>>>>>> dst_size, >>>>>>>>>> uint8_t *clear, >>>>>>>>>> } >>>>>>>>>> } else if (clear_size == 4) { >>>>>>>>>> uint32_t val = AV_RN32(clear); >>>>>>>>>> + uint64_t val8 = val * 0x100000001ULL; >>>>>>>>>> + for (; dst_size >= 32; dst_size -= 32) { >>>>>>>>>> + AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); >>>>>>>>>> + AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); >>>>>>>>>> + dst += 32; >>>>>>>>>> + } >>>>>>>>>> for (; dst_size >= 4; dst_size -= 4) { >>>>>>>>>> AV_WN32(dst, val); >>>>>>>>>> dst += 4; >>>>>>>>>> -- >>>>>>>>>> 2.20.1 >>>>>>>>>> >>>>>>>>> >>>>>>>>> NAK, implement special memset function instead. >>>>>>>> >>>>>>>> I can move the added loop into a seperate function, if thats what you >>>>>>>> suggest ? >>>>>>> >>>>>>> No, don't do that. >>>>>>> >>>>>>>> All the code is already in a "special" memset though, this is >>>>>>>> memset_bytes() >>>>>>>> >>>>>>> >>>>>>> I guess function is less useful if its static. So any duplicate should >>>>>>> be avoided in codebase. >>>>>> >>>>>> Isn't av_memcpy_backptr does almost exactly what is needed here? That can >>>>>> also be optimized further if needed. >>>>> >>>>> av_memcpy_backptr() copies data with overlap, its more like a recursive >>>>> memmove(). >>>> >>>> So? As far as I see the memset_bytes function in imgutils.c can be replaced >>>> with this: >>>> >>>> if (clear_size > dst_size) >>>> clear_size = dst_size; >>>> memcpy(dst, clear, clear_size); >>>> av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size); >>>> >>>> I am not against an av_memset_bytes API addition, but I believe it should >>>> share code with av_memcpy_backptr to avoid duplication. >>> >>> ive implemented this, it does not seem to be really faster in the testcase >> >> I guess it is not faster because you have not applied your original >> optimalization to fill32 in libavutil/mem.c. Could you compare speed after >> optimizing that the same way your original patch did it with imgutils >> memset_bytes? > > sure, that makes it faster: Thanks, both patches LGTM. Regards, Marton
On Sat, Jan 19, 2019 at 12:28:25AM +0100, Marton Balint wrote: > > > On Thu, 17 Jan 2019, Michael Niedermayer wrote: > > >On Wed, Jan 16, 2019 at 08:00:22PM +0100, Marton Balint wrote: > >> > >> > >>On Tue, 15 Jan 2019, Michael Niedermayer wrote: > >> > >>>On Sun, Dec 30, 2018 at 07:15:49PM +0100, Marton Balint wrote: > >>>> > >>>> > >>>>On Fri, 28 Dec 2018, Michael Niedermayer wrote: > >>>> > >>>>>On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote: > >>>>>> > >>>>>> > >>>>>>On Wed, 26 Dec 2018, Paul B Mahol wrote: > >>>>>> > >>>>>>>On 12/26/18, Michael Niedermayer <michael@niedermayer.cc> wrote: > >>>>>>>>On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: > >>>>>>>>>On 12/25/18, Michael Niedermayer <michael@niedermayer.cc> wrote: > >>>>>>>>>>Fixes: Timeout > >>>>>>>>>>Fixes: > >>>>>>>>>>11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>>>>>>>>>Before: Executed > >>>>>>>>>>clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>>>>>>>>>in 11294 ms > >>>>>>>>>>After : Executed > >>>>>>>>>>clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 > >>>>>>>>>>in 4249 ms > >>>>>>>>>> > >>>>>>>>>>Found-by: continuous fuzzing process > >>>>>>>>>>https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > >>>>>>>>>>Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > >>>>>>>>>>--- > >>>>>>>>>>libavutil/imgutils.c | 6 ++++++ > >>>>>>>>>>1 file changed, 6 insertions(+) > >>>>>>>>>> > >>>>>>>>>>diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c > >>>>>>>>>>index 4938a7ef67..cc38f1e878 100644 > >>>>>>>>>>--- a/libavutil/imgutils.c > >>>>>>>>>>+++ b/libavutil/imgutils.c > >>>>>>>>>>@@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t > >>>>>>>>>>dst_size, > >>>>>>>>>>uint8_t *clear, > >>>>>>>>>> } > >>>>>>>>>> } else if (clear_size == 4) { > >>>>>>>>>> uint32_t val = AV_RN32(clear); > >>>>>>>>>>+ uint64_t val8 = val * 0x100000001ULL; > >>>>>>>>>>+ for (; dst_size >= 32; dst_size -= 32) { > >>>>>>>>>>+ AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); > >>>>>>>>>>+ AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); > >>>>>>>>>>+ dst += 32; > >>>>>>>>>>+ } > >>>>>>>>>> for (; dst_size >= 4; dst_size -= 4) { > >>>>>>>>>> AV_WN32(dst, val); > >>>>>>>>>> dst += 4; > >>>>>>>>>>-- > >>>>>>>>>>2.20.1 > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>>NAK, implement special memset function instead. > >>>>>>>> > >>>>>>>>I can move the added loop into a seperate function, if thats what you > >>>>>>>>suggest ? > >>>>>>> > >>>>>>>No, don't do that. > >>>>>>> > >>>>>>>>All the code is already in a "special" memset though, this is > >>>>>>>>memset_bytes() > >>>>>>>> > >>>>>>> > >>>>>>>I guess function is less useful if its static. So any duplicate should > >>>>>>>be avoided in codebase. > >>>>>> > >>>>>>Isn't av_memcpy_backptr does almost exactly what is needed here? That can > >>>>>>also be optimized further if needed. > >>>>> > >>>>>av_memcpy_backptr() copies data with overlap, its more like a recursive > >>>>>memmove(). > >>>> > >>>>So? As far as I see the memset_bytes function in imgutils.c can be replaced > >>>>with this: > >>>> > >>>> if (clear_size > dst_size) > >>>> clear_size = dst_size; > >>>> memcpy(dst, clear, clear_size); > >>>> av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size); > >>>> > >>>>I am not against an av_memset_bytes API addition, but I believe it should > >>>>share code with av_memcpy_backptr to avoid duplication. > >>> > >>>ive implemented this, it does not seem to be really faster in the testcase > >> > >>I guess it is not faster because you have not applied your original > >>optimalization to fill32 in libavutil/mem.c. Could you compare speed after > >>optimizing that the same way your original patch did it with imgutils > >>memset_bytes? > > > >sure, that makes it faster: > > Thanks, both patches LGTM. will apply thanks [...]
diff --git a/libavutil/mem.c b/libavutil/mem.c index 6149755a6b..88fe09b179 100644 --- a/libavutil/mem.c +++ b/libavutil/mem.c @@ -399,6 +399,18 @@ static void fill32(uint8_t *dst, int len) { uint32_t v = AV_RN32(dst - 4); +#if HAVE_FAST_64BIT + uint64_t v2= v + ((uint64_t)v<<32); + while (len >= 32) { + AV_WN64(dst , v2); + AV_WN64(dst+ 8, v2); + AV_WN64(dst+16, v2); + AV_WN64(dst+24, v2); + dst += 32; + len -= 32; + } +#endif + while (len >= 4) { AV_WN32(dst, v); dst += 4; -- 2.20.1 From 9b5573f91a043a818fe1fd6b93d0d36c4830cd9c Mon Sep 17 00:00:00 2001 From: Michael Niedermayer <michael@niedermayer.cc> Date: Tue, 25 Dec 2018 23:15:20 +0100 Subject: [PATCH 2/2] avutil/imgutils: Optimize memset_bytes() by using av_memcpy_backptr() This is strongly based on code by Marton Balint Fixes: Timeout Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 Before: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 11209 ms After: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 4104 ms Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> --- libavutil/imgutils.c | 26 +++++--------------------- 1 file changed, 5 insertions(+), 21 deletions(-) diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c index 4938a7ef67..cf06afde3f 100644 --- a/libavutil/imgutils.c +++ b/libavutil/imgutils.c @@ -521,28 +521,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, uint8_t *clear, if (clear_size == 1) { memset(dst, clear[0], dst_size); dst_size = 0; - } else if (clear_size == 2) { - uint16_t val = AV_RN16(clear); - for (; dst_size >= 2; dst_size -= 2) { - AV_WN16(dst, val); - dst += 2; - } - } else if (clear_size == 4) { - uint32_t val = AV_RN32(clear); - for (; dst_size >= 4; dst_size -= 4) { - AV_WN32(dst, val); - dst += 4; - } - } else if (clear_size == 8) { - uint32_t val = AV_RN64(clear); - for (; dst_size >= 8; dst_size -= 8) { - AV_WN64(dst, val); - dst += 8; - } + } else { + if (clear_size > dst_size) + clear_size = dst_size; + memcpy(dst, clear, clear_size); + av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size); } - - for (; dst_size; dst_size--) - *dst++ = clear[pos++ % clear_size]; } // Maximum size in bytes of a plane element (usually a pixel, or multiple pixels