Message ID | 20190115223303.GH3501@michaelspb |
---|---|
State | Superseded |
Headers | show |
On Tue, 15 Jan 2019, Michael Niedermayer wrote: > On Sun, Dec 30, 2018 at 07:15:49PM +0100, Marton Balint wrote: >> >> >> On Fri, 28 Dec 2018, Michael Niedermayer wrote: >> >>> On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote: >>>> >>>> >>>> On Wed, 26 Dec 2018, Paul B Mahol wrote: >>>> >>>>> On 12/26/18, Michael Niedermayer <michael@niedermayer.cc> wrote: >>>>>> On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote: >>>>>>> On 12/25/18, Michael Niedermayer <michael@niedermayer.cc> wrote: >>>>>>>> Fixes: Timeout >>>>>>>> Fixes: >>>>>>>> 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 >>>>>>>> Before: Executed >>>>>>>> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 >>>>>>>> in 11294 ms >>>>>>>> After : Executed >>>>>>>> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 >>>>>>>> in 4249 ms >>>>>>>> >>>>>>>> Found-by: continuous fuzzing process >>>>>>>> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg >>>>>>>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> >>>>>>>> --- >>>>>>>> libavutil/imgutils.c | 6 ++++++ >>>>>>>> 1 file changed, 6 insertions(+) >>>>>>>> >>>>>>>> diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c >>>>>>>> index 4938a7ef67..cc38f1e878 100644 >>>>>>>> --- a/libavutil/imgutils.c >>>>>>>> +++ b/libavutil/imgutils.c >>>>>>>> @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t >>>>>>>> dst_size, >>>>>>>> uint8_t *clear, >>>>>>>> } >>>>>>>> } else if (clear_size == 4) { >>>>>>>> uint32_t val = AV_RN32(clear); >>>>>>>> + uint64_t val8 = val * 0x100000001ULL; >>>>>>>> + for (; dst_size >= 32; dst_size -= 32) { >>>>>>>> + AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); >>>>>>>> + AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); >>>>>>>> + dst += 32; >>>>>>>> + } >>>>>>>> for (; dst_size >= 4; dst_size -= 4) { >>>>>>>> AV_WN32(dst, val); >>>>>>>> dst += 4; >>>>>>>> -- >>>>>>>> 2.20.1 >>>>>>>> >>>>>>> >>>>>>> NAK, implement special memset function instead. >>>>>> >>>>>> I can move the added loop into a seperate function, if thats what you >>>>>> suggest ? >>>>> >>>>> No, don't do that. >>>>> >>>>>> All the code is already in a "special" memset though, this is >>>>>> memset_bytes() >>>>>> >>>>> >>>>> I guess function is less useful if its static. So any duplicate should >>>>> be avoided in codebase. >>>> >>>> Isn't av_memcpy_backptr does almost exactly what is needed here? That can >>>> also be optimized further if needed. >>> >>> av_memcpy_backptr() copies data with overlap, its more like a recursive >>> memmove(). >> >> So? As far as I see the memset_bytes function in imgutils.c can be replaced >> with this: >> >> if (clear_size > dst_size) >> clear_size = dst_size; >> memcpy(dst, clear, clear_size); >> av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size); >> >> I am not against an av_memset_bytes API addition, but I believe it should >> share code with av_memcpy_backptr to avoid duplication. > > ive implemented this, it does not seem to be really faster in the testcase I guess it is not faster because you have not applied your original optimalization to fill32 in libavutil/mem.c. Could you compare speed after optimizing that the same way your original patch did it with imgutils memset_bytes? Thanks, Marton
diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c index 4938a7ef67..6c0d3950de 100644 --- a/libavutil/imgutils.c +++ b/libavutil/imgutils.c @@ -529,6 +529,14 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, uint8_t *clear, } } else if (clear_size == 4) { uint32_t val = AV_RN32(clear); +#if HAVE_FAST_64BIT + uint64_t val8 = val * 0x100000001ULL; + for (; dst_size >= 32; dst_size -= 32) { + AV_WN64(dst , val8); AV_WN64(dst+ 8, val8); + AV_WN64(dst+16, val8); AV_WN64(dst+24, val8); + dst += 32; + } +#endif for (; dst_size >= 4; dst_size -= 4) { AV_WN32(dst, val); dst += 4; -- 2.20.1 From 8e5140bf92d7e41090bfca1c6163f9c428402904 Mon Sep 17 00:00:00 2001 From: Michael Niedermayer <michael@niedermayer.cc> Date: Tue, 25 Dec 2018 23:15:20 +0100 Subject: [PATCH] avutil/imgutils: Optimize memset_bytes() by using av_memcpy_backptr() This is strongly based on code by Marton Balint Fixes: Timeout Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 Before: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 11294 ms After: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 10948 ms Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> --- libavutil/imgutils.c | 26 +++++--------------------- 1 file changed, 5 insertions(+), 21 deletions(-) diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c index 4938a7ef67..cf06afde3f 100644 --- a/libavutil/imgutils.c +++ b/libavutil/imgutils.c @@ -521,28 +521,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, uint8_t *clear, if (clear_size == 1) { memset(dst, clear[0], dst_size); dst_size = 0; - } else if (clear_size == 2) { - uint16_t val = AV_RN16(clear); - for (; dst_size >= 2; dst_size -= 2) { - AV_WN16(dst, val); - dst += 2; - } - } else if (clear_size == 4) { - uint32_t val = AV_RN32(clear); - for (; dst_size >= 4; dst_size -= 4) { - AV_WN32(dst, val); - dst += 4; - } - } else if (clear_size == 8) { - uint32_t val = AV_RN64(clear); - for (; dst_size >= 8; dst_size -= 8) { - AV_WN64(dst, val); - dst += 8; - } + } else { + if (clear_size > dst_size) + clear_size = dst_size; + memcpy(dst, clear, clear_size); + av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size); } - - for (; dst_size; dst_size--) - *dst++ = clear[pos++ % clear_size]; } // Maximum size in bytes of a plane element (usually a pixel, or multiple pixels