[FFmpeg-devel,1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()

Submitted by Michael Niedermayer on Dec. 25, 2018, 10:15 p.m.

Details

Message ID 20181225221522.18064-1-michael@niedermayer.cc
State Superseded
Headers show

Commit Message

Michael Niedermayer Dec. 25, 2018, 10:15 p.m.
Fixes: Timeout
Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
Before: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 11294 ms
After : Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 4249 ms

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavutil/imgutils.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Kieran Kunhya Dec. 26, 2018, 12:30 a.m.
On Tue, 25 Dec 2018 at 22:17 Michael Niedermayer <michael@niedermayer.cc>
wrote:

> Fixes: Timeout
> Fixes:
> 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> Before: Executed
> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> in 11294 ms
> After : Executed
> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> in 4249 ms
>

So basically you go from one arbitrary timeout to another. I assume your
fuzzer timeout is 5 or 10 seconds and so you manage to make it pass?

Kieran
James Almer Dec. 26, 2018, 1:12 a.m.
On 12/25/2018 7:15 PM, Michael Niedermayer wrote:
> Fixes: Timeout
> Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> Before: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 11294 ms
> After : Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 4249 ms
> 
> Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  libavutil/imgutils.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
> index 4938a7ef67..cc38f1e878 100644
> --- a/libavutil/imgutils.c
> +++ b/libavutil/imgutils.c
> @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, uint8_t *clear,
>          }
>      } else if (clear_size == 4) {
>          uint32_t val = AV_RN32(clear);
> +        uint64_t val8 = val * 0x100000001ULL;
> +        for (; dst_size >= 32; dst_size -= 32) {
> +            AV_WN64(dst   , val8); AV_WN64(dst+ 8, val8);
> +            AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
> +            dst += 32;
> +        }

This should be wrapped with a HAVE_FAST_64BIT preprocessor check.

Also, is it much slower if you also write one per loop like everywhere
else in the function? I'd prefer if things are consistent.
Similarly, you could add four and eight bytes loops to the clear_size ==
2 case above.

>          for (; dst_size >= 4; dst_size -= 4) {
>              AV_WN32(dst, val);
>              dst += 4;
>
Paul B Mahol Dec. 26, 2018, 3:32 p.m.
On 12/25/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
> Fixes: Timeout
> Fixes:
> 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> Before: Executed
> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> in 11294 ms
> After : Executed
> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> in 4249 ms
>
> Found-by: continuous fuzzing process
> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  libavutil/imgutils.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
> index 4938a7ef67..cc38f1e878 100644
> --- a/libavutil/imgutils.c
> +++ b/libavutil/imgutils.c
> @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size,
> uint8_t *clear,
>          }
>      } else if (clear_size == 4) {
>          uint32_t val = AV_RN32(clear);
> +        uint64_t val8 = val * 0x100000001ULL;
> +        for (; dst_size >= 32; dst_size -= 32) {
> +            AV_WN64(dst   , val8); AV_WN64(dst+ 8, val8);
> +            AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
> +            dst += 32;
> +        }
>          for (; dst_size >= 4; dst_size -= 4) {
>              AV_WN32(dst, val);
>              dst += 4;
> --
> 2.20.1
>

NAK, implement special memset function instead.
Michael Niedermayer Dec. 26, 2018, 7:37 p.m.
On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote:
> On 12/25/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
> > Fixes: Timeout
> > Fixes:
> > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> > Before: Executed
> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> > in 11294 ms
> > After : Executed
> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> > in 4249 ms
> >
> > Found-by: continuous fuzzing process
> > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > ---
> >  libavutil/imgutils.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
> > index 4938a7ef67..cc38f1e878 100644
> > --- a/libavutil/imgutils.c
> > +++ b/libavutil/imgutils.c
> > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size,
> > uint8_t *clear,
> >          }
> >      } else if (clear_size == 4) {
> >          uint32_t val = AV_RN32(clear);
> > +        uint64_t val8 = val * 0x100000001ULL;
> > +        for (; dst_size >= 32; dst_size -= 32) {
> > +            AV_WN64(dst   , val8); AV_WN64(dst+ 8, val8);
> > +            AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
> > +            dst += 32;
> > +        }
> >          for (; dst_size >= 4; dst_size -= 4) {
> >              AV_WN32(dst, val);
> >              dst += 4;
> > --
> > 2.20.1
> >
> 
> NAK, implement special memset function instead.

I can move the added loop into a seperate function, if thats what you
suggest ?
All the code is already in a "special" memset though, this is
memset_bytes()

thx

[...]
Michael Niedermayer Dec. 26, 2018, 7:45 p.m.
On Tue, Dec 25, 2018 at 10:12:13PM -0300, James Almer wrote:
> On 12/25/2018 7:15 PM, Michael Niedermayer wrote:
> > Fixes: Timeout
> > Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> > Before: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 11294 ms
> > After : Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 4249 ms
> > 
> > Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > ---
> >  libavutil/imgutils.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
> > index 4938a7ef67..cc38f1e878 100644
> > --- a/libavutil/imgutils.c
> > +++ b/libavutil/imgutils.c
> > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, uint8_t *clear,
> >          }
> >      } else if (clear_size == 4) {
> >          uint32_t val = AV_RN32(clear);
> > +        uint64_t val8 = val * 0x100000001ULL;
> > +        for (; dst_size >= 32; dst_size -= 32) {
> > +            AV_WN64(dst   , val8); AV_WN64(dst+ 8, val8);
> > +            AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
> > +            dst += 32;
> > +        }
> 
> This should be wrapped with a HAVE_FAST_64BIT preprocessor check.

will do so


> 
> Also, is it much slower if you also write one per loop like everywhere
> else in the function? I'd prefer if things are consistent.

as in the patch:
 3955 ms  3954 ms  3954 ms
 
with one write per iteration:
 5705 ms  5635 ms  5629 ms

 
> Similarly, you could add four and eight bytes loops to the clear_size ==
> 2 case above.

yes i can if you want me to?, but i have no testcase for that so it would be untested

thx


[...]
Paul B Mahol Dec. 26, 2018, 9:02 p.m.
On 12/26/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
> On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote:
>> On 12/25/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
>> > Fixes: Timeout
>> > Fixes:
>> > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>> > Before: Executed
>> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>> > in 11294 ms
>> > After : Executed
>> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>> > in 4249 ms
>> >
>> > Found-by: continuous fuzzing process
>> > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
>> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
>> > ---
>> >  libavutil/imgutils.c | 6 ++++++
>> >  1 file changed, 6 insertions(+)
>> >
>> > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
>> > index 4938a7ef67..cc38f1e878 100644
>> > --- a/libavutil/imgutils.c
>> > +++ b/libavutil/imgutils.c
>> > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t
>> > dst_size,
>> > uint8_t *clear,
>> >          }
>> >      } else if (clear_size == 4) {
>> >          uint32_t val = AV_RN32(clear);
>> > +        uint64_t val8 = val * 0x100000001ULL;
>> > +        for (; dst_size >= 32; dst_size -= 32) {
>> > +            AV_WN64(dst   , val8); AV_WN64(dst+ 8, val8);
>> > +            AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
>> > +            dst += 32;
>> > +        }
>> >          for (; dst_size >= 4; dst_size -= 4) {
>> >              AV_WN32(dst, val);
>> >              dst += 4;
>> > --
>> > 2.20.1
>> >
>>
>> NAK, implement special memset function instead.
>
> I can move the added loop into a seperate function, if thats what you
> suggest ?

No, don't do that.

> All the code is already in a "special" memset though, this is
> memset_bytes()
>

I guess function is less useful if its static. So any duplicate should
be avoided in codebase.
Michael Niedermayer Dec. 26, 2018, 9:15 p.m.
On Wed, Dec 26, 2018 at 10:02:15PM +0100, Paul B Mahol wrote:
> On 12/26/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
> > On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote:
> >> On 12/25/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
> >> > Fixes: Timeout
> >> > Fixes:
> >> > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> >> > Before: Executed
> >> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> >> > in 11294 ms
> >> > After : Executed
> >> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> >> > in 4249 ms
> >> >
> >> > Found-by: continuous fuzzing process
> >> > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> >> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> >> > ---
> >> >  libavutil/imgutils.c | 6 ++++++
> >> >  1 file changed, 6 insertions(+)
> >> >
> >> > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
> >> > index 4938a7ef67..cc38f1e878 100644
> >> > --- a/libavutil/imgutils.c
> >> > +++ b/libavutil/imgutils.c
> >> > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t
> >> > dst_size,
> >> > uint8_t *clear,
> >> >          }
> >> >      } else if (clear_size == 4) {
> >> >          uint32_t val = AV_RN32(clear);
> >> > +        uint64_t val8 = val * 0x100000001ULL;
> >> > +        for (; dst_size >= 32; dst_size -= 32) {
> >> > +            AV_WN64(dst   , val8); AV_WN64(dst+ 8, val8);
> >> > +            AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
> >> > +            dst += 32;
> >> > +        }
> >> >          for (; dst_size >= 4; dst_size -= 4) {
> >> >              AV_WN32(dst, val);
> >> >              dst += 4;
> >> > --
> >> > 2.20.1
> >> >
> >>
> >> NAK, implement special memset function instead.
> >
> > I can move the added loop into a seperate function, if thats what you
> > suggest ?
> 
> No, don't do that.
> 
> > All the code is already in a "special" memset though, this is
> > memset_bytes()
> >
> 
> I guess function is less useful if its static. So any duplicate should
> be avoided in codebase.

i can make it non static and export it if you want ?

[...]
Marton Balint Dec. 26, 2018, 9:16 p.m.
On Wed, 26 Dec 2018, Paul B Mahol wrote:

> On 12/26/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
>> On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote:
>>> On 12/25/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
>>> > Fixes: Timeout
>>> > Fixes:
>>> > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>>> > Before: Executed
>>> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>>> > in 11294 ms
>>> > After : Executed
>>> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>>> > in 4249 ms
>>> >
>>> > Found-by: continuous fuzzing process
>>> > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
>>> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
>>> > ---
>>> >  libavutil/imgutils.c | 6 ++++++
>>> >  1 file changed, 6 insertions(+)
>>> >
>>> > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
>>> > index 4938a7ef67..cc38f1e878 100644
>>> > --- a/libavutil/imgutils.c
>>> > +++ b/libavutil/imgutils.c
>>> > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t
>>> > dst_size,
>>> > uint8_t *clear,
>>> >          }
>>> >      } else if (clear_size == 4) {
>>> >          uint32_t val = AV_RN32(clear);
>>> > +        uint64_t val8 = val * 0x100000001ULL;
>>> > +        for (; dst_size >= 32; dst_size -= 32) {
>>> > +            AV_WN64(dst   , val8); AV_WN64(dst+ 8, val8);
>>> > +            AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
>>> > +            dst += 32;
>>> > +        }
>>> >          for (; dst_size >= 4; dst_size -= 4) {
>>> >              AV_WN32(dst, val);
>>> >              dst += 4;
>>> > --
>>> > 2.20.1
>>> >
>>>
>>> NAK, implement special memset function instead.
>>
>> I can move the added loop into a seperate function, if thats what you
>> suggest ?
>
> No, don't do that.
>
>> All the code is already in a "special" memset though, this is
>> memset_bytes()
>>
>
> I guess function is less useful if its static. So any duplicate should
> be avoided in codebase.

Isn't av_memcpy_backptr does almost exactly what is needed here? That can 
also be optimized further if needed.

Thanks,
Marton
Paul B Mahol Dec. 26, 2018, 10:11 p.m.
On 12/26/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
> On Wed, Dec 26, 2018 at 10:02:15PM +0100, Paul B Mahol wrote:
>> On 12/26/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
>> > On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote:
>> >> On 12/25/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
>> >> > Fixes: Timeout
>> >> > Fixes:
>> >> > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>> >> > Before: Executed
>> >> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>> >> > in 11294 ms
>> >> > After : Executed
>> >> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>> >> > in 4249 ms
>> >> >
>> >> > Found-by: continuous fuzzing process
>> >> > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
>> >> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
>> >> > ---
>> >> >  libavutil/imgutils.c | 6 ++++++
>> >> >  1 file changed, 6 insertions(+)
>> >> >
>> >> > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
>> >> > index 4938a7ef67..cc38f1e878 100644
>> >> > --- a/libavutil/imgutils.c
>> >> > +++ b/libavutil/imgutils.c
>> >> > @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t
>> >> > dst_size,
>> >> > uint8_t *clear,
>> >> >          }
>> >> >      } else if (clear_size == 4) {
>> >> >          uint32_t val = AV_RN32(clear);
>> >> > +        uint64_t val8 = val * 0x100000001ULL;
>> >> > +        for (; dst_size >= 32; dst_size -= 32) {
>> >> > +            AV_WN64(dst   , val8); AV_WN64(dst+ 8, val8);
>> >> > +            AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
>> >> > +            dst += 32;
>> >> > +        }
>> >> >          for (; dst_size >= 4; dst_size -= 4) {
>> >> >              AV_WN32(dst, val);
>> >> >              dst += 4;
>> >> > --
>> >> > 2.20.1
>> >> >
>> >>
>> >> NAK, implement special memset function instead.
>> >
>> > I can move the added loop into a seperate function, if thats what you
>> > suggest ?
>>
>> No, don't do that.
>>
>> > All the code is already in a "special" memset though, this is
>> > memset_bytes()
>> >
>>
>> I guess function is less useful if its static. So any duplicate should
>> be avoided in codebase.
>
> i can make it non static and export it if you want ?
>

Yes. Also make it used where there is already similar code doing same.

> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> If you fake or manipulate statistics in a paper in physics you will never
> get a job again.
> If you fake or manipulate statistics in a paper in medicin you will get
> a job for life at the pharma industry.
>
Michael Niedermayer Dec. 27, 2018, 10:54 a.m.
On Wed, Dec 26, 2018 at 12:30:05AM +0000, Kieran Kunhya wrote:
> On Tue, 25 Dec 2018 at 22:17 Michael Niedermayer <michael@niedermayer.cc>
> wrote:
> 
> > Fixes: Timeout
> > Fixes:
> > 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> > Before: Executed
> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> > in 11294 ms
> > After : Executed
> > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> > in 4249 ms
> >
> 
> So basically you go from one arbitrary timeout to another. I assume your
> fuzzer timeout is 5 or 10 seconds and so you manage to make it pass?

I think the timeout used is 25sec. I assume the server that detected it
was maybe slower than the machine i used. Its even quite possible that
variation between runs, due to hardware, load or other issues to cause
timeouts to disappear.

But none of this really should matter in the context of this patchset.
Which simply makes the code faster.

Thanks

[...]
Michael Niedermayer Dec. 28, 2018, 4:51 p.m.
On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote:
> 
> 
> On Wed, 26 Dec 2018, Paul B Mahol wrote:
> 
> >On 12/26/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
> >>On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote:
> >>>On 12/25/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
> >>>> Fixes: Timeout
> >>>> Fixes:
> >>>> 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> >>>> Before: Executed
> >>>> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> >>>> in 11294 ms
> >>>> After : Executed
> >>>> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> >>>> in 4249 ms
> >>>>
> >>>> Found-by: continuous fuzzing process
> >>>> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> >>>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> >>>> ---
> >>>>  libavutil/imgutils.c | 6 ++++++
> >>>>  1 file changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
> >>>> index 4938a7ef67..cc38f1e878 100644
> >>>> --- a/libavutil/imgutils.c
> >>>> +++ b/libavutil/imgutils.c
> >>>> @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t
> >>>> dst_size,
> >>>> uint8_t *clear,
> >>>>          }
> >>>>      } else if (clear_size == 4) {
> >>>>          uint32_t val = AV_RN32(clear);
> >>>> +        uint64_t val8 = val * 0x100000001ULL;
> >>>> +        for (; dst_size >= 32; dst_size -= 32) {
> >>>> +            AV_WN64(dst   , val8); AV_WN64(dst+ 8, val8);
> >>>> +            AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
> >>>> +            dst += 32;
> >>>> +        }
> >>>>          for (; dst_size >= 4; dst_size -= 4) {
> >>>>              AV_WN32(dst, val);
> >>>>              dst += 4;
> >>>> --
> >>>> 2.20.1
> >>>>
> >>>
> >>>NAK, implement special memset function instead.
> >>
> >>I can move the added loop into a seperate function, if thats what you
> >>suggest ?
> >
> >No, don't do that.
> >
> >>All the code is already in a "special" memset though, this is
> >>memset_bytes()
> >>
> >
> >I guess function is less useful if its static. So any duplicate should
> >be avoided in codebase.
> 
> Isn't av_memcpy_backptr does almost exactly what is needed here? That can
> also be optimized further if needed.

av_memcpy_backptr() copies data with overlap, its more like a recursive
memmove(). 

thx
[...]
Marton Balint Dec. 30, 2018, 6:15 p.m.
On Fri, 28 Dec 2018, Michael Niedermayer wrote:

> On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote:
>>
>>
>> On Wed, 26 Dec 2018, Paul B Mahol wrote:
>>
>>> On 12/26/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
>>>> On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote:
>>>>> On 12/25/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
>>>>>> Fixes: Timeout
>>>>>> Fixes:
>>>>>> 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>>>>>> Before: Executed
>>>>>> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>>>>>> in 11294 ms
>>>>>> After : Executed
>>>>>> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>>>>>> in 4249 ms
>>>>>>
>>>>>> Found-by: continuous fuzzing process
>>>>>> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
>>>>>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
>>>>>> ---
>>>>>>  libavutil/imgutils.c | 6 ++++++
>>>>>>  1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
>>>>>> index 4938a7ef67..cc38f1e878 100644
>>>>>> --- a/libavutil/imgutils.c
>>>>>> +++ b/libavutil/imgutils.c
>>>>>> @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t
>>>>>> dst_size,
>>>>>> uint8_t *clear,
>>>>>>          }
>>>>>>      } else if (clear_size == 4) {
>>>>>>          uint32_t val = AV_RN32(clear);
>>>>>> +        uint64_t val8 = val * 0x100000001ULL;
>>>>>> +        for (; dst_size >= 32; dst_size -= 32) {
>>>>>> +            AV_WN64(dst   , val8); AV_WN64(dst+ 8, val8);
>>>>>> +            AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
>>>>>> +            dst += 32;
>>>>>> +        }
>>>>>>          for (; dst_size >= 4; dst_size -= 4) {
>>>>>>              AV_WN32(dst, val);
>>>>>>              dst += 4;
>>>>>> --
>>>>>> 2.20.1
>>>>>>
>>>>>
>>>>> NAK, implement special memset function instead.
>>>>
>>>> I can move the added loop into a seperate function, if thats what you
>>>> suggest ?
>>>
>>> No, don't do that.
>>>
>>>> All the code is already in a "special" memset though, this is
>>>> memset_bytes()
>>>>
>>>
>>> I guess function is less useful if its static. So any duplicate should
>>> be avoided in codebase.
>>
>> Isn't av_memcpy_backptr does almost exactly what is needed here? That can
>> also be optimized further if needed.
>
> av_memcpy_backptr() copies data with overlap, its more like a recursive
> memmove().

So? As far as I see the memset_bytes function in imgutils.c can be 
replaced with this:

     if (clear_size > dst_size)
         clear_size = dst_size;
     memcpy(dst, clear, clear_size);
     av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size);

I am not against an av_memset_bytes API addition, but I believe it should 
share code with av_memcpy_backptr to avoid duplication.

Regards,
Marton
Michael Niedermayer Dec. 30, 2018, 6:56 p.m.
On Sun, Dec 30, 2018 at 07:15:49PM +0100, Marton Balint wrote:
> 
> 
> On Fri, 28 Dec 2018, Michael Niedermayer wrote:
> 
> >On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote:
> >>
> >>
> >>On Wed, 26 Dec 2018, Paul B Mahol wrote:
> >>
> >>>On 12/26/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
> >>>>On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote:
> >>>>>On 12/25/18, Michael Niedermayer <michael@niedermayer.cc> wrote:
> >>>>>>Fixes: Timeout
> >>>>>>Fixes:
> >>>>>>11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> >>>>>>Before: Executed
> >>>>>>clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> >>>>>>in 11294 ms
> >>>>>>After : Executed
> >>>>>>clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> >>>>>>in 4249 ms
> >>>>>>
> >>>>>>Found-by: continuous fuzzing process
> >>>>>>https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> >>>>>>Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> >>>>>>---
> >>>>>> libavutil/imgutils.c | 6 ++++++
> >>>>>> 1 file changed, 6 insertions(+)
> >>>>>>
> >>>>>>diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
> >>>>>>index 4938a7ef67..cc38f1e878 100644
> >>>>>>--- a/libavutil/imgutils.c
> >>>>>>+++ b/libavutil/imgutils.c
> >>>>>>@@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t
> >>>>>>dst_size,
> >>>>>>uint8_t *clear,
> >>>>>>         }
> >>>>>>     } else if (clear_size == 4) {
> >>>>>>         uint32_t val = AV_RN32(clear);
> >>>>>>+        uint64_t val8 = val * 0x100000001ULL;
> >>>>>>+        for (; dst_size >= 32; dst_size -= 32) {
> >>>>>>+            AV_WN64(dst   , val8); AV_WN64(dst+ 8, val8);
> >>>>>>+            AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
> >>>>>>+            dst += 32;
> >>>>>>+        }
> >>>>>>         for (; dst_size >= 4; dst_size -= 4) {
> >>>>>>             AV_WN32(dst, val);
> >>>>>>             dst += 4;
> >>>>>>--
> >>>>>>2.20.1
> >>>>>>
> >>>>>
> >>>>>NAK, implement special memset function instead.
> >>>>
> >>>>I can move the added loop into a seperate function, if thats what you
> >>>>suggest ?
> >>>
> >>>No, don't do that.
> >>>
> >>>>All the code is already in a "special" memset though, this is
> >>>>memset_bytes()
> >>>>
> >>>
> >>>I guess function is less useful if its static. So any duplicate should
> >>>be avoided in codebase.
> >>
> >>Isn't av_memcpy_backptr does almost exactly what is needed here? That can
> >>also be optimized further if needed.
> >
> >av_memcpy_backptr() copies data with overlap, its more like a recursive
> >memmove().
> 
> So? As far as I see the memset_bytes function in imgutils.c can be replaced
> with this:
> 
>     if (clear_size > dst_size)
>         clear_size = dst_size;
>     memcpy(dst, clear, clear_size);
>     av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size);
> 
> I am not against an av_memset_bytes API addition, but I believe it should
> share code with av_memcpy_backptr to avoid duplication.

Iam a bit concerned that combining them could result in lower speed 
but i can surely combine them if people prefer?

thx

[...]

Patch hide | download patch | download mbox

diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
index 4938a7ef67..cc38f1e878 100644
--- a/libavutil/imgutils.c
+++ b/libavutil/imgutils.c
@@ -529,6 +529,12 @@  static void memset_bytes(uint8_t *dst, size_t dst_size, uint8_t *clear,
         }
     } else if (clear_size == 4) {
         uint32_t val = AV_RN32(clear);
+        uint64_t val8 = val * 0x100000001ULL;
+        for (; dst_size >= 32; dst_size -= 32) {
+            AV_WN64(dst   , val8); AV_WN64(dst+ 8, val8);
+            AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
+            dst += 32;
+        }
         for (; dst_size >= 4; dst_size -= 4) {
             AV_WN32(dst, val);
             dst += 4;