diff mbox

[FFmpeg-devel] avutil: add av_memcpy() to avoid undefined behavior with NULL, NULL, 0

Message ID 20190702205630.25749-1-michael@niedermayer.cc
State New
Headers show

Commit Message

Michael Niedermayer July 2, 2019, 8:56 p.m. UTC
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 doc/APIchanges      |  3 +++
 libavutil/mem.h     | 13 +++++++++++++
 libavutil/version.h |  2 +-
 3 files changed, 17 insertions(+), 1 deletion(-)

Comments

James Almer July 2, 2019, 11:42 p.m. UTC | #1
On 7/2/2019 5:56 PM, Michael Niedermayer wrote:
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  doc/APIchanges      |  3 +++
>  libavutil/mem.h     | 13 +++++++++++++
>  libavutil/version.h |  2 +-
>  3 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/APIchanges b/doc/APIchanges
> index b5fadc2a48..65b8ed74ad 100644
> --- a/doc/APIchanges
> +++ b/doc/APIchanges
> @@ -15,6 +15,9 @@ libavutil:     2017-10-21
>  
>  API changes, most recent first:
>  
> +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h
> +  Add av_memcpy()
> +
>  2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h
>    Add FF_DECODE_ERROR_DECODE_SLICES
>  
> diff --git a/libavutil/mem.h b/libavutil/mem.h
> index 5fb1a02dd9..a35230360d 100644
> --- a/libavutil/mem.h
> +++ b/libavutil/mem.h
> @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size);
>   */
>  void av_memcpy_backptr(uint8_t *dst, int back, int cnt);
>  
> +/**
> + * memcpy() implementation without a NULL pointer special case
> + *
> + * @param dst  Destination buffer
> + * @param src  Source buffer
> + * @param cnt  Number of bytes to copy; must be >= 0
> + */
> +static inline void av_memcpy(void *dst, const void *src, size_t cnt)

How many cases are there in the codebase where cnt can be 0, and dst or
src NULL, without it having been checked before calling memcpy? And from
those, which would not be from situations where the code should have
instead aborted and returned ENOMEM, or EINVAL if either of them are
function arguments?

> +{
> +    if (cnt)
> +        memcpy(dst, src, cnt);
> +}
> +
>  /**
>   * @}
>   */
> diff --git a/libavutil/version.h b/libavutil/version.h
> index e16b93e877..24ca8ab7db 100644
> --- a/libavutil/version.h
> +++ b/libavutil/version.h
> @@ -79,7 +79,7 @@
>   */
>  
>  #define LIBAVUTIL_VERSION_MAJOR  56
> -#define LIBAVUTIL_VERSION_MINOR  30
> +#define LIBAVUTIL_VERSION_MINOR  31
>  #define LIBAVUTIL_VERSION_MICRO 100
>  
>  #define LIBAVUTIL_VERSION_INT   AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \
>
Michael Niedermayer July 3, 2019, 6:29 a.m. UTC | #2
On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote:
> On 7/2/2019 5:56 PM, Michael Niedermayer wrote:
> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > ---
> >  doc/APIchanges      |  3 +++
> >  libavutil/mem.h     | 13 +++++++++++++
> >  libavutil/version.h |  2 +-
> >  3 files changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/doc/APIchanges b/doc/APIchanges
> > index b5fadc2a48..65b8ed74ad 100644
> > --- a/doc/APIchanges
> > +++ b/doc/APIchanges
> > @@ -15,6 +15,9 @@ libavutil:     2017-10-21
> >  
> >  API changes, most recent first:
> >  
> > +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h
> > +  Add av_memcpy()
> > +
> >  2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h
> >    Add FF_DECODE_ERROR_DECODE_SLICES
> >  
> > diff --git a/libavutil/mem.h b/libavutil/mem.h
> > index 5fb1a02dd9..a35230360d 100644
> > --- a/libavutil/mem.h
> > +++ b/libavutil/mem.h
> > @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size);
> >   */
> >  void av_memcpy_backptr(uint8_t *dst, int back, int cnt);
> >  
> > +/**
> > + * memcpy() implementation without a NULL pointer special case
> > + *
> > + * @param dst  Destination buffer
> > + * @param src  Source buffer
> > + * @param cnt  Number of bytes to copy; must be >= 0
> > + */
> > +static inline void av_memcpy(void *dst, const void *src, size_t cnt)
> 
> How many cases are there in the codebase where cnt can be 0, and dst or
> src NULL, without it having been checked before calling memcpy? And from
> those, which would not be from situations where the code should have
> instead aborted and returned ENOMEM, or EINVAL if either of them are
> function arguments?

There are around 2500 occurances of memcpy in the codebase
To awnser your question it would be needed to review all of them and in many
cases their surrounding code.
So that is unlikely to be awnsered by anyone accuratly

Also iam not sure i understand why you ask or why this would matter
the suggested function allows to simplify cases where the NULL can
occur, not where it cannot or should not. That is this is intended for
the cases where we already have or are adding explicit checks to
avoid the NULL case.

i could rename this to av_memcpy_nullsafe which would make it clearer but
also its more to write and read

thx

[...]
Reimar Döffinger July 3, 2019, 7:41 a.m. UTC | #3
On 03.07.2019, at 08:29, Michael Niedermayer <michael@niedermayer.cc> wrote:

> On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote:
>> On 7/2/2019 5:56 PM, Michael Niedermayer wrote:
>>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
>>> ---
>>> doc/APIchanges      |  3 +++
>>> libavutil/mem.h     | 13 +++++++++++++
>>> libavutil/version.h |  2 +-
>>> 3 files changed, 17 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/doc/APIchanges b/doc/APIchanges
>>> index b5fadc2a48..65b8ed74ad 100644
>>> --- a/doc/APIchanges
>>> +++ b/doc/APIchanges
>>> @@ -15,6 +15,9 @@ libavutil:     2017-10-21
>>> 
>>> API changes, most recent first:
>>> 
>>> +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h
>>> +  Add av_memcpy()
>>> +
>>> 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h
>>>   Add FF_DECODE_ERROR_DECODE_SLICES
>>> 
>>> diff --git a/libavutil/mem.h b/libavutil/mem.h
>>> index 5fb1a02dd9..a35230360d 100644
>>> --- a/libavutil/mem.h
>>> +++ b/libavutil/mem.h
>>> @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size);
>>>  */
>>> void av_memcpy_backptr(uint8_t *dst, int back, int cnt);
>>> 
>>> +/**
>>> + * memcpy() implementation without a NULL pointer special case
>>> + *
>>> + * @param dst  Destination buffer
>>> + * @param src  Source buffer
>>> + * @param cnt  Number of bytes to copy; must be >= 0
>>> + */
>>> +static inline void av_memcpy(void *dst, const void *src, size_t cnt)
>> 
>> How many cases are there in the codebase where cnt can be 0, and dst or
>> src NULL, without it having been checked before calling memcpy? And from
>> those, which would not be from situations where the code should have
>> instead aborted and returned ENOMEM, or EINVAL if either of them are
>> function arguments?
> 
> There are around 2500 occurances of memcpy in the codebase
> To awnser your question it would be needed to review all of them and in many
> cases their surrounding code.
> So that is unlikely to be awnsered by anyone accuratly
> 
> Also iam not sure i understand why you ask or why this would matter
> the suggested function allows to simplify cases where the NULL can
> occur, not where it cannot or should not. That is this is intended for
> the cases where we already have or are adding explicit checks to
> avoid the NULL case.
> 
> i could rename this to av_memcpy_nullsafe which would make it clearer but
> also its more to write and read

I admit I thought that a worthwhile idea originally,
but I have to think back to a long time ago that every function added to our "API" has a cost of people having to know about it and how to use it.
And if it's currently only 2 places that would benefit I think James is right to ask if it makes sense.
Of course another question might be if it might make sense to replace all memcpy uses with it.
I mean, isn't it naturally expected behaviour that the pointers would be ignored if the copy amount is 0? There might be a lot of code assuming that we do not know about...
Michael Niedermayer July 3, 2019, 8:46 a.m. UTC | #4
On Wed, Jul 03, 2019 at 09:41:41AM +0200, Reimar Döffinger wrote:
> 
> 
> On 03.07.2019, at 08:29, Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote:
> >> On 7/2/2019 5:56 PM, Michael Niedermayer wrote:
> >>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> >>> ---
> >>> doc/APIchanges      |  3 +++
> >>> libavutil/mem.h     | 13 +++++++++++++
> >>> libavutil/version.h |  2 +-
> >>> 3 files changed, 17 insertions(+), 1 deletion(-)
> >>> 
> >>> diff --git a/doc/APIchanges b/doc/APIchanges
> >>> index b5fadc2a48..65b8ed74ad 100644
> >>> --- a/doc/APIchanges
> >>> +++ b/doc/APIchanges
> >>> @@ -15,6 +15,9 @@ libavutil:     2017-10-21
> >>> 
> >>> API changes, most recent first:
> >>> 
> >>> +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h
> >>> +  Add av_memcpy()
> >>> +
> >>> 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h
> >>>   Add FF_DECODE_ERROR_DECODE_SLICES
> >>> 
> >>> diff --git a/libavutil/mem.h b/libavutil/mem.h
> >>> index 5fb1a02dd9..a35230360d 100644
> >>> --- a/libavutil/mem.h
> >>> +++ b/libavutil/mem.h
> >>> @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size);
> >>>  */
> >>> void av_memcpy_backptr(uint8_t *dst, int back, int cnt);
> >>> 
> >>> +/**
> >>> + * memcpy() implementation without a NULL pointer special case
> >>> + *
> >>> + * @param dst  Destination buffer
> >>> + * @param src  Source buffer
> >>> + * @param cnt  Number of bytes to copy; must be >= 0
> >>> + */
> >>> +static inline void av_memcpy(void *dst, const void *src, size_t cnt)
> >> 
> >> How many cases are there in the codebase where cnt can be 0, and dst or
> >> src NULL, without it having been checked before calling memcpy? And from
> >> those, which would not be from situations where the code should have
> >> instead aborted and returned ENOMEM, or EINVAL if either of them are
> >> function arguments?
> > 
> > There are around 2500 occurances of memcpy in the codebase
> > To awnser your question it would be needed to review all of them and in many
> > cases their surrounding code.
> > So that is unlikely to be awnsered by anyone accuratly
> > 
> > Also iam not sure i understand why you ask or why this would matter
> > the suggested function allows to simplify cases where the NULL can
> > occur, not where it cannot or should not. That is this is intended for
> > the cases where we already have or are adding explicit checks to
> > avoid the NULL case.
> > 
> > i could rename this to av_memcpy_nullsafe which would make it clearer but
> > also its more to write and read
> 
> I admit I thought that a worthwhile idea originally,
> but I have to think back to a long time ago that every function added to our "API" has a cost of people having to know about it and how to use it.
> And if it's currently only 2 places that would benefit I think James is right to ask if it makes sense.
> Of course another question might be if it might make sense to replace all memcpy uses with it.
> I mean, isn't it naturally expected behaviour that the pointers would be ignored if the copy amount is 0? There might be a lot of code assuming that we do not know about...

in addition to the 2 there are these related commits found by very dumb git log greps
In further addition there would be cases that deal with src == dst, something we
could add a check for in av_memcpy() too

commit c6aaf0840cf9b2b8cb139ed7110d3d47c2bf3d12
Author: Carl Eugen Hoyos <cehoyos@ag.or.at>
Date:   Tue Apr 18 10:56:31 2017 +0200

    lavf/mov: Only copy extradata if it exists.
    
    Avoids undefined call of memcpy(ptr, NULL, 0);

commit fde9013ab42411ee2015811c28e8921828a81702
Author: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Date:   Thu Jul 6 13:23:06 2017 -0400

    mpegtsenc: Don't pass NULL to memcpy
    
    Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>

commit 7bab631f7df55b361368296f125b95a1814bc18c
Author: Michael Niedermayer <michaelni@gmx.at>
Date:   Wed Mar 6 01:37:49 2013 +0100

    mss2dsp/upsample_plane: fix 0x0 handling
    
    Fixes invalid memcpy and out of array accesses
    
    Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
    Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

commit c54eef46f990722ed65fd1ad1da3d0fc50806eb5
Author: Carl Eugen Hoyos <cehoyos@ag.or.at>
Date:   Thu Sep 22 01:03:55 2016 +0200

    lavc/avpacket: Fix undefined behaviour, do not pass a null pointer to memcpy().
    
    Fixes ticket #5857.

commit f077ad69c682c13ab75a72aec11a61cac53f0c91
Author: Carl Eugen Hoyos <cehoyos@ag.or.at>
Date:   Sun Sep 4 21:11:02 2016 +0200

    lavc/avpacket: Fix undefined behaviour, do not pass a null pointer to memcpy().
    
    Fixes ticket #5128.
Hendrik Leppkes July 3, 2019, 10:50 a.m. UTC | #5
On Wed, Jul 3, 2019 at 10:46 AM Michael Niedermayer
<michael@niedermayer.cc> wrote:
>
> On Wed, Jul 03, 2019 at 09:41:41AM +0200, Reimar Döffinger wrote:
> >
> >
> > On 03.07.2019, at 08:29, Michael Niedermayer <michael@niedermayer.cc> wrote:
> >
> > > On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote:
> > >> On 7/2/2019 5:56 PM, Michael Niedermayer wrote:
> > >>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > >>> ---
> > >>> doc/APIchanges      |  3 +++
> > >>> libavutil/mem.h     | 13 +++++++++++++
> > >>> libavutil/version.h |  2 +-
> > >>> 3 files changed, 17 insertions(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/doc/APIchanges b/doc/APIchanges
> > >>> index b5fadc2a48..65b8ed74ad 100644
> > >>> --- a/doc/APIchanges
> > >>> +++ b/doc/APIchanges
> > >>> @@ -15,6 +15,9 @@ libavutil:     2017-10-21
> > >>>
> > >>> API changes, most recent first:
> > >>>
> > >>> +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h
> > >>> +  Add av_memcpy()
> > >>> +
> > >>> 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h
> > >>>   Add FF_DECODE_ERROR_DECODE_SLICES
> > >>>
> > >>> diff --git a/libavutil/mem.h b/libavutil/mem.h
> > >>> index 5fb1a02dd9..a35230360d 100644
> > >>> --- a/libavutil/mem.h
> > >>> +++ b/libavutil/mem.h
> > >>> @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size);
> > >>>  */
> > >>> void av_memcpy_backptr(uint8_t *dst, int back, int cnt);
> > >>>
> > >>> +/**
> > >>> + * memcpy() implementation without a NULL pointer special case
> > >>> + *
> > >>> + * @param dst  Destination buffer
> > >>> + * @param src  Source buffer
> > >>> + * @param cnt  Number of bytes to copy; must be >= 0
> > >>> + */
> > >>> +static inline void av_memcpy(void *dst, const void *src, size_t cnt)
> > >>
> > >> How many cases are there in the codebase where cnt can be 0, and dst or
> > >> src NULL, without it having been checked before calling memcpy? And from
> > >> those, which would not be from situations where the code should have
> > >> instead aborted and returned ENOMEM, or EINVAL if either of them are
> > >> function arguments?
> > >
> > > There are around 2500 occurances of memcpy in the codebase
> > > To awnser your question it would be needed to review all of them and in many
> > > cases their surrounding code.
> > > So that is unlikely to be awnsered by anyone accuratly
> > >
> > > Also iam not sure i understand why you ask or why this would matter
> > > the suggested function allows to simplify cases where the NULL can
> > > occur, not where it cannot or should not. That is this is intended for
> > > the cases where we already have or are adding explicit checks to
> > > avoid the NULL case.
> > >
> > > i could rename this to av_memcpy_nullsafe which would make it clearer but
> > > also its more to write and read
> >
> > I admit I thought that a worthwhile idea originally,
> > but I have to think back to a long time ago that every function added to our "API" has a cost of people having to know about it and how to use it.
> > And if it's currently only 2 places that would benefit I think James is right to ask if it makes sense.
> > Of course another question might be if it might make sense to replace all memcpy uses with it.
> > I mean, isn't it naturally expected behaviour that the pointers would be ignored if the copy amount is 0? There might be a lot of code assuming that we do not know about...
>
> in addition to the 2 there are these related commits found by very dumb git log greps
> In further addition there would be cases that deal with src == dst, something we
> could add a check for in av_memcpy() too
>

Personally I really don't like hiding too much magic in a function
like this. It can easily lead to brittle code, someone may think the
pointer is always valid since its memcpy'ed to, and uses it
afterwards, and there you have a disaster.
Either the function name should make perfectly clear what it does, or
preferably it should just not exist and code should just validate its
stuff.

- Hendrik
Derek Buitenhuis July 3, 2019, 11:40 a.m. UTC | #6
On 03/07/2019 08:41, Reimar Döffinger wrote:
> Of course another question might be if it might make sense to replace all memcpy uses with it.

I would expect this may break some compiler optimizations around
memcpy (like __builtin_memcpy, or direct inlining), no?

- Derek
Michael Niedermayer July 4, 2019, 6:04 p.m. UTC | #7
On Wed, Jul 03, 2019 at 12:50:59PM +0200, Hendrik Leppkes wrote:
> On Wed, Jul 3, 2019 at 10:46 AM Michael Niedermayer
> <michael@niedermayer.cc> wrote:
> >
> > On Wed, Jul 03, 2019 at 09:41:41AM +0200, Reimar Döffinger wrote:
> > >
> > >
> > > On 03.07.2019, at 08:29, Michael Niedermayer <michael@niedermayer.cc> wrote:
> > >
> > > > On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote:
> > > >> On 7/2/2019 5:56 PM, Michael Niedermayer wrote:
> > > >>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > > >>> ---
> > > >>> doc/APIchanges      |  3 +++
> > > >>> libavutil/mem.h     | 13 +++++++++++++
> > > >>> libavutil/version.h |  2 +-
> > > >>> 3 files changed, 17 insertions(+), 1 deletion(-)
> > > >>>
> > > >>> diff --git a/doc/APIchanges b/doc/APIchanges
> > > >>> index b5fadc2a48..65b8ed74ad 100644
> > > >>> --- a/doc/APIchanges
> > > >>> +++ b/doc/APIchanges
> > > >>> @@ -15,6 +15,9 @@ libavutil:     2017-10-21
> > > >>>
> > > >>> API changes, most recent first:
> > > >>>
> > > >>> +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h
> > > >>> +  Add av_memcpy()
> > > >>> +
> > > >>> 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h
> > > >>>   Add FF_DECODE_ERROR_DECODE_SLICES
> > > >>>
> > > >>> diff --git a/libavutil/mem.h b/libavutil/mem.h
> > > >>> index 5fb1a02dd9..a35230360d 100644
> > > >>> --- a/libavutil/mem.h
> > > >>> +++ b/libavutil/mem.h
> > > >>> @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size);
> > > >>>  */
> > > >>> void av_memcpy_backptr(uint8_t *dst, int back, int cnt);
> > > >>>
> > > >>> +/**
> > > >>> + * memcpy() implementation without a NULL pointer special case
> > > >>> + *
> > > >>> + * @param dst  Destination buffer
> > > >>> + * @param src  Source buffer
> > > >>> + * @param cnt  Number of bytes to copy; must be >= 0
> > > >>> + */
> > > >>> +static inline void av_memcpy(void *dst, const void *src, size_t cnt)
> > > >>
> > > >> How many cases are there in the codebase where cnt can be 0, and dst or
> > > >> src NULL, without it having been checked before calling memcpy? And from
> > > >> those, which would not be from situations where the code should have
> > > >> instead aborted and returned ENOMEM, or EINVAL if either of them are
> > > >> function arguments?
> > > >
> > > > There are around 2500 occurances of memcpy in the codebase
> > > > To awnser your question it would be needed to review all of them and in many
> > > > cases their surrounding code.
> > > > So that is unlikely to be awnsered by anyone accuratly
> > > >
> > > > Also iam not sure i understand why you ask or why this would matter
> > > > the suggested function allows to simplify cases where the NULL can
> > > > occur, not where it cannot or should not. That is this is intended for
> > > > the cases where we already have or are adding explicit checks to
> > > > avoid the NULL case.
> > > >
> > > > i could rename this to av_memcpy_nullsafe which would make it clearer but
> > > > also its more to write and read
> > >
> > > I admit I thought that a worthwhile idea originally,
> > > but I have to think back to a long time ago that every function added to our "API" has a cost of people having to know about it and how to use it.
> > > And if it's currently only 2 places that would benefit I think James is right to ask if it makes sense.
> > > Of course another question might be if it might make sense to replace all memcpy uses with it.
> > > I mean, isn't it naturally expected behaviour that the pointers would be ignored if the copy amount is 0? There might be a lot of code assuming that we do not know about...
> >
> > in addition to the 2 there are these related commits found by very dumb git log greps
> > In further addition there would be cases that deal with src == dst, something we
> > could add a check for in av_memcpy() too
> >
> 
> Personally I really don't like hiding too much magic in a function
> like this. It can easily lead to brittle code, someone may think the
> pointer is always valid since its memcpy'ed to, and uses it
> afterwards, and there you have a disaster.

Why would someone think that a pointer to an array of length 0 is valid ?
malloc(0) for example does not gurantee that


> Either the function name should make perfectly clear what it does, or
> preferably it should just not exist and code should just validate its
> stuff.

what name would you suggest ?

but we can just drop this and add the checks to memcpy() where needed
i was just trying to make all the undefined behaviour fixes a bit less
ugly as multiple people complained ...

thanks

[...]
Hendrik Leppkes July 4, 2019, 6:56 p.m. UTC | #8
On Thu, Jul 4, 2019 at 8:45 PM Michael Niedermayer
<michael@niedermayer.cc> wrote:
>
> On Wed, Jul 03, 2019 at 12:50:59PM +0200, Hendrik Leppkes wrote:
> > On Wed, Jul 3, 2019 at 10:46 AM Michael Niedermayer
> > <michael@niedermayer.cc> wrote:
> > >
> > > On Wed, Jul 03, 2019 at 09:41:41AM +0200, Reimar Döffinger wrote:
> > > >
> > > >
> > > > On 03.07.2019, at 08:29, Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > >
> > > > > On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote:
> > > > >> On 7/2/2019 5:56 PM, Michael Niedermayer wrote:
> > > > >>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > > > >>> ---
> > > > >>> doc/APIchanges      |  3 +++
> > > > >>> libavutil/mem.h     | 13 +++++++++++++
> > > > >>> libavutil/version.h |  2 +-
> > > > >>> 3 files changed, 17 insertions(+), 1 deletion(-)
> > > > >>>
> > > > >>> diff --git a/doc/APIchanges b/doc/APIchanges
> > > > >>> index b5fadc2a48..65b8ed74ad 100644
> > > > >>> --- a/doc/APIchanges
> > > > >>> +++ b/doc/APIchanges
> > > > >>> @@ -15,6 +15,9 @@ libavutil:     2017-10-21
> > > > >>>
> > > > >>> API changes, most recent first:
> > > > >>>
> > > > >>> +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h
> > > > >>> +  Add av_memcpy()
> > > > >>> +
> > > > >>> 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h
> > > > >>>   Add FF_DECODE_ERROR_DECODE_SLICES
> > > > >>>
> > > > >>> diff --git a/libavutil/mem.h b/libavutil/mem.h
> > > > >>> index 5fb1a02dd9..a35230360d 100644
> > > > >>> --- a/libavutil/mem.h
> > > > >>> +++ b/libavutil/mem.h
> > > > >>> @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size);
> > > > >>>  */
> > > > >>> void av_memcpy_backptr(uint8_t *dst, int back, int cnt);
> > > > >>>
> > > > >>> +/**
> > > > >>> + * memcpy() implementation without a NULL pointer special case
> > > > >>> + *
> > > > >>> + * @param dst  Destination buffer
> > > > >>> + * @param src  Source buffer
> > > > >>> + * @param cnt  Number of bytes to copy; must be >= 0
> > > > >>> + */
> > > > >>> +static inline void av_memcpy(void *dst, const void *src, size_t cnt)
> > > > >>
> > > > >> How many cases are there in the codebase where cnt can be 0, and dst or
> > > > >> src NULL, without it having been checked before calling memcpy? And from
> > > > >> those, which would not be from situations where the code should have
> > > > >> instead aborted and returned ENOMEM, or EINVAL if either of them are
> > > > >> function arguments?
> > > > >
> > > > > There are around 2500 occurances of memcpy in the codebase
> > > > > To awnser your question it would be needed to review all of them and in many
> > > > > cases their surrounding code.
> > > > > So that is unlikely to be awnsered by anyone accuratly
> > > > >
> > > > > Also iam not sure i understand why you ask or why this would matter
> > > > > the suggested function allows to simplify cases where the NULL can
> > > > > occur, not where it cannot or should not. That is this is intended for
> > > > > the cases where we already have or are adding explicit checks to
> > > > > avoid the NULL case.
> > > > >
> > > > > i could rename this to av_memcpy_nullsafe which would make it clearer but
> > > > > also its more to write and read
> > > >
> > > > I admit I thought that a worthwhile idea originally,
> > > > but I have to think back to a long time ago that every function added to our "API" has a cost of people having to know about it and how to use it.
> > > > And if it's currently only 2 places that would benefit I think James is right to ask if it makes sense.
> > > > Of course another question might be if it might make sense to replace all memcpy uses with it.
> > > > I mean, isn't it naturally expected behaviour that the pointers would be ignored if the copy amount is 0? There might be a lot of code assuming that we do not know about...
> > >
> > > in addition to the 2 there are these related commits found by very dumb git log greps
> > > In further addition there would be cases that deal with src == dst, something we
> > > could add a check for in av_memcpy() too
> > >
> >
> > Personally I really don't like hiding too much magic in a function
> > like this. It can easily lead to brittle code, someone may think the
> > pointer is always valid since its memcpy'ed to, and uses it
> > afterwards, and there you have a disaster.
>
> Why would someone think that a pointer to an array of length 0 is valid ?
> malloc(0) for example does not gurantee that
>

Why would someone memcpy a array of length 0?
Because they didn't realize that it may be zero.

You seem to view this as intentional code with the full knowledge that
it may be zero, but I would rather argue that most people just didn't
consider this special case, instead of willingly running into
errorneous usage of memcpy.

- Hendrik
Tomas Härdin July 5, 2019, 3:01 p.m. UTC | #9
ons 2019-07-03 klockan 10:46 +0200 skrev Michael Niedermayer:
> On Wed, Jul 03, 2019 at 09:41:41AM +0200, Reimar Döffinger wrote:
> > 
> > 
> > On 03.07.2019, at 08:29, Michael Niedermayer <michael@niedermayer.cc> wrote:
> > 
> > > On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote:
> > > > 
> > > > How many cases are there in the codebase where cnt can be 0, and dst or
> > > > src NULL, without it having been checked before calling memcpy? And from
> > > > those, which would not be from situations where the code should have
> > > > instead aborted and returned ENOMEM, or EINVAL if either of them are
> > > > function arguments?
> > > 
> > > There are around 2500 occurances of memcpy in the codebase
> > > To awnser your question it would be needed to review all of them and in many
> > > cases their surrounding code.
> > > So that is unlikely to be awnsered by anyone accuratly
> > > 
> > > Also iam not sure i understand why you ask or why this would matter
> > > the suggested function allows to simplify cases where the NULL can
> > > occur, not where it cannot or should not. That is this is intended for
> > > the cases where we already have or are adding explicit checks to
> > > avoid the NULL case.
> > > 
> > > i could rename this to av_memcpy_nullsafe which would make it clearer but
> > > also its more to write and read
> > 
> > I admit I thought that a worthwhile idea originally,
> > but I have to think back to a long time ago that every function
> > added to our "API" has a cost of people having to know about it and
> > how to use it.
> > And if it's currently only 2 places that would benefit I think
> > James is right to ask if it makes sense.
> > Of course another question might be if it might make sense to
> > replace all memcpy uses with it.
> > I mean, isn't it naturally expected behaviour that the pointers
> > would be ignored if the copy amount is 0? There might be a lot of
> > code assuming that we do not know about...
> 
> in addition to the 2 there are these related commits found by very dumb git log greps
> In further addition there would be cases that deal with src == dst, something we
> could add a check for in av_memcpy() too

All of these proposed "solutions" are equally horrible

I'm going to raise the issue of formal verification again (Frama-C and
friends), because crap like this should be checked at compile time, not
with flakey runtime checks.

I gave adding ACSL markup to parts of lavu a stab a while back, and it
looked somewhat promising. Bit hacks present a bit of a problem
however..

/Tomas
Michael Niedermayer July 5, 2019, 10:08 p.m. UTC | #10
On Fri, Jul 05, 2019 at 05:01:33PM +0200, Tomas Härdin wrote:
> ons 2019-07-03 klockan 10:46 +0200 skrev Michael Niedermayer:
> > On Wed, Jul 03, 2019 at 09:41:41AM +0200, Reimar Döffinger wrote:
> > > 
> > > 
> > > On 03.07.2019, at 08:29, Michael Niedermayer <michael@niedermayer.cc> wrote:
> > > 
> > > > On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote:
> > > > > 
> > > > > How many cases are there in the codebase where cnt can be 0, and dst or
> > > > > src NULL, without it having been checked before calling memcpy? And from
> > > > > those, which would not be from situations where the code should have
> > > > > instead aborted and returned ENOMEM, or EINVAL if either of them are
> > > > > function arguments?
> > > > 
> > > > There are around 2500 occurances of memcpy in the codebase
> > > > To awnser your question it would be needed to review all of them and in many
> > > > cases their surrounding code.
> > > > So that is unlikely to be awnsered by anyone accuratly
> > > > 
> > > > Also iam not sure i understand why you ask or why this would matter
> > > > the suggested function allows to simplify cases where the NULL can
> > > > occur, not where it cannot or should not. That is this is intended for
> > > > the cases where we already have or are adding explicit checks to
> > > > avoid the NULL case.
> > > > 
> > > > i could rename this to av_memcpy_nullsafe which would make it clearer but
> > > > also its more to write and read
> > > 
> > > I admit I thought that a worthwhile idea originally,
> > > but I have to think back to a long time ago that every function
> > > added to our "API" has a cost of people having to know about it and
> > > how to use it.
> > > And if it's currently only 2 places that would benefit I think
> > > James is right to ask if it makes sense.
> > > Of course another question might be if it might make sense to
> > > replace all memcpy uses with it.
> > > I mean, isn't it naturally expected behaviour that the pointers
> > > would be ignored if the copy amount is 0? There might be a lot of
> > > code assuming that we do not know about...
> > 
> > in addition to the 2 there are these related commits found by very dumb git log greps
> > In further addition there would be cases that deal with src == dst, something we
> > could add a check for in av_memcpy() too
> 
> All of these proposed "solutions" are equally horrible

> 
> I'm going to raise the issue of formal verification again (Frama-C and
> friends), because crap like this should be checked at compile time, not
> with flakey runtime checks.
> 
> I gave adding ACSL markup to parts of lavu a stab a while back, and it
> looked somewhat promising. Bit hacks present a bit of a problem
> however..

As we are already off topic, heres an example to test static
analysis, does this trigger undefined behavior by executing the memcpy
for some user input ?

void f(unsigned bigint a) {
    bigint i;
    for (i = 2; (((bigint)1 << a) + 1) % i; i++)
        ;
    if (a > 20 && i > ((bigint)1 << a))
        memcpy(NULL, NULL, 0);
}

i know its a lame example but just to remind that static analysis has
limitations. (your mail sounds a bit like static analysis could replace
everything ...)

Thanks

[...]
Tomas Härdin July 6, 2019, 12:34 p.m. UTC | #11
lör 2019-07-06 klockan 00:08 +0200 skrev Michael Niedermayer:
> As we are already off topic, heres an example to test static
> analysis, does this trigger undefined behavior by executing the memcpy
> for some user input ?
> 
> void f(unsigned bigint a) {
>     bigint i;
>     for (i = 2; (((bigint)1 << a) + 1) % i; i++)
>         ;
>     if (a > 20 && i > ((bigint)1 << a))
>         memcpy(NULL, NULL, 0);
> }
> 
> i know its a lame example but just to remind that static analysis has
> limitations. (your mail sounds a bit like static analysis could replace
> everything ...)

That is acually perfectly legal since the intersection between
[NULL,NULL) and [NULL,NULL) is empty and thus they do not overlap.
Here's an example that validates:

#include <stdint.h>
#include <string.h>

/*@ axiomatic Foo {
    axiom Bar: \forall integer a;
      0 <= a <= 62 ==>
        1 <= (1<<a) <= (1<<62);
    }
 */

/*@ requires 0 <= a <= 62;
    assigns ((char*)NULL)[0..-1];
 */
void f(uint64_t a) {
    int64_t i = 2;
    int64_t a2 = (1LL << a) + 1;
    /*@ loop invariant 2 <= i <= a2;
        loop assigns i;
     */
    for (; a2 % i; i++)
        ;
    //@ assert a2 % i == 0;
    if (a > 20 && i > ((int64_t)1 << a))
        memcpy(NULL, NULL, 0);
}

I got a bit lazy with the axiom. I think newer versions of Frama-C are
better able to reason about shifts. I'm on version Silicon-20161101. To
validate, put in a file called mini.c and run:

  frama-c -wp -wp-rte -wp-prover z3,cvc4,alt-ergo,qed mini.c

You'll need the frama-c package installed, and all the provers listed.

Curiously, replacing the assignment in the contract with assigns
\nothing doesn't work (should eq. assigning the empty set). Maybe a
newer version of Frama-C fixes this.

If you change the memcpy() to copy n=1 bytes then the verification
fails, as expected. It also fails if let src and dst equal to the same
valid memory area, because of overlap.

The point of all this is that it's possible to specify and verify
whether a function has side effects, that it doesn't write out of
bounds, that it terminates etc. This removes the need for unit tests
(if FFmpeg were to have any)


/Tomas
Michael Niedermayer July 6, 2019, 4:34 p.m. UTC | #12
On Sat, Jul 06, 2019 at 02:34:34PM +0200, Tomas Härdin wrote:
> lör 2019-07-06 klockan 00:08 +0200 skrev Michael Niedermayer:
> > As we are already off topic, heres an example to test static
> > analysis, does this trigger undefined behavior by executing the memcpy
> > for some user input ?
> > 
> > void f(unsigned bigint a) {
> >     bigint i;
> >     for (i = 2; (((bigint)1 << a) + 1) % i; i++)
> >         ;
> >     if (a > 20 && i > ((bigint)1 << a))
> >         memcpy(NULL, NULL, 0);
> > }
> > 
> > i know its a lame example but just to remind that static analysis has
> > limitations. (your mail sounds a bit like static analysis could replace
> > everything ...)
> 
> That is acually perfectly legal since the intersection between
> [NULL,NULL) and [NULL,NULL) is empty and thus they do not overlap.
> Here's an example that validates:
> 
> #include <stdint.h>
> #include <string.h>
> 
> /*@ axiomatic Foo {
>     axiom Bar: \forall integer a;
>       0 <= a <= 62 ==>
>         1 <= (1<<a) <= (1<<62);
>     }
>  */
> 
> /*@ requires 0 <= a <= 62;
>     assigns ((char*)NULL)[0..-1];
>  */
> void f(uint64_t a) {
>     int64_t i = 2;
>     int64_t a2 = (1LL << a) + 1;
>     /*@ loop invariant 2 <= i <= a2;
>         loop assigns i;
>      */
>     for (; a2 % i; i++)
>         ;
>     //@ assert a2 % i == 0;
>     if (a > 20 && i > ((int64_t)1 << a))
>         memcpy(NULL, NULL, 0);
> }

this code is wrong.
Imagine this was real code, and to make it fit in the static
analyzer one changes it like this

why is it worng ?
the range should not have a upper bound of 62 in fact there is no
reason to run this function with input below 1LL<<33 that is not
33 that is 1LL<<33

also you can restrict a to powers of 2 if you like
thats
for (b = 33; ; b++)
    f((bigint)1<<b);
    
can the analyzer help check this ?
I am pretty sure its not UB below, and i suspect its not UB for most values 
above but iam really curious and like to know for sure.


[...]

> If you change the memcpy() to copy n=1 bytes then the verification
> fails, as expected. It also fails if let src and dst equal to the same
> valid memory area, because of overlap.

are you saying that code which never executes causes failure of the
analyzer because if it did execute it would be wrong ?

[...]

Thanks
Tomas Härdin July 6, 2019, 9:22 p.m. UTC | #13
lör 2019-07-06 klockan 18:34 +0200 skrev Michael Niedermayer:
> On Sat, Jul 06, 2019 at 02:34:34PM +0200, Tomas Härdin wrote:
> > lör 2019-07-06 klockan 00:08 +0200 skrev Michael Niedermayer:
> > > As we are already off topic, heres an example to test static
> > > analysis, does this trigger undefined behavior by executing the
> > > memcpy
> > > for some user input ?
> > > 
> > > void f(unsigned bigint a) {
> > >     bigint i;
> > >     for (i = 2; (((bigint)1 << a) + 1) % i; i++)
> > >         ;
> > >     if (a > 20 && i > ((bigint)1 << a))
> > >         memcpy(NULL, NULL, 0);
> > > }
> > > 
> > > i know its a lame example but just to remind that static analysis
> > > has
> > > limitations. (your mail sounds a bit like static analysis could
> > > replace
> > > everything ...)
> > 
> > That is acually perfectly legal since the intersection between
> > [NULL,NULL) and [NULL,NULL) is empty and thus they do not overlap.
> > Here's an example that validates:
> > 
> > #include <stdint.h>
> > #include <string.h>
> > 
> > /*@ axiomatic Foo {
> >     axiom Bar: \forall integer a;
> >       0 <= a <= 62 ==>
> >         1 <= (1<<a) <= (1<<62);
> >     }
> >  */
> > 
> > /*@ requires 0 <= a <= 62;
> >     assigns ((char*)NULL)[0..-1];
> >  */
> > void f(uint64_t a) {
> >     int64_t i = 2;
> >     int64_t a2 = (1LL << a) + 1;
> >     /*@ loop invariant 2 <= i <= a2;
> >         loop assigns i;
> >      */
> >     for (; a2 % i; i++)
> >         ;
> >     //@ assert a2 % i == 0;
> >     if (a > 20 && i > ((int64_t)1 << a))
> >         memcpy(NULL, NULL, 0);
> > }
> 
> this code is wrong.
> Imagine this was real code, and to make it fit in the static
> analyzer one changes it like this
> 
> why is it worng ?
> the range should not have a upper bound of 62 in fact there is no
> reason to run this function with input below 1LL<<33 that is not
> 33 that is 1LL<<33

All bignum implementations I've seen have upper bounds. Maybe you have
an infinite tape laying around somewhere?

It is possible to define custom data types and reason about them just
as well as integers. In fact, I've seen elliptic curve implementations
that do just that. It's just that it's a bit of work

> 
> also you can restrict a to powers of 2 if you like
> thats
> for (b = 33; ; b++)
>     f((bigint)1<<b);
>     
> can the analyzer help check this ?

If you have a bigint_shift() function with an appropriate contract,
sure. b will overflow at some point howeveer, which the prover will
catch by default if b is signed. It can also be told to catch unsigned
overflow.

> I am pretty sure its not UB below, and i suspect its not UB for most
> values 
> above but iam really curious and like to know for sure.
> 
> 
> [...]
> 
> > If you change the memcpy() to copy n=1 bytes then the verification
> > fails, as expected. It also fails if let src and dst equal to the
> > same
> > valid memory area, because of overlap.
> 
> are you saying that code which never executes causes failure of the
> analyzer because if it did execute it would be wrong ?

This is equivalent to asking the prover to show that there are no
Fermat primes above (1<<20), which is a tall order. If your code
depends on tricky-to-prove theorems then obviously the prover is not
going to do that job for you.

Whether dead code is removed is a good question nonetheless, let me
check:

  void f(uint64_t a) {
    [...]
    if (a > 20 && i > ((int64_t)1 << a)) {
        char aa[1];
        memcpy(aa, aa, 1);
    }
  }

fails

  void f(uint64_t a) {
    [...]
    if (a > 62 && i > ((int64_t)1 << a)) {
        char aa[1];
        memcpy(aa, aa, 1);
    }
  }

succeeds (because 0 <= a <= 62)

/Tomas
Michael Niedermayer July 7, 2019, 4:19 p.m. UTC | #14
On Sat, Jul 06, 2019 at 11:22:59PM +0200, Tomas Härdin wrote:
> lör 2019-07-06 klockan 18:34 +0200 skrev Michael Niedermayer:
> > On Sat, Jul 06, 2019 at 02:34:34PM +0200, Tomas Härdin wrote:
> > > lör 2019-07-06 klockan 00:08 +0200 skrev Michael Niedermayer:
> > > > As we are already off topic, heres an example to test static
> > > > analysis, does this trigger undefined behavior by executing the
> > > > memcpy
> > > > for some user input ?
> > > > 
> > > > void f(unsigned bigint a) {
> > > >     bigint i;
> > > >     for (i = 2; (((bigint)1 << a) + 1) % i; i++)
> > > >         ;
> > > >     if (a > 20 && i > ((bigint)1 << a))
> > > >         memcpy(NULL, NULL, 0);
> > > > }
> > > > 
> > > > i know its a lame example but just to remind that static analysis
> > > > has
> > > > limitations. (your mail sounds a bit like static analysis could
> > > > replace
> > > > everything ...)
> > > 
> > > That is acually perfectly legal since the intersection between
> > > [NULL,NULL) and [NULL,NULL) is empty and thus they do not overlap.
> > > Here's an example that validates:
> > > 
> > > #include <stdint.h>
> > > #include <string.h>
> > > 
> > > /*@ axiomatic Foo {
> > >     axiom Bar: \forall integer a;
> > >       0 <= a <= 62 ==>
> > >         1 <= (1<<a) <= (1<<62);
> > >     }
> > >  */
> > > 
> > > /*@ requires 0 <= a <= 62;
> > >     assigns ((char*)NULL)[0..-1];
> > >  */
> > > void f(uint64_t a) {
> > >     int64_t i = 2;
> > >     int64_t a2 = (1LL << a) + 1;
> > >     /*@ loop invariant 2 <= i <= a2;
> > >         loop assigns i;
> > >      */
> > >     for (; a2 % i; i++)
> > >         ;
> > >     //@ assert a2 % i == 0;
> > >     if (a > 20 && i > ((int64_t)1 << a))
> > >         memcpy(NULL, NULL, 0);
> > > }
> > 
> > this code is wrong.
> > Imagine this was real code, and to make it fit in the static
> > analyzer one changes it like this
> > 
> > why is it worng ?
> > the range should not have a upper bound of 62 in fact there is no
> > reason to run this function with input below 1LL<<33 that is not
> > 33 that is 1LL<<33
> 
> All bignum implementations I've seen have upper bounds. 

How can this help ?

assume we proof avutil is free of UB, then we update some external lib
or change to a platform with a larger INT_MAX and boom, the proof is
no longer valid 
(this could happen if such implementation limits are used in a proof)

Thanks

[...]
Tomas Härdin July 7, 2019, 5:13 p.m. UTC | #15
sön 2019-07-07 klockan 18:19 +0200 skrev Michael Niedermayer:
> On Sat, Jul 06, 2019 at 11:22:59PM +0200, Tomas Härdin wrote:
> > lör 2019-07-06 klockan 18:34 +0200 skrev Michael Niedermayer:
> > > On Sat, Jul 06, 2019 at 02:34:34PM +0200, Tomas Härdin wrote:
> > > > lör 2019-07-06 klockan 00:08 +0200 skrev Michael Niedermayer:
> > > > > As we are already off topic, heres an example to test static
> > > > > analysis, does this trigger undefined behavior by executing the
> > > > > memcpy
> > > > > for some user input ?
> > > > > 
> > > > > void f(unsigned bigint a) {
> > > > >     bigint i;
> > > > >     for (i = 2; (((bigint)1 << a) + 1) % i; i++)
> > > > >         ;
> > > > >     if (a > 20 && i > ((bigint)1 << a))
> > > > >         memcpy(NULL, NULL, 0);
> > > > > }
> > > > > 
> > > > > i know its a lame example but just to remind that static analysis
> > > > > has
> > > > > limitations. (your mail sounds a bit like static analysis could
> > > > > replace
> > > > > everything ...)
> > > > 
> > > > That is acually perfectly legal since the intersection between
> > > > [NULL,NULL) and [NULL,NULL) is empty and thus they do not overlap.
> > > > Here's an example that validates:
> > > > 
> > > > #include <stdint.h>
> > > > #include <string.h>
> > > > 
> > > > /*@ axiomatic Foo {
> > > >     axiom Bar: \forall integer a;
> > > >       0 <= a <= 62 ==>
> > > >         1 <= (1<<a) <= (1<<62);
> > > >     }
> > > >  */
> > > > 
> > > > /*@ requires 0 <= a <= 62;
> > > >     assigns ((char*)NULL)[0..-1];
> > > >  */
> > > > void f(uint64_t a) {
> > > >     int64_t i = 2;
> > > >     int64_t a2 = (1LL << a) + 1;
> > > >     /*@ loop invariant 2 <= i <= a2;
> > > >         loop assigns i;
> > > >      */
> > > >     for (; a2 % i; i++)
> > > >         ;
> > > >     //@ assert a2 % i == 0;
> > > >     if (a > 20 && i > ((int64_t)1 << a))
> > > >         memcpy(NULL, NULL, 0);
> > > > }
> > > 
> > > this code is wrong.
> > > Imagine this was real code, and to make it fit in the static
> > > analyzer one changes it like this
> > > 
> > > why is it worng ?
> > > the range should not have a upper bound of 62 in fact there is no
> > > reason to run this function with input below 1LL<<33 that is not
> > > 33 that is 1LL<<33
> > 
> > All bignum implementations I've seen have upper bounds. 
> 
> How can this help ?
> 
> assume we proof avutil is free of UB, then we update some external lib
> or change to a platform with a larger INT_MAX and boom, the proof is
> no longer valid 
> (this could happen if such implementation limits are used in a proof)

Yes, you have to target the prover to every architecture you intend to
support. I ran into this exact issue when verifying some embedded code
where int is 16-bit. Adding the option -machdep x86_16 was enough to
fix that. I've seen posts on the Frama-C ML about adding support for
other platforms, such as the 128-bit mode in RISC-V.

You also can't verify external libraries of course. But FFmpeg is
notoriously NIH:y, so that wouldn't be an issue for a very long  time.
But, you can also augment function prototypes with guesstimated
contracts. This is analogous to reading headers and forming your own
mental model about how a given library works.

/Tomas
diff mbox

Patch

diff --git a/doc/APIchanges b/doc/APIchanges
index b5fadc2a48..65b8ed74ad 100644
--- a/doc/APIchanges
+++ b/doc/APIchanges
@@ -15,6 +15,9 @@  libavutil:     2017-10-21
 
 API changes, most recent first:
 
+2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h
+  Add av_memcpy()
+
 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h
   Add FF_DECODE_ERROR_DECODE_SLICES
 
diff --git a/libavutil/mem.h b/libavutil/mem.h
index 5fb1a02dd9..a35230360d 100644
--- a/libavutil/mem.h
+++ b/libavutil/mem.h
@@ -506,6 +506,19 @@  void *av_memdup(const void *p, size_t size);
  */
 void av_memcpy_backptr(uint8_t *dst, int back, int cnt);
 
+/**
+ * memcpy() implementation without a NULL pointer special case
+ *
+ * @param dst  Destination buffer
+ * @param src  Source buffer
+ * @param cnt  Number of bytes to copy; must be >= 0
+ */
+static inline void av_memcpy(void *dst, const void *src, size_t cnt)
+{
+    if (cnt)
+        memcpy(dst, src, cnt);
+}
+
 /**
  * @}
  */
diff --git a/libavutil/version.h b/libavutil/version.h
index e16b93e877..24ca8ab7db 100644
--- a/libavutil/version.h
+++ b/libavutil/version.h
@@ -79,7 +79,7 @@ 
  */
 
 #define LIBAVUTIL_VERSION_MAJOR  56
-#define LIBAVUTIL_VERSION_MINOR  30
+#define LIBAVUTIL_VERSION_MINOR  31
 #define LIBAVUTIL_VERSION_MICRO 100
 
 #define LIBAVUTIL_VERSION_INT   AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \