Message ID | 20190702205630.25749-1-michael@niedermayer.cc |
---|---|
State | New |
Headers | show |
On 7/2/2019 5:56 PM, Michael Niedermayer wrote: > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > --- > doc/APIchanges | 3 +++ > libavutil/mem.h | 13 +++++++++++++ > libavutil/version.h | 2 +- > 3 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/doc/APIchanges b/doc/APIchanges > index b5fadc2a48..65b8ed74ad 100644 > --- a/doc/APIchanges > +++ b/doc/APIchanges > @@ -15,6 +15,9 @@ libavutil: 2017-10-21 > > API changes, most recent first: > > +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h > + Add av_memcpy() > + > 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h > Add FF_DECODE_ERROR_DECODE_SLICES > > diff --git a/libavutil/mem.h b/libavutil/mem.h > index 5fb1a02dd9..a35230360d 100644 > --- a/libavutil/mem.h > +++ b/libavutil/mem.h > @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size); > */ > void av_memcpy_backptr(uint8_t *dst, int back, int cnt); > > +/** > + * memcpy() implementation without a NULL pointer special case > + * > + * @param dst Destination buffer > + * @param src Source buffer > + * @param cnt Number of bytes to copy; must be >= 0 > + */ > +static inline void av_memcpy(void *dst, const void *src, size_t cnt) How many cases are there in the codebase where cnt can be 0, and dst or src NULL, without it having been checked before calling memcpy? And from those, which would not be from situations where the code should have instead aborted and returned ENOMEM, or EINVAL if either of them are function arguments? > +{ > + if (cnt) > + memcpy(dst, src, cnt); > +} > + > /** > * @} > */ > diff --git a/libavutil/version.h b/libavutil/version.h > index e16b93e877..24ca8ab7db 100644 > --- a/libavutil/version.h > +++ b/libavutil/version.h > @@ -79,7 +79,7 @@ > */ > > #define LIBAVUTIL_VERSION_MAJOR 56 > -#define LIBAVUTIL_VERSION_MINOR 30 > +#define LIBAVUTIL_VERSION_MINOR 31 > #define LIBAVUTIL_VERSION_MICRO 100 > > #define LIBAVUTIL_VERSION_INT AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \ >
On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote: > On 7/2/2019 5:56 PM, Michael Niedermayer wrote: > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > > --- > > doc/APIchanges | 3 +++ > > libavutil/mem.h | 13 +++++++++++++ > > libavutil/version.h | 2 +- > > 3 files changed, 17 insertions(+), 1 deletion(-) > > > > diff --git a/doc/APIchanges b/doc/APIchanges > > index b5fadc2a48..65b8ed74ad 100644 > > --- a/doc/APIchanges > > +++ b/doc/APIchanges > > @@ -15,6 +15,9 @@ libavutil: 2017-10-21 > > > > API changes, most recent first: > > > > +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h > > + Add av_memcpy() > > + > > 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h > > Add FF_DECODE_ERROR_DECODE_SLICES > > > > diff --git a/libavutil/mem.h b/libavutil/mem.h > > index 5fb1a02dd9..a35230360d 100644 > > --- a/libavutil/mem.h > > +++ b/libavutil/mem.h > > @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size); > > */ > > void av_memcpy_backptr(uint8_t *dst, int back, int cnt); > > > > +/** > > + * memcpy() implementation without a NULL pointer special case > > + * > > + * @param dst Destination buffer > > + * @param src Source buffer > > + * @param cnt Number of bytes to copy; must be >= 0 > > + */ > > +static inline void av_memcpy(void *dst, const void *src, size_t cnt) > > How many cases are there in the codebase where cnt can be 0, and dst or > src NULL, without it having been checked before calling memcpy? And from > those, which would not be from situations where the code should have > instead aborted and returned ENOMEM, or EINVAL if either of them are > function arguments? There are around 2500 occurances of memcpy in the codebase To awnser your question it would be needed to review all of them and in many cases their surrounding code. So that is unlikely to be awnsered by anyone accuratly Also iam not sure i understand why you ask or why this would matter the suggested function allows to simplify cases where the NULL can occur, not where it cannot or should not. That is this is intended for the cases where we already have or are adding explicit checks to avoid the NULL case. i could rename this to av_memcpy_nullsafe which would make it clearer but also its more to write and read thx [...]
On 03.07.2019, at 08:29, Michael Niedermayer <michael@niedermayer.cc> wrote: > On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote: >> On 7/2/2019 5:56 PM, Michael Niedermayer wrote: >>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> >>> --- >>> doc/APIchanges | 3 +++ >>> libavutil/mem.h | 13 +++++++++++++ >>> libavutil/version.h | 2 +- >>> 3 files changed, 17 insertions(+), 1 deletion(-) >>> >>> diff --git a/doc/APIchanges b/doc/APIchanges >>> index b5fadc2a48..65b8ed74ad 100644 >>> --- a/doc/APIchanges >>> +++ b/doc/APIchanges >>> @@ -15,6 +15,9 @@ libavutil: 2017-10-21 >>> >>> API changes, most recent first: >>> >>> +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h >>> + Add av_memcpy() >>> + >>> 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h >>> Add FF_DECODE_ERROR_DECODE_SLICES >>> >>> diff --git a/libavutil/mem.h b/libavutil/mem.h >>> index 5fb1a02dd9..a35230360d 100644 >>> --- a/libavutil/mem.h >>> +++ b/libavutil/mem.h >>> @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size); >>> */ >>> void av_memcpy_backptr(uint8_t *dst, int back, int cnt); >>> >>> +/** >>> + * memcpy() implementation without a NULL pointer special case >>> + * >>> + * @param dst Destination buffer >>> + * @param src Source buffer >>> + * @param cnt Number of bytes to copy; must be >= 0 >>> + */ >>> +static inline void av_memcpy(void *dst, const void *src, size_t cnt) >> >> How many cases are there in the codebase where cnt can be 0, and dst or >> src NULL, without it having been checked before calling memcpy? And from >> those, which would not be from situations where the code should have >> instead aborted and returned ENOMEM, or EINVAL if either of them are >> function arguments? > > There are around 2500 occurances of memcpy in the codebase > To awnser your question it would be needed to review all of them and in many > cases their surrounding code. > So that is unlikely to be awnsered by anyone accuratly > > Also iam not sure i understand why you ask or why this would matter > the suggested function allows to simplify cases where the NULL can > occur, not where it cannot or should not. That is this is intended for > the cases where we already have or are adding explicit checks to > avoid the NULL case. > > i could rename this to av_memcpy_nullsafe which would make it clearer but > also its more to write and read I admit I thought that a worthwhile idea originally, but I have to think back to a long time ago that every function added to our "API" has a cost of people having to know about it and how to use it. And if it's currently only 2 places that would benefit I think James is right to ask if it makes sense. Of course another question might be if it might make sense to replace all memcpy uses with it. I mean, isn't it naturally expected behaviour that the pointers would be ignored if the copy amount is 0? There might be a lot of code assuming that we do not know about...
On Wed, Jul 03, 2019 at 09:41:41AM +0200, Reimar Döffinger wrote: > > > On 03.07.2019, at 08:29, Michael Niedermayer <michael@niedermayer.cc> wrote: > > > On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote: > >> On 7/2/2019 5:56 PM, Michael Niedermayer wrote: > >>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > >>> --- > >>> doc/APIchanges | 3 +++ > >>> libavutil/mem.h | 13 +++++++++++++ > >>> libavutil/version.h | 2 +- > >>> 3 files changed, 17 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/doc/APIchanges b/doc/APIchanges > >>> index b5fadc2a48..65b8ed74ad 100644 > >>> --- a/doc/APIchanges > >>> +++ b/doc/APIchanges > >>> @@ -15,6 +15,9 @@ libavutil: 2017-10-21 > >>> > >>> API changes, most recent first: > >>> > >>> +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h > >>> + Add av_memcpy() > >>> + > >>> 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h > >>> Add FF_DECODE_ERROR_DECODE_SLICES > >>> > >>> diff --git a/libavutil/mem.h b/libavutil/mem.h > >>> index 5fb1a02dd9..a35230360d 100644 > >>> --- a/libavutil/mem.h > >>> +++ b/libavutil/mem.h > >>> @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size); > >>> */ > >>> void av_memcpy_backptr(uint8_t *dst, int back, int cnt); > >>> > >>> +/** > >>> + * memcpy() implementation without a NULL pointer special case > >>> + * > >>> + * @param dst Destination buffer > >>> + * @param src Source buffer > >>> + * @param cnt Number of bytes to copy; must be >= 0 > >>> + */ > >>> +static inline void av_memcpy(void *dst, const void *src, size_t cnt) > >> > >> How many cases are there in the codebase where cnt can be 0, and dst or > >> src NULL, without it having been checked before calling memcpy? And from > >> those, which would not be from situations where the code should have > >> instead aborted and returned ENOMEM, or EINVAL if either of them are > >> function arguments? > > > > There are around 2500 occurances of memcpy in the codebase > > To awnser your question it would be needed to review all of them and in many > > cases their surrounding code. > > So that is unlikely to be awnsered by anyone accuratly > > > > Also iam not sure i understand why you ask or why this would matter > > the suggested function allows to simplify cases where the NULL can > > occur, not where it cannot or should not. That is this is intended for > > the cases where we already have or are adding explicit checks to > > avoid the NULL case. > > > > i could rename this to av_memcpy_nullsafe which would make it clearer but > > also its more to write and read > > I admit I thought that a worthwhile idea originally, > but I have to think back to a long time ago that every function added to our "API" has a cost of people having to know about it and how to use it. > And if it's currently only 2 places that would benefit I think James is right to ask if it makes sense. > Of course another question might be if it might make sense to replace all memcpy uses with it. > I mean, isn't it naturally expected behaviour that the pointers would be ignored if the copy amount is 0? There might be a lot of code assuming that we do not know about... in addition to the 2 there are these related commits found by very dumb git log greps In further addition there would be cases that deal with src == dst, something we could add a check for in av_memcpy() too commit c6aaf0840cf9b2b8cb139ed7110d3d47c2bf3d12 Author: Carl Eugen Hoyos <cehoyos@ag.or.at> Date: Tue Apr 18 10:56:31 2017 +0200 lavf/mov: Only copy extradata if it exists. Avoids undefined call of memcpy(ptr, NULL, 0); commit fde9013ab42411ee2015811c28e8921828a81702 Author: Derek Buitenhuis <derek.buitenhuis@gmail.com> Date: Thu Jul 6 13:23:06 2017 -0400 mpegtsenc: Don't pass NULL to memcpy Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> commit 7bab631f7df55b361368296f125b95a1814bc18c Author: Michael Niedermayer <michaelni@gmx.at> Date: Wed Mar 6 01:37:49 2013 +0100 mss2dsp/upsample_plane: fix 0x0 handling Fixes invalid memcpy and out of array accesses Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind Signed-off-by: Michael Niedermayer <michaelni@gmx.at> commit c54eef46f990722ed65fd1ad1da3d0fc50806eb5 Author: Carl Eugen Hoyos <cehoyos@ag.or.at> Date: Thu Sep 22 01:03:55 2016 +0200 lavc/avpacket: Fix undefined behaviour, do not pass a null pointer to memcpy(). Fixes ticket #5857. commit f077ad69c682c13ab75a72aec11a61cac53f0c91 Author: Carl Eugen Hoyos <cehoyos@ag.or.at> Date: Sun Sep 4 21:11:02 2016 +0200 lavc/avpacket: Fix undefined behaviour, do not pass a null pointer to memcpy(). Fixes ticket #5128.
On Wed, Jul 3, 2019 at 10:46 AM Michael Niedermayer <michael@niedermayer.cc> wrote: > > On Wed, Jul 03, 2019 at 09:41:41AM +0200, Reimar Döffinger wrote: > > > > > > On 03.07.2019, at 08:29, Michael Niedermayer <michael@niedermayer.cc> wrote: > > > > > On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote: > > >> On 7/2/2019 5:56 PM, Michael Niedermayer wrote: > > >>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > > >>> --- > > >>> doc/APIchanges | 3 +++ > > >>> libavutil/mem.h | 13 +++++++++++++ > > >>> libavutil/version.h | 2 +- > > >>> 3 files changed, 17 insertions(+), 1 deletion(-) > > >>> > > >>> diff --git a/doc/APIchanges b/doc/APIchanges > > >>> index b5fadc2a48..65b8ed74ad 100644 > > >>> --- a/doc/APIchanges > > >>> +++ b/doc/APIchanges > > >>> @@ -15,6 +15,9 @@ libavutil: 2017-10-21 > > >>> > > >>> API changes, most recent first: > > >>> > > >>> +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h > > >>> + Add av_memcpy() > > >>> + > > >>> 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h > > >>> Add FF_DECODE_ERROR_DECODE_SLICES > > >>> > > >>> diff --git a/libavutil/mem.h b/libavutil/mem.h > > >>> index 5fb1a02dd9..a35230360d 100644 > > >>> --- a/libavutil/mem.h > > >>> +++ b/libavutil/mem.h > > >>> @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size); > > >>> */ > > >>> void av_memcpy_backptr(uint8_t *dst, int back, int cnt); > > >>> > > >>> +/** > > >>> + * memcpy() implementation without a NULL pointer special case > > >>> + * > > >>> + * @param dst Destination buffer > > >>> + * @param src Source buffer > > >>> + * @param cnt Number of bytes to copy; must be >= 0 > > >>> + */ > > >>> +static inline void av_memcpy(void *dst, const void *src, size_t cnt) > > >> > > >> How many cases are there in the codebase where cnt can be 0, and dst or > > >> src NULL, without it having been checked before calling memcpy? And from > > >> those, which would not be from situations where the code should have > > >> instead aborted and returned ENOMEM, or EINVAL if either of them are > > >> function arguments? > > > > > > There are around 2500 occurances of memcpy in the codebase > > > To awnser your question it would be needed to review all of them and in many > > > cases their surrounding code. > > > So that is unlikely to be awnsered by anyone accuratly > > > > > > Also iam not sure i understand why you ask or why this would matter > > > the suggested function allows to simplify cases where the NULL can > > > occur, not where it cannot or should not. That is this is intended for > > > the cases where we already have or are adding explicit checks to > > > avoid the NULL case. > > > > > > i could rename this to av_memcpy_nullsafe which would make it clearer but > > > also its more to write and read > > > > I admit I thought that a worthwhile idea originally, > > but I have to think back to a long time ago that every function added to our "API" has a cost of people having to know about it and how to use it. > > And if it's currently only 2 places that would benefit I think James is right to ask if it makes sense. > > Of course another question might be if it might make sense to replace all memcpy uses with it. > > I mean, isn't it naturally expected behaviour that the pointers would be ignored if the copy amount is 0? There might be a lot of code assuming that we do not know about... > > in addition to the 2 there are these related commits found by very dumb git log greps > In further addition there would be cases that deal with src == dst, something we > could add a check for in av_memcpy() too > Personally I really don't like hiding too much magic in a function like this. It can easily lead to brittle code, someone may think the pointer is always valid since its memcpy'ed to, and uses it afterwards, and there you have a disaster. Either the function name should make perfectly clear what it does, or preferably it should just not exist and code should just validate its stuff. - Hendrik
On 03/07/2019 08:41, Reimar Döffinger wrote:
> Of course another question might be if it might make sense to replace all memcpy uses with it.
I would expect this may break some compiler optimizations around
memcpy (like __builtin_memcpy, or direct inlining), no?
- Derek
On Wed, Jul 03, 2019 at 12:50:59PM +0200, Hendrik Leppkes wrote: > On Wed, Jul 3, 2019 at 10:46 AM Michael Niedermayer > <michael@niedermayer.cc> wrote: > > > > On Wed, Jul 03, 2019 at 09:41:41AM +0200, Reimar Döffinger wrote: > > > > > > > > > On 03.07.2019, at 08:29, Michael Niedermayer <michael@niedermayer.cc> wrote: > > > > > > > On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote: > > > >> On 7/2/2019 5:56 PM, Michael Niedermayer wrote: > > > >>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > > > >>> --- > > > >>> doc/APIchanges | 3 +++ > > > >>> libavutil/mem.h | 13 +++++++++++++ > > > >>> libavutil/version.h | 2 +- > > > >>> 3 files changed, 17 insertions(+), 1 deletion(-) > > > >>> > > > >>> diff --git a/doc/APIchanges b/doc/APIchanges > > > >>> index b5fadc2a48..65b8ed74ad 100644 > > > >>> --- a/doc/APIchanges > > > >>> +++ b/doc/APIchanges > > > >>> @@ -15,6 +15,9 @@ libavutil: 2017-10-21 > > > >>> > > > >>> API changes, most recent first: > > > >>> > > > >>> +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h > > > >>> + Add av_memcpy() > > > >>> + > > > >>> 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h > > > >>> Add FF_DECODE_ERROR_DECODE_SLICES > > > >>> > > > >>> diff --git a/libavutil/mem.h b/libavutil/mem.h > > > >>> index 5fb1a02dd9..a35230360d 100644 > > > >>> --- a/libavutil/mem.h > > > >>> +++ b/libavutil/mem.h > > > >>> @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size); > > > >>> */ > > > >>> void av_memcpy_backptr(uint8_t *dst, int back, int cnt); > > > >>> > > > >>> +/** > > > >>> + * memcpy() implementation without a NULL pointer special case > > > >>> + * > > > >>> + * @param dst Destination buffer > > > >>> + * @param src Source buffer > > > >>> + * @param cnt Number of bytes to copy; must be >= 0 > > > >>> + */ > > > >>> +static inline void av_memcpy(void *dst, const void *src, size_t cnt) > > > >> > > > >> How many cases are there in the codebase where cnt can be 0, and dst or > > > >> src NULL, without it having been checked before calling memcpy? And from > > > >> those, which would not be from situations where the code should have > > > >> instead aborted and returned ENOMEM, or EINVAL if either of them are > > > >> function arguments? > > > > > > > > There are around 2500 occurances of memcpy in the codebase > > > > To awnser your question it would be needed to review all of them and in many > > > > cases their surrounding code. > > > > So that is unlikely to be awnsered by anyone accuratly > > > > > > > > Also iam not sure i understand why you ask or why this would matter > > > > the suggested function allows to simplify cases where the NULL can > > > > occur, not where it cannot or should not. That is this is intended for > > > > the cases where we already have or are adding explicit checks to > > > > avoid the NULL case. > > > > > > > > i could rename this to av_memcpy_nullsafe which would make it clearer but > > > > also its more to write and read > > > > > > I admit I thought that a worthwhile idea originally, > > > but I have to think back to a long time ago that every function added to our "API" has a cost of people having to know about it and how to use it. > > > And if it's currently only 2 places that would benefit I think James is right to ask if it makes sense. > > > Of course another question might be if it might make sense to replace all memcpy uses with it. > > > I mean, isn't it naturally expected behaviour that the pointers would be ignored if the copy amount is 0? There might be a lot of code assuming that we do not know about... > > > > in addition to the 2 there are these related commits found by very dumb git log greps > > In further addition there would be cases that deal with src == dst, something we > > could add a check for in av_memcpy() too > > > > Personally I really don't like hiding too much magic in a function > like this. It can easily lead to brittle code, someone may think the > pointer is always valid since its memcpy'ed to, and uses it > afterwards, and there you have a disaster. Why would someone think that a pointer to an array of length 0 is valid ? malloc(0) for example does not gurantee that > Either the function name should make perfectly clear what it does, or > preferably it should just not exist and code should just validate its > stuff. what name would you suggest ? but we can just drop this and add the checks to memcpy() where needed i was just trying to make all the undefined behaviour fixes a bit less ugly as multiple people complained ... thanks [...]
On Thu, Jul 4, 2019 at 8:45 PM Michael Niedermayer <michael@niedermayer.cc> wrote: > > On Wed, Jul 03, 2019 at 12:50:59PM +0200, Hendrik Leppkes wrote: > > On Wed, Jul 3, 2019 at 10:46 AM Michael Niedermayer > > <michael@niedermayer.cc> wrote: > > > > > > On Wed, Jul 03, 2019 at 09:41:41AM +0200, Reimar Döffinger wrote: > > > > > > > > > > > > On 03.07.2019, at 08:29, Michael Niedermayer <michael@niedermayer.cc> wrote: > > > > > > > > > On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote: > > > > >> On 7/2/2019 5:56 PM, Michael Niedermayer wrote: > > > > >>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > > > > >>> --- > > > > >>> doc/APIchanges | 3 +++ > > > > >>> libavutil/mem.h | 13 +++++++++++++ > > > > >>> libavutil/version.h | 2 +- > > > > >>> 3 files changed, 17 insertions(+), 1 deletion(-) > > > > >>> > > > > >>> diff --git a/doc/APIchanges b/doc/APIchanges > > > > >>> index b5fadc2a48..65b8ed74ad 100644 > > > > >>> --- a/doc/APIchanges > > > > >>> +++ b/doc/APIchanges > > > > >>> @@ -15,6 +15,9 @@ libavutil: 2017-10-21 > > > > >>> > > > > >>> API changes, most recent first: > > > > >>> > > > > >>> +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h > > > > >>> + Add av_memcpy() > > > > >>> + > > > > >>> 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h > > > > >>> Add FF_DECODE_ERROR_DECODE_SLICES > > > > >>> > > > > >>> diff --git a/libavutil/mem.h b/libavutil/mem.h > > > > >>> index 5fb1a02dd9..a35230360d 100644 > > > > >>> --- a/libavutil/mem.h > > > > >>> +++ b/libavutil/mem.h > > > > >>> @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size); > > > > >>> */ > > > > >>> void av_memcpy_backptr(uint8_t *dst, int back, int cnt); > > > > >>> > > > > >>> +/** > > > > >>> + * memcpy() implementation without a NULL pointer special case > > > > >>> + * > > > > >>> + * @param dst Destination buffer > > > > >>> + * @param src Source buffer > > > > >>> + * @param cnt Number of bytes to copy; must be >= 0 > > > > >>> + */ > > > > >>> +static inline void av_memcpy(void *dst, const void *src, size_t cnt) > > > > >> > > > > >> How many cases are there in the codebase where cnt can be 0, and dst or > > > > >> src NULL, without it having been checked before calling memcpy? And from > > > > >> those, which would not be from situations where the code should have > > > > >> instead aborted and returned ENOMEM, or EINVAL if either of them are > > > > >> function arguments? > > > > > > > > > > There are around 2500 occurances of memcpy in the codebase > > > > > To awnser your question it would be needed to review all of them and in many > > > > > cases their surrounding code. > > > > > So that is unlikely to be awnsered by anyone accuratly > > > > > > > > > > Also iam not sure i understand why you ask or why this would matter > > > > > the suggested function allows to simplify cases where the NULL can > > > > > occur, not where it cannot or should not. That is this is intended for > > > > > the cases where we already have or are adding explicit checks to > > > > > avoid the NULL case. > > > > > > > > > > i could rename this to av_memcpy_nullsafe which would make it clearer but > > > > > also its more to write and read > > > > > > > > I admit I thought that a worthwhile idea originally, > > > > but I have to think back to a long time ago that every function added to our "API" has a cost of people having to know about it and how to use it. > > > > And if it's currently only 2 places that would benefit I think James is right to ask if it makes sense. > > > > Of course another question might be if it might make sense to replace all memcpy uses with it. > > > > I mean, isn't it naturally expected behaviour that the pointers would be ignored if the copy amount is 0? There might be a lot of code assuming that we do not know about... > > > > > > in addition to the 2 there are these related commits found by very dumb git log greps > > > In further addition there would be cases that deal with src == dst, something we > > > could add a check for in av_memcpy() too > > > > > > > Personally I really don't like hiding too much magic in a function > > like this. It can easily lead to brittle code, someone may think the > > pointer is always valid since its memcpy'ed to, and uses it > > afterwards, and there you have a disaster. > > Why would someone think that a pointer to an array of length 0 is valid ? > malloc(0) for example does not gurantee that > Why would someone memcpy a array of length 0? Because they didn't realize that it may be zero. You seem to view this as intentional code with the full knowledge that it may be zero, but I would rather argue that most people just didn't consider this special case, instead of willingly running into errorneous usage of memcpy. - Hendrik
ons 2019-07-03 klockan 10:46 +0200 skrev Michael Niedermayer: > On Wed, Jul 03, 2019 at 09:41:41AM +0200, Reimar Döffinger wrote: > > > > > > On 03.07.2019, at 08:29, Michael Niedermayer <michael@niedermayer.cc> wrote: > > > > > On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote: > > > > > > > > How many cases are there in the codebase where cnt can be 0, and dst or > > > > src NULL, without it having been checked before calling memcpy? And from > > > > those, which would not be from situations where the code should have > > > > instead aborted and returned ENOMEM, or EINVAL if either of them are > > > > function arguments? > > > > > > There are around 2500 occurances of memcpy in the codebase > > > To awnser your question it would be needed to review all of them and in many > > > cases their surrounding code. > > > So that is unlikely to be awnsered by anyone accuratly > > > > > > Also iam not sure i understand why you ask or why this would matter > > > the suggested function allows to simplify cases where the NULL can > > > occur, not where it cannot or should not. That is this is intended for > > > the cases where we already have or are adding explicit checks to > > > avoid the NULL case. > > > > > > i could rename this to av_memcpy_nullsafe which would make it clearer but > > > also its more to write and read > > > > I admit I thought that a worthwhile idea originally, > > but I have to think back to a long time ago that every function > > added to our "API" has a cost of people having to know about it and > > how to use it. > > And if it's currently only 2 places that would benefit I think > > James is right to ask if it makes sense. > > Of course another question might be if it might make sense to > > replace all memcpy uses with it. > > I mean, isn't it naturally expected behaviour that the pointers > > would be ignored if the copy amount is 0? There might be a lot of > > code assuming that we do not know about... > > in addition to the 2 there are these related commits found by very dumb git log greps > In further addition there would be cases that deal with src == dst, something we > could add a check for in av_memcpy() too All of these proposed "solutions" are equally horrible I'm going to raise the issue of formal verification again (Frama-C and friends), because crap like this should be checked at compile time, not with flakey runtime checks. I gave adding ACSL markup to parts of lavu a stab a while back, and it looked somewhat promising. Bit hacks present a bit of a problem however.. /Tomas
On Fri, Jul 05, 2019 at 05:01:33PM +0200, Tomas Härdin wrote: > ons 2019-07-03 klockan 10:46 +0200 skrev Michael Niedermayer: > > On Wed, Jul 03, 2019 at 09:41:41AM +0200, Reimar Döffinger wrote: > > > > > > > > > On 03.07.2019, at 08:29, Michael Niedermayer <michael@niedermayer.cc> wrote: > > > > > > > On Tue, Jul 02, 2019 at 08:42:43PM -0300, James Almer wrote: > > > > > > > > > > How many cases are there in the codebase where cnt can be 0, and dst or > > > > > src NULL, without it having been checked before calling memcpy? And from > > > > > those, which would not be from situations where the code should have > > > > > instead aborted and returned ENOMEM, or EINVAL if either of them are > > > > > function arguments? > > > > > > > > There are around 2500 occurances of memcpy in the codebase > > > > To awnser your question it would be needed to review all of them and in many > > > > cases their surrounding code. > > > > So that is unlikely to be awnsered by anyone accuratly > > > > > > > > Also iam not sure i understand why you ask or why this would matter > > > > the suggested function allows to simplify cases where the NULL can > > > > occur, not where it cannot or should not. That is this is intended for > > > > the cases where we already have or are adding explicit checks to > > > > avoid the NULL case. > > > > > > > > i could rename this to av_memcpy_nullsafe which would make it clearer but > > > > also its more to write and read > > > > > > I admit I thought that a worthwhile idea originally, > > > but I have to think back to a long time ago that every function > > > added to our "API" has a cost of people having to know about it and > > > how to use it. > > > And if it's currently only 2 places that would benefit I think > > > James is right to ask if it makes sense. > > > Of course another question might be if it might make sense to > > > replace all memcpy uses with it. > > > I mean, isn't it naturally expected behaviour that the pointers > > > would be ignored if the copy amount is 0? There might be a lot of > > > code assuming that we do not know about... > > > > in addition to the 2 there are these related commits found by very dumb git log greps > > In further addition there would be cases that deal with src == dst, something we > > could add a check for in av_memcpy() too > > All of these proposed "solutions" are equally horrible > > I'm going to raise the issue of formal verification again (Frama-C and > friends), because crap like this should be checked at compile time, not > with flakey runtime checks. > > I gave adding ACSL markup to parts of lavu a stab a while back, and it > looked somewhat promising. Bit hacks present a bit of a problem > however.. As we are already off topic, heres an example to test static analysis, does this trigger undefined behavior by executing the memcpy for some user input ? void f(unsigned bigint a) { bigint i; for (i = 2; (((bigint)1 << a) + 1) % i; i++) ; if (a > 20 && i > ((bigint)1 << a)) memcpy(NULL, NULL, 0); } i know its a lame example but just to remind that static analysis has limitations. (your mail sounds a bit like static analysis could replace everything ...) Thanks [...]
lör 2019-07-06 klockan 00:08 +0200 skrev Michael Niedermayer: > As we are already off topic, heres an example to test static > analysis, does this trigger undefined behavior by executing the memcpy > for some user input ? > > void f(unsigned bigint a) { > bigint i; > for (i = 2; (((bigint)1 << a) + 1) % i; i++) > ; > if (a > 20 && i > ((bigint)1 << a)) > memcpy(NULL, NULL, 0); > } > > i know its a lame example but just to remind that static analysis has > limitations. (your mail sounds a bit like static analysis could replace > everything ...) That is acually perfectly legal since the intersection between [NULL,NULL) and [NULL,NULL) is empty and thus they do not overlap. Here's an example that validates: #include <stdint.h> #include <string.h> /*@ axiomatic Foo { axiom Bar: \forall integer a; 0 <= a <= 62 ==> 1 <= (1<<a) <= (1<<62); } */ /*@ requires 0 <= a <= 62; assigns ((char*)NULL)[0..-1]; */ void f(uint64_t a) { int64_t i = 2; int64_t a2 = (1LL << a) + 1; /*@ loop invariant 2 <= i <= a2; loop assigns i; */ for (; a2 % i; i++) ; //@ assert a2 % i == 0; if (a > 20 && i > ((int64_t)1 << a)) memcpy(NULL, NULL, 0); } I got a bit lazy with the axiom. I think newer versions of Frama-C are better able to reason about shifts. I'm on version Silicon-20161101. To validate, put in a file called mini.c and run: frama-c -wp -wp-rte -wp-prover z3,cvc4,alt-ergo,qed mini.c You'll need the frama-c package installed, and all the provers listed. Curiously, replacing the assignment in the contract with assigns \nothing doesn't work (should eq. assigning the empty set). Maybe a newer version of Frama-C fixes this. If you change the memcpy() to copy n=1 bytes then the verification fails, as expected. It also fails if let src and dst equal to the same valid memory area, because of overlap. The point of all this is that it's possible to specify and verify whether a function has side effects, that it doesn't write out of bounds, that it terminates etc. This removes the need for unit tests (if FFmpeg were to have any) /Tomas
On Sat, Jul 06, 2019 at 02:34:34PM +0200, Tomas Härdin wrote: > lör 2019-07-06 klockan 00:08 +0200 skrev Michael Niedermayer: > > As we are already off topic, heres an example to test static > > analysis, does this trigger undefined behavior by executing the memcpy > > for some user input ? > > > > void f(unsigned bigint a) { > > bigint i; > > for (i = 2; (((bigint)1 << a) + 1) % i; i++) > > ; > > if (a > 20 && i > ((bigint)1 << a)) > > memcpy(NULL, NULL, 0); > > } > > > > i know its a lame example but just to remind that static analysis has > > limitations. (your mail sounds a bit like static analysis could replace > > everything ...) > > That is acually perfectly legal since the intersection between > [NULL,NULL) and [NULL,NULL) is empty and thus they do not overlap. > Here's an example that validates: > > #include <stdint.h> > #include <string.h> > > /*@ axiomatic Foo { > axiom Bar: \forall integer a; > 0 <= a <= 62 ==> > 1 <= (1<<a) <= (1<<62); > } > */ > > /*@ requires 0 <= a <= 62; > assigns ((char*)NULL)[0..-1]; > */ > void f(uint64_t a) { > int64_t i = 2; > int64_t a2 = (1LL << a) + 1; > /*@ loop invariant 2 <= i <= a2; > loop assigns i; > */ > for (; a2 % i; i++) > ; > //@ assert a2 % i == 0; > if (a > 20 && i > ((int64_t)1 << a)) > memcpy(NULL, NULL, 0); > } this code is wrong. Imagine this was real code, and to make it fit in the static analyzer one changes it like this why is it worng ? the range should not have a upper bound of 62 in fact there is no reason to run this function with input below 1LL<<33 that is not 33 that is 1LL<<33 also you can restrict a to powers of 2 if you like thats for (b = 33; ; b++) f((bigint)1<<b); can the analyzer help check this ? I am pretty sure its not UB below, and i suspect its not UB for most values above but iam really curious and like to know for sure. [...] > If you change the memcpy() to copy n=1 bytes then the verification > fails, as expected. It also fails if let src and dst equal to the same > valid memory area, because of overlap. are you saying that code which never executes causes failure of the analyzer because if it did execute it would be wrong ? [...] Thanks
lör 2019-07-06 klockan 18:34 +0200 skrev Michael Niedermayer: > On Sat, Jul 06, 2019 at 02:34:34PM +0200, Tomas Härdin wrote: > > lör 2019-07-06 klockan 00:08 +0200 skrev Michael Niedermayer: > > > As we are already off topic, heres an example to test static > > > analysis, does this trigger undefined behavior by executing the > > > memcpy > > > for some user input ? > > > > > > void f(unsigned bigint a) { > > > bigint i; > > > for (i = 2; (((bigint)1 << a) + 1) % i; i++) > > > ; > > > if (a > 20 && i > ((bigint)1 << a)) > > > memcpy(NULL, NULL, 0); > > > } > > > > > > i know its a lame example but just to remind that static analysis > > > has > > > limitations. (your mail sounds a bit like static analysis could > > > replace > > > everything ...) > > > > That is acually perfectly legal since the intersection between > > [NULL,NULL) and [NULL,NULL) is empty and thus they do not overlap. > > Here's an example that validates: > > > > #include <stdint.h> > > #include <string.h> > > > > /*@ axiomatic Foo { > > axiom Bar: \forall integer a; > > 0 <= a <= 62 ==> > > 1 <= (1<<a) <= (1<<62); > > } > > */ > > > > /*@ requires 0 <= a <= 62; > > assigns ((char*)NULL)[0..-1]; > > */ > > void f(uint64_t a) { > > int64_t i = 2; > > int64_t a2 = (1LL << a) + 1; > > /*@ loop invariant 2 <= i <= a2; > > loop assigns i; > > */ > > for (; a2 % i; i++) > > ; > > //@ assert a2 % i == 0; > > if (a > 20 && i > ((int64_t)1 << a)) > > memcpy(NULL, NULL, 0); > > } > > this code is wrong. > Imagine this was real code, and to make it fit in the static > analyzer one changes it like this > > why is it worng ? > the range should not have a upper bound of 62 in fact there is no > reason to run this function with input below 1LL<<33 that is not > 33 that is 1LL<<33 All bignum implementations I've seen have upper bounds. Maybe you have an infinite tape laying around somewhere? It is possible to define custom data types and reason about them just as well as integers. In fact, I've seen elliptic curve implementations that do just that. It's just that it's a bit of work > > also you can restrict a to powers of 2 if you like > thats > for (b = 33; ; b++) > f((bigint)1<<b); > > can the analyzer help check this ? If you have a bigint_shift() function with an appropriate contract, sure. b will overflow at some point howeveer, which the prover will catch by default if b is signed. It can also be told to catch unsigned overflow. > I am pretty sure its not UB below, and i suspect its not UB for most > values > above but iam really curious and like to know for sure. > > > [...] > > > If you change the memcpy() to copy n=1 bytes then the verification > > fails, as expected. It also fails if let src and dst equal to the > > same > > valid memory area, because of overlap. > > are you saying that code which never executes causes failure of the > analyzer because if it did execute it would be wrong ? This is equivalent to asking the prover to show that there are no Fermat primes above (1<<20), which is a tall order. If your code depends on tricky-to-prove theorems then obviously the prover is not going to do that job for you. Whether dead code is removed is a good question nonetheless, let me check: void f(uint64_t a) { [...] if (a > 20 && i > ((int64_t)1 << a)) { char aa[1]; memcpy(aa, aa, 1); } } fails void f(uint64_t a) { [...] if (a > 62 && i > ((int64_t)1 << a)) { char aa[1]; memcpy(aa, aa, 1); } } succeeds (because 0 <= a <= 62) /Tomas
On Sat, Jul 06, 2019 at 11:22:59PM +0200, Tomas Härdin wrote: > lör 2019-07-06 klockan 18:34 +0200 skrev Michael Niedermayer: > > On Sat, Jul 06, 2019 at 02:34:34PM +0200, Tomas Härdin wrote: > > > lör 2019-07-06 klockan 00:08 +0200 skrev Michael Niedermayer: > > > > As we are already off topic, heres an example to test static > > > > analysis, does this trigger undefined behavior by executing the > > > > memcpy > > > > for some user input ? > > > > > > > > void f(unsigned bigint a) { > > > > bigint i; > > > > for (i = 2; (((bigint)1 << a) + 1) % i; i++) > > > > ; > > > > if (a > 20 && i > ((bigint)1 << a)) > > > > memcpy(NULL, NULL, 0); > > > > } > > > > > > > > i know its a lame example but just to remind that static analysis > > > > has > > > > limitations. (your mail sounds a bit like static analysis could > > > > replace > > > > everything ...) > > > > > > That is acually perfectly legal since the intersection between > > > [NULL,NULL) and [NULL,NULL) is empty and thus they do not overlap. > > > Here's an example that validates: > > > > > > #include <stdint.h> > > > #include <string.h> > > > > > > /*@ axiomatic Foo { > > > axiom Bar: \forall integer a; > > > 0 <= a <= 62 ==> > > > 1 <= (1<<a) <= (1<<62); > > > } > > > */ > > > > > > /*@ requires 0 <= a <= 62; > > > assigns ((char*)NULL)[0..-1]; > > > */ > > > void f(uint64_t a) { > > > int64_t i = 2; > > > int64_t a2 = (1LL << a) + 1; > > > /*@ loop invariant 2 <= i <= a2; > > > loop assigns i; > > > */ > > > for (; a2 % i; i++) > > > ; > > > //@ assert a2 % i == 0; > > > if (a > 20 && i > ((int64_t)1 << a)) > > > memcpy(NULL, NULL, 0); > > > } > > > > this code is wrong. > > Imagine this was real code, and to make it fit in the static > > analyzer one changes it like this > > > > why is it worng ? > > the range should not have a upper bound of 62 in fact there is no > > reason to run this function with input below 1LL<<33 that is not > > 33 that is 1LL<<33 > > All bignum implementations I've seen have upper bounds. How can this help ? assume we proof avutil is free of UB, then we update some external lib or change to a platform with a larger INT_MAX and boom, the proof is no longer valid (this could happen if such implementation limits are used in a proof) Thanks [...]
sön 2019-07-07 klockan 18:19 +0200 skrev Michael Niedermayer: > On Sat, Jul 06, 2019 at 11:22:59PM +0200, Tomas Härdin wrote: > > lör 2019-07-06 klockan 18:34 +0200 skrev Michael Niedermayer: > > > On Sat, Jul 06, 2019 at 02:34:34PM +0200, Tomas Härdin wrote: > > > > lör 2019-07-06 klockan 00:08 +0200 skrev Michael Niedermayer: > > > > > As we are already off topic, heres an example to test static > > > > > analysis, does this trigger undefined behavior by executing the > > > > > memcpy > > > > > for some user input ? > > > > > > > > > > void f(unsigned bigint a) { > > > > > bigint i; > > > > > for (i = 2; (((bigint)1 << a) + 1) % i; i++) > > > > > ; > > > > > if (a > 20 && i > ((bigint)1 << a)) > > > > > memcpy(NULL, NULL, 0); > > > > > } > > > > > > > > > > i know its a lame example but just to remind that static analysis > > > > > has > > > > > limitations. (your mail sounds a bit like static analysis could > > > > > replace > > > > > everything ...) > > > > > > > > That is acually perfectly legal since the intersection between > > > > [NULL,NULL) and [NULL,NULL) is empty and thus they do not overlap. > > > > Here's an example that validates: > > > > > > > > #include <stdint.h> > > > > #include <string.h> > > > > > > > > /*@ axiomatic Foo { > > > > axiom Bar: \forall integer a; > > > > 0 <= a <= 62 ==> > > > > 1 <= (1<<a) <= (1<<62); > > > > } > > > > */ > > > > > > > > /*@ requires 0 <= a <= 62; > > > > assigns ((char*)NULL)[0..-1]; > > > > */ > > > > void f(uint64_t a) { > > > > int64_t i = 2; > > > > int64_t a2 = (1LL << a) + 1; > > > > /*@ loop invariant 2 <= i <= a2; > > > > loop assigns i; > > > > */ > > > > for (; a2 % i; i++) > > > > ; > > > > //@ assert a2 % i == 0; > > > > if (a > 20 && i > ((int64_t)1 << a)) > > > > memcpy(NULL, NULL, 0); > > > > } > > > > > > this code is wrong. > > > Imagine this was real code, and to make it fit in the static > > > analyzer one changes it like this > > > > > > why is it worng ? > > > the range should not have a upper bound of 62 in fact there is no > > > reason to run this function with input below 1LL<<33 that is not > > > 33 that is 1LL<<33 > > > > All bignum implementations I've seen have upper bounds. > > How can this help ? > > assume we proof avutil is free of UB, then we update some external lib > or change to a platform with a larger INT_MAX and boom, the proof is > no longer valid > (this could happen if such implementation limits are used in a proof) Yes, you have to target the prover to every architecture you intend to support. I ran into this exact issue when verifying some embedded code where int is 16-bit. Adding the option -machdep x86_16 was enough to fix that. I've seen posts on the Frama-C ML about adding support for other platforms, such as the 128-bit mode in RISC-V. You also can't verify external libraries of course. But FFmpeg is notoriously NIH:y, so that wouldn't be an issue for a very long time. But, you can also augment function prototypes with guesstimated contracts. This is analogous to reading headers and forming your own mental model about how a given library works. /Tomas
diff --git a/doc/APIchanges b/doc/APIchanges index b5fadc2a48..65b8ed74ad 100644 --- a/doc/APIchanges +++ b/doc/APIchanges @@ -15,6 +15,9 @@ libavutil: 2017-10-21 API changes, most recent first: +2019-07-XX - XXXXXXXXXX - lavu 56.31.100 - mem.h + Add av_memcpy() + 2019-06-21 - XXXXXXXXXX - lavu 56.30.100 - frame.h Add FF_DECODE_ERROR_DECODE_SLICES diff --git a/libavutil/mem.h b/libavutil/mem.h index 5fb1a02dd9..a35230360d 100644 --- a/libavutil/mem.h +++ b/libavutil/mem.h @@ -506,6 +506,19 @@ void *av_memdup(const void *p, size_t size); */ void av_memcpy_backptr(uint8_t *dst, int back, int cnt); +/** + * memcpy() implementation without a NULL pointer special case + * + * @param dst Destination buffer + * @param src Source buffer + * @param cnt Number of bytes to copy; must be >= 0 + */ +static inline void av_memcpy(void *dst, const void *src, size_t cnt) +{ + if (cnt) + memcpy(dst, src, cnt); +} + /** * @} */ diff --git a/libavutil/version.h b/libavutil/version.h index e16b93e877..24ca8ab7db 100644 --- a/libavutil/version.h +++ b/libavutil/version.h @@ -79,7 +79,7 @@ */ #define LIBAVUTIL_VERSION_MAJOR 56 -#define LIBAVUTIL_VERSION_MINOR 30 +#define LIBAVUTIL_VERSION_MINOR 31 #define LIBAVUTIL_VERSION_MICRO 100 #define LIBAVUTIL_VERSION_INT AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> --- doc/APIchanges | 3 +++ libavutil/mem.h | 13 +++++++++++++ libavutil/version.h | 2 +- 3 files changed, 17 insertions(+), 1 deletion(-)