Message ID | 20191226190408.26419-1-andreas.rheinhardt@gmail.com |
---|---|
State | New |
Headers | show |
Andreas Rheinhardt (12019-12-26): > b) It guarantees to not allocated more than UINT_MAX - 1 elements, so > the caller needn't check for overflow if the desired size is increased > in steps of one. This is preparing trouble for later, and as Michael pointed, it will not work when the number of elements is stored in a signed integer. (It was a mistake to use a signed integer in the first place, but that particular mistake is already all over the place). I strongly suggest we try to make things properly: - Make nb_allocated and min_nb size_t as they should be. - Do not hard-code UINT_MAX-1, make it an argument, max_nb: that way, if the index is an integer, just pass INT_MAX-1, and if it is properly size_t, pass SIZE_MAX-1. - Since we can't pass a pointer to int as a pointer to size_t, add a small convenience wrapper av_fast_realloc_array_int() just to convert the type of nb_allocated. That's ugly, but I do not see a better solution. Or we can decide we must change the type of the size to size_t whenever we update code to use av_fast_realloc_array(). And while we are at it, I would like better: - Alter ptr the same way nb_allocated is altered, and return a real error code in case of error. Otherwise, the caller has to guess that only AVERROR(ENOMEM) is possible, and we have seen that these assumptions and hard-coded error code have a way of overstaying their welcome. Poor choice of types have caused problems in the past. They will cause more problems as the memory of computers, and therefore the task we give them, grow. Let us not cultivate them. Regards,
On Thu, Dec 26, 2019 at 8:35 PM Nicolas George <george@nsup.org> wrote: > Andreas Rheinhardt (12019-12-26): > > b) It guarantees to not allocated more than UINT_MAX - 1 elements, so > > the caller needn't check for overflow if the desired size is increased > > in steps of one. > > This is preparing trouble for later, I don't understand. Do you think that callers will take this as a blank cheque to not check at all? > and as Michael pointed, it will not > work when the number of elements is stored in a signed integer. My intention was actually to convert the types for every user of this function when using this function (see e.g. the -Matroska demuxer patch). > (It was > a mistake to use a signed integer in the first place, but that > particular mistake is already all over the place). I agree. > I strongly suggest we > try to make things properly: > > - Make nb_allocated and min_nb size_t as they should be. > Do we have arrays where there is a need to go beyond the range of ordinary integers for the number of elements? > > - Do not hard-code UINT_MAX-1, make it an argument, max_nb: that way, if > the index is an integer, just pass INT_MAX-1, and if it is properly > size_t, pass SIZE_MAX-1. > > - Since we can't pass a pointer to int as a pointer to size_t, add a > small convenience wrapper av_fast_realloc_array_int() just to convert > the type of nb_allocated. That's ugly, but I do not see a better > solution. Or we can decide we must change the type of the size to > size_t whenever we update code to use av_fast_realloc_array(). > I'd like not to turn the int/unsigned version of av_fast_realloc_array() into a wrapper until there is a real case where an array is needed that doesn't fit into an int (or an unsigned). And switching to size_t everywhere would increase the size of certain structures which I very much like to avoid (e.g. the size of every MatroskaIndex element would increase by eight bytes (for 64 bit systems)). > > And while we are at it, I would like better: > > - Alter ptr the same way nb_allocated is altered, and return a real > error code in case of error. Otherwise, the caller has to guess that > only AVERROR(ENOMEM) is possible, and we have seen that these > assumptions and hard-coded error code have a way of overstaying their > welcome. > That is certainly possible (if "alter ptr the same way nb_allocated is altered" means: Via a pointer to a pointer. (The other interpretation (that the array should be automatically freed on failure and ptr set to NULL) would have elicited a different response.)) > > Poor choice of types have caused problems in the past. They will cause > more problems as the memory of computers, and therefore the task we give > them, grow. Let us not cultivate them. > > Regards, > > -- > Nicolas George > Thanks for looking over this. - Andreas
Andreas Rheinhardt (12019-12-26): > I don't understand. Do you think that callers will take this as a blank > cheque to not check at all? No, I mean that using the wrong type will impose limitations on us later, that will take work to resolve. > My intention was actually to convert the types for every user of this > function when using this function (see e.g. the -Matroska demuxer patch). Good. Then by all means, let us use the right type directly. > Do we have arrays where there is a need to go beyond the range of ordinary > integers for the number of elements? Probably not many right now. But the power an memory of computers increase, the needs increase too: it is becoming more and more likely to cause an actual problem; and at the same time, the cost of using the proper type decreases. > I'd like not to turn the int/unsigned version of av_fast_realloc_array() > into a wrapper until there is a real case where an array is needed that > doesn't fit into an int (or an unsigned). This is a public API, once it is there, we are stuck with it, including the types, good or bad. Therefore, I insist we use the proper type from the start. > And switching to size_t > everywhere would increase the size of certain structures which I very much > like to avoid (e.g. the size of every MatroskaIndex element would increase > by eight bytes (for 64 bit systems)). If you are worried about eight measly bytes, you should start by rethinking the memory allocation strategy: having a mallocated dynamic array costs way more than eight bytes. But if you really need to scrimp and save, you still can, you just need to use an intermediate variable when calling av_fast_realloc_array(). > That is certainly possible (if "alter ptr the same way nb_allocated is > altered" means: Via a pointer to a pointer. Yes, altering the pointer by pointer instead of returning it: that frees the return value for an error code. Regards,
diff --git a/doc/APIchanges b/doc/APIchanges index 5b8d801f06..b7cede329a 100644 --- a/doc/APIchanges +++ b/doc/APIchanges @@ -15,6 +15,9 @@ libavutil: 2017-10-21 API changes, most recent first: +2019-12-26 - xxxxxxxxxx - lavu 56.38.100 - mem.h + Add av_fast_realloc_array(). + 2019-12-xx - xxxxxxxxxx - lavu 56.37.100 - buffer.h Add av_buffer_pool_buffer_get_opaque(). diff --git a/libavutil/mem.c b/libavutil/mem.c index 88fe09b179..151ab586c4 100644 --- a/libavutil/mem.c +++ b/libavutil/mem.c @@ -497,6 +497,36 @@ void *av_fast_realloc(void *ptr, unsigned int *size, size_t min_size) return ptr; } +void *av_fast_realloc_array(void *ptr, unsigned *nb_allocated, + unsigned min_nb, size_t elsize) +{ + unsigned max_nb, nb; + + if (min_nb <= *nb_allocated) + return ptr; + + /* The 16 / 17 implies that we do not need to check + * for overflow in the calculation of nb below. */ + max_nb = FFMIN((UINT_MAX - 1) * 16 / 17, (max_alloc_size - 32) / elsize); + + if (min_nb > max_nb) { + *nb_allocated = 0; + return NULL; + } + + nb = FFMIN(max_nb, min_nb + (min_nb + 14) / 16); + + ptr = av_realloc(ptr, nb * elsize); + /* We could keep *nb_allocated in case of failure, but this is safer + * if the user lost the ptr and uses NULL now. */ + if (!ptr) + nb = 0; + + *nb_allocated = nb; + + return ptr; +} + void av_fast_malloc(void *ptr, unsigned int *size, size_t min_size) { ff_fast_malloc(ptr, size, min_size, 0); diff --git a/libavutil/mem.h b/libavutil/mem.h index 5fb1a02dd9..539a030387 100644 --- a/libavutil/mem.h +++ b/libavutil/mem.h @@ -370,11 +370,41 @@ int av_reallocp_array(void *ptr, size_t nmemb, size_t size); * @return `ptr` if the buffer is large enough, a pointer to newly reallocated * buffer if the buffer was not large enough, or `NULL` in case of * error + * @see av_fast_realloc_array() * @see av_realloc() * @see av_fast_malloc() */ void *av_fast_realloc(void *ptr, unsigned int *size, size_t min_size); +/** + * Reallocate the given array if it is not large enough, otherwise do nothing. + * + * If the given array is `NULL`, then a new uninitialized array is allocated. + * + * If the given array is not large enough, and reallocation fails, `NULL` is + * returned and `*nb_allocated` is set to 0, but the original array is not + * changed or freed. + * + * @param[in,out] ptr Already allocated array, or `NULL` + * @param[in,out] nb_allocated Pointer to the number of elements of the array + * `ptr`. `*nb_allocated` is updated to the new + * number of allocated elements, in particular 0 + * in case of failure. + * @param[in] min_nb Desired minimal amount of elements in array `ptr` + * @param[in] elsize Size of a single element of the array. + * Must not be zero. + * @return `ptr` if the array is large enough, a pointer to newly reallocated + * array if the array was not large enough, or `NULL` in case of + * error + * @note This function will never allocate more than `UINT_MAX - 1` elements, + * so incrementing in steps of one need not be checked for overflow. + * @see av_fast_realloc() + * @see av_realloc() + * @see av_fast_malloc() + */ +void *av_fast_realloc_array(void *ptr, unsigned *nb_allocated, + unsigned min_nb, size_t elsize); + /** * Allocate a buffer, reusing the given one if large enough. * diff --git a/libavutil/version.h b/libavutil/version.h index 4de0fa1fc3..af8f614aff 100644 --- a/libavutil/version.h +++ b/libavutil/version.h @@ -79,7 +79,7 @@ */ #define LIBAVUTIL_VERSION_MAJOR 56 -#define LIBAVUTIL_VERSION_MINOR 37 +#define LIBAVUTIL_VERSION_MINOR 38 #define LIBAVUTIL_VERSION_MICRO 100 #define LIBAVUTIL_VERSION_INT AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \
This is an array-equivalent of av_fast_realloc(). Its advantages compared to using av_fast_realloc() for allocating arrays are as follows: a) It performs its own overflow checks for the multiplication that is implicit in array allocations. (And it only needs to perform these checks (as well as the multiplication itself) in case the array needs to be reallocated.) b) It guarantees to not allocated more than UINT_MAX - 1 elements, so the caller needn't check for overflow if the desired size is increased in steps of one. c) Should the caller have increased max_alloc_size, av_fast_realloc() could come in a situation where it allocates more than what fits into an unsigned. In this case the variable containing the allocated size (an unsigned) won't accurately reflect how much has been allocated. After this point, lots of reallocations will happen despite the buffer actually being big enough. d) av_fast_realloc_array() will always allocate in multiples of array elements; no memory is wasted with partial elements. av_fast_realloc() usually allocates size + size / 16 + 32 bytes if size bytes are desired and if the already existing buffer isn't big enough. av_fast_realloc_array() instead allocates nb + (nb + 14) / 16. Rounding up is done in order not to reallocate in steps of one if the current number is < 16; adding 14 instead of 15 has the effect of only allocating one element if one element is desired. This is done with an eye towards applications where arrays might commonly only contain one element (as happens with the Matroska CueTrackPositions). Which of the two functions allocates faster depends upon the size of the elements. E.g. if the elements have a size of 32B and the desired size is incremented in steps of one, allocations happen at 1, 3, 5, 7, 9, 11, 13, 15, 17, 20, 23, 26 ... for av_fast_realloc(), 1, 2, 4, 6, 8, 10, 12, 14, 16, 18, 21, 24 ... for av_fast_realloc_array(). For element sizes of 96B, the numbers are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 21, 23, 25, 27, 30 ... for av_fast_realloc() whereas the pattern for av_fast_realloc_array() is unchanged. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> --- I missed that Marton has already pushed his patch for avutil/buffer which also included changes to doc/APIchanges and the lavu version number. Therefore I am resending this to solve the ensuing merge conflict. doc/APIchanges | 3 +++ libavutil/mem.c | 30 ++++++++++++++++++++++++++++++ libavutil/mem.h | 30 ++++++++++++++++++++++++++++++ libavutil/version.h | 2 +- 4 files changed, 64 insertions(+), 1 deletion(-)