Message ID | 20201110104851.321029-2-andreas.rheinhardt@gmail.com |
---|---|
State | Superseded |
Headers | show |
Series | VLC, esp. init_vlc patches | expand |
Context | Check | Description |
---|---|---|
andriy/x86_make | success | Make finished |
andriy/x86_make_fate | success | Make fate finished |
On 10/11/2020 10:46, Andreas Rheinhardt wrote: > > +#define INIT_VLC_STATIC_FROM_LENGTHS(vlc, bits, nb_codes, lens, len_wrap, \ > + symbols, symbols_wrap, symbols_size, \ > + offset, flags, static_size) \ > + do { \ > + static VLC_TYPE table[static_size][2]; \ > + (vlc)->table = table; \ > + (vlc)->table_allocated = static_size; \ > + ff_init_vlc_from_lengths(vlc, bits, nb_codes, lens, len_wrap, \ > + symbols, symbols_wrap, symbols_size, \ > + offset, flags | INIT_VLC_USE_NEW_STATIC); \ > + } while (0) If I am reading correctly, wherever you add/use this, you are adding non-thread safe global init code to a decoder. This is a huge step back and not acceptable. It should be made to properly use ff_thread_once if possible, or reworked. - Derek
Derek Buitenhuis: > On 10/11/2020 10:46, Andreas Rheinhardt wrote: >> >> +#define INIT_VLC_STATIC_FROM_LENGTHS(vlc, bits, nb_codes, lens, len_wrap, \ >> + symbols, symbols_wrap, symbols_size, \ >> + offset, flags, static_size) \ >> + do { \ >> + static VLC_TYPE table[static_size][2]; \ >> + (vlc)->table = table; \ >> + (vlc)->table_allocated = static_size; \ >> + ff_init_vlc_from_lengths(vlc, bits, nb_codes, lens, len_wrap, \ >> + symbols, symbols_wrap, symbols_size, \ >> + offset, flags | INIT_VLC_USE_NEW_STATIC); \ >> + } while (0) > > If I am reading correctly, wherever you add/use this, you are adding non-thread > safe global init code to a decoder. This is a huge step back and not acceptable. > > It should be made to properly use ff_thread_once if possible, or reworked. > The ff_init_vlc_... functions are inherently thread-safe: Everything is modified only once and directly set to its final value; so it's no problem if two threads are initializing the same static VLC at the same time. - Andreas
On 12/11/2020 20:51, Andreas Rheinhardt wrote: > The ff_init_vlc_... functions are inherently thread-safe: Everything is > modified only once and directly set to its final value; so it's no > problem if two threads are initializing the same static VLC at the same > time. Hmm, indeed. At the very least it is doing repeated work and allocations on every init that needn't happen. Wrapping in ff_thread_once is still preferable IMO... Unrelatedly: It looks like buf may not be freed when ff_init_vlc_from_lengths is called with INIT_VLC_USE_NEW_STATIC with more than LOCALBUF_ELEMS (which while currently set to something higher than the max used, could accidentally be called with more than 1500 in the future.) There should be fixed, or maybe be an assert added so that nobody ends up doing that in the first place by accident. - Derek
Derek Buitenhuis: > On 12/11/2020 20:51, Andreas Rheinhardt wrote: >> The ff_init_vlc_... functions are inherently thread-safe: Everything is >> modified only once and directly set to its final value; so it's no >> problem if two threads are initializing the same static VLC at the same >> time. > > > > Hmm, indeed. At the very least it is doing repeated work and allocations on every init > that needn't happen. Wrapping in ff_thread_once is still preferable IMO... > > Unrelatedly: It looks like buf may not be freed when ff_init_vlc_from_lengths is called > with INIT_VLC_USE_NEW_STATIC with more than LOCALBUF_ELEMS (which while currently > set to something higher than the max used, could accidentally be called with more than > 1500 in the future.) There should be fixed, or maybe be an assert added so that nobody > ends up doing that in the first place by accident. > It is currently asserted that this case doesn't happen. My patch doesn't change that. (on2avc could use static tables, yet it would need more than 3000 elements in the localbuf, therefore I did not switch it to static tables (one would either have to make localbuf bigger or allow for static table initialization to fail (due to allocation error). The latter one would mean that the caller would need to check the return value which is currently not done and unnecessary for all other static tables; a middle way would be to define a constant and document that initialization of static tables with less nb_codes than this constant can't fail. Said constant would then of course be set equal to the size of the local buffer.) - Andreas
Quoting Andreas Rheinhardt (2020-11-12 21:51:28) > Derek Buitenhuis: > > On 10/11/2020 10:46, Andreas Rheinhardt wrote: > >> > >> +#define INIT_VLC_STATIC_FROM_LENGTHS(vlc, bits, nb_codes, lens, len_wrap, \ > >> + symbols, symbols_wrap, symbols_size, \ > >> + offset, flags, static_size) \ > >> + do { \ > >> + static VLC_TYPE table[static_size][2]; \ > >> + (vlc)->table = table; \ > >> + (vlc)->table_allocated = static_size; \ > >> + ff_init_vlc_from_lengths(vlc, bits, nb_codes, lens, len_wrap, \ > >> + symbols, symbols_wrap, symbols_size, \ > >> + offset, flags | INIT_VLC_USE_NEW_STATIC); \ > >> + } while (0) > > > > If I am reading correctly, wherever you add/use this, you are adding non-thread > > safe global init code to a decoder. This is a huge step back and not acceptable. > > > > It should be made to properly use ff_thread_once if possible, or reworked. > > > The ff_init_vlc_... functions are inherently thread-safe: Everything is > modified only once and directly set to its final value; so it's no > problem if two threads are initializing the same static VLC at the same > time. Strictly speaking it's still a race (and therefore UB), even if you store the same values. I suspect tools like tsan will not like it either.
Quoting Andreas Rheinhardt (2020-11-10 11:46:58) > +int ff_init_vlc_from_lengths(VLC *vlc_arg, int nb_bits, int nb_codes, > + const int8_t *lens, int lens_wrap, > + const void *symbols, int symbols_wrap, int symbols_size, > + int offset, int flags) > +{ > + VLCcode localbuf[LOCALBUF_ELEMS], *buf = localbuf; > + VLC localvlc, *vlc; > + uint64_t code; > + int ret, j, len_max = FFMIN(32, 3 * nb_bits); > + > + ret = vlc_common_init(vlc_arg, nb_bits, nb_codes, &vlc, &localvlc, > + &buf, flags); > + if (ret < 0) > + return ret; > + > + j = code = 0; > + for (int i = 0; i < nb_codes; i++, lens += lens_wrap) { > + int len = *lens; > + if (len > 0) { > + unsigned sym; > + > + buf[j].bits = len; > + if (symbols) > + GET_DATA(sym, symbols, i, symbols_wrap, symbols_size) > + else > + sym = i; > + buf[j].symbol = sym + offset; > + buf[j++].code = code; > + } else if (len < 0) { > + len = -len; > + } else > + continue; > + if (len > len_max || code & ((1U << (32 - len)) - 1)) { > + av_log(NULL, AV_LOG_ERROR, "Invalid VLC (length %u)\n", len); Can you use a proper logging context here?
Anton Khirnov: > Quoting Andreas Rheinhardt (2020-11-10 11:46:58) >> +int ff_init_vlc_from_lengths(VLC *vlc_arg, int nb_bits, int nb_codes, >> + const int8_t *lens, int lens_wrap, >> + const void *symbols, int symbols_wrap, int symbols_size, >> + int offset, int flags) >> +{ >> + VLCcode localbuf[LOCALBUF_ELEMS], *buf = localbuf; >> + VLC localvlc, *vlc; >> + uint64_t code; >> + int ret, j, len_max = FFMIN(32, 3 * nb_bits); >> + >> + ret = vlc_common_init(vlc_arg, nb_bits, nb_codes, &vlc, &localvlc, >> + &buf, flags); >> + if (ret < 0) >> + return ret; >> + >> + j = code = 0; >> + for (int i = 0; i < nb_codes; i++, lens += lens_wrap) { >> + int len = *lens; >> + if (len > 0) { >> + unsigned sym; >> + >> + buf[j].bits = len; >> + if (symbols) >> + GET_DATA(sym, symbols, i, symbols_wrap, symbols_size) >> + else >> + sym = i; >> + buf[j].symbol = sym + offset; >> + buf[j++].code = code; >> + } else if (len < 0) { >> + len = -len; >> + } else >> + continue; >> + if (len > len_max || code & ((1U << (32 - len)) - 1)) { >> + av_log(NULL, AV_LOG_ERROR, "Invalid VLC (length %u)\n", len); > > Can you use a proper logging context here? > Yes. This will of course mean that I will have to go with the flag in patch four. - Andreas
Anton Khirnov: > Quoting Andreas Rheinhardt (2020-11-12 21:51:28) >> Derek Buitenhuis: >>> On 10/11/2020 10:46, Andreas Rheinhardt wrote: >>>> >>>> +#define INIT_VLC_STATIC_FROM_LENGTHS(vlc, bits, nb_codes, lens, len_wrap, \ >>>> + symbols, symbols_wrap, symbols_size, \ >>>> + offset, flags, static_size) \ >>>> + do { \ >>>> + static VLC_TYPE table[static_size][2]; \ >>>> + (vlc)->table = table; \ >>>> + (vlc)->table_allocated = static_size; \ >>>> + ff_init_vlc_from_lengths(vlc, bits, nb_codes, lens, len_wrap, \ >>>> + symbols, symbols_wrap, symbols_size, \ >>>> + offset, flags | INIT_VLC_USE_NEW_STATIC); \ >>>> + } while (0) >>> >>> If I am reading correctly, wherever you add/use this, you are adding non-thread >>> safe global init code to a decoder. This is a huge step back and not acceptable. >>> >>> It should be made to properly use ff_thread_once if possible, or reworked. >>> >> The ff_init_vlc_... functions are inherently thread-safe: Everything is >> modified only once and directly set to its final value; so it's no >> problem if two threads are initializing the same static VLC at the same >> time. > > Strictly speaking it's still a race (and therefore UB), even if you > store the same values. I suspect tools like tsan will not like it > either. > I at first thought it was not so, because the definition of data races in C11 speaks of conflicting actions in different threads; but you are right: It is conflicting according to the definition: "Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location." Furthermore, the current code has the problem of not using atomic operations to modify the VLC table. So I'll use ff_thread_once() for the cases affected by this patchset; a later patchset will then fix the other ones and also implement the simplifications that will be possible once this is done (no volatile!). This will also improve performance in general. - Andreas
diff --git a/libavcodec/bitstream.c b/libavcodec/bitstream.c index c7a87734e5..03d39ad88c 100644 --- a/libavcodec/bitstream.c +++ b/libavcodec/bitstream.c @@ -132,6 +132,8 @@ static int alloc_table(VLC *vlc, int size, int use_static) return index; } +#define LOCALBUF_ELEMS 1500 // the maximum currently needed is 1296 by rv34 + typedef struct VLCcode { uint8_t bits; VLC_TYPE symbol; @@ -140,6 +142,31 @@ typedef struct VLCcode { uint32_t code; } VLCcode; +static int vlc_common_init(VLC *vlc_arg, int nb_bits, int nb_codes, + VLC **vlc, VLC *localvlc, VLCcode **buf, + int flags) +{ + *vlc = vlc_arg; + (*vlc)->bits = nb_bits; + if (flags & INIT_VLC_USE_NEW_STATIC) { + av_assert0(nb_codes <= LOCALBUF_ELEMS); + *localvlc = *vlc_arg; + *vlc = localvlc; + (*vlc)->table_size = 0; + } else { + (*vlc)->table = NULL; + (*vlc)->table_allocated = 0; + (*vlc)->table_size = 0; + } + if (nb_codes > LOCALBUF_ELEMS) { + *buf = av_malloc_array(nb_codes, sizeof(VLCcode)); + if (!*buf) + return AVERROR(ENOMEM); + } + + return 0; +} + static int compare_vlcspec(const void *a, const void *b) { const VLCcode *sa = a, *sb = b; @@ -248,6 +275,27 @@ static int build_table(VLC *vlc, int table_nb_bits, int nb_codes, return table_index; } +static int vlc_common_end(VLC *vlc, int nb_bits, int nb_codes, VLCcode *codes, + int flags, VLC *vlc_arg, VLCcode localbuf[LOCALBUF_ELEMS]) +{ + int ret = build_table(vlc, nb_bits, nb_codes, codes, flags); + + if (flags & INIT_VLC_USE_NEW_STATIC) { + if(vlc->table_size != vlc->table_allocated) + av_log(NULL, AV_LOG_ERROR, "needed %d had %d\n", vlc->table_size, vlc->table_allocated); + + av_assert0(ret >= 0); + *vlc_arg = *vlc; + } else { + if (codes != localbuf) + av_free(codes); + if (ret < 0) { + av_freep(&vlc->table); + return ret; + } + } + return 0; +} /* Build VLC decoding tables suitable for use with get_vlc(). @@ -278,30 +326,14 @@ int ff_init_vlc_sparse(VLC *vlc_arg, int nb_bits, int nb_codes, const void *symbols, int symbols_wrap, int symbols_size, int flags) { - VLCcode *buf; + VLCcode localbuf[LOCALBUF_ELEMS], *buf = localbuf; int i, j, ret; - VLCcode localbuf[1500]; // the maximum currently needed is 1296 by rv34 VLC localvlc, *vlc; - vlc = vlc_arg; - vlc->bits = nb_bits; - if (flags & INIT_VLC_USE_NEW_STATIC) { - av_assert0(nb_codes <= FF_ARRAY_ELEMS(localbuf)); - localvlc = *vlc_arg; - vlc = &localvlc; - vlc->table_size = 0; - } else { - vlc->table = NULL; - vlc->table_allocated = 0; - vlc->table_size = 0; - } - if (nb_codes > FF_ARRAY_ELEMS(localbuf)) { - buf = av_malloc_array(nb_codes, sizeof(VLCcode)); - if (!buf) - return AVERROR(ENOMEM); - } else - buf = localbuf; - + ret = vlc_common_init(vlc_arg, nb_bits, nb_codes, &vlc, &localvlc, + &buf, flags); + if (ret < 0) + return ret; av_assert0(symbols_size <= 2 || !symbols); j = 0; @@ -342,26 +374,60 @@ int ff_init_vlc_sparse(VLC *vlc_arg, int nb_bits, int nb_codes, COPY(len && len <= nb_bits); nb_codes = j; - ret = build_table(vlc, nb_bits, nb_codes, buf, flags); - - if (flags & INIT_VLC_USE_NEW_STATIC) { - if(vlc->table_size != vlc->table_allocated) - av_log(NULL, AV_LOG_ERROR, "needed %d had %d\n", vlc->table_size, vlc->table_allocated); + return vlc_common_end(vlc, nb_bits, nb_codes, buf, + flags, vlc_arg, localbuf); +} - av_assert0(ret >= 0); - *vlc_arg = *vlc; - } else { - if (buf != localbuf) - av_free(buf); - if (ret < 0) { - av_freep(&vlc->table); - return ret; +int ff_init_vlc_from_lengths(VLC *vlc_arg, int nb_bits, int nb_codes, + const int8_t *lens, int lens_wrap, + const void *symbols, int symbols_wrap, int symbols_size, + int offset, int flags) +{ + VLCcode localbuf[LOCALBUF_ELEMS], *buf = localbuf; + VLC localvlc, *vlc; + uint64_t code; + int ret, j, len_max = FFMIN(32, 3 * nb_bits); + + ret = vlc_common_init(vlc_arg, nb_bits, nb_codes, &vlc, &localvlc, + &buf, flags); + if (ret < 0) + return ret; + + j = code = 0; + for (int i = 0; i < nb_codes; i++, lens += lens_wrap) { + int len = *lens; + if (len > 0) { + unsigned sym; + + buf[j].bits = len; + if (symbols) + GET_DATA(sym, symbols, i, symbols_wrap, symbols_size) + else + sym = i; + buf[j].symbol = sym + offset; + buf[j++].code = code; + } else if (len < 0) { + len = -len; + } else + continue; + if (len > len_max || code & ((1U << (32 - len)) - 1)) { + av_log(NULL, AV_LOG_ERROR, "Invalid VLC (length %u)\n", len); + goto fail; + } + code += 1U << (32 - len); + if (code > UINT32_MAX + 1ULL) { + av_log(NULL, AV_LOG_ERROR, "Overdetermined VLC tree\n"); + goto fail; } } - return 0; + return vlc_common_end(vlc, nb_bits, j, buf, + flags, vlc_arg, localbuf); +fail: + if (buf != localbuf) + av_free(buf); + return AVERROR_INVALIDDATA; } - void ff_free_vlc(VLC *vlc) { av_freep(&vlc->table); diff --git a/libavcodec/vlc.h b/libavcodec/vlc.h index 22d3e33485..b5a8c371bf 100644 --- a/libavcodec/vlc.h +++ b/libavcodec/vlc.h @@ -49,6 +49,41 @@ int ff_init_vlc_sparse(VLC *vlc, int nb_bits, int nb_codes, const void *codes, int codes_wrap, int codes_size, const void *symbols, int symbols_wrap, int symbols_size, int flags); + +/** + * Build VLC decoding tables suitable for use with get_vlc2() + * + * This function takes lengths and symbols and calculates the codes from them. + * For this the input lengths and symbols have to be sorted according to "left + * nodes in the corresponding tree first". + * + * @param[in,out] vlc The VLC to be initialized; table and table_allocated + * must have been set when initializing a static VLC, + * otherwise this will be treated as uninitialized. + * @param[in] nb_bits The number of bits to use for the VLC table; + * higher values take up more memory and cache, but + * allow to read codes with fewer reads. + * @param[in] nb_codes The number of provided length and (if supplied) symbol + * entries. + * @param[in] lens The lengths of the codes. Entries > 0 correspond to + * valid codes; entries == 0 will be skipped and entries + * with len < 0 indicate that the tree is incomplete and + * has an open end of length -len at this position. + * @param[in] lens_wrap Stride (in bytes) of the lengths. + * @param[in] symbols The symbols, i.e. what is returned from get_vlc2() + * when the corresponding code is encountered. + * May be NULL, then 0, 1, 2, 3, 4,... will be used. + * @param[in] symbols_wrap Stride (in bytes) of the symbols. + * @param[in] symbols_size Size of the symbols. 1 and 2 are supported. + * @param[in] offset An offset to apply to all the valid symbols. + * @param[in] flags A combination of the INIT_VLC_* flags; notice that + * INIT_VLC_INPUT_LE is pointless and ignored. + */ +int ff_init_vlc_from_lengths(VLC *vlc, int nb_bits, int nb_codes, + const int8_t *lens, int lens_wrap, + const void *symbols, int symbols_wrap, int symbols_size, + int offset, int flags); + void ff_free_vlc(VLC *vlc); /* If INIT_VLC_INPUT_LE is set, the LSB bit of the codes used to @@ -87,4 +122,16 @@ void ff_free_vlc(VLC *vlc); #define INIT_LE_VLC_STATIC(vlc, bits, a, b, c, d, e, f, g, static_size) \ INIT_LE_VLC_SPARSE_STATIC(vlc, bits, a, b, c, d, e, f, g, NULL, 0, 0, static_size) +#define INIT_VLC_STATIC_FROM_LENGTHS(vlc, bits, nb_codes, lens, len_wrap, \ + symbols, symbols_wrap, symbols_size, \ + offset, flags, static_size) \ + do { \ + static VLC_TYPE table[static_size][2]; \ + (vlc)->table = table; \ + (vlc)->table_allocated = static_size; \ + ff_init_vlc_from_lengths(vlc, bits, nb_codes, lens, len_wrap, \ + symbols, symbols_wrap, symbols_size, \ + offset, flags | INIT_VLC_USE_NEW_STATIC); \ + } while (0) + #endif /* AVCODEC_VLC_H */
When using ff_init_vlc_sparse() to create a VLC, three input tables are used: A table for lengths, one for codes and one for symbols; the latter one can be omitted, then a default one will be used. These input tables will be traversed twice, once to get the long codes (which will be put into subtables) and once for the small codes. The long codes are then sorted so that entries that should be in the same subtable are contiguous. This commit adds an alternative to ff_init_vlc_sparse(): ff_init_vlc_from_lengths(). It is based upon the observation that if lengths, codes and symbols tables are permuted (in the same way) so that the codes are ordered from left to right in the corresponding tree and if said tree is complete (i.e. every non-leaf node has two children), the codes can be easily computed from the lengths and are therefore redundant. This means that if one initializes such a VLC with explicitly coded lengths, codes and symbols, the codes can be avoided; and even if one has no explicitly coded symbols, it might still be beneficial to remove the codes even when one has to add a new symbol table, because codes are typically longer than symbols so that the latter often fit into a smaller type, saving space. Furthermore, given that the codes here are by definition ordered from left to right, it is unnecessary to sort them again; for the same reason, one does not have to traverse the input twice. This function proved to be faster than ff_init_vlc_sparse() whenever it has been benchmarked. This function is usable for static tables (they can simply be permuted once) as well as in scenarios where the tables are naturally ordered from left to right in the tree; the latter e.g. happens with Smacker, Theora and several other formats. In order to make it also usable for (static) tables with incomplete trees, negative lengths are used to indicate that there is an open end of a certain length. Finally, ff_init_vlc_from_lengths() has one downside compared to ff_init_vlc_sparse(): The latter uses tables that can be reused by encoders. Of course, one could calculate the needed table at runtime if one so wishes, but it is nevertheless an obstacle. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> --- libavcodec/bitstream.c | 138 ++++++++++++++++++++++++++++++----------- libavcodec/vlc.h | 47 ++++++++++++++ 2 files changed, 149 insertions(+), 36 deletions(-)