Message ID | 20200102152225.182072-1-derek.buitenhuis@gmail.com |
---|---|
State | Superseded |
Headers | show |
Series | [FFmpeg-devel] lavc/ffv1: Properly check that the 4th and 5th quant tables are zeroes | expand |
Context | Check | Description |
---|---|---|
andriy/ffmpeg-patchwork | success | Make fate finished |
On 02/01/2020 16:22, Derek Buitenhuis wrote: > Currently, the decoder checks the 128th value of the 4th quant table during > while deriving the context on each sample, in order to speed itself up. This > is due to relying on the behavior of FFmpeg's FFV1 encoder, in which if that > value is zero, the entire 4th and 5th quant tables are assumed to be entirely > zero. > > This does not match the FFV1 spec, which has no such restriction, and after > some discussion, it was decided to fix FFmpeg to abide by the spec, rather > than change the spec. Thanks for having taken care of this issue. > [...] > @@ -735,6 +737,7 @@ static int read_header(FFV1Context *f) > f->chroma_h_shift, f->chroma_v_shift, f->avctx->pix_fmt); > if (f->version < 2) { > context_count = read_quant_tables(c, f->quant_table); > + f->three_quant_table = is_three_quant_tables(f->quant_table); > if (context_count < 0) { > av_log(f->avctx, AV_LOG_ERROR, "read_quant_table error\n"); > return AVERROR_INVALIDDATA; > @@ -797,9 +800,11 @@ static int read_header(FFV1Context *f) > p->quant_table_index = idx; > memcpy(p->quant_table, f->quant_tables[idx], > sizeof(p->quant_table)); > + p->three_quant_tables = f->three_quant_tables[idx]; > context_count = f->context_count[idx]; > } else { > memcpy(p->quant_table, f->quant_table, sizeof(p->quant_table)); > + p->three_quant_tables = f->three_quant_table; > } Couldn't three_quant_table be a local value instead of being stored in the context? > diff --git a/libavcodec/ffv1enc.c b/libavcodec/ffv1enc.c I may have missed something, but I don't get what the additional code in the encoding part does, value seems not used. And I am reluctant to change anything in the encoder in this patch as we consider that the issue is "only" a too aggressive optimization in the decoder. Jérôme
On 02/01/2020 16:17, Jerome Martinez wrote: >> @@ -797,9 +800,11 @@ static int read_header(FFV1Context *f) >> p->quant_table_index = idx; >> memcpy(p->quant_table, f->quant_tables[idx], >> sizeof(p->quant_table)); >> + p->three_quant_tables = f->three_quant_tables[idx]; >> context_count = f->context_count[idx]; >> } else { >> memcpy(p->quant_table, f->quant_table, sizeof(p->quant_table)); >> + p->three_quant_tables = f->three_quant_table; >> } > > Couldn't three_quant_table be a local value instead of being stored in > the context? I'm not sure that is possible - there is a subtle ifference between the ->quant_table and ->quant_tables (i.e. one is for version 2, which does not have extradata). I followed what is done for the the quant table itself. Michael can likely comment on it. >> diff --git a/libavcodec/ffv1enc.c b/libavcodec/ffv1enc.c > > I may have missed something, but I don't get what the additional code in > the encoding part does, value seems not used. > And I am reluctant to change anything in the encoder in this patch as we > consider that the issue is "only" a too aggressive optimization in the > decoder. The encoder does use it, because the encoder calls get_context, of which the template is shared between the encoder and decoder, so it has to be changed. The optimization is/was for both encoding and decoding. - Derek
On Thu, Jan 02, 2020 at 03:22:25PM +0000, Derek Buitenhuis wrote: > Currently, the decoder checks the 128th value of the 4th quant table during > while deriving the context on each sample, in order to speed itself up. This > is due to relying on the behavior of FFmpeg's FFV1 encoder, in which if that > value is zero, the entire 4th and 5th quant tables are assumed to be entirely > zero. I think if entry 128 is 0 then the whole table must be 0. If that is the case, checking the entry 128 of table 4 and 5 would be enough and caching the entry comparission is maybe not needed. thx [...]
On 02/01/2020 23:09, Michael Niedermayer wrote: > I think if entry 128 is 0 then the whole table must be 0. > If that is the case, checking the entry 128 of table 4 and 5 would be enough > and caching the entry comparission is maybe not needed. Is this guaranteed somehow? It isn't mentioned in the spec. - Derek
On 03/01/2020 22:59, Derek Buitenhuis wrote: > On 02/01/2020 23:09, Michael Niedermayer wrote: >> I think if entry 128 is 0 then the whole table must be 0. >> If that is the case, checking the entry 128 of table 4 and 5 would be enough >> and caching the entry comparission is maybe not needed. > Is this guaranteed somehow? It isn't mentioned in the spec. Checking better the spec about how the quant table is stored in bitstream, we may have missed in our past discussion that it is actually not stored as 256 independent values, "QuantizationTable(i, j, scale)" permits only increasing numbers from index 0 to 127 (so 128, as index 128 is the negative of index 127), so it seems that by design of how the quant table is stored it is impossible to have 0 at index 128 without having 0 at indexes 0-127 (one value of "len - 1", 127, seems to be the only way to have 0 at index 128). The need to check the 5th quant table is still valid. Jérôme
On 03/01/2020 21:59, Derek Buitenhuis wrote:
> Is this guaranteed somehow? It isn't mentioned in the spec.
Looks like it's inherent to how it's stored.
I'll send a v2 with this.
Thanks,
- Derek
diff --git a/libavcodec/ffv1.h b/libavcodec/ffv1.h index f0bb19350a..115ea24120 100644 --- a/libavcodec/ffv1.h +++ b/libavcodec/ffv1.h @@ -67,6 +67,7 @@ typedef struct VlcState { typedef struct PlaneContext { int16_t quant_table[MAX_CONTEXT_INPUTS][256]; + int three_quant_tables; int quant_table_index; int context_count; uint8_t (*state)[CONTEXT_SIZE]; @@ -102,7 +103,9 @@ typedef struct FFV1Context { int ac_byte_count; ///< number of bytes used for AC coding PlaneContext plane[MAX_PLANES]; int16_t quant_table[MAX_CONTEXT_INPUTS][256]; + int three_quant_table; int16_t quant_tables[MAX_QUANT_TABLES][MAX_CONTEXT_INPUTS][256]; + int three_quant_tables[MAX_QUANT_TABLES]; int context_count[MAX_QUANT_TABLES]; uint8_t state_transition[256]; uint8_t (*initial_states[MAX_QUANT_TABLES])[32]; @@ -148,6 +151,18 @@ int ff_ffv1_allocate_initial_states(FFV1Context *f); void ff_ffv1_clear_slice_state(FFV1Context *f, FFV1Context *fs); int ff_ffv1_close(AVCodecContext *avctx); +/* + * The last two quant tables are often zeroes, so we can do a single check + * here, to speed up every get_context call on such files. + */ +static av_always_inline int is_three_quant_tables(const int16_t quant_table[MAX_CONTEXT_INPUTS][256]) +{ + const int16_t zero_quant_table[256] = { 0 }; + + return !memcmp(zero_quant_table, quant_table[3], sizeof(zero_quant_table)) && + !memcmp(zero_quant_table, quant_table[4], sizeof(zero_quant_table)); +} + static av_always_inline int fold(int diff, int bits) { if (bits == 8) diff --git a/libavcodec/ffv1_template.c b/libavcodec/ffv1_template.c index f2ab93313a..6304b54ab1 100644 --- a/libavcodec/ffv1_template.c +++ b/libavcodec/ffv1_template.c @@ -37,7 +37,7 @@ static inline int RENAME(get_context)(PlaneContext *p, TYPE *src, const int RT = last[1]; const int L = src[-1]; - if (p->quant_table[3][127]) { + if (!p->three_quant_tables) { const int TT = last2[0]; const int LL = src[-2]; return p->quant_table[0][(L - LT) & 0xFF] + diff --git a/libavcodec/ffv1dec.c b/libavcodec/ffv1dec.c index e465ed49d7..5ac67daaa8 100644 --- a/libavcodec/ffv1dec.c +++ b/libavcodec/ffv1dec.c @@ -197,6 +197,7 @@ static int decode_slice_header(FFV1Context *f, FFV1Context *fs) } p->quant_table_index = idx; memcpy(p->quant_table, f->quant_tables[idx], sizeof(p->quant_table)); + p->three_quant_tables = f->three_quant_tables[idx]; context_count = f->context_count[idx]; if (p->context_count < context_count) { @@ -474,6 +475,7 @@ static int read_extra_header(FFV1Context *f) for (i = 0; i < f->quant_table_count; i++) { f->context_count[i] = read_quant_tables(c, f->quant_tables[i]); + f->three_quant_tables[i] = is_three_quant_tables(f->quant_tables[i]); if (f->context_count[i] < 0) { av_log(f->avctx, AV_LOG_ERROR, "read_quant_table error\n"); return AVERROR_INVALIDDATA; @@ -735,6 +737,7 @@ static int read_header(FFV1Context *f) f->chroma_h_shift, f->chroma_v_shift, f->avctx->pix_fmt); if (f->version < 2) { context_count = read_quant_tables(c, f->quant_table); + f->three_quant_table = is_three_quant_tables(f->quant_table); if (context_count < 0) { av_log(f->avctx, AV_LOG_ERROR, "read_quant_table error\n"); return AVERROR_INVALIDDATA; @@ -797,9 +800,11 @@ static int read_header(FFV1Context *f) p->quant_table_index = idx; memcpy(p->quant_table, f->quant_tables[idx], sizeof(p->quant_table)); + p->three_quant_tables = f->three_quant_tables[idx]; context_count = f->context_count[idx]; } else { memcpy(p->quant_table, f->quant_table, sizeof(p->quant_table)); + p->three_quant_tables = f->three_quant_table; } if (f->version <= 2) { diff --git a/libavcodec/ffv1enc.c b/libavcodec/ffv1enc.c index c521b7d445..e6e5171c8f 100644 --- a/libavcodec/ffv1enc.c +++ b/libavcodec/ffv1enc.c @@ -733,31 +733,37 @@ FF_ENABLE_DEPRECATION_WARNINGS s->quant_tables[0][0][i]= quant11[i]; s->quant_tables[0][1][i]= 11*quant11[i]; s->quant_tables[0][2][i]= 11*11*quant11[i]; + s->three_quant_tables[0]= 1; s->quant_tables[1][0][i]= quant11[i]; s->quant_tables[1][1][i]= 11*quant11[i]; s->quant_tables[1][2][i]= 11*11*quant5 [i]; s->quant_tables[1][3][i]= 5*11*11*quant5 [i]; s->quant_tables[1][4][i]= 5*5*11*11*quant5 [i]; + s->three_quant_tables[1]= 0; } else { s->quant_tables[0][0][i]= quant9_10bit[i]; s->quant_tables[0][1][i]= 11*quant9_10bit[i]; s->quant_tables[0][2][i]= 11*11*quant9_10bit[i]; + s->three_quant_tables[0]= 1; s->quant_tables[1][0][i]= quant9_10bit[i]; s->quant_tables[1][1][i]= 11*quant9_10bit[i]; s->quant_tables[1][2][i]= 11*11*quant5_10bit[i]; s->quant_tables[1][3][i]= 5*11*11*quant5_10bit[i]; s->quant_tables[1][4][i]= 5*5*11*11*quant5_10bit[i]; + s->three_quant_tables[1]= 0; } } s->context_count[0] = (11 * 11 * 11 + 1) / 2; s->context_count[1] = (11 * 11 * 5 * 5 * 5 + 1) / 2; memcpy(s->quant_table, s->quant_tables[s->context_model], sizeof(s->quant_table)); + s->three_quant_table = s->three_quant_tables[s->context_model]; for (i = 0; i < s->plane_count; i++) { PlaneContext *const p = &s->plane[i]; memcpy(p->quant_table, s->quant_table, sizeof(p->quant_table)); + p->three_quant_tables= s->three_quant_table; p->quant_table_index = s->context_model; p->context_count = s->context_count[p->quant_table_index]; }
Currently, the decoder checks the 128th value of the 4th quant table during while deriving the context on each sample, in order to speed itself up. This is due to relying on the behavior of FFmpeg's FFV1 encoder, in which if that value is zero, the entire 4th and 5th quant tables are assumed to be entirely zero. This does not match the FFV1 spec, which has no such restriction, and after some discussion, it was decided to fix FFmpeg to abide by the spec, rather than change the spec. We will now check whether the 4th and 5th quant tables are zero properly, using a full table compare, once, when they are read in. There should be no speed loss with this method. For further context, the FFV1 issue in question is located at: https://github.com/FFmpeg/FFV1/issues/169 Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> --- libavcodec/ffv1.h | 15 +++++++++++++++ libavcodec/ffv1_template.c | 2 +- libavcodec/ffv1dec.c | 5 +++++ libavcodec/ffv1enc.c | 6 ++++++ 4 files changed, 27 insertions(+), 1 deletion(-)