diff mbox series

[FFmpeg-devel,09/18] avcodec/hevcdec: Add stat_coeffs to HEVCABACState

Message ID DB6PR0101MB22142A7A49F01D9EB61518608FBA9@DB6PR0101MB2214.eurprd01.prod.exchangelabs.com
State Accepted
Commit 72d5ce9fa6cd67497fbfffc97031cb6b7ab23c6d
Headers show
Series [FFmpeg-devel,v2,01/18] avcodec/pthread_slice: Don't reinitialise initialised mutex | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Andreas Rheinhardt June 30, 2022, 10:29 p.m. UTC
The HEVC decoder has both HEVCContext and HEVCLocalContext
structures. The latter is supposed to be the structure
containing the per-slicethread state.

Yet that is not how it is handled in practice: Each HEVCLocalContext
has a unique HEVCContext allocated for it and each of these
coincides with the main HEVCContext except in exactly one field:
The corresponding HEVCLocalContext.
This makes it possible to pass the HEVCContext everywhere where
logically a HEVCLocalContext should be used.

This led to confusion in the first version of what eventually became
commit c8bc0f66a875bc3708d8dc11b757f2198606ffd7:
Before said commit, the initialization of the Rice parameter derivation
state was incorrect; the fix for single-threaded as well as
frame-threaded decoding was to add backup stats to HEVCContext
that are used when the cabac state is updated*, see
https://ffmpeg.org/pipermail/ffmpeg-devel/2020-August/268861.html
Yet due to what has been said above, this does not work for
slice-threading, because the each HEVCLocalContext has its own
HEVCContext, so the Rice parameter state would not be transferred
between threads.

This is fixed in c8bc0f66a875bc3708d8dc11b757f2198606ffd7
by a hack: It rederives what the previous thread was and accesses
the corresponding HEVCContext.

Fix this by treating the Rice parameter state the same way
the ordinary CABAC parameters are shared between threads:
Make them part of the same struct that is shared between
slice threads. This does not cause races, because
the parts of the code that access these Rice parameters
are a subset of the parts of code that access the CABAC parameters.

*: And if the persistent_rice_adaptation_enabled_flag is set.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
---
 libavcodec/hevc_cabac.c | 17 ++++++++---------
 libavcodec/hevcdec.c    | 10 +++++-----
 libavcodec/hevcdec.h    | 10 +++++++---
 3 files changed, 20 insertions(+), 17 deletions(-)

Comments

Anton Khirnov July 1, 2022, 12:10 p.m. UTC | #1
Quoting Andreas Rheinhardt (2022-07-01 00:29:40)
> diff --git a/libavcodec/hevcdec.h b/libavcodec/hevcdec.h
> index e2dba54f26..b97b7d1354 100644
> --- a/libavcodec/hevcdec.h
> +++ b/libavcodec/hevcdec.h
> @@ -226,6 +226,11 @@ enum ScanType {
>      SCAN_VERT,
>  };
>  
> +typedef struct HEVCCABACState {
> +    uint8_t state[HEVC_CONTEXTS];
> +    uint8_t stat_coeff[HEVC_STAT_COEFFS];
> +} HEVCCABACState;

Might be good to mention why this is allocated separately from
HEVCContext.

Otherwise patch looks good.
Anton Khirnov July 2, 2022, 8:34 a.m. UTC | #2
Quoting Andreas Rheinhardt (2022-07-01 00:29:40)
> The HEVC decoder has both HEVCContext and HEVCLocalContext
> structures. The latter is supposed to be the structure
> containing the per-slicethread state.
> 
> Yet that is not how it is handled in practice: Each HEVCLocalContext
> has a unique HEVCContext allocated for it and each of these
> coincides with the main HEVCContext except in exactly one field:
> The corresponding HEVCLocalContext.
> This makes it possible to pass the HEVCContext everywhere where
> logically a HEVCLocalContext should be used.
> 
> This led to confusion in the first version of what eventually became
> commit c8bc0f66a875bc3708d8dc11b757f2198606ffd7:
> Before said commit, the initialization of the Rice parameter derivation
> state was incorrect; the fix for single-threaded as well as
> frame-threaded decoding was to add backup stats to HEVCContext
> that are used when the cabac state is updated*, see
> https://ffmpeg.org/pipermail/ffmpeg-devel/2020-August/268861.html
> Yet due to what has been said above, this does not work for
> slice-threading, because the each HEVCLocalContext has its own
> HEVCContext, so the Rice parameter state would not be transferred
> between threads.
> 
> This is fixed in c8bc0f66a875bc3708d8dc11b757f2198606ffd7
> by a hack: It rederives what the previous thread was and accesses
> the corresponding HEVCContext.
> 
> Fix this by treating the Rice parameter state the same way
> the ordinary CABAC parameters are shared between threads:
> Make them part of the same struct that is shared between
> slice threads. This does not cause races, because
> the parts of the code that access these Rice parameters
> are a subset of the parts of code that access the CABAC parameters.
> 
> *: And if the persistent_rice_adaptation_enabled_flag is set.
> 
> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
> ---
>  libavcodec/hevc_cabac.c | 17 ++++++++---------
>  libavcodec/hevcdec.c    | 10 +++++-----
>  libavcodec/hevcdec.h    | 10 +++++++---
>  3 files changed, 20 insertions(+), 17 deletions(-)
> 
> diff --git a/libavcodec/hevc_cabac.c b/libavcodec/hevc_cabac.c
> index a194f8a02a..985c97ef2a 100644
> --- a/libavcodec/hevc_cabac.c
> +++ b/libavcodec/hevc_cabac.c
> @@ -453,19 +453,18 @@ void ff_hevc_save_states(HEVCContext *s, int ctb_addr_ts)
>          (ctb_addr_ts % s->ps.sps->ctb_width == 2 ||
>           (s->ps.sps->ctb_width == 2 &&
>            ctb_addr_ts % s->ps.sps->ctb_width == 0))) {
> -        memcpy(s->cabac_state, s->HEVClc->cabac_state, HEVC_CONTEXTS);
> +        memcpy(s->cabac->state, s->HEVClc->cabac_state, HEVC_CONTEXTS);

So if I'm reading this right, this copies the per-slice-context state
into the decoder-global state. And it's done from slice threads with no
locks. So how is this not racy?
Andreas Rheinhardt July 2, 2022, 10:40 a.m. UTC | #3
Anton Khirnov:
> Quoting Andreas Rheinhardt (2022-07-01 00:29:40)
>> The HEVC decoder has both HEVCContext and HEVCLocalContext
>> structures. The latter is supposed to be the structure
>> containing the per-slicethread state.
>>
>> Yet that is not how it is handled in practice: Each HEVCLocalContext
>> has a unique HEVCContext allocated for it and each of these
>> coincides with the main HEVCContext except in exactly one field:
>> The corresponding HEVCLocalContext.
>> This makes it possible to pass the HEVCContext everywhere where
>> logically a HEVCLocalContext should be used.
>>
>> This led to confusion in the first version of what eventually became
>> commit c8bc0f66a875bc3708d8dc11b757f2198606ffd7:
>> Before said commit, the initialization of the Rice parameter derivation
>> state was incorrect; the fix for single-threaded as well as
>> frame-threaded decoding was to add backup stats to HEVCContext
>> that are used when the cabac state is updated*, see
>> https://ffmpeg.org/pipermail/ffmpeg-devel/2020-August/268861.html
>> Yet due to what has been said above, this does not work for
>> slice-threading, because the each HEVCLocalContext has its own
>> HEVCContext, so the Rice parameter state would not be transferred
>> between threads.
>>
>> This is fixed in c8bc0f66a875bc3708d8dc11b757f2198606ffd7
>> by a hack: It rederives what the previous thread was and accesses
>> the corresponding HEVCContext.
>>
>> Fix this by treating the Rice parameter state the same way
>> the ordinary CABAC parameters are shared between threads:
>> Make them part of the same struct that is shared between
>> slice threads. This does not cause races, because
>> the parts of the code that access these Rice parameters
>> are a subset of the parts of code that access the CABAC parameters.
>>
>> *: And if the persistent_rice_adaptation_enabled_flag is set.
>>
>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
>> ---
>>  libavcodec/hevc_cabac.c | 17 ++++++++---------
>>  libavcodec/hevcdec.c    | 10 +++++-----
>>  libavcodec/hevcdec.h    | 10 +++++++---
>>  3 files changed, 20 insertions(+), 17 deletions(-)
>>
>> diff --git a/libavcodec/hevc_cabac.c b/libavcodec/hevc_cabac.c
>> index a194f8a02a..985c97ef2a 100644
>> --- a/libavcodec/hevc_cabac.c
>> +++ b/libavcodec/hevc_cabac.c
>> @@ -453,19 +453,18 @@ void ff_hevc_save_states(HEVCContext *s, int ctb_addr_ts)
>>          (ctb_addr_ts % s->ps.sps->ctb_width == 2 ||
>>           (s->ps.sps->ctb_width == 2 &&
>>            ctb_addr_ts % s->ps.sps->ctb_width == 0))) {
>> -        memcpy(s->cabac_state, s->HEVClc->cabac_state, HEVC_CONTEXTS);
>> +        memcpy(s->cabac->state, s->HEVClc->cabac_state, HEVC_CONTEXTS);
> 
> So if I'm reading this right, this copies the per-slice-context state
> into the decoder-global state. And it's done from slice threads with no
> locks. So how is this not racy?
> 

a) I am not claiming that this is not racy; I am merely claiming that it
does not introduce a new race, because the parts of the code that access
these Rice parameters are a subset of the parts of code that access the
CABAC parameters and tsan has never shown a race for me when updating
cabac state or the rice parameters in general.

b) (i) I readily admit that HEVC is not my forte*. WPP is supposed to be
as follows: Given that HEVC needs the upper right and upper block/ctu
for prediction, each row can only start decoding after the first two
ctus of the row above it have been decoded. And then the cabac state of
the row below is initialized from the cabac state of the row above after
decoding its first two ctus. Therefore only one cabac state needs to be
cached at any given time.
(ii) You can see this in ff_hevc_save_states, where it is only saving
the state after the second row (this presumes that the initial
ctb_addr_ts in hls_decode_entry_wpp is a multiple of ctb_width which
seems to be the case when tiles are disabled; I don't know what happens
when both tiles and wpp are enabled. According to
https://github.com/ultravideo/kvazaar/issues/201#issuecomment-391329526
this is not even allowed in any currently legal profile, but I don't
think our decoder checks for that).
(iii) After having saved the state, it is signalling that it is done
with this ctu via ff_thread_report_progress2. The next thread waits for
this event via ff_thread_await_progress2 and initializes its cabac state
(if necessary). So there is your synchronization.
(iv) Looking at ff_hevc_cabac_init, one can see that the first branch is
always true when run from the first job; whereas ctb_addr_ts %
s->ps.sps->ctb_width == 0 is true when tiles are disabled and when one
is decoding the first ctu of a row (I don't know what happens in case
tiles are enabled; probably mayhem. It seems kvazaar can produce such
files, see above link.).

c) The current state of affairs is btw weird: Given that the secondary
HEVCContexts are overwritten by the main HEVCContext in
hls_slice_data_wpp, the rice state that every HEVCContext starts with is
given by the state of the first HEVCContext. And which row (of the last
picture) this corresponds to depends upon the number of slice threads in
use. This might cause problems if dependent_slice_segment_flag is enabled.

d) See also the comment to patch #2.

- Andreas

*: I only wanted to share the common SEI parts of H.264 and HEVC due to
softworkz's horrible way of sharing it.
Andreas Rheinhardt July 2, 2022, 11:31 a.m. UTC | #4
Andreas Rheinhardt:
> Anton Khirnov:
> c) The current state of affairs is btw weird: Given that the secondary
> HEVCContexts are overwritten by the main HEVCContext in
> hls_slice_data_wpp, the rice state that every HEVCContext starts with is
> given by the state of the first HEVCContext. And which row (of the last
> picture) this corresponds to depends upon the number of slice threads in
> use. This might cause problems if dependent_slice_segment_flag is enabled.
> 

kvazaar allows to create files with dependent_slice_segment_flag == 1;
but it puts one row into one slice and so the above issue would probably
not manifest itself. The caveat "probably" is necessary, because
decoding such files with slice threading leads to "entry_point_offset
table is corrupted" errors (which is probably a decoder bug).

- Andreas
Andreas Rheinhardt July 2, 2022, 11:43 a.m. UTC | #5
Andreas Rheinhardt:
> Andreas Rheinhardt:
>> Anton Khirnov:
>> c) The current state of affairs is btw weird: Given that the secondary
>> HEVCContexts are overwritten by the main HEVCContext in
>> hls_slice_data_wpp, the rice state that every HEVCContext starts with is
>> given by the state of the first HEVCContext. And which row (of the last
>> picture) this corresponds to depends upon the number of slice threads in
>> use. This might cause problems if dependent_slice_segment_flag is enabled.
>>
> 
> kvazaar allows to create files with dependent_slice_segment_flag == 1;
> but it puts one row into one slice and so the above issue would probably
> not manifest itself. The caveat "probably" is necessary, because
> decoding such files with slice threading leads to "entry_point_offset
> table is corrupted" errors (which is probably a decoder bug).
> 

The first slice segment header of a picture has entry points for all
rows, even for those that are actually in other slice segments. I don't
know whether this is how it is supposed to be.

- Andreas
diff mbox series

Patch

diff --git a/libavcodec/hevc_cabac.c b/libavcodec/hevc_cabac.c
index a194f8a02a..985c97ef2a 100644
--- a/libavcodec/hevc_cabac.c
+++ b/libavcodec/hevc_cabac.c
@@ -453,19 +453,18 @@  void ff_hevc_save_states(HEVCContext *s, int ctb_addr_ts)
         (ctb_addr_ts % s->ps.sps->ctb_width == 2 ||
          (s->ps.sps->ctb_width == 2 &&
           ctb_addr_ts % s->ps.sps->ctb_width == 0))) {
-        memcpy(s->cabac_state, s->HEVClc->cabac_state, HEVC_CONTEXTS);
+        memcpy(s->cabac->state, s->HEVClc->cabac_state, HEVC_CONTEXTS);
         if (s->ps.sps->persistent_rice_adaptation_enabled_flag) {
-            memcpy(s->stat_coeff, s->HEVClc->stat_coeff, HEVC_STAT_COEFFS);
+            memcpy(s->cabac->stat_coeff, s->HEVClc->stat_coeff, HEVC_STAT_COEFFS);
         }
     }
 }
 
-static void load_states(HEVCContext *s, int thread)
+static void load_states(HEVCContext *s)
 {
-    memcpy(s->HEVClc->cabac_state, s->cabac_state, HEVC_CONTEXTS);
+    memcpy(s->HEVClc->cabac_state, s->cabac->state, HEVC_CONTEXTS);
     if (s->ps.sps->persistent_rice_adaptation_enabled_flag) {
-        const HEVCContext *prev = s->sList[(thread + s->threads_number - 1) % s->threads_number];
-        memcpy(s->HEVClc->stat_coeff, prev->stat_coeff, HEVC_STAT_COEFFS);
+        memcpy(s->HEVClc->stat_coeff, s->cabac->stat_coeff, HEVC_STAT_COEFFS);
     }
 }
 
@@ -508,7 +507,7 @@  static void cabac_init_state(HEVCContext *s)
         s->HEVClc->stat_coeff[i] = 0;
 }
 
-int ff_hevc_cabac_init(HEVCContext *s, int ctb_addr_ts, int thread)
+int ff_hevc_cabac_init(HEVCContext *s, int ctb_addr_ts)
 {
     if (ctb_addr_ts == s->ps.pps->ctb_addr_rs_to_ts[s->sh.slice_ctb_addr_rs]) {
         int ret = cabac_init_decoder(s);
@@ -525,7 +524,7 @@  int ff_hevc_cabac_init(HEVCContext *s, int ctb_addr_ts, int thread)
                 if (s->ps.sps->ctb_width == 1)
                     cabac_init_state(s);
                 else if (s->sh.dependent_slice_segment_flag == 1)
-                    load_states(s, thread);
+                    load_states(s);
             }
         }
     } else {
@@ -556,7 +555,7 @@  int ff_hevc_cabac_init(HEVCContext *s, int ctb_addr_ts, int thread)
                 if (s->ps.sps->ctb_width == 1)
                     cabac_init_state(s);
                 else
-                    load_states(s, thread);
+                    load_states(s);
             }
         }
     }
diff --git a/libavcodec/hevcdec.c b/libavcodec/hevcdec.c
index 33c9d74e5f..7d4de414cd 100644
--- a/libavcodec/hevcdec.c
+++ b/libavcodec/hevcdec.c
@@ -2501,7 +2501,7 @@  static int hls_decode_entry(AVCodecContext *avctxt, void *isFilterThread)
         y_ctb = (ctb_addr_rs / ((s->ps.sps->width + ctb_size - 1) >> s->ps.sps->log2_ctb_size)) << s->ps.sps->log2_ctb_size;
         hls_decode_neighbour(s, x_ctb, y_ctb, ctb_addr_ts);
 
-        ret = ff_hevc_cabac_init(s, ctb_addr_ts, 0);
+        ret = ff_hevc_cabac_init(s, ctb_addr_ts);
         if (ret < 0) {
             s->tab_slice_address[ctb_addr_rs] = -1;
             return ret;
@@ -2579,7 +2579,7 @@  static int hls_decode_entry_wpp(AVCodecContext *avctxt, void *input_ctb_row, int
             return 0;
         }
 
-        ret = ff_hevc_cabac_init(s, ctb_addr_ts, thread);
+        ret = ff_hevc_cabac_init(s, ctb_addr_ts);
         if (ret < 0)
             goto error;
         hls_sao_param(s, x_ctb >> s->ps.sps->log2_ctb_size, y_ctb >> s->ps.sps->log2_ctb_size);
@@ -3601,7 +3601,7 @@  static av_cold int hevc_decode_free(AVCodecContext *avctx)
 
     av_freep(&s->md5_ctx);
 
-    av_freep(&s->cabac_state);
+    av_freep(&s->cabac);
 
     for (i = 0; i < 3; i++) {
         av_freep(&s->sao_pixel_buffer_h[i]);
@@ -3655,8 +3655,8 @@  static av_cold int hevc_init_context(AVCodecContext *avctx)
     s->HEVClcList[0] = s->HEVClc;
     s->sList[0] = s;
 
-    s->cabac_state = av_malloc(HEVC_CONTEXTS);
-    if (!s->cabac_state)
+    s->cabac = av_malloc(sizeof(*s->cabac));
+    if (!s->cabac)
         return AVERROR(ENOMEM);
 
     s->output_frame = av_frame_alloc();
diff --git a/libavcodec/hevcdec.h b/libavcodec/hevcdec.h
index e2dba54f26..b97b7d1354 100644
--- a/libavcodec/hevcdec.h
+++ b/libavcodec/hevcdec.h
@@ -226,6 +226,11 @@  enum ScanType {
     SCAN_VERT,
 };
 
+typedef struct HEVCCABACState {
+    uint8_t state[HEVC_CONTEXTS];
+    uint8_t stat_coeff[HEVC_STAT_COEFFS];
+} HEVCCABACState;
+
 typedef struct LongTermRPS {
     int     poc[32];
     uint8_t poc_msb_present[32];
@@ -482,8 +487,7 @@  typedef struct HEVCContext {
     int                 width;
     int                 height;
 
-    uint8_t *cabac_state;
-    uint8_t stat_coeff[HEVC_STAT_COEFFS];
+    HEVCCABACState *cabac;
 
     /** 1 if the independent slice segment header was successfully parsed */
     uint8_t slice_initialized;
@@ -601,7 +605,7 @@  int ff_hevc_frame_rps(HEVCContext *s);
 int ff_hevc_slice_rpl(HEVCContext *s);
 
 void ff_hevc_save_states(HEVCContext *s, int ctb_addr_ts);
-int ff_hevc_cabac_init(HEVCContext *s, int ctb_addr_ts, int thread);
+int ff_hevc_cabac_init(HEVCContext *s, int ctb_addr_ts);
 int ff_hevc_sao_merge_flag_decode(HEVCContext *s);
 int ff_hevc_sao_type_idx_decode(HEVCContext *s);
 int ff_hevc_sao_band_position_decode(HEVCContext *s);