diff mbox series

[FFmpeg-devel,27/39] lavc/ffv1: change FFV1SliceContext.plane into a RefStruct object

Message ID 20240716171155.31838-27-anton@khirnov.net
State New
Headers show
Series [FFmpeg-devel,01/39] tests/fate/vcodec: add vsynth tests for FFV1 version 2 | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Anton Khirnov July 16, 2024, 5:11 p.m. UTC
Frame threading in the FFV1 decoder works in a very unusual way - the
state that needs to be propagated from the previous frame is not decoded
pixels(¹), but each slice's entropy coder state after decoding the slice.

For that purpose, the decoder's update_thread_context() callback stores
a pointer to the previous frame thread's private data. Then, when
decoding each slice, the frame thread uses the standard progress
mechanism to wait for the corresponding slice in the previous frame to
be completed, then copies the entropy coder state from the
previously-stored pointer.

This approach is highly dubious, as update_thread_context() should be
the only point where frame-thread contexts come into direct contact.
There are no guarantees that the stored pointer will be valid at all, or
will contain any particular data after update_thread_context() finishes.

More specifically, this code can break due to the fact that keyframes
reset entropy coder state and thus do not need to wait for the previous
frame. As an example, consider a decoder process with 2 frame threads -
thread 0 with its context 0, and thread 1 with context 1 - decoding a
previous frame P, current frame F, followed by a keyframe K. Then
consider concurrent execution consistent with the following sequence of
events:
* thread 0 starts decoding P
* thread 0 reads P's slice header, then calls
  ff_thread_finish_setup() allowing next frame thread to start
* main thread calls update_thread_context() to transfer state from
  context 0 to context 1; context 1 stores a pointer to context 0's private
  data
* thread 1 starts decoding F
* thread 1 reads F's slice header, then calls
  ff_thread_finish_setup() allowing the next frame thread to start
  decoding
* thread 0 finishes decoding P
* thread 0 starts decoding K; since K is a keyframe, it does not
  wait for F and reallocates the arrays holding entropy coder state
* thread 0 finishes decoding K
* thread 1 reads entropy coder state from its stored pointer to context
  0, however it finds state from K rather than from P

This execution is currently prevented by special-casing FFV1 in the
generic frame threading code, however that is supremely ugly. It also
involves unnecessary copies of the state arrays, when in fact they can
only be used by one thread at a time.

This commit addresses these deficiencies by changing the array of
PlaneContext (each of which contains the allocated state arrays)
embedded in FFV1SliceContext into a RefStruct object. This object can
then be propagated across frame threads in standard manner. Since the
code structure guarantees only one thread accesses it at a time, no
copies are necessary. It is also re-created for keyframes, solving the
above issue cleanly.

Special-casing of FFV1 in the generic frame threading code will be
removed in a later commit.

(¹) except in the case of a damaged slice, when previous frame's pixels
    are used directly
---
 libavcodec/ffv1.c    | 30 ++++++++++++++++++++++++------
 libavcodec/ffv1.h    |  4 +++-
 libavcodec/ffv1dec.c | 33 +++++++++------------------------
 3 files changed, 36 insertions(+), 31 deletions(-)

Comments

Michael Niedermayer July 24, 2024, 7:53 p.m. UTC | #1
On Tue, Jul 16, 2024 at 07:11:42PM +0200, Anton Khirnov wrote:
> Frame threading in the FFV1 decoder works in a very unusual way - the
> state that needs to be propagated from the previous frame is not decoded
> pixels(¹), but each slice's entropy coder state after decoding the slice.
> 
> For that purpose, the decoder's update_thread_context() callback stores
> a pointer to the previous frame thread's private data. Then, when
> decoding each slice, the frame thread uses the standard progress
> mechanism to wait for the corresponding slice in the previous frame to
> be completed, then copies the entropy coder state from the
> previously-stored pointer.
> 
> This approach is highly dubious, as update_thread_context() should be
> the only point where frame-thread contexts come into direct contact.
> There are no guarantees that the stored pointer will be valid at all, or
> will contain any particular data after update_thread_context() finishes.
> 
> More specifically, this code can break due to the fact that keyframes
> reset entropy coder state and thus do not need to wait for the previous
> frame. As an example, consider a decoder process with 2 frame threads -
> thread 0 with its context 0, and thread 1 with context 1 - decoding a
> previous frame P, current frame F, followed by a keyframe K. Then
> consider concurrent execution consistent with the following sequence of
> events:
> * thread 0 starts decoding P
> * thread 0 reads P's slice header, then calls
>   ff_thread_finish_setup() allowing next frame thread to start
> * main thread calls update_thread_context() to transfer state from
>   context 0 to context 1; context 1 stores a pointer to context 0's private
>   data
> * thread 1 starts decoding F
> * thread 1 reads F's slice header, then calls
>   ff_thread_finish_setup() allowing the next frame thread to start
>   decoding
> * thread 0 finishes decoding P
> * thread 0 starts decoding K; since K is a keyframe, it does not
>   wait for F and reallocates the arrays holding entropy coder state
> * thread 0 finishes decoding K
> * thread 1 reads entropy coder state from its stored pointer to context
>   0, however it finds state from K rather than from P
> 
> This execution is currently prevented by special-casing FFV1 in the
> generic frame threading code, however that is supremely ugly. It also
> involves unnecessary copies of the state arrays, when in fact they can
> only be used by one thread at a time.
> 
> This commit addresses these deficiencies by changing the array of
> PlaneContext (each of which contains the allocated state arrays)
> embedded in FFV1SliceContext into a RefStruct object. This object can
> then be propagated across frame threads in standard manner. Since the
> code structure guarantees only one thread accesses it at a time, no
> copies are necessary. It is also re-created for keyframes, solving the
> above issue cleanly.
> 
> Special-casing of FFV1 in the generic frame threading code will be
> removed in a later commit.
> 
> (¹) except in the case of a damaged slice, when previous frame's pixels
>     are used directly
> ---
>  libavcodec/ffv1.c    | 30 ++++++++++++++++++++++++------
>  libavcodec/ffv1.h    |  4 +++-
>  libavcodec/ffv1dec.c | 33 +++++++++------------------------
>  3 files changed, 36 insertions(+), 31 deletions(-)

LGTM

thx

[...]
Anton Khirnov Aug. 1, 2024, 8:17 a.m. UTC | #2
pushed all patches up to this one, as they have been approved.
diff mbox series

Patch

diff --git a/libavcodec/ffv1.c b/libavcodec/ffv1.c
index 07cf5564cc..9c219b5ddb 100644
--- a/libavcodec/ffv1.c
+++ b/libavcodec/ffv1.c
@@ -31,6 +31,7 @@ 
 
 #include "avcodec.h"
 #include "ffv1.h"
+#include "refstruct.h"
 
 av_cold int ff_ffv1_common_init(AVCodecContext *avctx)
 {
@@ -52,6 +53,24 @@  av_cold int ff_ffv1_common_init(AVCodecContext *avctx)
     return 0;
 }
 
+static void planes_free(FFRefStructOpaque opaque, void *obj)
+{
+    PlaneContext *planes = obj;
+
+    for (int i = 0; i < MAX_PLANES; i++) {
+        PlaneContext *p = &planes[i];
+
+        av_freep(&p->state);
+        av_freep(&p->vlc_state);
+    }
+}
+
+PlaneContext* ff_ffv1_planes_alloc(void)
+{
+    return ff_refstruct_alloc_ext(sizeof(PlaneContext) * MAX_PLANES,
+                                  0, NULL, planes_free);
+}
+
 av_cold int ff_ffv1_init_slice_state(const FFV1Context *f,
                                      FFV1SliceContext *sc)
 {
@@ -132,6 +151,10 @@  av_cold int ff_ffv1_init_slice_contexts(FFV1Context *f)
                                               sizeof(*sc->sample_buffer32));
         if (!sc->sample_buffer || !sc->sample_buffer32)
             return AVERROR(ENOMEM);
+
+        sc->plane = ff_ffv1_planes_alloc();
+        if (!sc->plane)
+            return AVERROR(ENOMEM);
     }
 
     return 0;
@@ -188,12 +211,7 @@  av_cold int ff_ffv1_close(AVCodecContext *avctx)
         av_freep(&sc->sample_buffer);
         av_freep(&sc->sample_buffer32);
 
-        for (i = 0; i < s->plane_count; i++) {
-            PlaneContext *p = &sc->plane[i];
-
-            av_freep(&p->state);
-            av_freep(&p->vlc_state);
-        }
+        ff_refstruct_unref(&sc->plane);
     }
 
     av_freep(&avctx->stats_out);
diff --git a/libavcodec/ffv1.h b/libavcodec/ffv1.h
index 9d79219921..edc3f6aef0 100644
--- a/libavcodec/ffv1.h
+++ b/libavcodec/ffv1.h
@@ -81,7 +81,8 @@  typedef struct FFV1SliceContext {
     int slice_rct_by_coef;
     int slice_rct_ry_coef;
 
-    PlaneContext plane[MAX_PLANES];
+    // RefStruct reference, array of MAX_PLANES elements
+    PlaneContext *plane;
     PutBitContext pb;
     RangeCoder c;
 
@@ -153,6 +154,7 @@  int ff_ffv1_common_init(AVCodecContext *avctx);
 int ff_ffv1_init_slice_state(const FFV1Context *f, FFV1SliceContext *sc);
 int ff_ffv1_init_slices_state(FFV1Context *f);
 int ff_ffv1_init_slice_contexts(FFV1Context *f);
+PlaneContext *ff_ffv1_planes_alloc(void);
 int ff_ffv1_allocate_initial_states(FFV1Context *f);
 void ff_ffv1_clear_slice_state(const FFV1Context *f, FFV1SliceContext *sc);
 int ff_ffv1_close(AVCodecContext *avctx);
diff --git a/libavcodec/ffv1dec.c b/libavcodec/ffv1dec.c
index be4a1a2bf3..7dc4a537a9 100644
--- a/libavcodec/ffv1dec.c
+++ b/libavcodec/ffv1dec.c
@@ -38,6 +38,7 @@ 
 #include "mathops.h"
 #include "ffv1.h"
 #include "progressframe.h"
+#include "refstruct.h"
 #include "thread.h"
 
 static inline av_flatten int get_symbol_inline(RangeCoder *c, uint8_t *state,
@@ -265,30 +266,11 @@  static int decode_slice(AVCodecContext *c, void *arg)
     if (f->fsrc && !(p->flags & AV_FRAME_FLAG_KEY) && f->last_picture.f)
         ff_progress_frame_await(&f->last_picture, si);
 
-    if(f->fsrc && !(p->flags & AV_FRAME_FLAG_KEY)) {
+    if (f->fsrc) {
         const FFV1SliceContext *scsrc = &f->fsrc->slices[si];
 
         if (!(p->flags & AV_FRAME_FLAG_KEY))
             sc->slice_damaged |= scsrc->slice_damaged;
-
-        for (int i = 0; i < f->plane_count; i++) {
-            const PlaneContext *psrc = &scsrc->plane[i];
-            PlaneContext *pdst = &sc->plane[i];
-
-            av_free(pdst->state);
-            av_free(pdst->vlc_state);
-            memcpy(pdst, psrc, sizeof(*pdst));
-            pdst->state = NULL;
-            pdst->vlc_state = NULL;
-
-            if (f->ac) {
-                pdst->state = av_malloc_array(CONTEXT_SIZE,  psrc->context_count);
-                memcpy(pdst->state, psrc->state, CONTEXT_SIZE * psrc->context_count);
-            } else {
-                pdst->vlc_state = av_malloc_array(sizeof(*pdst->vlc_state), psrc->context_count);
-                memcpy(pdst->vlc_state, psrc->vlc_state, sizeof(*pdst->vlc_state) * psrc->context_count);
-            }
-        }
     }
 
     sc->slice_rct_by_coef = 1;
@@ -812,6 +794,11 @@  static int read_header(FFV1Context *f)
                         && (unsigned)sc->slice_y + (uint64_t)sc->slice_height <= f->height);
         }
 
+        ff_refstruct_unref(&sc->plane);
+        sc->plane = ff_ffv1_planes_alloc();
+        if (!sc->plane)
+            return AVERROR(ENOMEM);
+
         for (int i = 0; i < f->plane_count; i++) {
             PlaneContext *const p = &sc->plane[i];
 
@@ -828,10 +815,6 @@  static int read_header(FFV1Context *f)
 
             if (f->version <= 2) {
                 av_assert0(context_count >= 0);
-                if (p->context_count < context_count) {
-                    av_freep(&p->state);
-                    av_freep(&p->vlc_state);
-                }
                 p->context_count = context_count;
             }
         }
@@ -1060,6 +1043,8 @@  static int update_thread_context(AVCodecContext *dst, const AVCodecContext *src)
 
         sc->slice_damaged = sc0->slice_damaged;
 
+        ff_refstruct_replace(&sc->plane, sc0->plane);
+
         if (fsrc->version < 3) {
             sc->slice_x             = sc0->slice_x;
             sc->slice_y             = sc0->slice_y;