[FFmpeg-devel,RFC] v4l2_m2m: Fix races around freeing data on close

Submitted by Mark Thompson on Oct. 19, 2017, 12:10 a.m.

Details

Message ID d1968608-8017-0d66-c8b0-61f27717552d@jkqxz.net
State New
Headers show

Commit Message

Mark Thompson Oct. 19, 2017, 12:10 a.m.
Refcount all of the context information.
---
As discussed in the other thread, something like this.  We move most of the context into a refcounted buffer and AVCodecContext.priv_data is left as a stub holding a reference to it.

Thoughts:
* Change is rather ugly; some structures and function arguments could probably be rearranged to improve it.
* Not very well tested - I'm only testing it with the decoder on s5p-mfc.  (Encoder might well be totally broken.)
* It currently adds an extra atomic to each buffer to keep track of the context-references that isn't really wanted.  Cleaning up the per-plane references so we only go through the buffer-free sequence once would remove it.

I found several more issues while looking at this (not treated here):
* The refsync process with the semaphore is racy - it will fall over if the buffer unrefs are called on multiple threads at the same time.
* Buffers are requeued once for every plane they have as a consequnce of the per-plane references (NV12 buffer -> enqueue twice).  The later enqueues are ignored by the driver (QBUF returns EINVAL; look at strace), but that should probably still be treated as a bug.
* It seems to be able to leak all of the input packets (if refcounted?) - valgrind shows this, but I didn't investigate further.

Thanks,

- Mark


 libavcodec/v4l2_buffers.c | 22 +++++++++++++------
 libavcodec/v4l2_buffers.h |  6 ++++++
 libavcodec/v4l2_m2m.c     | 55 ++++++++++++++++++++++++++++++++++++-----------
 libavcodec/v4l2_m2m.h     | 34 +++++++++++++++++++++++------
 libavcodec/v4l2_m2m_dec.c | 24 ++++++++++++++-------
 libavcodec/v4l2_m2m_enc.c | 24 ++++++++++++++-------
 6 files changed, 122 insertions(+), 43 deletions(-)

Comments

Jorge Ramirez-Ortiz Oct. 19, 2017, 7:28 a.m.
On 10/19/2017 02:10 AM, Mark Thompson wrote:
> Refcount all of the context information.
> ---
> As discussed in the other thread, something like this.  We move most of the context into a refcounted buffer and AVCodecContext.priv_data is left as a stub holding a reference to it.

um, ok ... please could you send a patch so I can apply it? I see 
several problems in v4l2_free_buffer.

To sum up the RFC, instead of using private_data to place the codec's 
context, it uses private data to place a _pointer_ to an allocated codec 
context "just" so it wont be deallocated after the codec is closed 
(codec close frees the private_data)

Personally I think adding AV_CODEC_FLAG2_PRESERVE_PRIVDATA_ONCLOSE and 
use private_data to hold the codec context is a cleaner/simpler approach.

- the codec requests private_data with a size (sizeof(type))
- the code explicitly informs the API whether all memory will be 
released on close or it will preserve it.
Jorge Ramirez-Ortiz Oct. 19, 2017, 8:50 a.m.
On 10/19/2017 02:10 AM, Mark Thompson wrote:
> Refcount all of the context information.
> ---
> As discussed in the other thread, something like this.  We move most of the context into a refcounted buffer and AVCodecContext.priv_data is left as a stub holding a reference to it.
>
> Thoughts:
> * Change is rather ugly; some structures and function arguments could probably be rearranged to improve it.
> * Not very well tested - I'm only testing it with the decoder on s5p-mfc.  (Encoder might well be totally broken.)
> * It currently adds an extra atomic to each buffer to keep track of the context-references that isn't really wanted.  Cleaning up the per-plane references so we only go through the buffer-free sequence once would remove it.
>
> I found several more issues while looking at this (not treated here):
> * The refsync process with the semaphore is racy - it will fall over if the buffer unrefs are called on multiple threads at the same time.
> * Buffers are requeued once for every plane they have as a consequnce of the per-plane references (NV12 buffer -> enqueue twice).  The later enqueues are ignored by the driver (QBUF returns EINVAL; look at strace), but that should probably still be treated as a bug.
> * It seems to be able to leak all of the input packets (if refcounted?) - valgrind shows this, but I didn't investigate further.

about the last issue, I think this is related of how bsf is used in 3.4 
codecs and not to the v4l2 codec implementation itself
Running the v4l2 m2m decoding in 3.4 vs running 3.3 shows the following 
(I am using the flag to preserve the private_data in both cases)

ffmpeg: 3.4
============

[AVIOContext @ 0x590ee20] Statistics: 3997696 bytes read, 0 seeks
==2525==    at 0x12A00F8: VALGRIND_PRINTF_BACKTRACE (valgrind.h:6303)
==2525==    by 0x12A0DD3: av_log_default_callback (log.c:355)
==2525==    by 0x21CD03: log_callback_report (cmdutils.c:110)
==2525==    by 0x12A0FAF: av_vlog (log.c:383)
==2525==    by 0x12A0F37: av_log (log.c:375)
==2525==    by 0x224287: term_exit (ffmpeg.c:317)
==2525==    by 0x224E0B: ffmpeg_cleanup (ffmpeg.c:618)
==2525==    by 0x21CDFB: exit_program (cmdutils.c:138)
==2525==    by 0x233F03: main (ffmpeg.c:4824)
==2525==
==2525== HEAP SUMMARY:
==2525==     in use at exit: 4,015,310 bytes in 617 blocks
==2525==   total heap usage: 16,338 allocs, 15,721 frees, 21,262,380 
bytes allocated
==2525==
==2525== 18,156 bytes in 1 blocks are possibly lost in loss record 5 of 8
==2525==    at 0x4844C38: malloc (vg_replace_malloc.c:298)
==2525==    by 0x4847197: realloc (vg_replace_malloc.c:785)
==2525==    by 0x12A4773: av_realloc (mem.c:144)
==2525==    by 0x128FABB: av_buffer_realloc (buffer.c:177)
==2525==    by 0x67F49F: packet_alloc (avpacket.c:77)
==2525==    by 0x68118F: av_packet_ref (avpacket.c:636)
==2525==    by 0x6078EB: add_to_pktbuf (utils.c:435)
==2525==    by 0x60B483: parse_packet (utils.c:1470)
==2525==    by 0x60BBE3: read_frame_internal (utils.c:1613)
==2525==    by 0x60C22B: av_read_frame (utils.c:1724)
==2525==    by 0x2317AF: get_input_packet (ffmpeg.c:4097)
==2525==    by 0x231CAB: process_input (ffmpeg.c:4217)
==2525==
==2525== 3,996,554 (4,920 direct, 3,991,634 indirect) bytes in 205 
blocks are definitely lost in loss record 8 of 8
==2525==    at 0x4847248: memalign (vg_replace_malloc.c:857)
==2525==    by 0x484735B: posix_memalign (vg_replace_malloc.c:1020)
==2525==    by 0x12A46D3: av_malloc (mem.c:87)
==2525==    by 0x12A49B3: av_mallocz (mem.c:224)
==2525==    by 0x128F7DF: av_buffer_ref (buffer.c:95)
==2525==    by 0x6811FF: av_packet_ref (avpacket.c:644)
==2525==    by 0x6CCEF3: avcodec_send_packet (decode.c:666)
==2525==    by 0x22B043: decode (ffmpeg.c:2265)
==2525==    by 0x22B7BF: decode_video (ffmpeg.c:2409)
==2525==    by 0x22C573: process_input_packet (ffmpeg.c:2650)
==2525==    by 0x232F93: process_input (ffmpeg.c:4442)
==2525==    by 0x23348F: transcode_step (ffmpeg.c:4553)
==2525==
==2525== LEAK SUMMARY:
==2525==    definitely lost: 4,920 bytes in 205 blocks
==2525==    indirectly lost: 3,991,634 bytes in 409 blocks
==2525==      possibly lost: 18,156 bytes in 1 blocks
==2525==    still reachable: 600 bytes in 2 blocks
==2525==         suppressed: 0 bytes in 0 blocks
==2525== Reachable blocks (those to which a pointer was found) are not 
shown.
==2525== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==2525==
==2525== For counts of detected and suppressed errors, rerun with: -v
==2525== Use --track-origins=yes to see where uninitialised values come 
from
==2525== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)


ffmpeg 3.3
==========

[AVIOContext @ 0x590ee20] Statistics: 2621440 bytes read, 0 seeks
==11047==    at 0x1246920: VALGRIND_PRINTF_BACKTRACE (valgrind.h:6303)
==11047==    by 0x12475FB: av_log_default_callback (log.c:355)
==11047==    by 0x203BEB: log_callback_report (cmdutils.c:110)
==11047==    by 0x12477D7: av_vlog (log.c:383)
==11047==    by 0x124775F: av_log (log.c:375)
==11047==    by 0x21C1A7: term_exit (ffmpeg.c:317)
==11047==    by 0x21CD37: ffmpeg_cleanup (ffmpeg.c:619)
==11047==    by 0x203CE3: exit_program (cmdutils.c:138)
==11047==    by 0x22BE7B: main (ffmpeg.c:4798)
==11047==
==11047== HEAP SUMMARY:
==11047==     in use at exit: 52,768 bytes in 3 blocks
==11047==   total heap usage: 11,550 allocs, 11,547 frees, 18,048,065 
bytes allocated
==11047==
==11047== 52,168 bytes in 1 blocks are definitely lost in loss record 3 
of 3
==11047==    at 0x4847248: memalign (vg_replace_malloc.c:857)
==11047==    by 0x484735B: posix_memalign (vg_replace_malloc.c:1020)
==11047==    by 0x124AE67: av_malloc (mem.c:87)
==11047==    by 0x124B147: av_mallocz (mem.c:224)
==11047==    by 0xAED317: avcodec_open2 (utils.c:1307)
==11047==    by 0x5DAA4B: avformat_find_stream_info (utils.c:3480)
==11047==    by 0x20DFD7: open_input_file (ffmpeg_opt.c:1013)
==11047==    by 0x21720F: open_files (ffmpeg_opt.c:3203)
==11047==    by 0x21737F: ffmpeg_parse_options (ffmpeg_opt.c:3243)
==11047==    by 0x22BC27: main (ffmpeg.c:4760)
==11047==
==11047== LEAK SUMMARY:
==11047==    definitely lost: 52,168 bytes in 1 blocks
==11047==    indirectly lost: 0 bytes in 0 blocks
==11047==      possibly lost: 0 bytes in 0 blocks
==11047==    still reachable: 600 bytes in 2 blocks
==11047==         suppressed: 0 bytes in 0 blocks
==11047== Reachable blocks (those to which a pointer was found) are not 
shown.
==11047== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==11047==
==11047== For counts of detected and suppressed errors, rerun with: -v
==11047== Use --track-origins=yes to see where uninitialised values come 
from
==11047== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Hendrik Leppkes Oct. 19, 2017, 9:30 a.m.
On Thu, Oct 19, 2017 at 9:28 AM, Jorge Ramirez
<jorge.ramirez-ortiz@linaro.org> wrote:
> On 10/19/2017 02:10 AM, Mark Thompson wrote:
>>
>> Refcount all of the context information.
>> ---
>> As discussed in the other thread, something like this.  We move most of
>> the context into a refcounted buffer and AVCodecContext.priv_data is left as
>> a stub holding a reference to it.
>
>
> um, ok ... please could you send a patch so I can apply it? I see several
> problems in v4l2_free_buffer.
>
> To sum up the RFC, instead of using private_data to place the codec's
> context, it uses private data to place a _pointer_ to an allocated codec
> context "just" so it wont be deallocated after the codec is closed (codec
> close frees the private_data)
>
> Personally I think adding AV_CODEC_FLAG2_PRESERVE_PRIVDATA_ONCLOSE and use
> private_data to hold the codec context is a cleaner/simpler approach.
>

Using ref-counting is a mechanism we already have in-place for this
kind of thing, adding another global flag only used by one decoder is
not a preferred approach.

- Hendrik
Mark Thompson Oct. 19, 2017, 9:49 a.m.
On 19/10/17 08:28, Jorge Ramirez wrote:
> On 10/19/2017 02:10 AM, Mark Thompson wrote:
>> Refcount all of the context information.
>> ---
>> As discussed in the other thread, something like this.  We move most of the context into a refcounted buffer and AVCodecContext.priv_data is left as a stub holding a reference to it.
> 
> um, ok ... please could you send a patch so I can apply it? I see several problems in v4l2_free_buffer.

What goes wrong?  It applies fine for me on current head (f4090940bd3024e69d236257d327f11d1e496229).

> To sum up the RFC, instead of using private_data to place the codec's context, it uses private data to place a _pointer_ to an allocated codec context "just" so it wont be deallocated after the codec is closed (codec close frees the private_data)
> 
> Personally I think adding AV_CODEC_FLAG2_PRESERVE_PRIVDATA_ONCLOSE and use private_data to hold the codec context is a cleaner/simpler approach.
> 
> - the codec requests private_data with a size (sizeof(type))
> - the code explicitly informs the API whether all memory will be released on close or it will preserve it.

- All APIs in ffmpeg with this sort of private data field use them in the same way - they are allocated at create/alloc time (with the given size, for AVOptions), and freed at close/destroy time.
- Using the well-tested reference-counted buffer implementation is IMO strongly preferable to making ad-hoc synchronisation with atomics and semaphores.
- All other decoders use the reference-counted buffer approach (e.g. look at rkmpp for a direct implementation, the hwaccels all do it via hwcontext).

Patch hide | download patch | download mbox

diff --git a/libavcodec/v4l2_buffers.c b/libavcodec/v4l2_buffers.c
index ba70c5d..9831bdb 100644
--- a/libavcodec/v4l2_buffers.c
+++ b/libavcodec/v4l2_buffers.c
@@ -211,16 +211,13 @@  static void v4l2_free_buffer(void *opaque, uint8_t *unused)
     if (s->reinit) {
         if (!atomic_load(&s->refcount))
             sem_post(&s->refsync);
-        return;
-    }
-
-    if (avbuf->context->streamon) {
+    } else if (avbuf->context->streamon) {
         ff_v4l2_buffer_enqueue(avbuf);
-        return;
     }
 
-    if (!atomic_load(&s->refcount))
-        ff_v4l2_m2m_codec_end(s->avctx);
+    if (atomic_fetch_sub(&avbuf->context_refcount, 1) == 1) {
+        av_buffer_unref(&avbuf->context_ref);
+    }
 }
 
 static int v4l2_buf_to_bufref(V4L2Buffer *in, int plane, AVBufferRef **buf)
@@ -236,6 +233,17 @@  static int v4l2_buf_to_bufref(V4L2Buffer *in, int plane, AVBufferRef **buf)
     if (!*buf)
         return AVERROR(ENOMEM);
 
+    if (in->context_ref) {
+        atomic_fetch_add(&in->context_refcount, 1);
+    } else {
+        in->context_ref = av_buffer_ref(s->self_ref);
+        if (!in->context_ref) {
+            av_buffer_unref(buf);
+            return AVERROR(ENOMEM);
+        }
+        in->context_refcount = 1;
+    }
+
     in->status = V4L2BUF_RET_USER;
     atomic_fetch_add_explicit(&s->refcount, 1, memory_order_relaxed);
 
diff --git a/libavcodec/v4l2_buffers.h b/libavcodec/v4l2_buffers.h
index e28a4a6..ce9f727 100644
--- a/libavcodec/v4l2_buffers.h
+++ b/libavcodec/v4l2_buffers.h
@@ -24,6 +24,7 @@ 
 #ifndef AVCODEC_V4L2_BUFFERS_H
 #define AVCODEC_V4L2_BUFFERS_H
 
+#include <stdatomic.h>
 #include <linux/videodev2.h>
 
 #include "avcodec.h"
@@ -41,6 +42,11 @@  typedef struct V4L2Buffer {
     /* each buffer needs to have a reference to its context */
     struct V4L2Context *context;
 
+    /* This object is refcounted per-plane, so we need to keep track
+     * of how many context-refs we are holding. */
+    AVBufferRef *context_ref;
+    atomic_uint  context_refcount;
+
     /* keep track of the mmap address and mmap length */
     struct V4L2Plane_info {
         int bytesperline;
diff --git a/libavcodec/v4l2_m2m.c b/libavcodec/v4l2_m2m.c
index 1d7a852..8a2da70 100644
--- a/libavcodec/v4l2_m2m.c
+++ b/libavcodec/v4l2_m2m.c
@@ -311,9 +311,22 @@  error:
     return ret;
 }
 
+static void v4l2_m2m_destroy_context(void *opaque, uint8_t *context)
+{
+    V4L2m2mContext *s = (V4L2m2mContext*)context;
+
+    ff_v4l2_context_release(&s->capture);
+    sem_destroy(&s->refsync);
+
+    close(s->fd);
+
+    av_free(s);
+}
+
 int ff_v4l2_m2m_codec_end(AVCodecContext *avctx)
 {
-    V4L2m2mContext* s = avctx->priv_data;
+    V4L2m2mPriv *priv = avctx->priv_data;
+    V4L2m2mContext* s = priv->context;
     int ret;
 
     ret = ff_v4l2_context_set_status(&s->output, VIDIOC_STREAMOFF);
@@ -326,17 +339,8 @@  int ff_v4l2_m2m_codec_end(AVCodecContext *avctx)
 
     ff_v4l2_context_release(&s->output);
 
-    if (atomic_load(&s->refcount))
-        av_log(avctx, AV_LOG_ERROR, "ff_v4l2m2m_codec_end leaving pending buffers\n");
-
-    ff_v4l2_context_release(&s->capture);
-    sem_destroy(&s->refsync);
-
-    /* release the hardware */
-    if (close(s->fd) < 0 )
-        av_log(avctx, AV_LOG_ERROR, "failure closing %s (%s)\n", s->devname, av_err2str(AVERROR(errno)));
-
-    s->fd = -1;
+    s->self_ref = NULL;
+    av_buffer_unref(&priv->context_ref);
 
     return 0;
 }
@@ -348,7 +352,7 @@  int ff_v4l2_m2m_codec_init(AVCodecContext *avctx)
     char node[PATH_MAX];
     DIR *dirp;
 
-    V4L2m2mContext *s = avctx->priv_data;
+    V4L2m2mContext *s = ((V4L2m2mPriv*)avctx->priv_data)->context;
     s->avctx = avctx;
 
     dirp = opendir("/dev");
@@ -381,3 +385,28 @@  int ff_v4l2_m2m_codec_init(AVCodecContext *avctx)
 
     return v4l2_configure_contexts(s);
 }
+
+int ff_v4l2_m2m_create_context(AVCodecContext *avctx)
+{
+    V4L2m2mPriv *priv = avctx->priv_data;
+    V4L2m2mContext *s;
+
+    s = av_mallocz(sizeof(*s));
+    if (!s)
+        return AVERROR(ENOMEM);
+    priv->context_ref =
+        av_buffer_create((uint8_t*)s, sizeof(*s),
+                         &v4l2_m2m_destroy_context, NULL, 0);
+    if (!priv->context_ref) {
+        av_free(s);
+        return AVERROR(ENOMEM);
+    }
+
+    priv->context = s;
+    s->self_ref   = priv->context_ref;
+
+    s->output.num_buffers  = priv->num_output_buffers;
+    s->capture.num_buffers = priv->num_capture_buffers;
+
+    return 0;
+}
diff --git a/libavcodec/v4l2_m2m.h b/libavcodec/v4l2_m2m.h
index afa3987..358c022 100644
--- a/libavcodec/v4l2_m2m.h
+++ b/libavcodec/v4l2_m2m.h
@@ -38,11 +38,9 @@ 
 
 #define V4L_M2M_DEFAULT_OPTS \
     { "num_output_buffers", "Number of buffers in the output context",\
-        OFFSET(output.num_buffers), AV_OPT_TYPE_INT, { .i64 = 16 }, 6, INT_MAX, FLAGS }
+        OFFSET(num_output_buffers), AV_OPT_TYPE_INT, { .i64 = 16 }, 6, INT_MAX, FLAGS }
 
-typedef struct V4L2m2mContext
-{
-    AVClass *class;
+typedef struct V4L2m2mContext {
     char devname[PATH_MAX];
     int fd;
 
@@ -50,18 +48,40 @@  typedef struct V4L2m2mContext
     V4L2Context capture;
     V4L2Context output;
 
-    /* refcount of buffers held by the user */
-    atomic_uint refcount;
-
     /* dynamic stream reconfig */
     AVCodecContext *avctx;
     sem_t refsync;
+    atomic_uint refcount;
     int reinit;
 
     /* null frame/packet received */
     int draining;
+
+    /* Reference to self; only valid while codec is active. */
+    AVBufferRef *self_ref;
 } V4L2m2mContext;
 
+typedef struct V4L2m2mPriv
+{
+    AVClass *class;
+
+    V4L2m2mContext *context;
+    AVBufferRef    *context_ref;
+
+    int num_output_buffers;
+    int num_capture_buffers;
+} V4L2m2mPriv;
+
+/**
+ * Allocate a new context and references for a V4L2 M2M instance.
+ *
+ * @param[in] ctx The AVCodecContext instantiated by the encoder/decoder.
+ *
+ * @returns 0 in success, a negative error code otherwise.
+ */
+int ff_v4l2_m2m_create_context(AVCodecContext *avctx);
+
+
 /**
  * Probes the video nodes looking for the required codec capabilities.
  *
diff --git a/libavcodec/v4l2_m2m_dec.c b/libavcodec/v4l2_m2m_dec.c
index 958cdc5..831fd81 100644
--- a/libavcodec/v4l2_m2m_dec.c
+++ b/libavcodec/v4l2_m2m_dec.c
@@ -35,7 +35,7 @@ 
 
 static int v4l2_try_start(AVCodecContext *avctx)
 {
-    V4L2m2mContext *s = avctx->priv_data;
+    V4L2m2mContext *s = ((V4L2m2mPriv*)avctx->priv_data)->context;
     V4L2Context *const capture = &s->capture;
     V4L2Context *const output = &s->output;
     struct v4l2_selection selection;
@@ -127,7 +127,7 @@  static int v4l2_prepare_decoder(V4L2m2mContext *s)
 
 static int v4l2_receive_frame(AVCodecContext *avctx, AVFrame *frame)
 {
-    V4L2m2mContext *s = avctx->priv_data;
+    V4L2m2mContext *s = ((V4L2m2mPriv*)avctx->priv_data)->context;
     V4L2Context *const capture = &s->capture;
     V4L2Context *const output = &s->output;
     AVPacket avpkt = {0};
@@ -159,11 +159,19 @@  dequeue:
 
 static av_cold int v4l2_decode_init(AVCodecContext *avctx)
 {
-    V4L2m2mContext *s = avctx->priv_data;
-    V4L2Context *capture = &s->capture;
-    V4L2Context *output = &s->output;
+    V4L2m2mPriv *priv = avctx->priv_data;
+    V4L2m2mContext *s;
+    V4L2Context *capture;
+    V4L2Context *output;
     int ret;
 
+    ret = ff_v4l2_m2m_create_context(avctx);
+    if (ret < 0)
+        return ret;
+    s       = priv->context;
+    capture = &s->capture;
+    output  = &s->output;
+
     /* if these dimensions are invalid (ie, 0 or too small) an event will be raised
      * by the v4l2 driver; this event will trigger a full pipeline reconfig and
      * the proper values will be retrieved from the kernel driver.
@@ -186,13 +194,13 @@  static av_cold int v4l2_decode_init(AVCodecContext *avctx)
     return v4l2_prepare_decoder(s);
 }
 
-#define OFFSET(x) offsetof(V4L2m2mContext, x)
+#define OFFSET(x) offsetof(V4L2m2mPriv, x)
 #define FLAGS AV_OPT_FLAG_VIDEO_PARAM | AV_OPT_FLAG_DECODING_PARAM
 
 static const AVOption options[] = {
     V4L_M2M_DEFAULT_OPTS,
     { "num_capture_buffers", "Number of buffers in the capture context",
-        OFFSET(capture.num_buffers), AV_OPT_TYPE_INT, {.i64 = 20}, 20, INT_MAX, FLAGS },
+        OFFSET(num_capture_buffers), AV_OPT_TYPE_INT, {.i64 = 20}, 20, INT_MAX, FLAGS },
     { NULL},
 };
 
@@ -209,7 +217,7 @@  AVCodec ff_ ## NAME ## _v4l2m2m_decoder = { \
     .long_name      = NULL_IF_CONFIG_SMALL("V4L2 mem2mem " LONGNAME " decoder wrapper"),\
     .type           = AVMEDIA_TYPE_VIDEO,\
     .id             = CODEC ,\
-    .priv_data_size = sizeof(V4L2m2mContext),\
+    .priv_data_size = sizeof(V4L2m2mPriv),\
     .priv_class     = &v4l2_m2m_ ## NAME ## _dec_class,\
     .init           = v4l2_decode_init,\
     .receive_frame  = v4l2_receive_frame,\
diff --git a/libavcodec/v4l2_m2m_enc.c b/libavcodec/v4l2_m2m_enc.c
index f71ce5f..c1478fb 100644
--- a/libavcodec/v4l2_m2m_enc.c
+++ b/libavcodec/v4l2_m2m_enc.c
@@ -242,7 +242,7 @@  static int v4l2_prepare_encoder(V4L2m2mContext *s)
 
 static int v4l2_send_frame(AVCodecContext *avctx, const AVFrame *frame)
 {
-    V4L2m2mContext *s = avctx->priv_data;
+    V4L2m2mContext *s = ((V4L2m2mPriv*)avctx->priv_data)->context;
     V4L2Context *const output = &s->output;
 
     return ff_v4l2_context_enqueue_frame(output, frame);
@@ -250,7 +250,7 @@  static int v4l2_send_frame(AVCodecContext *avctx, const AVFrame *frame)
 
 static int v4l2_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
 {
-    V4L2m2mContext *s = avctx->priv_data;
+    V4L2m2mContext *s = ((V4L2m2mPriv*)avctx->priv_data)->context;
     V4L2Context *const capture = &s->capture;
     V4L2Context *const output = &s->output;
     int ret;
@@ -280,11 +280,19 @@  dequeue:
 
 static av_cold int v4l2_encode_init(AVCodecContext *avctx)
 {
-    V4L2m2mContext *s = avctx->priv_data;
-    V4L2Context *capture = &s->capture;
-    V4L2Context *output = &s->output;
+    V4L2m2mPriv *priv = avctx->priv_data;
+    V4L2m2mContext *s;
+    V4L2Context *capture;
+    V4L2Context *output;
     int ret;
 
+    ret = ff_v4l2_m2m_create_context(avctx);
+    if (ret < 0)
+        return ret;
+    s       = priv->context;
+    capture = &s->capture;
+    output  = &s->output;
+
     /* common settings output/capture */
     output->height = capture->height = avctx->height;
     output->width = capture->width = avctx->width;
@@ -306,13 +314,13 @@  static av_cold int v4l2_encode_init(AVCodecContext *avctx)
     return v4l2_prepare_encoder(s);
 }
 
-#define OFFSET(x) offsetof(V4L2m2mContext, x)
+#define OFFSET(x) offsetof(V4L2m2mPriv, x)
 #define FLAGS AV_OPT_FLAG_VIDEO_PARAM | AV_OPT_FLAG_ENCODING_PARAM
 
 static const AVOption options[] = {
     V4L_M2M_DEFAULT_OPTS,
     { "num_capture_buffers", "Number of buffers in the capture context",
-        OFFSET(capture.num_buffers), AV_OPT_TYPE_INT, {.i64 = 4 }, 4, INT_MAX, FLAGS },
+        OFFSET(num_capture_buffers), AV_OPT_TYPE_INT, {.i64 = 4 }, 4, INT_MAX, FLAGS },
     { NULL },
 };
 
@@ -329,7 +337,7 @@  AVCodec ff_ ## NAME ## _v4l2m2m_encoder = { \
     .long_name      = NULL_IF_CONFIG_SMALL("V4L2 mem2mem " LONGNAME " encoder wrapper"),\
     .type           = AVMEDIA_TYPE_VIDEO,\
     .id             = CODEC ,\
-    .priv_data_size = sizeof(V4L2m2mContext),\
+    .priv_data_size = sizeof(V4L2m2mPriv),\
     .priv_class     = &v4l2_m2m_ ## NAME ##_enc_class,\
     .init           = v4l2_encode_init,\
     .send_frame     = v4l2_send_frame,\