From patchwork Wed Dec  7 11:43:29 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Timo Rothenpieler <timo@rothenpieler.org>
X-Patchwork-Id: 39644
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a05:6a21:999a:b0:a4:2148:650a with SMTP id ve26csp498330pzb;
        Wed, 7 Dec 2022 03:43:50 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf75+HjJQnD+8pga6hdhbVc+Gws70nZm3oikwrX3SflB6uUBKuc8wMZ4Wl5J+m4mUf5YBjlF
X-Received: by 2002:a17:907:7611:b0:7c0:9bc2:a7c8 with SMTP id
 jx17-20020a170907761100b007c09bc2a7c8mr27630703ejc.384.1670413430354;
        Wed, 07 Dec 2022 03:43:50 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1670413430; cv=none;
        d=google.com; s=arc-20160816;
        b=R0/+XuBTsxY4Qq8ebGEL+9aMLQ9WRIw3jfvA+7MwUab1KFeZMCOfDtpZg3E0Kd5EVZ
         JiV245NiwurQQS3VrfW8vTGf3IzHNzhvW8TTLcdczJpg28TyyQt7LSL66VqDLeUNgpVQ
         xpIkMydWu2Ahlihj/Nyq8sBJFSfZvDO2gaqnZXLmliyOL2vsEwPEuCmVQ/s6YakL3W0e
         px7wDh6lCpziRl9YyO7Daz65wR4toepfWRpzKOAR/8ybzmfs0HVbeif0IW1fDe9Bx1Mc
         pczFfw8buUSF2fG6rab3GxuTszgR5ZQKYULOKOftkBZCFTYvIVJ4xRIfaqxXlKWlcYZI
         D9qg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:mime-version:message-id:date:to:from
         :dkim-signature:delivered-to;
        bh=JFDoZCZPXPoJGGZOznjPXIC8MR6WPyef+I/4rFGvlYM=;
        b=PIJKz1Kv4G6WMxHXumdwWLcYbc5Y8hpfInQ07wmBY/c5TrE0iVzvLNZDG73kN2bnfQ
         ooFqg4SEb/ozxd5fscKKzuDgBjvGGgqMuYgM+IY7L78/ncERUCETSxRl+farXgHk8+8A
         j5nMbwZhl7MS4mNorCy3a9NnaTUIyQPaSeGUz6t5kirKw5+67TTn2pa87eWc1HuPrwxA
         SrJmD3VUaIiZYRPCspleTjuE7xAeQBC2lRFos044UrsNuSTqzWjlA0OEDnjk4hWgdpkw
         iF5S2TiRdaZmnuXtrOA0x6gIX9mh2xEWnLczcbr75PrGcU6HYHqX40bxr9aBjNY6ih3U
         O/aA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@rothenpieler.org
 header.s=mail header.b=I6MG3AuQ;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=rothenpieler.org
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id
 v10-20020a056402184a00b0046ada79b960si3573548edy.615.2022.12.07.03.43.49;
        Wed, 07 Dec 2022 03:43:50 -0800 (PST)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@rothenpieler.org
 header.s=mail header.b=I6MG3AuQ;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=rothenpieler.org
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3A20B68BDFF;
	Wed,  7 Dec 2022 13:43:37 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from btbn.de (btbn.de [136.243.74.85])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 866BF6800AA
 for <ffmpeg-devel@ffmpeg.org>; Wed,  7 Dec 2022 13:43:29 +0200 (EET)
Received: from [authenticated] by btbn.de (Postfix) with ESMTPSA id
 72E1F3A19BB; Wed,  7 Dec 2022 12:43:26 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rothenpieler.org;
 s=mail; t=1670413406;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:cc:mime-version:mime-version:
 content-transfer-encoding:content-transfer-encoding;
 bh=459kucT5GOMswINNMSpOxQ++0LiyLTKUFuCxrClaWV8=;
 b=I6MG3AuQe55rHYZoc52J6oBLBqkYN0DLwwjcEWii7sHA+I80phedjuLF/i2NKnuC+3kir8
 IsKpWzIdHnrV4hDm38T+/wQWEv7r8V0Iub0OilU0cmjfwzxPfQZjWJhsRrnvKvI0a6Huhs
 TT0oKawD35z25n3c+sVF1elfo6FihEBqlgadksGQ/OhrOrBpyD5xcIhoUlDrF1GeuebPBW
 M4zBAvBeZj8qJqfi5+5qNombyBdWWnv+y5QXiA8jijE/yFW+BJPwN40NaVw3YMyQF/C+n+
 kv6G9gOfE+G1HLMBLHwF9o1/wwvWI6LHKoK1soS3fIlxDeVK0j2ReMq6bQu4/g==
From: Timo Rothenpieler <timo@rothenpieler.org>
To: ffmpeg-devel@ffmpeg.org
Date: Wed,  7 Dec 2022 12:43:29 +0100
Message-Id: <20221207114330.250-1-timo@rothenpieler.org>
X-Mailer: git-send-email 2.34.1
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH 1/2] lavc: convert frame threading to the
 receive_frame() pattern
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Anton Khirnov <anton@khirnov.net>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: 2IqS3Bws7FG/

From: Anton Khirnov <anton@khirnov.net>

Reorganize the code such that the frame threading code does not call the
decoders directly, but instead calls back into the generic decoding
code. This avoids duplicating the logic that wraps the decoder
invocation and will be useful in the following commits.
---
 libavcodec/decode.c        |  57 +++++---
 libavcodec/decode.h        |   7 +
 libavcodec/internal.h      |   7 +
 libavcodec/pthread_frame.c | 276 ++++++++++++++++++++++++-------------
 libavcodec/thread.h        |  18 +--
 5 files changed, 241 insertions(+), 124 deletions(-)

diff --git a/libavcodec/decode.c b/libavcodec/decode.c
index 6be2d3d6ed..bf3c0cbe0a 100644
--- a/libavcodec/decode.c
+++ b/libavcodec/decode.c
@@ -202,6 +202,10 @@ fail:
     return ret;
 }
 
+#if !HAVE_THREADS
+#define ff_thread_get_packet(avctx, pkt) (AVERROR_BUG)
+#endif
+
 int ff_decode_get_packet(AVCodecContext *avctx, AVPacket *pkt)
 {
     AVCodecInternal *avci = avctx->internal;
@@ -210,7 +214,14 @@ int ff_decode_get_packet(AVCodecContext *avctx, AVPacket *pkt)
     if (avci->draining)
         return AVERROR_EOF;
 
-    ret = av_bsf_receive_packet(avci->bsf, pkt);
+    /* If we are a worker thread, get the next packet from the threading
+     * context. Otherwise we are the main (user-facing) context, so we get the
+     * next packet from the input filterchain.
+     */
+    if (avctx->internal->is_frame_mt)
+        ret = ff_thread_get_packet(avctx, pkt);
+    else
+        ret = av_bsf_receive_packet(avci->bsf, pkt);
     if (ret == AVERROR_EOF)
         avci->draining = 1;
     if (ret < 0)
@@ -295,30 +306,25 @@ static inline int decode_simple_internal(AVCodecContext *avctx, AVFrame *frame,
         return AVERROR_EOF;
 
     if (!pkt->data &&
-        !(avctx->codec->capabilities & AV_CODEC_CAP_DELAY ||
-          avctx->active_thread_type & FF_THREAD_FRAME))
+        !(avctx->codec->capabilities & AV_CODEC_CAP_DELAY))
         return AVERROR_EOF;
 
     got_frame = 0;
 
-    if (HAVE_THREADS && avctx->active_thread_type & FF_THREAD_FRAME) {
-        ret = ff_thread_decode_frame(avctx, frame, &got_frame, pkt);
-    } else {
-        ret = codec->cb.decode(avctx, frame, &got_frame, pkt);
-
-        if (!(codec->caps_internal & FF_CODEC_CAP_SETS_PKT_DTS))
-            frame->pkt_dts = pkt->dts;
-        if (avctx->codec->type == AVMEDIA_TYPE_VIDEO) {
-            if(!avctx->has_b_frames)
-                frame->pkt_pos = pkt->pos;
-            //FIXME these should be under if(!avctx->has_b_frames)
-            /* get_buffer is supposed to set frame parameters */
-            if (!(avctx->codec->capabilities & AV_CODEC_CAP_DR1)) {
-                if (!frame->sample_aspect_ratio.num)  frame->sample_aspect_ratio = avctx->sample_aspect_ratio;
-                if (!frame->width)                    frame->width               = avctx->width;
-                if (!frame->height)                   frame->height              = avctx->height;
-                if (frame->format == AV_PIX_FMT_NONE) frame->format              = avctx->pix_fmt;
-            }
+    ret = codec->cb.decode(avctx, frame, &got_frame, pkt);
+
+    if (!(codec->caps_internal & FF_CODEC_CAP_SETS_PKT_DTS))
+        frame->pkt_dts = pkt->dts;
+    if (avctx->codec->type == AVMEDIA_TYPE_VIDEO) {
+        if(!avctx->has_b_frames)
+            frame->pkt_pos = pkt->pos;
+        //FIXME these should be under if(!avctx->has_b_frames)
+        /* get_buffer is supposed to set frame parameters */
+        if (!(avctx->codec->capabilities & AV_CODEC_CAP_DR1)) {
+            if (!frame->sample_aspect_ratio.num)  frame->sample_aspect_ratio = avctx->sample_aspect_ratio;
+            if (!frame->width)                    frame->width               = avctx->width;
+            if (!frame->height)                   frame->height              = avctx->height;
+            if (frame->format == AV_PIX_FMT_NONE) frame->format              = avctx->pix_fmt;
         }
     }
     emms_c();
@@ -568,7 +574,7 @@ static int decode_simple_receive_frame(AVCodecContext *avctx, AVFrame *frame)
     return 0;
 }
 
-static int decode_receive_frame_internal(AVCodecContext *avctx, AVFrame *frame)
+int ff_decode_receive_frame_internal(AVCodecContext *avctx, AVFrame *frame)
 {
     AVCodecInternal *avci = avctx->internal;
     const FFCodec *const codec = ffcodec(avctx->codec);
@@ -634,6 +640,13 @@ FF_ENABLE_DEPRECATION_WARNINGS
     return ret;
 }
 
+static int decode_receive_frame_internal(AVCodecContext *avctx, AVFrame *frame)
+{
+    if (avctx->active_thread_type & FF_THREAD_FRAME)
+        return ff_thread_receive_frame(avctx, frame);
+    return ff_decode_receive_frame_internal(avctx, frame);
+}
+
 int attribute_align_arg avcodec_send_packet(AVCodecContext *avctx, const AVPacket *avpkt)
 {
     AVCodecInternal *avci = avctx->internal;
diff --git a/libavcodec/decode.h b/libavcodec/decode.h
index 5d95369b5e..34beb70f97 100644
--- a/libavcodec/decode.h
+++ b/libavcodec/decode.h
@@ -58,6 +58,13 @@ typedef struct FrameDecodeData {
  */
 int ff_decode_receive_frame(AVCodecContext *avctx, AVFrame *frame);
 
+/**
+ * Do the actual decoding and obtain a decoded frame from the decoder, if
+ * available. When frame threading is used, this is invoked by the worker
+ * threads, otherwise by the top layer directly.
+ */
+int ff_decode_receive_frame_internal(AVCodecContext *avctx, AVFrame *frame);
+
 /**
  * Called by decoders to get the next packet for decoding.
  *
diff --git a/libavcodec/internal.h b/libavcodec/internal.h
index 76a6ea6bc6..99e4bb3095 100644
--- a/libavcodec/internal.h
+++ b/libavcodec/internal.h
@@ -56,6 +56,13 @@ typedef struct AVCodecInternal {
      */
     int is_copy;
 
+    /**
+     * This field is set to 1 when frame threading is being used and the parent
+     * AVCodecContext of this AVCodecInternal is a worker-thread context (i.e.
+     * one of those actually doing the decoding), 0 otherwise.
+     */
+    int is_frame_mt;
+
     /**
      * An audio frame with less than required samples has been submitted (and
      * potentially padded with silence). Reject all subsequent frames.
diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
index df82a4125f..08550fc728 100644
--- a/libavcodec/pthread_frame.c
+++ b/libavcodec/pthread_frame.c
@@ -46,6 +46,7 @@
 #include "libavutil/log.h"
 #include "libavutil/mem.h"
 #include "libavutil/opt.h"
+#include "libavutil/fifo.h"
 #include "libavutil/thread.h"
 
 enum {
@@ -73,6 +74,12 @@ enum {
     INITIALIZED,    ///< Thread has been properly set up
 };
 
+typedef struct DecodedFrames {
+    AVFrame  **f;
+    size_t  nb_f;
+    size_t  nb_f_allocated;
+} DecodedFrames;
+
 /**
  * Context used by codec threads and stored in their AVCodecInternal thread_ctx.
  */
@@ -93,8 +100,10 @@ typedef struct PerThreadContext {
 
     AVPacket       *avpkt;          ///< Input packet (for decoding) or output (for encoding).
 
-    AVFrame *frame;                 ///< Output frame (for decoding) or input (for encoding).
-    int     got_frame;              ///< The output of got_picture_ptr from the last avcodec_decode_video() call.
+    /**
+     * Decoded frames from a single decode iteration.
+     */
+    DecodedFrames df;
     int     result;                 ///< The result of the last codec decode/encode() call.
 
     atomic_int state;
@@ -141,6 +150,14 @@ typedef struct FrameThreadContext {
     pthread_cond_t async_cond;
     int async_lock;
 
+    DecodedFrames df;
+    int result;
+
+    /**
+     * Packet to be submitted to the next thread for decoding.
+     */
+    AVPacket *next_pkt;
+
     int next_decoding;             ///< The next context to submit a packet to.
     int next_finished;             ///< The next context to return output from.
 
@@ -190,6 +207,51 @@ static void thread_set_name(PerThreadContext *p)
     ff_thread_setname(name);
 }
 
+// get a free frame to decode into
+static AVFrame *decoded_frames_get_free(DecodedFrames *df)
+{
+    if (df->nb_f == df->nb_f_allocated) {
+        AVFrame **tmp = av_realloc_array(df->f, df->nb_f + 1,
+                                         sizeof(*df->f));
+        if (!tmp)
+            return NULL;
+        df->f = tmp;
+
+        df->f[df->nb_f] = av_frame_alloc();
+        if (!df->f[df->nb_f])
+            return NULL;
+
+        df->nb_f_allocated++;
+    }
+
+    av_frame_unref(df->f[df->nb_f]);
+    return df->f[df->nb_f];
+}
+
+static void decoded_frames_pop(DecodedFrames *df, AVFrame *dst)
+{
+    AVFrame *tmp_frame = df->f[0];
+    av_frame_move_ref(dst, tmp_frame);
+    memmove(df->f, df->f + 1, (df->nb_f - 1) * sizeof(*df->f));
+    df->f[--df->nb_f] = tmp_frame;
+}
+
+static void decoded_frames_flush(DecodedFrames *df)
+{
+    for (int i = 0; i < df->nb_f; i++)
+        av_frame_unref(df->f[i]);
+    df->nb_f = 0;
+}
+
+static void decoded_frames_free(DecodedFrames *df)
+{
+    for (int i = 0; i < df->nb_f_allocated; i++)
+        av_frame_free(&df->f[i]);
+    av_freep(&df->f);
+    df->nb_f           = 0;
+    df->nb_f_allocated = 0;
+}
+
 /**
  * Codec worker thread.
  *
@@ -202,6 +264,7 @@ static attribute_align_arg void *frame_worker_thread(void *arg)
     PerThreadContext *p = arg;
     AVCodecContext *avctx = p->avctx;
     const FFCodec *codec = ffcodec(avctx->codec);
+    int ret;
 
     thread_set_name(p);
 
@@ -236,16 +299,31 @@ FF_ENABLE_DEPRECATION_WARNINGS
             p->hwaccel_serializing = 1;
         }
 
-        av_frame_unref(p->frame);
-        p->got_frame = 0;
-        p->result = codec->cb.decode(avctx, p->frame, &p->got_frame, p->avpkt);
+        ret = 0;
+        while (ret >= 0) {
+            AVFrame *frame;
 
-        if ((p->result < 0 || !p->got_frame) && p->frame->buf[0])
-            ff_thread_release_buffer(avctx, p->frame);
+            /* get the frame which will store the output */
+            frame = decoded_frames_get_free(&p->df);
+            if (!frame) {
+                p->result = AVERROR(ENOMEM);
+                goto alloc_fail;
+            }
+
+            /* do the actual decoding */
+            ret = ff_decode_receive_frame_internal(avctx, frame);
+            if (ret == 0)
+                p->df.nb_f++;
+            else if (ret < 0 && frame->buf[0])
+                ff_thread_release_buffer(avctx, frame);
+
+            p->result = (ret == AVERROR(EAGAIN)) ? 0 : ret;
+        }
 
         if (atomic_load(&p->state) == STATE_SETTING_UP)
             ff_thread_finish_setup(avctx);
 
+alloc_fail:
         if (p->hwaccel_serializing) {
             /* wipe hwaccel state to avoid stale pointers lying around;
              * the state was transferred to FrameThreadContext in
@@ -433,23 +511,26 @@ static void release_delayed_buffers(PerThreadContext *p)
 #endif
 
 static int submit_packet(PerThreadContext *p, AVCodecContext *user_avctx,
-                         AVPacket *avpkt)
+                         AVPacket *in_pkt)
 {
     FrameThreadContext *fctx = p->parent;
     PerThreadContext *prev_thread = fctx->prev_thread;
-    const AVCodec *codec = p->avctx->codec;
-    int ret;
-
-    if (!avpkt->size && !(codec->capabilities & AV_CODEC_CAP_DELAY))
-        return 0;
+    int err;
 
     pthread_mutex_lock(&p->mutex);
 
-    ret = update_context_from_user(p->avctx, user_avctx);
-    if (ret) {
+    av_packet_unref(p->avpkt);
+    av_packet_move_ref(p->avpkt, in_pkt);
+
+    p->avctx->internal->draining      = user_avctx->internal->draining;
+    p->avctx->internal->draining_done = user_avctx->internal->draining_done;
+
+    err = update_context_from_user(p->avctx, user_avctx);
+    if (err < 0) {
         pthread_mutex_unlock(&p->mutex);
-        return ret;
+        return err;
     }
+
     atomic_store_explicit(&p->debug_threads,
                           (p->avctx->debug & FF_DEBUG_THREADS) != 0,
                           memory_order_relaxed);
@@ -459,7 +540,6 @@ static int submit_packet(PerThreadContext *p, AVCodecContext *user_avctx,
 #endif
 
     if (prev_thread) {
-        int err;
         if (atomic_load(&prev_thread->state) == STATE_SETTING_UP) {
             pthread_mutex_lock(&prev_thread->progress_mutex);
             while (atomic_load(&prev_thread->state) == STATE_SETTING_UP)
@@ -480,14 +560,6 @@ static int submit_packet(PerThreadContext *p, AVCodecContext *user_avctx,
     FFSWAP(void*,            p->avctx->hwaccel_context,             fctx->stash_hwaccel_context);
     FFSWAP(void*,            p->avctx->internal->hwaccel_priv_data, fctx->stash_hwaccel_priv);
 
-    av_packet_unref(p->avpkt);
-    ret = av_packet_ref(p->avpkt, avpkt);
-    if (ret < 0) {
-        pthread_mutex_unlock(&p->mutex);
-        av_log(p->avctx, AV_LOG_ERROR, "av_packet_ref() failed in submit_packet()\n");
-        return ret;
-    }
-
     atomic_store(&p->state, STATE_SETTING_UP);
     pthread_cond_signal(&p->input_cond);
     pthread_mutex_unlock(&p->mutex);
@@ -531,57 +603,42 @@ FF_ENABLE_DEPRECATION_WARNINGS
 #endif
 
     fctx->prev_thread = p;
-    fctx->next_decoding++;
+    fctx->next_decoding = (fctx->next_decoding + 1) % p->avctx->thread_count;
 
     return 0;
 }
 
-int ff_thread_decode_frame(AVCodecContext *avctx,
-                           AVFrame *picture, int *got_picture_ptr,
-                           AVPacket *avpkt)
+int ff_thread_receive_frame(AVCodecContext *avctx, AVFrame *frame)
 {
     FrameThreadContext *fctx = avctx->internal->thread_ctx;
-    int finished = fctx->next_finished;
-    PerThreadContext *p;
-    int err;
+    int ret = 0;
 
     /* release the async lock, permitting blocked hwaccel threads to
      * go forward while we are in this function */
     async_unlock(fctx);
 
-    /*
-     * Submit a packet to the next decoding thread.
-     */
-
-    p = &fctx->threads[fctx->next_decoding];
-    err = submit_packet(p, avctx, avpkt);
-    if (err)
-        goto finish;
-
-    /*
-     * If we're still receiving the initial packets, don't return a frame.
-     */
+    /* submit packets to threads while there are no buffered results to return */
+    while (!fctx->df.nb_f && !fctx->result) {
+        PerThreadContext *p;
 
-    if (fctx->next_decoding > (avctx->thread_count-1-(avctx->codec_id == AV_CODEC_ID_FFV1)))
-        fctx->delaying = 0;
+        /* get a packet to be submitted to the next thread */
+        av_packet_unref(fctx->next_pkt);
+        ret = ff_decode_get_packet(avctx, fctx->next_pkt);
+        if (ret < 0 && ret != AVERROR_EOF)
+            goto finish;
 
-    if (fctx->delaying) {
-        *got_picture_ptr=0;
-        if (avpkt->size) {
-            err = avpkt->size;
+        ret = submit_packet(&fctx->threads[fctx->next_decoding], avctx,
+                            fctx->next_pkt);
+        if (ret < 0)
             goto finish;
-        }
-    }
 
-    /*
-     * Return the next available frame from the oldest thread.
-     * If we're at the end of the stream, then we have to skip threads that
-     * didn't output a frame/error, because we don't want to accidentally signal
-     * EOF (avpkt->size == 0 && *got_picture_ptr == 0 && err >= 0).
-     */
+        /* do not return any frames until all threads have something to do */
+        if (fctx->next_decoding != fctx->next_finished &&
+            !avctx->internal->draining)
+            continue;
 
-    do {
-        p = &fctx->threads[finished++];
+        p                   = &fctx->threads[fctx->next_finished];
+        fctx->next_finished = (fctx->next_finished + 1) % avctx->thread_count;
 
         if (atomic_load(&p->state) != STATE_INPUT_READY) {
             pthread_mutex_lock(&p->progress_mutex);
@@ -590,35 +647,26 @@ int ff_thread_decode_frame(AVCodecContext *avctx,
             pthread_mutex_unlock(&p->progress_mutex);
         }
 
-        av_frame_move_ref(picture, p->frame);
-        *got_picture_ptr = p->got_frame;
-        picture->pkt_dts = p->avpkt->dts;
-        err = p->result;
-
-        /*
-         * A later call with avkpt->size == 0 may loop over all threads,
-         * including this one, searching for a frame/error to return before being
-         * stopped by the "finished != fctx->next_finished" condition.
-         * Make sure we don't mistakenly return the same frame/error again.
-         */
-        p->got_frame = 0;
-        p->result = 0;
-
-        if (finished >= avctx->thread_count) finished = 0;
-    } while (!avpkt->size && !*got_picture_ptr && err >= 0 && finished != fctx->next_finished);
+        fctx->result = p->result;
+        p->result    = 0;
 
-    update_context_from_thread(avctx, p->avctx, 1);
-
-    if (fctx->next_decoding >= avctx->thread_count) fctx->next_decoding = 0;
+        if (p->df.nb_f)
+            FFSWAP(DecodedFrames, fctx->df, p->df);
+    }
 
-    fctx->next_finished = finished;
+    /* a thread may return multiple frames AND an error
+     * we first return all the frames, then the error */
+    if (fctx->df.nb_f) {
+        decoded_frames_pop(&fctx->df, frame);
+        ret = 0;
+    } else {
+        ret = fctx->result;
+        fctx->result = 0;
+    }
 
-    /* return the size of the consumed packet if no error occurred */
-    if (err >= 0)
-        err = avpkt->size;
 finish:
     async_lock(fctx);
-    return err;
+    return ret;
 }
 
 void ff_thread_report_progress(ThreadFrame *f, int n, int field)
@@ -718,7 +766,6 @@ static void park_frame_worker_threads(FrameThreadContext *fctx, int thread_count
                 pthread_cond_wait(&p->output_cond, &p->progress_mutex);
             pthread_mutex_unlock(&p->progress_mutex);
         }
-        p->got_frame = 0;
     }
 
     async_lock(fctx);
@@ -772,6 +819,17 @@ void ff_frame_thread_free(AVCodecContext *avctx, int thread_count)
                 av_freep(&ctx->priv_data);
             }
 
+            if (ctx->internal->pkt_props) {
+                while (av_fifo_can_read(ctx->internal->pkt_props)) {
+                    av_packet_unref(ctx->internal->last_pkt_props);
+                    av_fifo_read(ctx->internal->pkt_props, ctx->internal->last_pkt_props, 1);
+                }
+                av_fifo_freep2(&ctx->internal->pkt_props);
+            }
+
+            av_packet_free(&ctx->internal->last_pkt_props);
+            av_packet_free(&ctx->internal->in_pkt);
+
             av_freep(&ctx->slice_offset);
 
             av_buffer_unref(&ctx->internal->pool);
@@ -779,7 +837,7 @@ void ff_frame_thread_free(AVCodecContext *avctx, int thread_count)
             av_buffer_unref(&ctx->hw_frames_ctx);
         }
 
-        av_frame_free(&p->frame);
+        decoded_frames_free(&p->df);
 
         ff_pthread_free(p, per_thread_offsets);
         av_packet_free(&p->avpkt);
@@ -787,6 +845,9 @@ void ff_frame_thread_free(AVCodecContext *avctx, int thread_count)
         av_freep(&p->avctx);
     }
 
+    decoded_frames_free(&fctx->df);
+    av_packet_free(&fctx->next_pkt);
+
     av_freep(&fctx->threads);
     ff_pthread_free(fctx, thread_ctx_offsets);
 
@@ -845,14 +906,26 @@ static av_cold int init_thread(PerThreadContext *p, int *threads_to_free,
     if (err < 0)
         return err;
 
-    if (!(p->frame = av_frame_alloc()) ||
-        !(p->avpkt = av_packet_alloc()))
+    if (!(p->avpkt = av_packet_alloc()))
         return AVERROR(ENOMEM);
-    copy->internal->last_pkt_props = p->avpkt;
 
+    copy->internal->is_frame_mt = 1;
     if (!first)
         copy->internal->is_copy = 1;
 
+    copy->internal->in_pkt = av_packet_alloc();
+    if (!copy->internal->in_pkt)
+        return AVERROR(ENOMEM);
+
+    copy->internal->last_pkt_props = av_packet_alloc();
+    if (!copy->internal->last_pkt_props)
+        return AVERROR(ENOMEM);
+
+    copy->internal->pkt_props = av_fifo_alloc2(1, sizeof(*copy->internal->last_pkt_props),
+                                               AV_FIFO_FLAG_AUTO_GROW);
+    if (!copy->internal->pkt_props)
+        return AVERROR(ENOMEM);
+
     if (codec->init) {
         err = codec->init(copy);
         if (err < 0) {
@@ -908,6 +981,10 @@ int ff_frame_thread_init(AVCodecContext *avctx)
         return err;
     }
 
+    fctx->next_pkt = av_packet_alloc();
+    if (!fctx->next_pkt)
+        return AVERROR(ENOMEM);
+
     fctx->async_lock = 1;
     fctx->delaying = 1;
 
@@ -952,12 +1029,13 @@ void ff_thread_flush(AVCodecContext *avctx)
     fctx->next_decoding = fctx->next_finished = 0;
     fctx->delaying = 1;
     fctx->prev_thread = NULL;
+
+    decoded_frames_flush(&fctx->df);
+
     for (i = 0; i < avctx->thread_count; i++) {
         PerThreadContext *p = &fctx->threads[i];
-        // Make sure decode flush calls with size=0 won't return old frames
-        p->got_frame = 0;
-        av_frame_unref(p->frame);
-        p->result = 0;
+
+        decoded_frames_flush(&p->df);
 
 #if FF_API_THREAD_SAFE_CALLBACKS
         release_delayed_buffers(p);
@@ -1181,3 +1259,15 @@ void ff_thread_release_ext_buffer(AVCodecContext *avctx, ThreadFrame *f)
     f->owner[0] = f->owner[1] = NULL;
     ff_thread_release_buffer(avctx, f->f);
 }
+
+int ff_thread_get_packet(AVCodecContext *avctx, AVPacket *pkt)
+{
+    PerThreadContext *p = avctx->internal->thread_ctx;
+
+    if (p->avpkt->buf) {
+        av_packet_move_ref(pkt, p->avpkt);
+        return 0;
+    }
+
+    return avctx->internal->draining ? AVERROR_EOF : AVERROR(EAGAIN);
+}
diff --git a/libavcodec/thread.h b/libavcodec/thread.h
index d5673f25ea..7ae69990fb 100644
--- a/libavcodec/thread.h
+++ b/libavcodec/thread.h
@@ -40,17 +40,12 @@
 void ff_thread_flush(AVCodecContext *avctx);
 
 /**
- * Submit a new frame to a decoding thread.
- * Returns the next available frame in picture. *got_picture_ptr
- * will be 0 if none is available.
- * The return value on success is the size of the consumed packet for
- * compatibility with FFCodec.decode. This means the decoder
- * has to consume the full packet.
+ * Submit available packets for decoding to worker threads, return a
+ * decoded frame if available. Returns AVERROR(EAGAIN) if none is available.
  *
- * Parameters are the same as FFCodec.decode.
+ * Parameters are the same as FFCodec.receive_frame.
  */
-int ff_thread_decode_frame(AVCodecContext *avctx, AVFrame *picture,
-                           int *got_picture_ptr, AVPacket *avpkt);
+int ff_thread_receive_frame(AVCodecContext *avctx, AVFrame *frame);
 
 /**
  * If the codec defines update_thread_context(), call this
@@ -99,6 +94,11 @@ int ff_thread_get_buffer(AVCodecContext *avctx, AVFrame *f, int flags);
  */
 void ff_thread_release_buffer(AVCodecContext *avctx, AVFrame *f);
 
+/**
+ * Get a packet for decoding. This gets invoked by the worker threads.
+ */
+int ff_thread_get_packet(AVCodecContext *avctx, AVPacket *pkt);
+
 int ff_thread_init(AVCodecContext *s);
 int ff_slice_thread_execute_with_mainfunc(AVCodecContext *avctx,
         int (*action_func2)(AVCodecContext *c, void *arg, int jobnr, int threadnr),

From patchwork Wed Dec  7 11:43:30 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Timo Rothenpieler <timo@rothenpieler.org>
X-Patchwork-Id: 39643
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a05:6a21:999a:b0:a4:2148:650a with SMTP id ve26csp498199pzb;
        Wed, 7 Dec 2022 03:43:40 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf6Y8IBDzE5Ih8Hj4b4m72leGhXoi2ZTXOnFIjw2i7Be8rVzbRoxmBJRSJ8DS3KN3VMRxqxi
X-Received: by 2002:a17:906:4d88:b0:7c0:bbac:1921 with SMTP id
 s8-20020a1709064d8800b007c0bbac1921mr19616176eju.530.1670413420414;
        Wed, 07 Dec 2022 03:43:40 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1670413420; cv=none;
        d=google.com; s=arc-20160816;
        b=EG8rd0JXgE4mNR5babmnI3XhTVHgrgEYEPGlnzv4uf38kLuUgSvM/KoKbSPaJTLJzF
         vdpA0fsYWLG5LeOBhJJY/IQjDVNr6DXwwiCZ3bbmdJa+6a4g8S5bXTlOyf37U182NcWn
         gEUijRFvQaQ4oEz8DFckFwgPSfh3JVGr0e8ks8zN5ZkvZXtMauVYJD8CqxbjtOw3EX/T
         wQLgjt0tYTFrd4/u6XaQj+fA6oUA6p6MLQV4AseMsL6GI4H9v8YOPOd2I7qWdd1dI1OW
         axeuiQDUabDzt8qNU4RMva5L2G1r02hHIG4dPDTawH6dcdW7pbA52uHNCIso/oxrxeXw
         35ZA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:mime-version:references:in-reply-to
         :message-id:date:to:from:dkim-signature:delivered-to;
        bh=TIjY7RVgvYlafZZLSIraryaBkHouDOuxk4LVDWVsh/M=;
        b=cR7IRRj/eAlG0V9fgTK/N8FgDfSZTG/pKm3FFR+LdWxh4Py5Nk4X4LTyBPmdXsuXFz
         PrgJQzekspVuzgsagd5PO2JRFnq+9Tpne5DKLILBZL0lWniwfxCTvQcoZNXQViw0iorz
         fUwfGynl22BlwyKqi97EQ/H2C0xG5kGSFDFskAHvQOfvuzXhm35an6/8qJHl5WzEIkYC
         1XOkix2dSGvhVtGP/sphnEQ+y3u4szHO4ngE7C21viaM6lY/zKk1AK2RFfKYcOQ6OElU
         hGZUve4n2OzhpmTMXnNV9NqrTxsv3q9G4XrZbFjXWJPCV0vqiMW3BfzOnZFy7aLrxClK
         Jakg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@rothenpieler.org
 header.s=mail header.b=Eg4sWWhT;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=rothenpieler.org
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id
 j15-20020aa7c0cf000000b0046840ac2af1si3561696edp.68.2022.12.07.03.43.39;
        Wed, 07 Dec 2022 03:43:40 -0800 (PST)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@rothenpieler.org
 header.s=mail header.b=Eg4sWWhT;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=rothenpieler.org
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3395168BDDD;
	Wed,  7 Dec 2022 13:43:36 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from btbn.de (btbn.de [136.243.74.85])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8BA3C68A708
 for <ffmpeg-devel@ffmpeg.org>; Wed,  7 Dec 2022 13:43:29 +0200 (EET)
Received: from [authenticated] by btbn.de (Postfix) with ESMTPSA id
 CD8E13A19BC; Wed,  7 Dec 2022 12:43:28 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rothenpieler.org;
 s=mail; t=1670413408;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:cc:mime-version:mime-version:
 content-transfer-encoding:content-transfer-encoding:
 in-reply-to:in-reply-to:references:references;
 bh=PUzSDML/hoUea0dma0a6TVuEbUybcCEeoYpgvU6j1I0=;
 b=Eg4sWWhTXtaYoyQi/Wrdl5x7viXiSENCt7oe+oYRQpTPgekD9ExxsddhWgkqRrzGKooxIQ
 2x1OeNZNv/ZKQ2Whn59C+QXUc9mscCgU8VGEQsxQEnZGndKx39bYplpOuWbNaXkxEcTkt4
 pcrjiV/rrHFXtuOARR7oMy2iMauGbvp7LnSsfT8y7ZnyKNAVeycHAY1DB7UR9l5LB5z4Kg
 n7Ff0/5VaP5oAwl3cze6acmZBF/1p4gwLclYsRCIyzKZhewXLVXBoa+31tyaOVsJ1A4nFz
 W8cZ6oPAcYe9gE8drD676g8EWEZkOkWyCGIg9LKm8gx9jDOY8IjXsldtX4Tkzw==
From: Timo Rothenpieler <timo@rothenpieler.org>
To: ffmpeg-devel@ffmpeg.org
Date: Wed,  7 Dec 2022 12:43:30 +0100
Message-Id: <20221207114330.250-2-timo@rothenpieler.org>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221207114330.250-1-timo@rothenpieler.org>
References: <20221207114330.250-1-timo@rothenpieler.org>
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH 2/2] avcodec/mjpegdec: add support for frame
 threading
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Timo Rothenpieler <timo@rothenpieler.org>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: DhkbmiewtEGQ

In my tests, this lead to a notable speed increase with the amount
of threads used. Decoding a 720p sample gave the following results:

1 Thread: 1428 FPS
2 Threads: 2501 FPS
8 Threads: 7575 FPS
Automatic: 11326 FPS (On a 16 Core/32 Threads system)
---
 libavcodec/jpeglsdec.c |  2 +-
 libavcodec/mjpegdec.c  | 11 ++++++-----
 libavcodec/sp5xdec.c   |  4 ++--
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/libavcodec/jpeglsdec.c b/libavcodec/jpeglsdec.c
index 2e6d018ea6..c0642e8e30 100644
--- a/libavcodec/jpeglsdec.c
+++ b/libavcodec/jpeglsdec.c
@@ -559,7 +559,7 @@ const FFCodec ff_jpegls_decoder = {
     .init           = ff_mjpeg_decode_init,
     .close          = ff_mjpeg_decode_end,
     FF_CODEC_RECEIVE_FRAME_CB(ff_mjpeg_receive_frame),
-    .p.capabilities = AV_CODEC_CAP_DR1,
+    .p.capabilities = AV_CODEC_CAP_DR1 | AV_CODEC_CAP_FRAME_THREADS,
     .caps_internal  = FF_CODEC_CAP_INIT_CLEANUP |
                       FF_CODEC_CAP_SETS_PKT_DTS,
 };
diff --git a/libavcodec/mjpegdec.c b/libavcodec/mjpegdec.c
index 9b7465abe7..54605e04cb 100644
--- a/libavcodec/mjpegdec.c
+++ b/libavcodec/mjpegdec.c
@@ -54,6 +54,7 @@
 #include "exif.h"
 #include "bytestream.h"
 #include "tiff_common.h"
+#include "thread.h"
 
 
 static int init_default_huffman_tables(MJpegDecodeContext *s)
@@ -713,7 +714,7 @@ int ff_mjpeg_decode_sof(MJpegDecodeContext *s)
                 s->avctx->pix_fmt,
                 AV_PIX_FMT_NONE,
             };
-            s->hwaccel_pix_fmt = ff_get_format(s->avctx, pix_fmts);
+            s->hwaccel_pix_fmt = ff_thread_get_format(s->avctx, pix_fmts);
             if (s->hwaccel_pix_fmt < 0)
                 return AVERROR(EINVAL);
 
@@ -729,7 +730,7 @@ int ff_mjpeg_decode_sof(MJpegDecodeContext *s)
         }
 
         av_frame_unref(s->picture_ptr);
-        if (ff_get_buffer(s->avctx, s->picture_ptr, AV_GET_BUFFER_FLAG_REF) < 0)
+        if (ff_thread_get_buffer(s->avctx, s->picture_ptr, AV_GET_BUFFER_FLAG_REF) < 0)
             return -1;
         s->picture_ptr->pict_type = AV_PICTURE_TYPE_I;
         s->picture_ptr->key_frame = 1;
@@ -3020,7 +3021,7 @@ const FFCodec ff_mjpeg_decoder = {
     .close          = ff_mjpeg_decode_end,
     FF_CODEC_RECEIVE_FRAME_CB(ff_mjpeg_receive_frame),
     .flush          = decode_flush,
-    .p.capabilities = AV_CODEC_CAP_DR1,
+    .p.capabilities = AV_CODEC_CAP_DR1 | AV_CODEC_CAP_FRAME_THREADS,
     .p.max_lowres   = 3,
     .p.priv_class   = &mjpegdec_class,
     .p.profiles     = NULL_IF_CONFIG_SMALL(ff_mjpeg_profiles),
@@ -3050,7 +3051,7 @@ const FFCodec ff_thp_decoder = {
     .close          = ff_mjpeg_decode_end,
     FF_CODEC_RECEIVE_FRAME_CB(ff_mjpeg_receive_frame),
     .flush          = decode_flush,
-    .p.capabilities = AV_CODEC_CAP_DR1,
+    .p.capabilities = AV_CODEC_CAP_DR1 | AV_CODEC_CAP_FRAME_THREADS,
     .p.max_lowres   = 3,
     .caps_internal  = FF_CODEC_CAP_INIT_CLEANUP |
                       FF_CODEC_CAP_SETS_PKT_DTS,
@@ -3068,7 +3069,7 @@ const FFCodec ff_smvjpeg_decoder = {
     .close          = ff_mjpeg_decode_end,
     FF_CODEC_RECEIVE_FRAME_CB(ff_mjpeg_receive_frame),
     .flush          = decode_flush,
-    .p.capabilities = AV_CODEC_CAP_DR1,
+    .p.capabilities = AV_CODEC_CAP_DR1 | AV_CODEC_CAP_FRAME_THREADS,
     .caps_internal  = FF_CODEC_CAP_EXPORTS_CROPPING |
                       FF_CODEC_CAP_SETS_PKT_DTS | FF_CODEC_CAP_INIT_CLEANUP,
 };
diff --git a/libavcodec/sp5xdec.c b/libavcodec/sp5xdec.c
index 394448c5a9..8b08dc672a 100644
--- a/libavcodec/sp5xdec.c
+++ b/libavcodec/sp5xdec.c
@@ -101,7 +101,7 @@ const FFCodec ff_sp5x_decoder = {
     .init           = ff_mjpeg_decode_init,
     .close          = ff_mjpeg_decode_end,
     FF_CODEC_RECEIVE_FRAME_CB(ff_mjpeg_receive_frame),
-    .p.capabilities = AV_CODEC_CAP_DR1,
+    .p.capabilities = AV_CODEC_CAP_DR1 | AV_CODEC_CAP_FRAME_THREADS,
     .p.max_lowres   = 3,
     .caps_internal  = FF_CODEC_CAP_INIT_CLEANUP |
                       FF_CODEC_CAP_SETS_PKT_DTS,
@@ -118,7 +118,7 @@ const FFCodec ff_amv_decoder = {
     .close          = ff_mjpeg_decode_end,
     FF_CODEC_RECEIVE_FRAME_CB(ff_mjpeg_receive_frame),
     .p.max_lowres   = 3,
-    .p.capabilities = AV_CODEC_CAP_DR1,
+    .p.capabilities = AV_CODEC_CAP_DR1 | AV_CODEC_CAP_FRAME_THREADS,
     .caps_internal  = FF_CODEC_CAP_INIT_CLEANUP |
                       FF_CODEC_CAP_SETS_PKT_DTS,
 };