From patchwork Tue Oct 1 06:55:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 51954 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:cb8a:0:b0:48e:c0f8:d0de with SMTP id d10csp83394vqv; Tue, 1 Oct 2024 00:46:35 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWRqmA4trBoNHOsTeQ685IrP+MxQgzQkYzqUR2TfupnrpLtBk2q4Cs+imkw290j+rsoO+Z/0tz0HaAHLW+36oOe@gmail.com X-Google-Smtp-Source: AGHT+IGfocHIcCOGIZgHBYDAn0nX+BHW+CXzWMj11qYXFzqlX0FVzpq+wiBnBYY478m1XkOky5Yv X-Received: by 2002:a2e:be1c:0:b0:2fa:cd3d:4a76 with SMTP id 38308e7fff4ca-2facd3d4c8fmr50379621fa.43.1727768795012; Tue, 01 Oct 2024 00:46:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1727768794; cv=none; d=google.com; s=arc-20240605; b=N7bS3tol4UyklPU3Ww8+PVo39esYApfXZ9WBb4H1hd3UuCs8vbYVZXFZ7WFluhDJ3R CSi6JbhlUYwNhyVmJYGIs7nq3Z6kpCrLM6HaJard5O07NAjG9OVCaNPJJqsJ44LsOahh bN9gArq4IJD3mseFY1Bxttpyk/LXR8Wzy/qttIVWN8a5j0Y4pLLn/s9vrVdel6lW8Om+ BH0Q58Tx+yakeqg9V7jXzRMSviZkKIiAS9ISyIT0+ALbkb3f3UinQQMo5tsEWblsYiKP ChbegNzrifKqHDBo4NRQmgy/B6YpjCLazv07FMUMtIw7xPpC3QRdjUhA973FuX4Jz/B1 q1Ig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=xGiQziRh675FzZsjZqbV7l0bgOIyIDV7WqDrs1+kwew=; fh=mZk9AfRmPBMGW9h158yccPeJgZmEjzU2tMQtLZcF184=; b=HczI673rhQWvVAI1DXLyT8vamHCYf6pkOqz2VTVX9+pyT8BzyBDpXFnFUg+bfq9Nt8 9uZkuZMVPQA9vg/uYQ662wfxV5avfFZnkUKmkqXI3RTNyLkn2O4aGvDo7Xiq92KppLhz ywABNbjz5XyLwv69NaSkyDPfUqx+n7oSCnaCvP0qc3wz74TUV4oVeHUMLeIbYu0HGHl5 ff3HJEgTAI4SaJhRCo+bQrJChkkLlYgPnTThM85rSJL5ctBbI5fhUbLjVGuPvOm3KCoF Jf1JlwHzLGKbp6v1vyIpg137vDi4SbJYpXJlRNjoGMqlXv6MvXQ+Tk3Hvgm2hGIm/Kx6 B5qg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=YLxH4Pix; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2fabeb5129asi23659261fa.59.2024.10.01.00.46.34; Tue, 01 Oct 2024 00:46:34 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=YLxH4Pix; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 200DA68DB05; Tue, 1 Oct 2024 09:56:35 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 36DD568DAAF for ; Tue, 1 Oct 2024 09:56:28 +0300 (EEST) Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-71971d2099cso4180529b3a.2 for ; Mon, 30 Sep 2024 23:56:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727765786; x=1728370586; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UHwe/zxAR9aMG0muwQsKOdudIAU3yk8wmp5HFOGKIH4=; b=YLxH4PixDGweqcRY/mdcRL8y1cVm5kZdikUKWjufaUyfhc7bP3pV3cBaZdvuhjNhZU QwTT8Aw+GS8np5DbdDoEiEyRd2rt0HXkPkulz6x2L8TQ51Q92s9eEkmFoZB/jkkFVGu2 f3EQBxXZfxfBerkDY9+ZVgWTLN1e/+BUrN6/AQFHzGfCSF/hJOzTrj3h3Aa83JseHyvn AqxqkXfHysLqlnFDHE7Gdwl0JOZ+4PtEEedODmvRW5DN2O/ukT69TPulpgX16FN+JTGy Up7sq/pT6YS/tZBhRIHvvvn9AsI6N7LYvOo0JSIKMCFGWFNji8MNfxn01L4BQrlLHkEp eGKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727765786; x=1728370586; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UHwe/zxAR9aMG0muwQsKOdudIAU3yk8wmp5HFOGKIH4=; b=HoiKITM8vxTFXIou7YwO3+AH2fIhdTFgPXxNJ8WUXWKyyGLoKCJadHTALRrFUqrJOr Xb5QHSIeRvgL5IWZYaR2VWJvbC6R8bl480xQka9g2CJGmAECN74cAzwHDBDcMI64LdoH aAwCQzLLYm48xZd8NhIXKtl2oeaTjoKv2t4Q4Y2UjlnC6swa80bqYxLbvO+Tcd76rta8 YUQOVwASxkFmnxQa0uWwdGjuYmGw5XE0MxLMtAf/7v+XRZhzUJT9NHPzD9PglLjFnMY2 5C2/qg0P1Uex3Q95nyEozXq+RUb0WJsl7hbiemWuBlagfkIj5DnUGQysb5/eZdtPSL3w 6PFQ== X-Gm-Message-State: AOJu0Yw0WuU+lpvFtx+P0kPTcytPIyXBHHZBaJlayGpO4uyvCxtaTAoA HicfFzqxvYNWTjOaTHihuO58n3y2MnphxPXRIzuFVpjErmfajOmI4hfrxFuk X-Received: by 2002:a05:6a00:3c8e:b0:717:8d52:643 with SMTP id d2e1a72fcca58-71b25f39a5amr22072093b3a.11.1727765785934; Mon, 30 Sep 2024 23:56:25 -0700 (PDT) Received: from localhost ([112.64.8.17]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71b26516098sm7327723b3a.121.2024.09.30.23.56.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Sep 2024 23:56:25 -0700 (PDT) From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Tue, 1 Oct 2024 14:55:58 +0800 Message-Id: <20241001065558.56890-3-nuomi2021@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241001065558.56890-1-nuomi2021@gmail.com> References: <20241001065558.56890-1-nuomi2021@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 3/3] avcodec/vvc: simplify priority logical to improve performance for 4K/8K X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: l7dvAvot5Nla For 4K/8K video processing, it's possible to have over 1,000 tasks pending on the executor. In such cases, O(n) and O(log(n)) insertion times are too costly. Reducing this to O(1) will significantly decrease the time spent in critical sections clip | before | after | delta ------------------------------------------------------------|--------|--------|------- VVC_HDR_UHDTV2_OpenGOP_7680x4320_50fps_HLG10.bit | 24 | 27 | 12.5% VVC_HDR_UHDTV2_OpenGOP_7680x4320_50fps_HLG10_HighBitrate.bit| 12 | 17 | 41.7% tears_of_steel_4k_8M_8bit_2000.vvc | 34 | 102 | 200.0% VVC_UHDTV1_OpenGOP_3840x2160_60fps_HLG10.bit | 126 | 128 | 1.6% RitualDance_1920x1080_60_10_420_37_RA.266 | 350 | 378 | 8.0% NovosobornayaSquare_1920x1080.bin | 341 | 369 | 8.2% Tango2_3840x2160_60_10_420_27_LD.266 | 69 | 70 | 1.4% RitualDance_1920x1080_60_10_420_32_LD.266 | 243 | 259 | 6.6% Chimera_8bit_1080P_1000_frames.vvc | 420 | 392 | -6.7% BQTerrace_1920x1080_60_10_420_22_RA.vvc | 148 | 144 | -2.7% --- libavcodec/executor.c | 54 ++++++++++++++++++++++++++--------------- libavcodec/executor.h | 5 ++-- libavcodec/vvc/thread.c | 48 +++++++++++++++--------------------- 3 files changed, 57 insertions(+), 50 deletions(-) diff --git a/libavcodec/executor.c b/libavcodec/executor.c index 84d52e7e3b..362e961c4f 100644 --- a/libavcodec/executor.c +++ b/libavcodec/executor.c @@ -48,6 +48,11 @@ typedef struct ThreadInfo { ExecutorThread thread; } ThreadInfo; +typedef struct Queue { + AVTask *head; + AVTask *tail; +} Queue; + struct AVExecutor { AVTaskCallbacks cb; int thread_count; @@ -60,29 +65,41 @@ struct AVExecutor { AVCond cond; int die; - AVTask *tasks; + Queue *q; }; -static AVTask* remove_task(AVTask **prev, AVTask *t) +static AVTask* remove_task(Queue *q) { - *prev = t->next; - t->next = NULL; + AVTask *t = q->head; + if (t) { + q->head = t->next; + t->next = NULL; + if (!q->head) + q->tail = NULL; + } return t; } -static void add_task(AVTask **prev, AVTask *t) +static void add_task(Queue *q, AVTask *t) { - t->next = *prev; - *prev = t; + t->next = NULL; + if (!q->head) { + q->tail = q->head = t; + } else { + q->tail->next = t; + q->tail = t; + } } static int run_one_task(AVExecutor *e, void *lc) { AVTaskCallbacks *cb = &e->cb; - AVTask **prev = &e->tasks; + AVTask *t = NULL; - if (*prev) { - AVTask *t = remove_task(prev, *prev); + for (int i = 0; i < e->cb.priorities && !t; i++) + t = remove_task(e->q + i); + + if (t) { if (e->thread_count > 0) ff_mutex_unlock(&e->lock); cb->run(t, lc, cb->user_data); @@ -132,6 +149,7 @@ static void executor_free(AVExecutor *e, const int has_lock, const int has_cond) ff_mutex_destroy(&e->lock); av_free(e->threads); + av_free(e->q); av_free(e->local_contexts); av_free(e); @@ -141,7 +159,7 @@ AVExecutor* av_executor_alloc(const AVTaskCallbacks *cb, int thread_count) { AVExecutor *e; int has_lock = 0, has_cond = 0; - if (!cb || !cb->user_data || !cb->run || !cb->priority_higher) + if (!cb || !cb->user_data || !cb->run || !cb->priorities) return NULL; e = av_mallocz(sizeof(*e)); @@ -153,6 +171,10 @@ AVExecutor* av_executor_alloc(const AVTaskCallbacks *cb, int thread_count) if (!e->local_contexts) goto free_executor; + e->q = av_calloc(e->cb.priorities, sizeof(Queue)); + if (!e->q) + goto free_executor; + e->threads = av_calloc(FFMAX(thread_count, 1), sizeof(*e->threads)); if (!e->threads) goto free_executor; @@ -192,16 +214,10 @@ void av_executor_free(AVExecutor **executor) void av_executor_execute(AVExecutor *e, AVTask *t) { - AVTaskCallbacks *cb = &e->cb; - AVTask **prev; - if (e->thread_count) ff_mutex_lock(&e->lock); - if (t) { - for (prev = &e->tasks; *prev && cb->priority_higher(*prev, t); prev = &(*prev)->next) - /* nothing */; - add_task(prev, t); - } + if (t) + add_task(e->q + t->priority % e->cb.priorities, t); if (e->thread_count) { ff_cond_signal(&e->cond); ff_mutex_unlock(&e->lock); diff --git a/libavcodec/executor.h b/libavcodec/executor.h index 29fb55f66b..2398acd56c 100644 --- a/libavcodec/executor.h +++ b/libavcodec/executor.h @@ -31,6 +31,7 @@ typedef struct AVExecutor AVExecutor; typedef struct AVTask AVTask; struct AVTask { + int priority; // task priority should >= 0 and < AVTaskCallbacks.priorities AVTask *next; }; @@ -39,8 +40,8 @@ typedef struct AVTaskCallbacks { int local_context_size; - // return 1 if a's priority > b's priority - int (*priority_higher)(const AVTask *a, const AVTask *b); + // How many priorities do we have? + int priorities; // run the task int (*run)(AVTask *t, void *local_context, void *user_data); diff --git a/libavcodec/vvc/thread.c b/libavcodec/vvc/thread.c index 993b682e1b..1736092abe 100644 --- a/libavcodec/vvc/thread.c +++ b/libavcodec/vvc/thread.c @@ -103,13 +103,28 @@ typedef struct VVCFrameThread { AVCond cond; } VVCFrameThread; +#define PRIORITY_LOWEST 2 static void add_task(VVCContext *s, VVCTask *t) { - VVCFrameThread *ft = t->fc->ft; + VVCFrameThread *ft = t->fc->ft; + AVTask *task = &t->u.task; + const int priorities[] = { + 0, // VVC_TASK_STAGE_INIT, + 0, // VVC_TASK_STAGE_PARSE, + // For an 8K clip, a CTU line completed in the reference frame may trigger 64 and more inter tasks. + // We assign these tasks the lowest priority to avoid being overwhelmed with inter tasks. + PRIORITY_LOWEST, // VVC_TASK_STAGE_INTER + 1, // VVC_TASK_STAGE_RECON, + 1, // VVC_TASK_STAGE_LMCS, + 1, // VVC_TASK_STAGE_DEBLOCK_V, + 1, // VVC_TASK_STAGE_DEBLOCK_H, + 1, // VVC_TASK_STAGE_SAO, + 1, // VVC_TASK_STAGE_ALF, + }; atomic_fetch_add(&ft->nb_scheduled_tasks, 1); - - av_executor_execute(s->executor, &t->u.task); + task->priority = priorities[t->stage]; + av_executor_execute(s->executor, task); } static void task_init(VVCTask *t, VVCTaskStage stage, VVCFrameContext *fc, const int rx, const int ry) @@ -372,31 +387,6 @@ static int task_is_stage_ready(VVCTask *t, int add) return task_has_target_score(t, stage, score); } -#define CHECK(a, b) \ - do { \ - if ((a) != (b)) \ - return (a) < (b); \ - } while (0) - -static int task_priority_higher(const AVTask *_a, const AVTask *_b) -{ - const VVCTask *a = (const VVCTask*)_a; - const VVCTask *b = (const VVCTask*)_b; - - - if (a->stage <= VVC_TASK_STAGE_PARSE || b->stage <= VVC_TASK_STAGE_PARSE) { - CHECK(a->stage, b->stage); - CHECK(a->fc->decode_order, b->fc->decode_order); //decode order - CHECK(a->ry, b->ry); - return a->rx < b->rx; - } - - CHECK(a->fc->decode_order, b->fc->decode_order); //decode order - CHECK(a->rx + a->ry + a->stage, b->rx + b->ry + b->stage); //zigzag with type - CHECK(a->rx + a->ry, b->rx + b->ry); //zigzag - return a->ry < b->ry; -} - static void check_colocation(VVCContext *s, VVCTask *t) { const VVCFrameContext *fc = t->fc; @@ -681,7 +671,7 @@ AVExecutor* ff_vvc_executor_alloc(VVCContext *s, const int thread_count) AVTaskCallbacks callbacks = { s, sizeof(VVCLocalContext), - task_priority_higher, + PRIORITY_LOWEST + 1, task_run, }; return av_executor_alloc(&callbacks, thread_count);