From patchwork Tue Oct 1 13:54:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 51977 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:cb8a:0:b0:48e:c0f8:d0de with SMTP id d10csp306176vqv; Tue, 1 Oct 2024 07:11:16 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCV90G9K9tIJgUaLlgbw0MpYRZTCw/8FIZugxo8/Jy8Jzokgkm5UNYswDGUNeBDM0FCF9vNMRpjidjHFuNvws0BH@gmail.com X-Google-Smtp-Source: AGHT+IGHRHrVWi8K557GumrZgH/9gQouytiEcQh1+DUhZHlp0WxeUIYyE4i7MRw3N0zch8ZUA9Da X-Received: by 2002:a2e:752:0:b0:2f6:d5e2:7889 with SMTP id 38308e7fff4ca-2f9d3e7923dmr69742101fa.19.1727791876025; Tue, 01 Oct 2024 07:11:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1727791876; cv=none; d=google.com; s=arc-20240605; b=Gz/4kOGp50t55e3Myr0/K5SvQgormjsminngBL69RUPqjKHJXSOlOAy2/9WdmzMVWl Y0yxQmSVZe+oppdpEowfAarQuvu74bqRyTNPfti1Omp3NEpgPsb5B582fqcnZyOwPa8A /scVDDQZYLkB8+VQ8BYPWklIQ/br4N4V3M/5FPpQp6gGx19w8z4++/ZZg2L6MXHWd7a6 cXifnp9QF7kR3yPzXj3GZDE2A1SuylDbiMY/R/v2K6CU0g/6KIF/bd941EujOSyhuB/t Et7Cmy239UPVa9wSEJj2YjREhCFv0j54R7EYduNFdDuqLkLy1LywfAqFZgZfGpkP//jb tWsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=xJjeu7Dskp1eeAiWew7GYwl4APvVEd0QaMpLONKhDcA=; fh=mZk9AfRmPBMGW9h158yccPeJgZmEjzU2tMQtLZcF184=; b=Tqb6lxyXCDWM2NDXZ0s2k57H0X+SnsN8PnhUV3Rpjj7i2sGdNze7P1YTVZAhDN/UeM sT4YAi3g/hlq7yBjVI3JxmYERXWxcYGJAEQbTNrcXhIcrsv7ESDn6FZbr+eh57HySwGY oNhsiIdII7118Qrx17EXVTXJuJ/rNB3IVC4WysjdBBo+BprEnY5wAs4BRcQCW6zVlezr Nay7Zk3HFGoCev5BXszUM6d59Yk2yG4QFQTKI9ECh+FiuUOGNHwFz+lGqeLyFb3uGk8Q JraLF7udGKjnL0fjCEQY9J+XBG6Mpz+dzXnE0rIr1YxqPlA5460fkzQatDkdOzBY7h91 dJLw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="KJq/By13"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2fac087d8bdsi23776101fa.415.2024.10.01.07.11.15; Tue, 01 Oct 2024 07:11:15 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="KJq/By13"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8651368DCBF; Tue, 1 Oct 2024 16:55:07 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pg1-f175.google.com (mail-pg1-f175.google.com [209.85.215.175]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1FACE68DCCE for ; Tue, 1 Oct 2024 16:55:01 +0300 (EEST) Received: by mail-pg1-f175.google.com with SMTP id 41be03b00d2f7-6e7b121be30so3697906a12.1 for ; Tue, 01 Oct 2024 06:55:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727790899; x=1728395699; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=18NAV5S/PjKhBSLlfP/cY3Q9Jb2Lb5SCwMgMHI4ayts=; b=KJq/By13LJS83RyvLuGRR/H6aMTr4FElZAYiWcLKFOvKa1FzXIlo2o+T34dmINRlDV ndX6lHUWGx8o82c2Ofiye0fFJkCJEoxJWDWY3YrTRTN36SwT2tNyz2eQJDhKaHLrFSoH 6NizIinUt61sfX+v3h2RJmXYTkfMQNfHCuJbGY9sHPCBqIyu5j8nswp/OVldVd3eGzmM KavTLr1aXxPwBt5Y9FJWjiGN/dnyGSHhxWtk2xxDcfMPnerA//2Ir6o/pCzP0iw8YjTh vJ+CbnP3uBrDY/PltfjISWkdTLkFFwtMBASqQ8Yln2dQke9/D+AVfEMprdCFB4MT06uG UV1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727790899; x=1728395699; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=18NAV5S/PjKhBSLlfP/cY3Q9Jb2Lb5SCwMgMHI4ayts=; b=LaX7PHtGL0qhnNAWbmdqXCuj6q7HXXIhwVtHMlrnLRtfYQpK8jOuH4d1+iJjAmU2e4 DkknMkKV3QCNO8foTgSZVbvV3q5s/72jOVjlxYbgaQoi9hVSJZOjm+yTqEIRVAL8eicp LNPQjjO4OWbOtS3CAzzG2XOYufntJ2n1tg+Wo8h+z+yJEB/1gO/4ozcHFd39aHzPMB9E VSE5kTXX5OYzlQqOd2g7+HhDOiW6bRi06Pa1Jz/drRfngOqvJzC3AmDoF6OtEesaxJyP 3NOnUiCks6OXi8cnc7NXNiCWqMA9DDl+4fR9xasxTvQ51XVb0EIDq8URtLJyiMjT4/OQ xsCA== X-Gm-Message-State: AOJu0YxwE6U3yyeSq4TeM0CLvmyBm9UIFWMERvuJ7Bg7H2TAXWTZ5No2 NPvhN+ZfwXlfkP/9hfPvTYXesY/neySYSYRIV3qlwAszM13GsrePl1ViaY6s X-Received: by 2002:a05:6a21:460c:b0:1c8:b849:c605 with SMTP id adf61e73a8af0-1d4fa8105edmr22970681637.44.1727790899125; Tue, 01 Oct 2024 06:54:59 -0700 (PDT) Received: from localhost ([112.64.8.17]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71b26539c5asm8000531b3a.210.2024.10.01.06.54.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2024 06:54:58 -0700 (PDT) From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Tue, 1 Oct 2024 21:54:38 +0800 Message-Id: <20241001135438.39385-3-nuomi2021@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241001135438.39385-1-nuomi2021@gmail.com> References: <20241001135438.39385-1-nuomi2021@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 3/3] avcodec/vvc: simplify priority logical to improve performance for 4K/8K X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: hT/m6AX75BLO For 4K/8K video processing, it's possible to have over 1,000 tasks pending on the executor. In such cases, O(n) and O(log(n)) insertion times are too costly. Reducing this to O(1) will significantly decrease the time spent in critical sections clip | before | after | delta ------------------------------------------------------------|--------|--------|------- VVC_HDR_UHDTV2_OpenGOP_7680x4320_50fps_HLG10.bit | 24 | 27 | 12.5% VVC_HDR_UHDTV2_OpenGOP_7680x4320_50fps_HLG10_HighBitrate.bit| 12 | 17 | 41.7% tears_of_steel_4k_8M_8bit_2000.vvc | 34 | 102 | 200.0% VVC_UHDTV1_OpenGOP_3840x2160_60fps_HLG10.bit | 126 | 128 | 1.6% RitualDance_1920x1080_60_10_420_37_RA.266 | 350 | 378 | 8.0% NovosobornayaSquare_1920x1080.bin | 341 | 369 | 8.2% Tango2_3840x2160_60_10_420_27_LD.266 | 69 | 70 | 1.4% RitualDance_1920x1080_60_10_420_32_LD.266 | 243 | 259 | 6.6% Chimera_8bit_1080P_1000_frames.vvc | 420 | 392 | -6.7% BQTerrace_1920x1080_60_10_420_22_RA.vvc | 148 | 144 | -2.7% --- libavcodec/executor.c | 52 ++++++++++++++++++++++++++--------------- libavcodec/executor.h | 5 ++-- libavcodec/vvc/thread.c | 48 +++++++++++++++---------------------- 3 files changed, 55 insertions(+), 50 deletions(-) diff --git a/libavcodec/executor.c b/libavcodec/executor.c index 8e6c134ba7..6cea851e6a 100644 --- a/libavcodec/executor.c +++ b/libavcodec/executor.c @@ -48,6 +48,11 @@ typedef struct ThreadInfo { ExecutorThread thread; } ThreadInfo; +typedef struct Queue { + AVTask *head; + AVTask *tail; +} Queue; + struct AVExecutor { AVTaskCallbacks cb; int thread_count; @@ -60,29 +65,39 @@ struct AVExecutor { AVCond cond; int die; - AVTask *tasks; + Queue *q; }; -static AVTask* remove_task(AVTask **prev, AVTask *t) +static AVTask* remove_task(Queue *q) { - *prev = t->next; - t->next = NULL; + AVTask *t = q->head; + if (t) { + q->head = t->next; + t->next = NULL; + if (!q->head) + q->tail = NULL; + } return t; } -static void add_task(AVTask **prev, AVTask *t) +static void add_task(Queue *q, AVTask *t) { - t->next = *prev; - *prev = t; + t->next = NULL; + if (!q->head) + q->tail = q->head = t; + else + q->tail = q->tail->next = t; } static int run_one_task(AVExecutor *e, void *lc) { AVTaskCallbacks *cb = &e->cb; - AVTask **prev = &e->tasks; + AVTask *t = NULL; + + for (int i = 0; i < e->cb.priorities && !t; i++) + t = remove_task(e->q + i); - if (*prev) { - AVTask *t = remove_task(prev, *prev); + if (t) { if (e->thread_count > 0) ff_mutex_unlock(&e->lock); cb->run(t, lc, cb->user_data); @@ -132,6 +147,7 @@ static void executor_free(AVExecutor *e, const int has_lock, const int has_cond) ff_mutex_destroy(&e->lock); av_free(e->threads); + av_free(e->q); av_free(e->local_contexts); av_free(e); @@ -141,7 +157,7 @@ AVExecutor* ff_executor_alloc(const AVTaskCallbacks *cb, int thread_count) { AVExecutor *e; int has_lock = 0, has_cond = 0; - if (!cb || !cb->user_data || !cb->run || !cb->priority_higher) + if (!cb || !cb->user_data || !cb->run || !cb->priorities) return NULL; e = av_mallocz(sizeof(*e)); @@ -153,6 +169,10 @@ AVExecutor* ff_executor_alloc(const AVTaskCallbacks *cb, int thread_count) if (!e->local_contexts) goto free_executor; + e->q = av_calloc(e->cb.priorities, sizeof(Queue)); + if (!e->q) + goto free_executor; + e->threads = av_calloc(FFMAX(thread_count, 1), sizeof(*e->threads)); if (!e->threads) goto free_executor; @@ -192,16 +212,10 @@ void ff_executor_free(AVExecutor **executor) void ff_executor_execute(AVExecutor *e, AVTask *t) { - AVTaskCallbacks *cb = &e->cb; - AVTask **prev; - if (e->thread_count) ff_mutex_lock(&e->lock); - if (t) { - for (prev = &e->tasks; *prev && cb->priority_higher(*prev, t); prev = &(*prev)->next) - /* nothing */; - add_task(prev, t); - } + if (t) + add_task(e->q + t->priority % e->cb.priorities, t); if (e->thread_count) { ff_cond_signal(&e->cond); ff_mutex_unlock(&e->lock); diff --git a/libavcodec/executor.h b/libavcodec/executor.h index c4688c86e6..9d53534079 100644 --- a/libavcodec/executor.h +++ b/libavcodec/executor.h @@ -31,6 +31,7 @@ typedef struct AVExecutor AVExecutor; typedef struct AVTask AVTask; struct AVTask { + int priority; // task priority should >= 0 and < AVTaskCallbacks.priorities AVTask *next; }; @@ -39,8 +40,8 @@ typedef struct AVTaskCallbacks { int local_context_size; - // return 1 if a's priority > b's priority - int (*priority_higher)(const AVTask *a, const AVTask *b); + // How many priorities do we have? + int priorities; // run the task int (*run)(AVTask *t, void *local_context, void *user_data); diff --git a/libavcodec/vvc/thread.c b/libavcodec/vvc/thread.c index da7fafed74..a477c06ccb 100644 --- a/libavcodec/vvc/thread.c +++ b/libavcodec/vvc/thread.c @@ -103,13 +103,28 @@ typedef struct VVCFrameThread { AVCond cond; } VVCFrameThread; +#define PRIORITY_LOWEST 2 static void add_task(VVCContext *s, VVCTask *t) { - VVCFrameThread *ft = t->fc->ft; + VVCFrameThread *ft = t->fc->ft; + AVTask *task = &t->u.task; + const int priorities[] = { + 0, // VVC_TASK_STAGE_INIT, + 0, // VVC_TASK_STAGE_PARSE, + // For an 8K clip, a CTU line completed in the reference frame may trigger 64 and more inter tasks. + // We assign these tasks the lowest priority to avoid being overwhelmed with inter tasks. + PRIORITY_LOWEST, // VVC_TASK_STAGE_INTER + 1, // VVC_TASK_STAGE_RECON, + 1, // VVC_TASK_STAGE_LMCS, + 1, // VVC_TASK_STAGE_DEBLOCK_V, + 1, // VVC_TASK_STAGE_DEBLOCK_H, + 1, // VVC_TASK_STAGE_SAO, + 1, // VVC_TASK_STAGE_ALF, + }; atomic_fetch_add(&ft->nb_scheduled_tasks, 1); - - ff_executor_execute(s->executor, &t->u.task); + task->priority = priorities[t->stage]; + ff_executor_execute(s->executor, task); } static void task_init(VVCTask *t, VVCTaskStage stage, VVCFrameContext *fc, const int rx, const int ry) @@ -372,31 +387,6 @@ static int task_is_stage_ready(VVCTask *t, int add) return task_has_target_score(t, stage, score); } -#define CHECK(a, b) \ - do { \ - if ((a) != (b)) \ - return (a) < (b); \ - } while (0) - -static int task_priority_higher(const AVTask *_a, const AVTask *_b) -{ - const VVCTask *a = (const VVCTask*)_a; - const VVCTask *b = (const VVCTask*)_b; - - - if (a->stage <= VVC_TASK_STAGE_PARSE || b->stage <= VVC_TASK_STAGE_PARSE) { - CHECK(a->stage, b->stage); - CHECK(a->fc->decode_order, b->fc->decode_order); //decode order - CHECK(a->ry, b->ry); - return a->rx < b->rx; - } - - CHECK(a->fc->decode_order, b->fc->decode_order); //decode order - CHECK(a->rx + a->ry + a->stage, b->rx + b->ry + b->stage); //zigzag with type - CHECK(a->rx + a->ry, b->rx + b->ry); //zigzag - return a->ry < b->ry; -} - static void check_colocation(VVCContext *s, VVCTask *t) { const VVCFrameContext *fc = t->fc; @@ -681,7 +671,7 @@ AVExecutor* ff_vvc_executor_alloc(VVCContext *s, const int thread_count) AVTaskCallbacks callbacks = { s, sizeof(VVCLocalContext), - task_priority_higher, + PRIORITY_LOWEST + 1, task_run, }; return ff_executor_alloc(&callbacks, thread_count);