diff mbox series

[FFmpeg-devel,v2,13/14] vvcdec: add CTU thread logical

Message ID TYSPR06MB64334C5B5D3906CAE6A9FBCEAA2DA@TYSPR06MB6433.apcprd06.prod.outlook.com
State Superseded
Headers show
Series add vvc decoder | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 fail Make fate failed
andriy/make_x86 success Make finished
andriy/make_fate_x86 fail Make fate failed

Commit Message

Nuo Mi July 7, 2023, 2:05 p.m. UTC
This is the main entry point for the CTU (Coding Tree Unit) decoder.
The code will divide the CTU decoder into several stages.
It will check the stage dependencies and run the stage decoder.
---
 libavcodec/vvc/Makefile     |   3 +-
 libavcodec/vvc/vvc_thread.c | 804 ++++++++++++++++++++++++++++++++++++
 libavcodec/vvc/vvc_thread.h |  73 ++++
 3 files changed, 879 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/vvc/vvc_thread.c
 create mode 100644 libavcodec/vvc/vvc_thread.h

Comments

Michael Niedermayer July 8, 2023, 9:41 p.m. UTC | #1
On Fri, Jul 07, 2023 at 10:05:39PM +0800, Nuo Mi wrote:
> This is the main entry point for the CTU (Coding Tree Unit) decoder.
> The code will divide the CTU decoder into several stages.
> It will check the stage dependencies and run the stage decoder.
> ---
>  libavcodec/vvc/Makefile     |   3 +-
>  libavcodec/vvc/vvc_thread.c | 804 ++++++++++++++++++++++++++++++++++++
>  libavcodec/vvc/vvc_thread.h |  73 ++++
>  3 files changed, 879 insertions(+), 1 deletion(-)
>  create mode 100644 libavcodec/vvc/vvc_thread.c
>  create mode 100644 libavcodec/vvc/vvc_thread.h

seems not to build with enable-shared

src/libavcodec/vvc/vvc_thread.c:235:9: error: address argument to atomic operation must be a pointer to non-const _Atomic type ('const atomic_int *' (aka 'const _Atomic(int) *') invalid)
    if (atomic_load(&ft->ret))
        ^           ~~~~~~~~
/usr/lib/llvm-6.0/lib/clang/6.0.0/include/stdatomic.h:134:29: note: expanded from macro 'atomic_load'
#define atomic_load(object) __c11_atomic_load(object, __ATOMIC_SEQ_CST)
                            ^                 ~~~~~~
1 error generated.
src/ffbuild/common.mak:81: recipe for target 'libavcodec/vvc/vvc_thread.o' failed
make: *** [libavcodec/vvc/vvc_thread.o] Error 1
make: *** Waiting for unfinished jobs....


[...]
Andreas Rheinhardt July 9, 2023, 1:04 a.m. UTC | #2
Michael Niedermayer:
> On Fri, Jul 07, 2023 at 10:05:39PM +0800, Nuo Mi wrote:
>> This is the main entry point for the CTU (Coding Tree Unit) decoder.
>> The code will divide the CTU decoder into several stages.
>> It will check the stage dependencies and run the stage decoder.
>> ---
>>  libavcodec/vvc/Makefile     |   3 +-
>>  libavcodec/vvc/vvc_thread.c | 804 ++++++++++++++++++++++++++++++++++++
>>  libavcodec/vvc/vvc_thread.h |  73 ++++
>>  3 files changed, 879 insertions(+), 1 deletion(-)
>>  create mode 100644 libavcodec/vvc/vvc_thread.c
>>  create mode 100644 libavcodec/vvc/vvc_thread.h
> 
> seems not to build with enable-shared
> 
> src/libavcodec/vvc/vvc_thread.c:235:9: error: address argument to atomic operation must be a pointer to non-const _Atomic type ('const atomic_int *' (aka 'const _Atomic(int) *') invalid)
>     if (atomic_load(&ft->ret))
>         ^           ~~~~~~~~
> /usr/lib/llvm-6.0/lib/clang/6.0.0/include/stdatomic.h:134:29: note: expanded from macro 'atomic_load'
> #define atomic_load(object) __c11_atomic_load(object, __ATOMIC_SEQ_CST)
>                             ^                 ~~~~~~
> 1 error generated.
> src/ffbuild/common.mak:81: recipe for target 'libavcodec/vvc/vvc_thread.o' failed
> make: *** [libavcodec/vvc/vvc_thread.o] Error 1
> make: *** Waiting for unfinished jobs....
> 
> 

atomic_load() does not accept pointers to non-const atomic objects in
the original C11 spec (presumably the reason for this was that on
systems that lack atomics of the appropriate size an atomic would need
to be emulated somehow and this may involve locking and therefore
require the object to be writable). Your system is old and abides by the
original spec; AFAIK this point has been changed in later specs.
The solution is to use a cast.

- Andreas

PS: Exactly the same thing happened in the HEVC decoder.
Nuo Mi July 10, 2023, 7:45 a.m. UTC | #3
On Sun, Jul 9, 2023 at 9:03 AM Andreas Rheinhardt <
andreas.rheinhardt@outlook.com> wrote:

> Michael Niedermayer:
> > On Fri, Jul 07, 2023 at 10:05:39PM +0800, Nuo Mi wrote:
> >> This is the main entry point for the CTU (Coding Tree Unit) decoder.
> >> The code will divide the CTU decoder into several stages.
> >> It will check the stage dependencies and run the stage decoder.
> >> ---
> >>  libavcodec/vvc/Makefile     |   3 +-
> >>  libavcodec/vvc/vvc_thread.c | 804 ++++++++++++++++++++++++++++++++++++
> >>  libavcodec/vvc/vvc_thread.h |  73 ++++
> >>  3 files changed, 879 insertions(+), 1 deletion(-)
> >>  create mode 100644 libavcodec/vvc/vvc_thread.c
> >>  create mode 100644 libavcodec/vvc/vvc_thread.h
> >
> > seems not to build with enable-shared
> >
> > src/libavcodec/vvc/vvc_thread.c:235:9: error: address argument to atomic
> operation must be a pointer to non-const _Atomic type ('const atomic_int *'
> (aka 'const _Atomic(int) *') invalid)
> >     if (atomic_load(&ft->ret))
> >         ^           ~~~~~~~~
> > /usr/lib/llvm-6.0/lib/clang/6.0.0/include/stdatomic.h:134:29: note:
> expanded from macro 'atomic_load'
> > #define atomic_load(object) __c11_atomic_load(object, __ATOMIC_SEQ_CST)
> >                             ^                 ~~~~~~
> > 1 error generated.
> > src/ffbuild/common.mak:81: recipe for target
> 'libavcodec/vvc/vvc_thread.o' failed
> > make: *** [libavcodec/vvc/vvc_thread.o] Error 1
> > make: *** Waiting for unfinished jobs....
> >
> >
>
> atomic_load() does not accept pointers to non-const atomic objects in
> the original C11 spec (presumably the reason for this was that on
> systems that lack atomics of the appropriate size an atomic would need
> to be emulated somehow and this may involve locking and therefore
> require the object to be writable). Your system is old and abides by the
> original spec; AFAIK this point has been changed in later specs.
> The solution is to use a cast.
>
> - Andreas
>
> PS: Exactly the same thing happened in the HEVC decoder.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Hi Michael and Andreas,
Thank you for the report and suggestion. will check the HEVC and fix it.
Rémi Denis-Courmont July 10, 2023, 3:04 p.m. UTC | #4
Le sunnuntaina 9. heinäkuuta 2023, 0.41.35 EEST Michael Niedermayer a écrit :
> On Fri, Jul 07, 2023 at 10:05:39PM +0800, Nuo Mi wrote:
> > This is the main entry point for the CTU (Coding Tree Unit) decoder.
> > The code will divide the CTU decoder into several stages.
> > It will check the stage dependencies and run the stage decoder.
> > ---
> > 
> >  libavcodec/vvc/Makefile     |   3 +-
> >  libavcodec/vvc/vvc_thread.c | 804 ++++++++++++++++++++++++++++++++++++
> >  libavcodec/vvc/vvc_thread.h |  73 ++++
> >  3 files changed, 879 insertions(+), 1 deletion(-)
> >  create mode 100644 libavcodec/vvc/vvc_thread.c
> >  create mode 100644 libavcodec/vvc/vvc_thread.h
> 
> seems not to build with enable-shared
> 
> src/libavcodec/vvc/vvc_thread.c:235:9: error: address argument to atomic
> operation must be a pointer to non-const _Atomic type ('const atomic_int *'
> (aka 'const _Atomic(int) *') invalid) if (atomic_load(&ft->ret))
>         ^           ~~~~~~~~

That is a known bug in the Clang compiler that was fixed in newer versions, 
AFAIK. You seem to be using something ancient...

> /usr/lib/llvm-6.0/lib/clang/6.0.0/include/stdatomic.h:134:29: note: expanded
> from macro 'atomic_load' #define atomic_load(object)
> __c11_atomic_load(object, __ATOMIC_SEQ_CST) ^                 ~~~~~~
> 1 error generated.
> src/ffbuild/common.mak:81: recipe for target 'libavcodec/vvc/vvc_thread.o'
> failed make: *** [libavcodec/vvc/vvc_thread.o] Error 1
> make: *** Waiting for unfinished jobs....
> 
> 
> [...]
Michael Niedermayer July 11, 2023, 5:28 p.m. UTC | #5
On Mon, Jul 10, 2023 at 06:04:06PM +0300, Rémi Denis-Courmont wrote:
> Le sunnuntaina 9. heinäkuuta 2023, 0.41.35 EEST Michael Niedermayer a écrit :
> > On Fri, Jul 07, 2023 at 10:05:39PM +0800, Nuo Mi wrote:
> > > This is the main entry point for the CTU (Coding Tree Unit) decoder.
> > > The code will divide the CTU decoder into several stages.
> > > It will check the stage dependencies and run the stage decoder.
> > > ---
> > > 
> > >  libavcodec/vvc/Makefile     |   3 +-
> > >  libavcodec/vvc/vvc_thread.c | 804 ++++++++++++++++++++++++++++++++++++
> > >  libavcodec/vvc/vvc_thread.h |  73 ++++
> > >  3 files changed, 879 insertions(+), 1 deletion(-)
> > >  create mode 100644 libavcodec/vvc/vvc_thread.c
> > >  create mode 100644 libavcodec/vvc/vvc_thread.h
> > 
> > seems not to build with enable-shared
> > 
> > src/libavcodec/vvc/vvc_thread.c:235:9: error: address argument to atomic
> > operation must be a pointer to non-const _Atomic type ('const atomic_int *'
> > (aka 'const _Atomic(int) *') invalid) if (atomic_load(&ft->ret))
> >         ^           ~~~~~~~~
> 
> That is a known bug in the Clang compiler that was fixed in newer versions, 
> AFAIK. You seem to be using something ancient...

I have many versions of clang on that box and several newer. But you are correct
and i intend to update that. Its just that updating the OS will break some scripts
and i need time to fix that, time ATM is limited.
As a sideeffect we get more testing with ancient clang which has its advanatges
too


[...]
diff mbox series

Patch

diff --git a/libavcodec/vvc/Makefile b/libavcodec/vvc/Makefile
index e8076329f9..9d13ae5b48 100644
--- a/libavcodec/vvc/Makefile
+++ b/libavcodec/vvc/Makefile
@@ -13,4 +13,5 @@  OBJS-$(CONFIG_VVC_DECODER)          +=  vvc/vvcdec.o            \
                                         vvc/vvc_itx_1d.o        \
                                         vvc/vvc_mvs.o           \
                                         vvc/vvc_ps.o            \
-                                        vvc/vvc_refs.o
\ No newline at end of file
+                                        vvc/vvc_refs.o          \
+                                        vvc/vvc_thread.o
\ No newline at end of file
diff --git a/libavcodec/vvc/vvc_thread.c b/libavcodec/vvc/vvc_thread.c
new file mode 100644
index 0000000000..1a0d465e7d
--- /dev/null
+++ b/libavcodec/vvc/vvc_thread.c
@@ -0,0 +1,804 @@ 
+/*
+ * VVC thread logic
+ *
+ * Copyright (C) 2023 Nuo Mi
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <stdatomic.h>
+
+#include "libavutil/thread.h"
+
+#include "vvc_thread.h"
+#include "vvc_ctu.h"
+#include "vvc_filter.h"
+#include "vvc_inter.h"
+#include "vvc_intra.h"
+#include "vvc_refs.h"
+
+typedef struct VVCRowThread {
+    VVCTask reconstruct_task;
+    VVCTask deblock_v_task;
+    VVCTask sao_task;
+    atomic_int progress[VVC_PROGRESS_LAST];
+} VVCRowThread;
+
+typedef struct VVCColThread {
+    VVCTask deblock_h_task;
+} VVCColThread;
+
+struct VVCFrameThread {
+    // error return for tasks
+    atomic_int ret;
+
+    atomic_uchar *avails;
+
+    VVCRowThread *rows;
+    VVCColThread *cols;
+    VVCTask *tasks;
+
+    int ctu_size;
+    int ctu_width;
+    int ctu_height;
+    int ctu_count;
+
+    //protected by lock
+    int nb_scheduled_tasks;
+    int nb_parse_tasks;
+    int row_progress[VVC_PROGRESS_LAST];
+
+    pthread_mutex_t lock;
+    pthread_cond_t  cond;
+};
+
+static int get_avail(const VVCFrameThread *ft, const int rx, const int ry, const VVCTaskType type)
+{
+    atomic_uchar *avail;
+    if (rx < 0 || ry < 0)
+        return 1;
+    avail = ft->avails + FFMIN(ry,  ft->ctu_height - 1)* ft->ctu_width + FFMIN(rx, ft->ctu_width - 1);
+    return atomic_load(avail) & (1 << type);
+}
+
+static void set_avail(const VVCFrameThread *ft, const int rx, const int ry, const VVCTaskType type)
+{
+    atomic_uchar *avail = ft->avails + ry * ft->ctu_width + rx;
+    if (rx < 0 || rx >= ft->ctu_width || ry < 0 || ry >= ft->ctu_height)
+        return;
+    atomic_fetch_or(avail, 1 << type);
+}
+
+void ff_vvc_task_init(VVCTask *task, VVCTaskType type, VVCFrameContext *fc)
+{
+    memset(task, 0, sizeof(*task));
+    task->type           = type;
+    task->fc             = fc;
+    task->decode_order   = fc->decode_order;
+}
+
+void ff_vvc_parse_task_init(VVCTask *t, VVCTaskType type, VVCFrameContext *fc,
+    SliceContext *sc, EntryPoint *ep, const int ctu_idx)
+{
+    const VVCFrameThread *ft = fc->frame_thread;
+    const int rs = sc->sh.ctb_addr_in_curr_slice[ctu_idx];
+
+    ff_vvc_task_init(t, type, fc);
+    t->sc = sc;
+    t->ep = ep;
+    t->ctu_idx = ctu_idx;
+    t->rx = rs % ft->ctu_width;
+    t->ry = rs / ft->ctu_width;
+}
+
+VVCTask* ff_vvc_task_alloc(void)
+{
+    return av_malloc(sizeof(VVCTask));
+}
+
+static int check_colocation_ctu(const VVCFrameContext *fc, const VVCTask *t)
+{
+    if (fc->ps.ph->temporal_mvp_enabled_flag || fc->ps.sps->sbtmvp_enabled_flag) {
+        //-1 to avoid we are waiting for next CTU line.
+        const int y = (t->ry << fc->ps.sps->ctb_log2_size_y) - 1;
+        VVCFrame *col = fc->ref->collocated_ref;
+        if (col && !ff_vvc_check_progress(col, VVC_PROGRESS_MV, y))
+            return 0;
+    }
+    return 1;
+}
+
+static int is_parse_ready(const VVCFrameContext *fc, const VVCTask *t)
+{
+    av_assert0(t->type == VVC_TASK_TYPE_PARSE);
+    if (!check_colocation_ctu(fc, t))
+        return 0;
+    if (fc->ps.sps->entropy_coding_sync_enabled_flag && t->ry != fc->ps.pps->ctb_to_row_bd[t->ry])
+        return get_avail(fc->frame_thread, t->rx, t->ry - 1, VVC_TASK_TYPE_PARSE);
+    return 1;
+}
+
+static int is_inter_ready(const VVCFrameContext *fc, const VVCTask *t)
+{
+    VVCFrameThread *ft  = fc->frame_thread;
+    const int rs        = t->ry * ft->ctu_width + t->rx;
+    const int slice_idx = fc->tab.slice_idx[rs];
+
+    av_assert0(t->type == VVC_TASK_TYPE_INTER);
+
+    if (slice_idx != -1) {
+        const SliceContext *sc = fc->slices[slice_idx];
+        const VVCSH *sh        = &sc->sh;
+        CTU *ctu               = fc->tab.ctus + rs;
+        if (!IS_I(sh)) {
+            for (int lx = 0; lx < 2; lx++) {
+                for (int i = ctu->max_y_idx[lx]; i < sh->nb_refs[lx]; i++) {
+                    const int y = ctu->max_y[lx][i];
+                    VVCFrame *ref = sc->rpl[lx].ref[i];
+                    if (ref && y >= 0) {
+                        if (!ff_vvc_check_progress(ref, VVC_PROGRESS_PIXEL, y + LUMA_EXTRA_AFTER))
+                            return 0;
+                    }
+                    ctu->max_y_idx[lx]++;
+                }
+            }
+        }
+    }
+    return 1;
+}
+
+static int is_recon_ready(const VVCFrameContext *fc, const VVCTask *t)
+{
+    const VVCFrameThread *ft = fc->frame_thread;
+
+    av_assert0(t->type == VVC_TASK_TYPE_RECON);
+    return get_avail(ft, t->rx, t->ry, VVC_TASK_TYPE_INTER) &&
+        get_avail(ft, t->rx + 1, t->ry - 1, VVC_TASK_TYPE_RECON) &&
+        get_avail(ft, t->rx - 1, t->ry, VVC_TASK_TYPE_RECON);
+}
+
+static int is_lmcs_ready(const VVCFrameContext *fc, const VVCTask *t)
+{
+    const VVCFrameThread *ft = fc->frame_thread;
+
+    av_assert0(t->type == VVC_TASK_TYPE_LMCS);
+    return get_avail(ft, t->rx + 1, t->ry + 1, VVC_TASK_TYPE_RECON) &&
+        get_avail(ft, t->rx, t->ry + 1, VVC_TASK_TYPE_RECON) &&
+        get_avail(ft, t->rx + 1, t->ry, VVC_TASK_TYPE_RECON);
+}
+
+static int is_deblock_v_ready(const VVCFrameContext *fc, const VVCTask *t)
+{
+    const VVCFrameThread *ft  = fc->frame_thread;
+
+    av_assert0(t->type == VVC_TASK_TYPE_DEBLOCK_V);
+    return get_avail(ft, t->rx, t->ry, VVC_TASK_TYPE_LMCS) &&
+        get_avail(ft, t->rx + 1, t->ry, VVC_TASK_TYPE_LMCS);
+}
+
+static int is_deblock_h_ready(const VVCFrameContext *fc, const VVCTask *t)
+{
+    const VVCFrameThread *ft = fc->frame_thread;
+
+    av_assert0(t->type == VVC_TASK_TYPE_DEBLOCK_H);
+    return get_avail(ft, t->rx - 1, t->ry, VVC_TASK_TYPE_DEBLOCK_H) &&
+        get_avail(ft, t->rx, t->ry, VVC_TASK_TYPE_DEBLOCK_V);
+}
+
+static int is_sao_ready(const VVCFrameContext *fc, const VVCTask *t)
+{
+    av_assert0(t->type == VVC_TASK_TYPE_SAO);
+    return get_avail(fc->frame_thread, t->rx + 1, t->ry - 1, VVC_TASK_TYPE_SAO) &&
+        get_avail(fc->frame_thread, t->rx - 1, t->ry + 1, VVC_TASK_TYPE_DEBLOCK_H) &&
+        get_avail(fc->frame_thread, t->rx, t->ry + 1, VVC_TASK_TYPE_DEBLOCK_H) &&
+        get_avail(fc->frame_thread, t->rx + 1, t->ry + 1, VVC_TASK_TYPE_DEBLOCK_H);
+}
+
+static int is_alf_ready(const VVCFrameContext *fc, const VVCTask *t)
+{
+    av_assert0(t->type == VVC_TASK_TYPE_ALF);
+    return 1;
+}
+
+typedef int (*is_ready_func)(const VVCFrameContext *fc, const VVCTask *t);
+
+int ff_vvc_task_ready(const Tasklet *_t, void *user_data)
+{
+    const VVCTask *t            = (const VVCTask*)_t;
+    const VVCFrameThread *ft    = t->fc->frame_thread;
+    int ready;
+    is_ready_func is_ready[]    = {
+        is_parse_ready,
+        is_inter_ready,
+        is_recon_ready,
+        is_lmcs_ready,
+        is_deblock_v_ready,
+        is_deblock_h_ready,
+        is_sao_ready,
+        is_alf_ready,
+    };
+
+    if (atomic_load(&ft->ret))
+        return 1;
+    ready = is_ready[t->type](t->fc, t);
+
+    return ready;
+}
+
+#define CHECK(a, b)                         \
+    do {                                    \
+        if ((a) != (b))                     \
+            return (a) < (b);               \
+    } while (0)
+
+int ff_vvc_task_priority_higher(const Tasklet *_a, const Tasklet *_b)
+{
+    const VVCTask *a = (const VVCTask*)_a;
+    const VVCTask *b = (const VVCTask*)_b;
+
+    CHECK(a->decode_order, b->decode_order);                //decode order
+
+    if (a->type == VVC_TASK_TYPE_PARSE || b->type == VVC_TASK_TYPE_PARSE) {
+        CHECK(a->type, b->type);
+        CHECK(a->ry, b->ry);
+        return a->rx < b->rx;
+    }
+
+    CHECK(a->rx + a->ry + a->type, b->rx + b->ry + b->type);    //zigzag with type
+    CHECK(a->rx + a->ry, b->rx + b->ry);                        //zigzag
+    return a->ry < b->ry;
+}
+
+static void add_task(VVCContext *s, VVCTask *t, const VVCTaskType type)
+{
+    t->type = type;
+    ff_vvc_frame_add_task(s, t);
+}
+
+static int run_parse(VVCContext *s, VVCLocalContext *lc, VVCTask *t)
+{
+    VVCFrameContext *fc     = lc->fc;
+    const VVCSPS *sps       = fc->ps.sps;
+    const VVCPPS *pps       = fc->ps.pps;
+    SliceContext *sc        = t->sc;
+    const VVCSH *sh         = &sc->sh;
+    EntryPoint *ep          = t->ep;
+    VVCFrameThread *ft      = fc->frame_thread;
+    int ret, rs, prev_ry;
+
+    lc->sc = sc;
+    lc->ep = ep;
+
+    //reconstruct one line a time
+    rs = sh->ctb_addr_in_curr_slice[t->ctu_idx];
+    do {
+
+        prev_ry = t->ry;
+
+        ret = ff_vvc_coding_tree_unit(lc, t->ctu_idx, rs, t->rx, t->ry);
+        if (ret < 0)
+            return ret;
+
+        set_avail(ft, t->rx, t->ry, VVC_TASK_TYPE_PARSE);
+        add_task(s, ft->tasks + rs, VVC_TASK_TYPE_INTER);
+
+        if (fc->ps.sps->entropy_coding_sync_enabled_flag && t->rx == pps->ctb_to_col_bd[t->rx]) {
+            EntryPoint *next = ep + 1;
+            if (next < sc->eps + sc->nb_eps) {
+                memcpy(next->cabac_state, ep->cabac_state, sizeof(next->cabac_state));
+                av_assert0(!next->parse_task->type);
+                ff_vvc_ep_init_stat_coeff(lc->ep, sps->bit_depth, sps->persistent_rice_adaptation_enabled_flag);
+                ff_vvc_frame_add_task(s, next->parse_task);
+            }
+        }
+
+        t->ctu_idx++;
+        if (t->ctu_idx >= ep->ctu_end)
+            break;
+
+        rs = sh->ctb_addr_in_curr_slice[t->ctu_idx];
+        t->rx = rs % ft->ctu_width;
+        t->ry = rs / ft->ctu_width;
+    } while (t->ry == prev_ry && is_parse_ready(fc, t));
+
+    if (t->ctu_idx < ep->ctu_end)
+        ff_vvc_frame_add_task(s, t);
+
+    return 0;
+}
+
+static void report_frame_progress(VVCFrameContext *fc, VVCTask *t)
+{
+    VVCFrameThread *ft  = fc->frame_thread;
+    const int ctu_size  = ft->ctu_size;
+    const int idx       = t->type == VVC_TASK_TYPE_INTER ? VVC_PROGRESS_MV : VVC_PROGRESS_PIXEL;
+    int old;
+
+    if (atomic_fetch_add(&ft->rows[t->ry].progress[idx], 1) == ft->ctu_width - 1) {
+        int y;
+        pthread_mutex_lock(&ft->lock);
+        y = old = ft->row_progress[idx];
+        while (y < ft->ctu_height && atomic_load(&ft->rows[y].progress[idx]) == ft->ctu_width)
+            y++;
+        if (old != y) {
+            const int progress = y == ft->ctu_height ? INT_MAX : y * ctu_size;
+            ft->row_progress[idx] = y;
+            ff_vvc_report_progress(fc->ref, idx, progress);
+        }
+        pthread_mutex_unlock(&ft->lock);
+    }
+}
+
+static int run_inter(VVCContext *s, VVCLocalContext *lc, VVCTask *t)
+{
+    VVCFrameContext *fc = lc->fc;
+    VVCFrameThread *ft  = fc->frame_thread;
+    const int rs        = t->ry * ft->ctu_width + t->rx;
+    const int slice_idx = fc->tab.slice_idx[rs];
+
+    if (slice_idx != -1) {
+        lc->sc = fc->slices[slice_idx];
+        ff_vvc_predict_inter(lc, rs);
+        if (!t->rx)
+            ff_vvc_frame_add_task(s, &ft->rows[t->ry].reconstruct_task);
+    }
+    set_avail(ft, t->rx, t->ry, VVC_TASK_TYPE_INTER);
+    report_frame_progress(fc, t);
+
+    return 0;
+}
+
+static int run_recon(VVCContext *s, VVCLocalContext *lc, VVCTask *t)
+{
+    VVCFrameContext *fc = lc->fc;
+    VVCFrameThread *ft  = fc->frame_thread;
+
+    do {
+        const int rs = t->ry * ft->ctu_width + t->rx;
+        const int slice_idx = fc->tab.slice_idx[rs];
+
+        if (slice_idx != -1) {
+            lc->sc = fc->slices[slice_idx];
+            ff_vvc_reconstruct(lc, rs, t->rx, t->ry);
+        }
+
+        set_avail(ft, t->rx, t->ry, VVC_TASK_TYPE_RECON);
+        add_task(s, ft->tasks + rs, VVC_TASK_TYPE_LMCS);
+
+        t->rx++;
+    } while (t->rx < ft->ctu_width && is_recon_ready(fc, t));
+
+    if (t->rx < ft->ctu_width)
+        ff_vvc_frame_add_task(s, t);
+    return 0;
+}
+
+static int run_lmcs(VVCContext *s, VVCLocalContext *lc, VVCTask *t)
+{
+    VVCFrameContext *fc = lc->fc;
+    VVCFrameThread *ft  = fc->frame_thread;
+    const int ctu_size  = ft->ctu_size;
+    const int x0        = t->rx * ctu_size;
+    const int y0        = t->ry * ctu_size;
+    const int rs        = t->ry * ft->ctu_width + t->rx;
+    const int slice_idx = fc->tab.slice_idx[rs];
+
+    if (slice_idx != -1) {
+        lc->sc = fc->slices[slice_idx];
+        ff_vvc_lmcs_filter(lc, x0, y0);
+    }
+    set_avail(ft, t->rx, t->ry, VVC_TASK_TYPE_LMCS);
+    if (!t->rx)
+        add_task(s, &ft->rows[t->ry].deblock_v_task, VVC_TASK_TYPE_DEBLOCK_V);
+
+    return 0;
+}
+
+static int run_deblock_v(VVCContext *s, VVCLocalContext *lc, VVCTask *t)
+{
+    VVCFrameContext *fc = lc->fc;
+    VVCFrameThread *ft  = fc->frame_thread;
+    int rs              = t->ry * ft->ctu_width + t->rx;
+    const int ctb_size  = ft->ctu_size;
+
+    do {
+        const int x0        = t->rx * ctb_size;
+        const int y0        = t->ry * ctb_size;
+        const int slice_idx = fc->tab.slice_idx[rs];
+
+        if (slice_idx != -1) {
+            lc->sc = fc->slices[slice_idx];
+            if (!lc->sc->sh.deblocking_filter_disabled_flag) {
+                ff_vvc_decode_neighbour(lc, x0, y0, t->rx, t->ry, rs);
+                ff_vvc_deblock_vertical(lc, x0, y0);
+            }
+        }
+
+        set_avail(ft, t->rx, t->ry, VVC_TASK_TYPE_DEBLOCK_V);
+
+        if (!t->ry)
+            add_task(s, &ft->cols[t->rx].deblock_h_task , VVC_TASK_TYPE_DEBLOCK_H);
+
+        t->rx++;
+        rs++;
+    } while (t->rx < ft->ctu_width && is_deblock_v_ready(fc, t));
+
+    if (t->rx < ft->ctu_width)
+        ff_vvc_frame_add_task(s, t);
+
+    return 0;
+}
+
+static int run_deblock_h(VVCContext *s, VVCLocalContext *lc, VVCTask *t)
+{
+    VVCFrameContext *fc = lc->fc;
+    VVCFrameThread *ft  = fc->frame_thread;
+    const int ctb_size  = ft->ctu_size;
+    int rs              = t->ry * ft->ctu_width + t->rx;
+
+    do {
+        const int x0 = t->rx * ctb_size;
+        const int y0 = t->ry * ctb_size;
+        const int slice_idx = fc->tab.slice_idx[rs];
+
+        if (slice_idx != -1) {
+            lc->sc = fc->slices[slice_idx];
+            if (!lc->sc->sh.deblocking_filter_disabled_flag) {
+                ff_vvc_decode_neighbour(lc, x0, y0, t->rx, t->ry, rs);
+                ff_vvc_deblock_horizontal(lc, x0, y0);
+            }
+        }
+
+        set_avail(ft, t->rx, t->ry, VVC_TASK_TYPE_DEBLOCK_H);
+
+        if (!t->rx)
+            add_task(s, &ft->rows[t->ry].sao_task, VVC_TASK_TYPE_SAO);
+
+        rs += ft->ctu_width;
+        t->ry++;
+    } while (t->ry < ft->ctu_height && is_deblock_h_ready(fc, t));
+
+    if (t->ry < ft->ctu_height)
+        ff_vvc_frame_add_task(s, t);
+
+    return 0;
+}
+
+static void add_alf_tasks(VVCContext *s, VVCLocalContext *lc, VVCTask *t)
+{
+    VVCFrameContext *fc = lc->fc;
+    VVCFrameThread *ft  = fc->frame_thread;
+    VVCTask *at = ft->tasks + ft->ctu_width * t->ry + t->rx;
+    if (t->ry > 0) {
+        VVCTask *top = at - ft->ctu_width;
+        if (t->rx > 0)
+            add_task(s, top - 1, VVC_TASK_TYPE_ALF);
+        if (t->rx == ft->ctu_width - 1)
+            add_task(s, top, VVC_TASK_TYPE_ALF);
+    }
+    if (t->ry == ft->ctu_height - 1) {
+        if (t->rx > 0)
+            add_task(s, at - 1, VVC_TASK_TYPE_ALF);
+        if (t->rx == ft->ctu_width - 1)
+            add_task(s, at, VVC_TASK_TYPE_ALF);
+    }
+
+}
+
+static int run_sao(VVCContext *s, VVCLocalContext *lc, VVCTask *t)
+{
+    VVCFrameContext *fc = lc->fc;
+    VVCFrameThread *ft  = fc->frame_thread;
+    int rs              = t->ry * fc->ps.pps->ctb_width + t->rx;
+    const int ctb_size  = ft->ctu_size;
+
+    do {
+        const int x0 = t->rx * ctb_size;
+        const int y0 = t->ry * ctb_size;
+
+        if (fc->ps.sps->sao_enabled_flag) {
+            ff_vvc_decode_neighbour(lc, x0, y0, t->rx, t->ry, rs);
+            ff_vvc_sao_filter(lc, x0, y0);
+        }
+
+        if (fc->ps.sps->alf_enabled_flag)
+            ff_vvc_alf_copy_ctu_to_hv(lc, x0, y0);
+
+        set_avail(ft, t->rx, t->ry, VVC_TASK_TYPE_SAO);
+
+        add_alf_tasks(s, lc, t);
+
+        rs++;
+        t->rx++;
+    } while (t->rx < ft->ctu_width && is_sao_ready(fc, t));
+
+    if (t->rx < ft->ctu_width)
+        ff_vvc_frame_add_task(s, t);
+
+    return 0;
+}
+
+static int run_alf(VVCContext *s, VVCLocalContext *lc, VVCTask *t)
+{
+    VVCFrameContext *fc = lc->fc;
+    VVCFrameThread *ft  = fc->frame_thread;
+    const int ctu_size  = ft->ctu_size;
+    const int x0 = t->rx * ctu_size;
+    const int y0 = t->ry * ctu_size;
+
+    if (fc->ps.sps->alf_enabled_flag) {
+        const int slice_idx = CTB(fc->tab.slice_idx, t->rx, t->ry);
+        if (slice_idx != -1) {
+            const int rs = t->ry * fc->ps.pps->ctb_width + t->rx;
+            lc->sc = fc->slices[slice_idx];
+            ff_vvc_decode_neighbour(lc, x0, y0, t->rx, t->ry, rs);
+            ff_vvc_alf_filter(lc, x0, y0);
+        }
+    }
+    set_avail(ft, t->rx, t->ry, VVC_TASK_TYPE_ALF);
+    report_frame_progress(fc, t);
+
+    return 0;
+}
+
+static void finished_one_task(VVCFrameThread *ft, const VVCTaskType type)
+{
+    int parse_done = 0;
+    pthread_mutex_lock(&ft->lock);
+
+    av_assert0(ft->nb_scheduled_tasks);
+    ft->nb_scheduled_tasks--;
+
+    if (type == VVC_TASK_TYPE_PARSE) {
+        av_assert0(ft->nb_parse_tasks);
+        ft->nb_parse_tasks--;
+        if (!ft->nb_parse_tasks)
+            parse_done = 1;
+    }
+    if (parse_done || !ft->nb_scheduled_tasks)
+        pthread_cond_broadcast(&ft->cond);
+
+    pthread_mutex_unlock(&ft->lock);
+}
+
+
+#define VVC_THREAD_DEBUG
+#ifdef VVC_THREAD_DEBUG
+const static char* task_name[] = {
+    "P",
+    "I",
+    "R",
+    "L",
+    "V",
+    "H",
+    "S",
+    "A"
+};
+#endif
+
+typedef int (*run_func)(VVCContext *s, VVCLocalContext *lc, VVCTask *t);
+
+int ff_vvc_task_run(Tasklet *_t, void *local_context, void *user_data)
+{
+    VVCTask *t              = (VVCTask*)_t;
+    VVCContext *s           = (VVCContext *)user_data;
+    VVCLocalContext *lc     = local_context;
+    VVCFrameThread *ft      = t->fc->frame_thread;
+    const VVCTaskType type  = t->type;
+    int ret = 0;
+    run_func run[] = {
+        run_parse,
+        run_inter,
+        run_recon,
+        run_lmcs,
+        run_deblock_v,
+        run_deblock_h,
+        run_sao,
+        run_alf,
+    };
+
+    lc->fc = t->fc;
+
+#ifdef VVC_THREAD_DEBUG
+    av_log(s->avctx, AV_LOG_DEBUG, "frame %5d, %s(%3d, %3d)\r\n", (int)t->fc->decode_order, task_name[t->type], t->rx, t->ry);
+#endif
+
+    if (!atomic_load(&ft->ret)) {
+        if ((ret = run[t->type](s, lc, t)) < 0) {
+#ifdef COMPAT_ATOMICS_WIN32_STDATOMIC_H
+            intptr_t zero = 0;
+#else
+            int zero = 0;
+#endif
+            atomic_compare_exchange_strong(&ft->ret, &zero, ret);
+        }
+    }
+
+    // t->type may changed by run(), we use a local copy of t->type
+    finished_one_task(ft, type);
+
+    return ret;
+}
+
+void ff_vvc_frame_thread_free(VVCFrameContext *fc)
+{
+    VVCFrameThread *ft = fc->frame_thread;
+
+    if (!ft)
+        return;
+
+    pthread_mutex_destroy(&ft->lock);
+    pthread_cond_destroy(&ft->cond);
+    av_freep(&ft->avails);
+    av_freep(&ft->cols);
+    av_freep(&ft->rows);
+    av_freep(&ft->tasks);
+    av_freep(&ft);
+}
+
+int ff_vvc_frame_thread_init(VVCFrameContext *fc)
+{
+    const VVCSPS *sps = fc->ps.sps;
+    const VVCPPS *pps = fc->ps.pps;
+    VVCFrameThread *ft = fc->frame_thread;
+    int ret;
+
+    if (!ft || ft->ctu_width != pps->ctb_width ||
+        ft->ctu_height != pps->ctb_height ||
+        ft->ctu_size != sps->ctb_size_y) {
+
+        ff_vvc_frame_thread_free(fc);
+        ft = av_calloc(1, sizeof(*fc->frame_thread));
+        if (!ft)
+            return AVERROR(ENOMEM);
+
+        ft->ctu_width  = fc->ps.pps->ctb_width;
+        ft->ctu_height = fc->ps.pps->ctb_height;
+        ft->ctu_count  = fc->ps.pps->ctb_count;
+        ft->ctu_size   = fc->ps.sps->ctb_size_y;
+
+        ft->rows = av_calloc(ft->ctu_height, sizeof(*ft->rows));
+        if (!ft->rows)
+            goto fail;
+
+        for (int y = 0; y < ft->ctu_height; y++) {
+            VVCRowThread *row = ft->rows + y;
+            ff_vvc_task_init(&row->deblock_v_task, VVC_TASK_TYPE_DEBLOCK_V, fc);
+            row->deblock_v_task.ry = y;
+            ff_vvc_task_init(&row->sao_task, VVC_TASK_TYPE_SAO, fc);
+            row->sao_task.ry = y;
+            ff_vvc_task_init(&row->reconstruct_task, VVC_TASK_TYPE_RECON, fc);
+            row->reconstruct_task.ry = y;
+        }
+
+        ft->cols = av_calloc(ft->ctu_width, sizeof(*ft->cols));
+        if (!ft->cols)
+            goto fail;
+        for (int x = 0; x < ft->ctu_width; x++) {
+            VVCColThread *col = ft->cols + x;
+            ff_vvc_task_init(&col->deblock_h_task, VVC_TASK_TYPE_DEBLOCK_H, fc);
+            col->deblock_h_task.rx = x;
+        }
+
+        ft->avails = av_calloc(ft->ctu_count, sizeof(*ft->avails));
+        if (!ft->avails)
+            goto fail;
+
+        ft->tasks = av_calloc(ft->ctu_count, sizeof(*ft->tasks));
+        if (!ft->tasks)
+            goto fail;
+        for (int rs = 0; rs < ft->ctu_count; rs++) {
+            VVCTask *t = ft->tasks + rs;
+            t->rx = rs % ft->ctu_width;
+            t->ry = rs / ft->ctu_width;
+            t->fc = fc;
+        }
+
+        if ((ret = pthread_cond_init(&ft->cond, NULL)))
+            goto fail;
+
+        if ((ret = pthread_mutex_init(&ft->lock, NULL))) {
+            pthread_cond_destroy(&ft->cond);
+            goto fail;
+        }
+    }
+
+    ft->ret = 0;
+    for (int y = 0; y < ft->ctu_height; y++) {
+        VVCRowThread *row = ft->rows + y;
+
+        row->reconstruct_task.rx = 0;
+        memset(&row->progress[0], 0, sizeof(row->progress));
+        row->deblock_v_task.rx = 0;
+        row->sao_task.rx = 0;
+    }
+
+    for (int x = 0; x < ft->ctu_width; x++) {
+        VVCColThread *col = ft->cols + x;
+        col->deblock_h_task.ry = 0;
+    }
+
+    for (int rs = 0; rs < ft->ctu_count; rs++) {
+        ft->avails[rs] = 0;
+        ft->tasks[rs].decode_order = fc->decode_order;
+    }
+
+    memset(&ft->row_progress[0], 0, sizeof(ft->row_progress));
+    fc->frame_thread = ft;
+
+    return 0;
+
+fail:
+    if (ft) {
+        av_freep(&ft->avails);
+        av_freep(&ft->cols);
+        av_freep(&ft->rows);
+        av_freep(&ft->tasks);
+        av_freep(&ft);
+    }
+
+    return AVERROR(ENOMEM);
+}
+
+void ff_vvc_frame_add_task(VVCContext *s, VVCTask *t)
+{
+    VVCFrameContext *fc = t->fc;
+    VVCFrameThread *ft  = fc->frame_thread;
+
+    pthread_mutex_lock(&ft->lock);
+
+    ft->nb_scheduled_tasks++;
+    if (t->type == VVC_TASK_TYPE_PARSE)
+        ft->nb_parse_tasks++;
+
+    pthread_mutex_unlock(&ft->lock);
+
+    ff_executor_execute(s->executor, &t->task);
+}
+
+int ff_vvc_frame_wait(VVCContext *s, VVCFrameContext *fc)
+{
+    VVCFrameThread *ft = fc->frame_thread;
+    int check_missed_slices = 1;
+
+    pthread_mutex_lock(&ft->lock);
+
+    while (ft->nb_scheduled_tasks) {
+        if (check_missed_slices && !ft->nb_parse_tasks) {
+            // abort for missed slices
+            for (int rs = 0; rs < ft->ctu_count; rs++){
+                atomic_uchar mask = 1 << VVC_TASK_TYPE_PARSE;
+                if (!(atomic_load(ft->avails + rs) & mask)) {
+                    atomic_store(&ft->ret, AVERROR_INVALIDDATA);
+                    // maybe all thread are waiting, let us wake up one
+                    ff_executor_execute(s->executor, NULL);
+                    break;
+                }
+            }
+            check_missed_slices = 0;
+        }
+        pthread_cond_wait(&ft->cond, &ft->lock);
+    }
+
+    pthread_mutex_unlock(&ft->lock);
+    ff_vvc_report_frame_finished(fc->ref);
+
+#ifdef VVC_THREAD_DEBUG
+    av_log(s->avctx, AV_LOG_DEBUG, "frame %5d done\r\n", (int)fc->decode_order);
+#endif
+    return ft->ret;
+}
diff --git a/libavcodec/vvc/vvc_thread.h b/libavcodec/vvc/vvc_thread.h
new file mode 100644
index 0000000000..dafee0133b
--- /dev/null
+++ b/libavcodec/vvc/vvc_thread.h
@@ -0,0 +1,73 @@ 
+/*
+ * VVC thread logic
+ *
+ * Copyright (C) 2023 Nuo Mi
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVCODEC_VVC_THREAD_H
+#define AVCODEC_VVC_THREAD_H
+
+#include "vvcdec.h"
+
+typedef enum VVCTaskType {
+    VVC_TASK_TYPE_PARSE,
+    VVC_TASK_TYPE_INTER,
+    VVC_TASK_TYPE_RECON,
+    VVC_TASK_TYPE_LMCS,
+    VVC_TASK_TYPE_DEBLOCK_V,
+    VVC_TASK_TYPE_DEBLOCK_H,
+    VVC_TASK_TYPE_SAO,
+    VVC_TASK_TYPE_ALF,
+    VVC_TASK_TYPE_LAST
+} VVCTaskType;
+
+struct VVCTask {
+    union {
+        VVCTask *next;                //for executor debug only
+        Tasklet task;
+    };
+
+    VVCTaskType type;
+    uint64_t decode_order;
+
+    // ctu x, y in raster order
+    int rx, ry;
+    VVCFrameContext *fc;
+
+    // reconstruct task only
+    SliceContext *sc;
+    EntryPoint *ep;
+    int ctu_idx;                    //ctu idx in the current slice
+};
+
+void ff_vvc_task_init(VVCTask *task, VVCTaskType type, VVCFrameContext *fc);
+void ff_vvc_parse_task_init(VVCTask *task, VVCTaskType type, VVCFrameContext *fc,
+    SliceContext *sc,  EntryPoint *ep, int ctu_addr);
+VVCTask* ff_vvc_task_alloc(void);
+
+int ff_vvc_task_ready(const Tasklet* t, void* user_data);
+int ff_vvc_task_priority_higher(const Tasklet *a, const Tasklet *b);
+int ff_vvc_task_run(Tasklet *t, void *local_context, void *user_data);
+
+int ff_vvc_frame_thread_init(VVCFrameContext *fc);
+void ff_vvc_frame_thread_free(VVCFrameContext *fc);
+void ff_vvc_frame_add_task(VVCContext *s, VVCTask *t);
+int ff_vvc_frame_wait(VVCContext *s, VVCFrameContext *fc);
+
+#endif // AVCODEC_VVC_THREAD_H