From patchwork Sun Feb 26 05:48:34 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Nuo Mi <nuomi2021@gmail.com>
X-Patchwork-Id: 40525
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a05:6a20:5494:b0:bf:7b3a:fd32 with SMTP id i20csp2495122pzk;
        Sat, 25 Feb 2023 21:48:56 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set/J4iBkSqT2jNEnamvuvqeubaZrB2NxwWlvGh4PUIulJrCv03MFQ2eLSdi2UIOOCbohJcUt
X-Received: by 2002:a17:907:170e:b0:8b1:3483:e3d5 with SMTP id
 le14-20020a170907170e00b008b13483e3d5mr26273873ejc.48.1677390536744;
        Sat, 25 Feb 2023 21:48:56 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1677390536; cv=none;
        d=google.com; s=arc-20160816;
        b=GGqAx8tkDDPNyboisIu70ik88IC+SJZpJdEYQ1ShGSstm4OnAseqcHW8cPBX4nRXyW
         Nu4oHruXh4Ba77Yi6djFggXuvC51wK7DO7b3W/lkbeljMI/QdYH1LxawnnbI+P2RUt6J
         kVoZKJ6CvQq4SeXTKuTgweP0sZenRblRjy73pN3szyxuWTYfOHdsufV4bO/BEDKL4HrZ
         7bMsDjPoV13MU0lAvbk5oALN7E/CmhAA1hXgaqQGUn6CPOG95tNdAD+s9dWt1QCySHV+
         fUW0mTONk+iUscbit61GJhFY9ol1JppO3DT/mzaQhSWjd/EdUFq2wanzay9rlSvbYD+O
         w89w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:mime-version:message-id:date:to:from
         :dkim-signature:delivered-to;
        bh=/RqGYH/TvAqpWfPizPbvVqpanCEMsy04WIkplSPbsiA=;
        b=u/TrVOLLEaFuGcuVGc228m0b0tSWnvCBI1bl31Uqo5eYDLGutnUbDg6zMUa+61UOXn
         +aNmvdMehkeJgEYDr5+I/ijMW+qt+keWqjcsS1SYOh5OaBPDUY3FRRMpztQK65egs3QP
         MUnuiQj5QlRiXQ/SoZnOyAolkMUrVk571DjIXQ1qclVpZSWtpSOybwkZwdEyc8E0hyN8
         ewdsf4V6t1fSCi8N51maeP4tnh+q6+VnFIWrX2fmXznta8NnmOezgVSl4c3k3awMDL4Z
         iJbTiNjpzPnePYkcK9EOXQXC1to/zJ2x2T0+ucrRCr0MD4oxmf9ZxEvynMz4OS0S2ZJh
         4e/Q==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@gmail.com
 header.s=20210112 header.b="YjA/78a/";
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id
 bs4-20020a170906d1c400b008c660960fa5si4400611ejb.336.2023.02.25.21.48.56;
        Sat, 25 Feb 2023 21:48:56 -0800 (PST)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@gmail.com
 header.s=20210112 header.b="YjA/78a/";
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CAF2968BC56;
	Sun, 26 Feb 2023 07:48:51 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com
 [209.85.216.45])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2522068B9F1
 for <ffmpeg-devel@ffmpeg.org>; Sun, 26 Feb 2023 07:48:44 +0200 (EET)
Received: by mail-pj1-f45.google.com with SMTP id u10so2993663pjc.5
 for <ffmpeg-devel@ffmpeg.org>; Sat, 25 Feb 2023 21:48:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:mime-version:message-id:date:subject:cc
 :to:from:from:to:cc:subject:date:message-id:reply-to;
 bh=/1fUlQD3OYAdk0nuGv9VCSocdR555BZktmC6H/Hfc1A=;
 b=YjA/78a/ivmL1P8d7MjXmS0lpV0CZe36wjtt9/NvT+udF970ztG5A4F1yMFCj/9Pn+
 DR9NqORWYDn2q34mDqR7xflTtBWVZOtPpBdU2mWz2K4pS+GrLWFGHC7Y5/+2RWSNOx+v
 aPAUmdtrmWo/krA6iy4nVOU3x5qr6qOz8fs1fk11GEyFCigZ08UzR2hAeFREmsuyyhT3
 E+6z67eHVH8sN1Prc2dTNPB6IMUlmaA659gFvApmZC9ACWscUDRt9IzXKzccQBs/nhQH
 jv7hQG/B6RxZHcHzFmxI+8cPaQ52Rp9Hzzzu20k0l0LJDmLNCAiRxKWmVcTp/7WR9Auj
 e+pA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:mime-version:message-id:date:subject:cc
 :to:from:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=/1fUlQD3OYAdk0nuGv9VCSocdR555BZktmC6H/Hfc1A=;
 b=05gomE4mkL6aYdRrjAmSBAC47rwH/0Pv0X25yBoR3r6v5LM7uQuLIs/NCc8GKaWdsq
 NyGbvt6eDwVSNFcUXuBUtxD62ODxTU0Meyw8o49XoBPcKAPvz839daLBLc945UHa58DC
 WuXTPNhTmiHd5zCxpeYA5X5Qkgo6Y81LTCWlRzhW+FzLmfv+r+av0DhjnG5Cb1QrxSID
 OW8c0+spVlQNwqGd0jbMohLIVweHHSBL2QI9j5wzTaw5FcsM8DWAEhk5SVdCJQvrsXRR
 EmJQLllylDJZ/UguZnH7U2jflgEs5c3FeGiAz6DWHRpZoYrn9pm530eIhtWtxqCKV0eB
 NSDQ==
X-Gm-Message-State: AO0yUKVBm5z6zxEdlYGSQBpM59Nk50JjxSwJeJ6l0Qhuyn3gDrL99Y5G
 0E+VLkq0/K8JDDF+hTwWQqAv87RzMRTLMw==
X-Received: by 2002:a17:902:f605:b0:19c:dd2e:d4f5 with SMTP id
 n5-20020a170902f60500b0019cdd2ed4f5mr6355296plg.36.1677390520547;
 Sat, 25 Feb 2023 21:48:40 -0800 (PST)
Received: from NuoMi.localdomain ([112.64.8.103])
 by smtp.gmail.com with ESMTPSA id
 s22-20020a170902b19600b001991f3d85acsm2046531plr.299.2023.02.25.21.48.38
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Sat, 25 Feb 2023 21:48:39 -0800 (PST)
From: Nuo Mi <nuomi2021@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Date: Sun, 26 Feb 2023 13:48:34 +0800
Message-Id: <20230226054835.14201-1-nuomi2021@gmail.com>
X-Mailer: git-send-email 2.25.1
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH 1/2] vvcdec: alf,
 add avx2 for luma and chroma filter
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Nuo Mi <nuomi2021@gmail.com>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: ft/BXejNKx6X

got 11%~26% performance for 1080P and 4k video

clip                                        before      after   delta
RitualDance_1920x1080_60_10_420_32_LD.26        35          43    22.8%
RitualDance_1920x1080_60_10_420_37_RA.266       43          48    11.6%
Tango2_3840x2160_60_10_420_27_LD.266            7.9         10    26.5%
---
 libavcodec/vvcdsp.c          |   3 +
 libavcodec/x86/Makefile      |   2 +
 libavcodec/x86/vvc_alf.asm   | 301 +++++++++++++++++++++++++++++++++++
 libavcodec/x86/vvcdsp.h      |  44 +++++
 libavcodec/x86/vvcdsp_init.c |  81 ++++++++++
 5 files changed, 431 insertions(+)
 create mode 100644 libavcodec/x86/vvc_alf.asm
 create mode 100644 libavcodec/x86/vvcdsp.h
 create mode 100644 libavcodec/x86/vvcdsp_init.c

diff --git a/libavcodec/vvcdsp.c b/libavcodec/vvcdsp.c
index 801bd0189d..399631503f 100644
--- a/libavcodec/vvcdsp.c
+++ b/libavcodec/vvcdsp.c
@@ -313,4 +313,7 @@ void ff_vvc_dsp_init(VVCDSPContext *vvcdsp, int bit_depth)
         VVC_DSP(8);
         break;
     }
+#if ARCH_X86
+    ff_vvc_dsp_init_x86(vvcdsp, bit_depth);
+#endif
 }
diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile
index 118daca333..23b2fb42bb 100644
--- a/libavcodec/x86/Makefile
+++ b/libavcodec/x86/Makefile
@@ -82,6 +82,7 @@ OBJS-$(CONFIG_VP9_DECODER)             += x86/vp9dsp_init.o            \
                                           x86/vp9dsp_init_12bpp.o      \
                                           x86/vp9dsp_init_16bpp.o
 OBJS-$(CONFIG_WEBP_DECODER)            += x86/vp8dsp_init.o
+OBJS-$(CONFIG_VVC_DECODER)             += x86/vvcdsp_init.o
 
 
 # GCC inline assembly optimizations
@@ -202,4 +203,5 @@ X86ASM-OBJS-$(CONFIG_VP9_DECODER)      += x86/vp9intrapred.o            \
                                           x86/vp9lpf_16bpp.o            \
                                           x86/vp9mc.o                   \
                                           x86/vp9mc_16bpp.o
+X86ASM-OBJS-$(CONFIG_VVC_DECODER)      += x86/vvc_alf.o
 X86ASM-OBJS-$(CONFIG_WEBP_DECODER)     += x86/vp8dsp.o
diff --git a/libavcodec/x86/vvc_alf.asm b/libavcodec/x86/vvc_alf.asm
new file mode 100644
index 0000000000..c3e4074be7
--- /dev/null
+++ b/libavcodec/x86/vvc_alf.asm
@@ -0,0 +1,301 @@
+;******************************************************************************
+;* VVC Adaptive Loop Filter SIMD optimizations
+;*
+;* Copyright (c) 2023 Nuo Mi <nuomi2021@gmail.com>
+;*
+;* This file is part of FFmpeg.
+;*
+;* FFmpeg is free software; you can redistribute it and/or
+;* modify it under the terms of the GNU Lesser General Public
+;* License as published by the Free Software Foundation; either
+;* version 2.1 of the License, or (at your option) any later version.
+;*
+;* FFmpeg is distributed in the hope that it will be useful,
+;* but WITHOUT ANY WARRANTY; without even the implied warranty of
+;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;* Lesser General Public License for more details.
+;*
+;* You should have received a copy of the GNU Lesser General Public
+;* License along with FFmpeg; if not, write to the Free Software
+;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+;******************************************************************************
+
+%include "libavutil/x86/x86util.asm"
+
+SECTION_RODATA
+
+%macro PARAM_SHUFFE 1
+%assign i (%1  * 2)
+%assign j ((i + 1) << 8) + (i)
+param_shuffe_%+%1:
+%rep 2
+    times 4 dw j
+    times 4 dw (j + 0x0808)
+%endrep
+%endmacro
+
+PARAM_SHUFFE 0
+PARAM_SHUFFE 1
+PARAM_SHUFFE 2
+PARAM_SHUFFE 3
+
+dw_64:                  dd 64
+
+SECTION .text
+
+%if HAVE_AVX2_EXTERNAL
+
+;%1-%3 out
+;%4 clip or filter
+%macro LOAD_LUMA_PARAMS_W16 4
+    %ifidn clip, %4
+        movu            m%1, [%4q + 0 * 32]
+        movu            m%2, [%4q + 1 * 32]
+        movu            m%3, [%4q + 2 * 32]
+    %elifidn filter, %4
+        movu            xm%1, [%4q + 0 * 16]
+        movu            xm%2, [%4q + 1 * 16]
+        movu            xm%3, [%4q + 2 * 16]
+        pmovsxbw        m%1, xm%1
+        pmovsxbw        m%2, xm%2
+        pmovsxbw        m%3, xm%3
+    %else
+        %error "need filter or clip for the fourth param"
+    %endif
+%endmacro
+
+%macro LOAD_LUMA_PARAMS_W16 6
+    LOAD_LUMA_PARAMS_W16    %1, %2, %3, %4
+    ;m%1 = 03 02 01 00
+    ;m%2 = 07 06 05 04
+    ;m%3 = 11 10 09 08
+
+    vshufpd                 m%5, m%1, m%2, 0b0011       ;06 02 05 01
+    vshufpd                 m%6, m%3, m%5, 0b1001       ;06 10 01 09
+
+    vshufpd                 m%1, m%1, m%6, 0b1100       ;06 03 09 00
+    vshufpd                 m%2, m%2, m%6, 0b0110       ;10 07 01 04
+    vshufpd                 m%3, m%3, m%5, 0b0110       ;02 11 05 08
+
+    vpermpd                 m%1, m%1, 0b01_11_10_00     ;09 06 03 00
+    vshufpd                 m%2, m%2, m%2, 0b1001       ;10 07 04 01
+    vpermpd                 m%3, m%3, 0b10_00_01_11     ;11 08 05 02
+%endmacro
+
+%macro LOAD_LUMA_PARAMS_W4 6
+    %ifidn clip, %4
+        movq                xm%1, [%4q + 0 * 8]
+        movq                xm%2, [%4q + 1 * 8]
+        movq                xm%3, [%4q + 2 * 8]
+    %elifidn filter, %4
+        movd                xm%1, [%4q + 0 * 4]
+        movd                xm%2, [%4q + 1 * 4]
+        movd                xm%3, [%4q + 2 * 4]
+        pmovsxbw            xm%1, xm%1
+        pmovsxbw            xm%2, xm%2
+        pmovsxbw            xm%3, xm%3
+    %else
+        %error "need filter or clip for the fourth param"
+    %endif
+    vpbroadcastq            m%1, xm%1
+    vpbroadcastq            m%2, xm%2
+    vpbroadcastq            m%3, xm%3
+%endmacro
+
+;%1-%3 out
+;%4 clip or filter
+;%5, %6 tmp
+%macro LOAD_LUMA_PARAMS 6
+    LOAD_LUMA_PARAMS_W %+ WIDTH %1, %2, %3, %4, %5, %6
+%endmacro
+
+%macro LOAD_CHROMA_PARAMS 4
+    ;LOAD_CHROMA_PARAMS_W %+ WIDTH %1, %2, %3, %4
+    %ifidn clip, %3
+        movq            xm%1, [%3q]
+        movd            xm%2, [%3q + 8]
+    %elifidn filter, %3
+        movd            xm%1, [%3q + 0]
+        pinsrw          xm%2, [%3q + 4], 0
+        vpmovsxbw       m%1, xm%1
+        vpmovsxbw       m%2, xm%2
+    %else
+        %error "need filter or clip for the third param"
+    %endif
+    vpbroadcastq    m%1, xm%1
+    vpbroadcastq    m%2, xm%2
+%endmacro
+
+%macro LOAD_PARAMS 0
+    %if LUMA
+        LOAD_LUMA_PARAMS     3, 4, 5, filter, 6, 7
+        LOAD_LUMA_PARAMS     6, 7, 8, clip,   9, 10
+    %else
+        LOAD_CHROMA_PARAMS   3, 4, filter, 5
+        LOAD_CHROMA_PARAMS   6, 7, clip, 8
+    %endif
+%endmacro
+
+;FILTER(param_idx)
+;input: m2, m9, m10
+;output: m0, m1
+;m12 ~ m15: tmp
+%macro FILTER 1
+    %assign i (%1 % 4)
+    %assign j (%1 / 4 + 3)
+    %assign k (%1 / 4 + 6)
+    %define filters m%+j
+    %define clips m%+k
+
+    movu            m12, [param_shuffe_%+i]
+    pshufb          m14, clips, m12          ;clip
+    pxor            m13, m13
+    psubw           m13, m14                 ;-clip
+
+    vpsubw          m9, m2
+    CLIPW           m9, m13, m14
+
+    vpsubw          m10, m2
+    CLIPW           m10, m13, m14
+
+    vpunpckhwd      m15, m9, m10
+    vpunpcklwd      m9, m9, m10
+
+    pshufb          m14, filters, m12       ;filter
+    vpunpcklwd      m10, m14, m14
+    vpunpckhwd      m14, m14, m14
+
+    vpmaddwd        m9, m10
+    vpmaddwd        m14, m15
+
+    paddd           m0, m9
+    paddd           m1, m14
+%endmacro
+
+;FILTER(param_start, off0~off2)
+%macro FILTER 4
+    %assign %%i (%1)
+    %rep 3
+        lea             offsetq, [%2]
+        mov             topq, srcq
+        mov             bottomq, srcq
+        sub             topq, offsetq
+        add             bottomq, offsetq
+        LOAD_PIXELS     9, topq, 11
+        LOAD_PIXELS     10, bottomq, 12
+        FILTER  %%i
+        %assign %%i %%i+1
+        %rotate 1
+    %endrep
+%endmacro
+
+;filter pixels for luma and chroma
+%macro FILTER 0
+    %if LUMA
+        FILTER          0, src_stride3q ,           src_strideq  * 2 + ps,  src_strideq  * 2
+        FILTER          3, src_strideq  * 2 - ps,   src_strideq  + 2 * ps,  src_strideq  + ps
+        FILTER          6, src_strideq,             src_strideq  - ps,      src_strideq  + -2 * ps
+        FILTER          9, src_stride0q + 3 * ps,   src_stride0q + 2 * ps,  src_stride0q + ps
+    %else
+        FILTER          0, src_strideq * 2,         src_strideq  + ps,      src_strideq
+        FILTER          3, src_strideq - ps,        src_stride0q + 2 * ps,  src_stride0q + ps
+    %endif
+%endmacro
+
+%define SHIFT 7
+
+;LOAD_PIXELS(dest, src, tmp)
+%macro LOAD_PIXELS 3
+    %if WIDTH == 16
+        movu            m%1, [%2]
+    %else
+        pinsrq          xm%1, [%2], 0
+        pinsrq          xm%1, [%2 + src_strideq], 1
+        pinsrq          xm%3, [%2 + src_strideq * 2], 0
+        pinsrq          xm%3, [%2 + src_stride3q], 1
+        vinsertf128     m%1, xm%3, 1
+    %endif
+%endmacro
+
+;STORE_PIXELS(dest, src, tmp)
+%macro STORE_PIXELS 3
+    %if WIDTH == 16
+        movu            [%1], m%2
+    %else
+        pextrq          [%1], xm%2, 0
+        pextrq          [%1 + src_strideq], xm%2, 1
+        vperm2f128      m%2, m%2, 1
+        pextrq          [%1 + src_strideq * 2], xm%2, 0
+        pextrq          [%1 + src_stride3q], xm%2, 1
+    %endif
+%endmacro
+
+;FILTER_LUMA(width)
+%macro ALF_FILTER_16BPP 2
+%ifidn %1, luma
+    %xdefine LUMA 1
+%else
+    %xdefine LUMA 0
+%endif
+%xdefine WIDTH %2
+; void vvc_alf_filter_luma_w%1_16bpp_avx2(uint8_t *dst, ptrdiff_t dst_stride,
+;    const uint8_t *src, ptrdiff_t src_stride, int height,
+;    const int8_t *filter, const int16_t *clip, ptrdiff_t stride, uint16_t pixel_max);
+
+; see c code for p0 to p6
+
+INIT_YMM avx2
+cglobal vvc_alf_filter_%1_w%2_16bpp, 9, 15, 15, dst, dst_stride, src, src_stride, height, filter, clip, stride, pixel_max, \
+    top, bottom, offset, src_stride3, src_stride0
+%define ps 2
+    lea             src_stride3q, [src_strideq * 2 + src_strideq]
+    mov             src_stride0q, 0
+    shr             heightq, 2
+
+.loop:
+    LOAD_PARAMS
+
+;we need loop 4 times for a 16x4 block, 1 time for a 4x4 block
+%define rep_num (WIDTH / 4)
+%define lines  (4 / rep_num)
+%rep rep_num
+    VPBROADCASTD    m0, [dw_64]
+    VPBROADCASTD    m1, [dw_64]
+
+    LOAD_PIXELS     2, srcq, 9   ;p0
+
+    FILTER
+
+    vpsrad          m0, SHIFT
+    vpsrad          m1, SHIFT
+
+    vpackssdw       m0, m0, m1
+    paddw           m0, m2
+
+    ;clip to pixel
+    pinsrw          xm2, pixel_maxw, 0
+    vpbroadcastw    m2, xm2
+    pxor            m1, m1
+    CLIPW           m0, m1, m2
+
+    STORE_PIXELS    dstq, 0, 1
+
+    lea             srcq, [srcq + lines * src_strideq]
+    lea             dstq, [dstq + lines * dst_strideq]
+%endrep
+
+    lea             filterq, [filterq + strideq]
+    lea             clipq, [clipq + 2 * strideq]
+
+    dec             heightq
+    jg              .loop
+    RET
+%endmacro
+
+ALF_FILTER_16BPP luma, 16
+ALF_FILTER_16BPP luma, 4
+ALF_FILTER_16BPP chroma, 16
+ALF_FILTER_16BPP chroma, 4
+
+%endif
+
diff --git a/libavcodec/x86/vvcdsp.h b/libavcodec/x86/vvcdsp.h
new file mode 100644
index 0000000000..8589d4ae97
--- /dev/null
+++ b/libavcodec/x86/vvcdsp.h
@@ -0,0 +1,44 @@
+/*
+ * VVC DSP for x86
+ *
+ * Copyright (C) 2022 Nuo Mi
+ *
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVCODEC_X86_VVCDSP_H
+#define AVCODEC_X86_VVCDSP_H
+
+void ff_vvc_alf_filter_luma_w16_16bpp_avx2(uint8_t *dst, ptrdiff_t dst_stride,
+    const uint8_t *src, ptrdiff_t src_stride, int height,
+    const int8_t *filter, const int16_t *clip, ptrdiff_t stride, uint16_t pixel_max);
+
+void ff_vvc_alf_filter_luma_w4_16bpp_avx2(uint8_t *dst, ptrdiff_t dst_stride,
+    const uint8_t *src, ptrdiff_t src_stride, int height,
+    const int8_t *filter, const int16_t *clip, ptrdiff_t stride, uint16_t pixel_max);
+
+void ff_vvc_alf_filter_chroma_w16_16bpp_avx2(uint8_t *dst, ptrdiff_t dst_stride,
+    const uint8_t *src, ptrdiff_t src_stride, int height,
+    const int8_t *filter, const int16_t *clip, ptrdiff_t stride, uint16_t pixel_max);
+
+void ff_vvc_alf_filter_chroma_w4_16bpp_avx2(uint8_t *dst, ptrdiff_t dst_stride,
+    const uint8_t *src, ptrdiff_t src_stride, int height,
+    const int8_t *filter, const int16_t *clip, ptrdiff_t stride, uint16_t pixel_max);
+
+#endif //AVCODEC_X86_VVCDSP_H
+
diff --git a/libavcodec/x86/vvcdsp_init.c b/libavcodec/x86/vvcdsp_init.c
new file mode 100644
index 0000000000..c595ed55fa
--- /dev/null
+++ b/libavcodec/x86/vvcdsp_init.c
@@ -0,0 +1,81 @@
+/*
+ * VVC DSP init for x86
+ *
+ * Copyright (C) 2022 Nuo Mi
+ *
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+
+#include "libavutil/cpu.h"
+#include "libavutil/x86/asm.h"
+#include "libavutil/x86/cpu.h"
+#include "libavcodec/vvcdec.h"
+#include "libavcodec/vvcdsp.h"
+#include "libavcodec/x86/vvcdsp.h"
+
+static void alf_filter_luma_10_avx2(uint8_t *dst, ptrdiff_t dst_stride, const uint8_t *src, ptrdiff_t src_stride,
+    int width, int height, const int8_t *filter, const int16_t *clip)
+{
+    const int ps            = 1;                                    //pixel shift
+    const int pixel_max     = (1 << 10) - 1;
+    const int param_stride  = (width >> 2) * ALF_NUM_COEFF_LUMA;
+    int w;
+
+    for (w = 0; w + 16 <= width; w += 16) {
+        const int param_offset = w * ALF_NUM_COEFF_LUMA / ALF_BLOCK_SIZE;
+        ff_vvc_alf_filter_luma_w16_16bpp_avx2(dst + (w << ps), dst_stride, src + (w << ps), src_stride,
+            height, filter + param_offset, clip + param_offset, param_stride, pixel_max);
+    }
+    for ( /* nothing */; w < width; w += 4) {
+        const int param_offset = w * ALF_NUM_COEFF_LUMA / ALF_BLOCK_SIZE;
+        ff_vvc_alf_filter_luma_w4_16bpp_avx2(dst + (w << ps), dst_stride, src + (w << ps), src_stride,
+            height, filter + param_offset, clip + param_offset, param_stride, pixel_max);
+    }
+}
+
+static void alf_filter_chroma_10_avx2(uint8_t *dst, ptrdiff_t dst_stride, const uint8_t *src, ptrdiff_t src_stride,
+    int width, int height, const int8_t *filter, const int16_t *clip)
+{
+    const int ps            = 1;                                    //pixel shift
+    const int pixel_max     = (1 << 10) - 1;
+    int w;
+
+    for (w = 0; w + 16 <= width; w += 16) {
+        ff_vvc_alf_filter_chroma_w16_16bpp_avx2(dst + (w << ps), dst_stride, src + (w << ps), src_stride,
+            height, filter, clip, 0, pixel_max);
+    }
+    for ( /* nothing */; w < width; w += 4) {
+        ff_vvc_alf_filter_chroma_w4_16bpp_avx2(dst + (w << ps), dst_stride, src + (w << ps), src_stride,
+            height, filter, clip, 0, pixel_max);
+    }
+}
+
+void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bit_depth)
+{
+    const int cpu_flags = av_get_cpu_flags();
+
+    if (bit_depth == 10) {
+        if (EXTERNAL_AVX2(cpu_flags)) {
+            c->alf.filter[LUMA] = alf_filter_luma_10_avx2;
+            c->alf.filter[CHROMA] = alf_filter_chroma_10_avx2;
+        }
+    }
+}
+