From patchwork Wed Aug 28 15:21:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51203 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:c944:0:b0:48e:c0f8:d0de with SMTP id k4csp949356vqt; Wed, 28 Aug 2024 08:21:26 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWya2LyS/7kjxpriZJoSJgHV7aKPAozDpMm5I+rGTc3vYvYi4hInCxCpvFG0HYFv0xgDujv8Wil2pfO56vWFEQk@gmail.com X-Google-Smtp-Source: AGHT+IH37L9OBTABLumGs1zSU20SDgQSDqajiZnIy90bMG9IBOCeUmfd30kv6j8sL+H0/IKslck8 X-Received: by 2002:a05:651c:b27:b0:2f3:eeab:7f17 with SMTP id 38308e7fff4ca-2f61054a141mr502881fa.41.1724858485717; Wed, 28 Aug 2024 08:21:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1724858485; cv=none; d=google.com; s=arc-20240605; b=gWCNtl31+vp8D8efm0LZos3A3s7i5nOTJatvi3no8vZuHycm3vzgWmFOn/fSiBqInX 5PJLnxmRUI214GfbZIB10jC2rI404cClU/ISgIEdbxwV69q0u3/SPdxnIDDXi2REQrvg F25VKkz+0etsA30LKXZUD+ANjKLeDOpue4h6e+MswjfSH657EpGMRRUdi2jl6MQXdfd/ ScTszSEfrjqfaZ2S5Dr//jUf/oPT2BwJGASE/2JFSUP7lzhWZ0XbxotdKcg0LwptqDRK R9Eg9XUi1LJk+QY8zccrO59Eb4ygcBDyI/Ner4YMIhNsfqiqtlpDGNmuCp2OBgJ4h5Hl GO5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=FTGd+UJ0p9uv0orPqWJ5gVFe4L3c8wdsVcbP5EPkKck=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=cSqzxfRslKlivVA+uHAG+HS2pUFm53pFsEsymhWEUfOxO1ALqXr8bJ2SSsplaGgLhU 2pfTjP2Ag2FnU++ykIGCwHYb8eWPzDrzGxvSii++mgybyFA/Oin5+Cs3qDbeiYnKXB2S Y8/ML5SgoaG8cx5xtwVfpQvBfjxQnMhlb1Kf5VRItkzWT2lU84uDGJwoyOPUyUp6x7bm 5UyCpp2Bp0QdK4i5MskN1s066YpeZZLPXjxGhhwTg9tfSDI6QWeKeyE/5FhY3uSEMaH+ Dqd02xayFgLh/4ESUHcUDzroLxjb5x/AqU72seU6E58Zq0JK+qB17VyZGOKeo1A1uzeF hWfg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=hAejJBKD; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5c0bb4a5d55si3328260a12.581.2024.08.28.08.21.25; Wed, 28 Aug 2024 08:21:25 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=hAejJBKD; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 26AB868DF6A; Wed, 28 Aug 2024 18:21:22 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-57-49.mail.qq.com (out162-62-57-49.mail.qq.com [162.62.57.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0FD4C68DEAB for ; Wed, 28 Aug 2024 18:21:13 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1724858465; bh=h0+0VMn4ruwmoPX8BIHJZcOL6WFqjxS7iBl4YssXq6s=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=hAejJBKD/gC+JMS3VvcmIDLdlgAVyWvXSuG20vXoP+df1Be5O+rtpOJxMY2is0mLc 07XYwjfMn/s6/OdbHYA1mCAEeMOAT1bGg9AWophVhaGI6Ro0/Ne3BMJbhzV1qdP73a T/rqm4NqO0Gh1iXNdECdt11uGjtp6PEGZlThkEBY= Received: from ZHILIZHAO-MB1.tencent.com ([121.35.184.249]) by newxmesmtplogicsvrsza15-1.qq.com (NewEsmtp) with SMTP id 54314468; Wed, 28 Aug 2024 23:21:03 +0800 X-QQ-mid: xmsmtpt1724858464t13uzbp08 Message-ID: X-QQ-XMAILINFO: MyIXMys/8kCtLI6dXacj2OSaxYuP735lortpgTp/QXnliViLxfRFzHfVdcd4v8 nC8YH+Vuq5BeKt37BeEXe3Jd4suLgSwrvQK2q72OlSh7pn/YEV9MXziZsouB0VnRV6t1oACMkO+V lp4IlPgAjEUeUgCx7HJi9Ab+yMLjf1FDmT28s0niIfW/rUTOHyKls1VAignCdO83cpgXQTEeG8c9 TwfNB708PyOBSV9p4wSWZk3+GxFUupDy2Yivz82cba3HYrRH0Vo1Rh7yIPXgqzGxX87B5ZWRW9EX BuElmchiN4cBr8cqmKE0zeMDBUqECINOwBvS6b6xPdCQHExYz8dYpg6UhD+bgpRz82keHw7zb2Fw zzACTDNf3VOx/sNbUGhhu/bFRUQCSD75F6M7li2LHZAsy9j/pLEYQ7QU/Tod0I9i/IWgV3aYe7TM rF870zD7qzN6KvdBYqj/PBaziDR4K21W/7YAi9lev8QTIVeVfUAMWzHFKz8fYnn0iJoQqx4y2g7k c77rHN/OUlvAUXR/fE0kdi0x8ZWUJAN2nChb+x629JoayZQCgjcMZLZhSHYA+m3pIyttq4/HALEw zm+pIrhhj69kBI09kIQ9dKPGw6ZNbQtYB9c2swb/IinpDptU3YkTJO3WOvTuBY1TBnXtd78V9TSl tNBzXByCLH6IzJg0qqd3UspCP6RQrhwZaN+jb4zziyvb4gAF2D72wye9BMO4qRS1PF7ex/fQyHuF JlOArGtTxZhAbxl2L4t7gNlNN7kRfI+DXxost3ccm69qQFkRzE7QVpvyplTQMIQ5q1QvH8quB05X bltVSOqhrCOsMk4Ozst5cwgAHUrYsjE+AAyGwYJFxeue/nPiEQG62Tx02E0yeC7gWHscR2yLLGN/ 5gn6vvXaU4gpixphxbCQidMQOynhk4ImRWqIPEWQYcDZuEwTbCOGHhH7+PqRBDuaSPDqThNEv8Kj u/sPvK1mpWMF1v1Ww4hFvuUMVSnj2Wp+kj250ihHfU0usQJlYClQ== X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Wed, 28 Aug 2024 23:21:01 +0800 X-OQ-MSGID: <20240828152101.91510-2-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240828152101.91510-1-quinkblack@foxmail.com> References: <20240828152101.91510-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] aarch64/vvc: Bind h26x/sao filter implementation to vvc X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: /o1yhXlM3+sq From: Zhao Zhili --- libavcodec/aarch64/h26x/dsp.h | 6 +++- libavcodec/aarch64/h26x/sao_neon.S | 44 +++++++++++++++++------ libavcodec/aarch64/hevcdsp_init_aarch64.c | 2 +- libavcodec/aarch64/vvc/Makefile | 5 +-- libavcodec/aarch64/vvc/dsp_init.c | 6 ++++ 5 files changed, 48 insertions(+), 15 deletions(-) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 4dcaf0e6bb..d3f7a4dfe3 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -24,7 +24,7 @@ #include #include -void ff_hevc_sao_band_filter_8x8_8_neon(uint8_t *_dst, const uint8_t *_src, +void ff_h26x_sao_band_filter_8x8_8_neon(uint8_t *_dst, const uint8_t *_src, ptrdiff_t stride_dst, ptrdiff_t stride_src, const int16_t *sao_offset_val, int sao_left_class, int width, int height); @@ -33,4 +33,8 @@ void ff_hevc_sao_edge_filter_16x16_8_neon(uint8_t *dst, const uint8_t *src, ptrd void ff_hevc_sao_edge_filter_8x8_8_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride_dst, const int16_t *sao_offset_val, int eo, int width, int height); +void ff_vvc_sao_edge_filter_16x16_8_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride_dst, + const int16_t *sao_offset_val, int eo, int width, int height); +void ff_vvc_sao_edge_filter_8x8_8_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride_dst, + const int16_t *sao_offset_val, int eo, int width, int height); #endif diff --git a/libavcodec/aarch64/h26x/sao_neon.S b/libavcodec/aarch64/h26x/sao_neon.S index dc407484de..c43820135e 100644 --- a/libavcodec/aarch64/h26x/sao_neon.S +++ b/libavcodec/aarch64/h26x/sao_neon.S @@ -24,15 +24,17 @@ #include "libavutil/aarch64/asm.S" -#define MAX_PB_SIZE 64 +#define HEVC_MAX_PB_SIZE 64 +#define VVC_MAX_PB_SIZE 128 #define AV_INPUT_BUFFER_PADDING_SIZE 64 -#define SAO_STRIDE (2*MAX_PB_SIZE + AV_INPUT_BUFFER_PADDING_SIZE) +#define HEVC_SAO_STRIDE (2 * HEVC_MAX_PB_SIZE + AV_INPUT_BUFFER_PADDING_SIZE) +#define VVC_SAO_STRIDE (2 * VVC_MAX_PB_SIZE + AV_INPUT_BUFFER_PADDING_SIZE) // void sao_band_filter(uint8_t *_dst, uint8_t *_src, // ptrdiff_t stride_dst, ptrdiff_t stride_src, // int16_t *sao_offset_val, int sao_left_class, // int width, int height) -function ff_hevc_sao_band_filter_8x8_8_neon, export=1 +function ff_h26x_sao_band_filter_8x8_8_neon, export=1 stp xzr, xzr, [sp, #-64]! stp xzr, xzr, [sp, #16] stp xzr, xzr, [sp, #32] @@ -79,16 +81,30 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 ret endfunc -.Lsao_edge_pos: +.Lhevc_sao_edge_pos: .word 1 // horizontal -.word SAO_STRIDE // vertical -.word SAO_STRIDE + 1 // 45 degree -.word SAO_STRIDE - 1 // 135 degree +.word HEVC_SAO_STRIDE // vertical +.word HEVC_SAO_STRIDE + 1 // 45 degree +.word HEVC_SAO_STRIDE - 1 // 135 degree + +.Lvvc_sao_edge_pos: +.word 1 // horizontal +.word VVC_SAO_STRIDE // vertical +.word VVC_SAO_STRIDE + 1 // 45 degree +.word VVC_SAO_STRIDE - 1 // 135 degree + +function ff_vvc_sao_edge_filter_16x16_8_neon, export=1 + adr x7, .Lvvc_sao_edge_pos + mov x15, #VVC_SAO_STRIDE + b 1f +endfunc // ff_hevc_sao_edge_filter_16x16_8_neon(char *dst, char *src, ptrdiff stride_dst, // int16 *sao_offset_val, int eo, int width, int height) function ff_hevc_sao_edge_filter_16x16_8_neon, export=1 - adr x7, .Lsao_edge_pos + adr x7, .Lhevc_sao_edge_pos + mov x15, #HEVC_SAO_STRIDE +1: ld1 {v3.8h}, [x3] // load sao_offset_val add w5, w5, #0xF bic w5, w5, #0xF @@ -101,7 +117,6 @@ function ff_hevc_sao_edge_filter_16x16_8_neon, export=1 uzp2 v1.16b, v3.16b, v3.16b // sao_offset_val -> upper uzp1 v0.16b, v3.16b, v3.16b // sao_offset_val -> lower movi v2.16b, #2 - mov x15, #SAO_STRIDE // strides between end of line and next src/dst sub x15, x15, x5 // stride_src - width sub x16, x2, x5 // stride_dst - width @@ -145,10 +160,18 @@ function ff_hevc_sao_edge_filter_16x16_8_neon, export=1 ret endfunc +function ff_vvc_sao_edge_filter_8x8_8_neon, export=1 + adr x7, .Lvvc_sao_edge_pos + mov x15, #VVC_SAO_STRIDE + b 1f +endfunc + // ff_hevc_sao_edge_filter_8x8_8_neon(char *dst, char *src, ptrdiff stride_dst, // int16 *sao_offset_val, int eo, int width, int height) function ff_hevc_sao_edge_filter_8x8_8_neon, export=1 - adr x7, .Lsao_edge_pos + adr x7, .Lhevc_sao_edge_pos + mov x15, #HEVC_SAO_STRIDE +1: ldr w4, [x7, w4, uxtw #2] ld1 {v3.8h}, [x3] mov v3.h[7], v3.h[0] @@ -160,7 +183,6 @@ function ff_hevc_sao_edge_filter_8x8_8_neon, export=1 movi v2.16b, #2 add x16, x0, x2 lsl x2, x2, #1 - mov x15, #SAO_STRIDE mov x8, x1 sub x9, x1, x4 add x10, x1, x4 diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 7efae0f740..a90da0246e 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -384,7 +384,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->sao_band_filter[1] = c->sao_band_filter[2] = c->sao_band_filter[3] = - c->sao_band_filter[4] = ff_hevc_sao_band_filter_8x8_8_neon; + c->sao_band_filter[4] = ff_h26x_sao_band_filter_8x8_8_neon; c->sao_edge_filter[0] = ff_hevc_sao_edge_filter_8x8_8_neon; c->sao_edge_filter[1] = c->sao_edge_filter[2] = diff --git a/libavcodec/aarch64/vvc/Makefile b/libavcodec/aarch64/vvc/Makefile index 58398d6e3d..54c49fea92 100644 --- a/libavcodec/aarch64/vvc/Makefile +++ b/libavcodec/aarch64/vvc/Makefile @@ -1,5 +1,6 @@ clean:: $(RM) $(CLEANSUFFIXES:%=libavcodec/aarch64/vvc/%) -OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/dsp_init.o -NEON-OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/alf.o +OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/dsp_init.o +NEON-OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/alf.o \ + aarch64/h26x/sao_neon.o diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 2a9f25911f..0aac140a8f 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -22,6 +22,7 @@ #include "libavutil/cpu.h" #include "libavutil/aarch64/cpu.h" +#include "libavcodec/aarch64/h26x/dsp.h" #include "libavcodec/vvc/dsp.h" #include "libavcodec/vvc/dec.h" #include "libavcodec/vvc/ctu.h" @@ -45,6 +46,11 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) return; if (bd == 8) { + for (int i = 0; i < FF_ARRAY_ELEMS(c->sao.band_filter); i++) + c->sao.band_filter[i] = ff_h26x_sao_band_filter_8x8_8_neon; + c->sao.edge_filter[0] = ff_vvc_sao_edge_filter_8x8_8_neon; + for (int i = 1; i < FF_ARRAY_ELEMS(c->sao.edge_filter); i++) + c->sao.edge_filter[i] = ff_vvc_sao_edge_filter_16x16_8_neon; c->alf.filter[LUMA] = alf_filter_luma_8_neon; c->alf.filter[CHROMA] = alf_filter_chroma_8_neon; } else if (bd == 10) {