From patchwork Mon Sep 23 09:05:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51734 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:d154:0:b0:48e:c0f8:d0de with SMTP id bt20csp2319242vqb; Mon, 23 Sep 2024 02:06:17 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWtUTb96we3Aj4U152EFsomxX/0yLkb+UT6xQ58i2GYAgMKWUHuiQbFxsSUKF43YkqJ63iHfay4eB6Mjno+Swi8@gmail.com X-Google-Smtp-Source: AGHT+IGVeiJjJOvffNCzkIMt3sHxCcCVCxQoR365pChecK3hIt6Bw4bEztEyMsCAxBC7YCfoQcEM X-Received: by 2002:a17:907:f747:b0:a7a:a0c2:8be9 with SMTP id a640c23a62f3a-a90d5611379mr1186635166b.18.1727082376933; Mon, 23 Sep 2024 02:06:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1727082376; cv=none; d=google.com; s=arc-20240605; b=AgR7VY1ZZGEnPRPS+hPSHDkA6CQeIlk1Lo4t44h6uM7ztQj0B2NPRSSWV09Lp++yUd Pbp1rUzKUhYBxynZxz9/0YavUef2Jr46sPVlWxxcvBvusOj8J9tdJMRD0FD/h2iknlHm lvdshiPN92kBeFSuBJm/1VR9Nn+SlEoxnH3ynht0bGUQ85IqjsZ30rVp8QRyH32G6GiK CQEo81DbDYRzpvJTLQBOnLqvlu1stArHGBxzSZpVBTQjMYraDGw1CyPTftPI9VjoZnsz NY+YskoaE4yVfTpAyV67rCibsWt+KD5YgcAqCgK+2jpaP9Clw8Sh5CTh0VNE5MFHWwcc nsjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=rTOMEUnr9VqbUjgaDoUP9CvuKEfMz7lyZRJCBlviH5U=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=BvuxwfBr6KobyhfCZQmzuboIqnYLIlM/N+fc5QDs8BN4ByZ903KZ1hpgNfAo0Rl8wF 0AxPzqERrAcRPO3NGxq5TVdRx73+PZkoGloDcBA+eT76BH5DIu64KwY1sPc/HLqz+TX4 A15QCgkdVPTeDQTeC0YjpwWdzdeMq/rqPhDYIQeV0jL6K/UlZjQvkRjCNaeiBYIqFPcF gkHgj+7kvlrw1W8WHuE+2Nf5FDxebr/uiCTMC7PffsPWV1kTNm6Ne93jyRgv9ylnq+c3 JB+zb/8OoHzexvEmwXjr2M+4Z3O1hXM4VcavHpaQl3CQ9yBTS66PoK4UmcD+tjj73j+p wl7w==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=RDqjYrG8; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a90613362f6si1324292266b.757.2024.09.23.02.06.16; Mon, 23 Sep 2024 02:06:16 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=RDqjYrG8; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E760768DB88; Mon, 23 Sep 2024 12:06:02 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-173.mail.qq.com (out203-205-221-173.mail.qq.com [203.205.221.173]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8342B68DB29 for ; Mon, 23 Sep 2024 12:05:53 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1727082344; bh=NHUk38IymlzuvIdyVqFoS7AIaZAUAaFMufnwvAE/7FU=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=RDqjYrG8RgKFjbGmjUezFpkYavMP1pTMc1o/6Bc1aULxYudCFvBresqXO7cMvlHqp GTpGSQE3A4KI+DUTut2GDYuifGNKrLwMoCev/VxxleiJAvp9r7BwdJmJ70ovYpxRd4 ZwVCatlzoqbYIF9nUW3Q2f324blpcUSVJDr8qqNY= Received: from ZHILIZHAO-MB1.tencent.com ([119.147.10.207]) by newxmesmtplogicsvrszc19-0.qq.com (NewEsmtp) with SMTP id 16A88826; Mon, 23 Sep 2024 17:05:42 +0800 X-QQ-mid: xmsmtpt1727082344tppjvljhi Message-ID: X-QQ-XMAILINFO: Ncpnai4dwVTHx9PP/LZUvaT0qL3jHQ5h93cimjPcTivPWFuOUF6Eer+zSxB6w/ ebPhXuGL/dxl124vk/2Ah+Ggch1ZnZY7bWnfLKFGzh0rMemzePsYEevQNzY88U8cdjYbtor3Sw43 /W/eo0zvR7R3zrnvP99qoTsXoN42YjXxXGxOqZjXNldykz10TcjLr5bB8mFjpEba1BTC/mCKrcSa GzYnKVA7howyNmvH0WeZwNpOJgf/x6RrQtJeCyuqZ8Ddki8Ah8A2r2o83wNqJeA5aY2CCQsgf7av 1lFRPpz2FGPVExxzuzGzOHZImUxGjkZprprJ/ch1zQP+zwW79Vxvr6XDTtfvuVjhhcK/J5OXeTwz Uc0vomiYbCjGB8etF6VsT9L3HVE79CltSUxXVsModwy9WyxcKXI/LAoiByVhMr0OXpZz6dpzBFIl YvxFeI9CtP/grTnMF2PjvtEFveX4UhzcXN+kFpOVfj9073E41ePqnH4qWf1NhGt0vZYvOIUnmapg UBmE2P+DNeak5tuNOTIce3axOAeV+XJhzL3AVKoRRIFnjpOGQqtQcNr/dHZ1hGqlN9PTdqIICM0S 5+iaC8UxL8iny+N3aPqatGB4RlC1eHRQkZTp80VS7hM9GppLBqhz6HeH2/29ZECiMYQSVF4MjYvk NtVcKJpKBPZgXlRNsLA1VtgXJgXmRTshp0NdFrdvI1jKWVihyboAbWS6tUjtkDY0jD2u05n/k4iX Oq3X/PeqL3tVpKArnLggBeA+AXd9QKASwpURt/URzdwOZ6uwFaZKNZMdFRAN2UGeuQSAKXLO7soV D64bqHwF6Bcdg+FuGq32KUdR6QDAGiocllr9Bu63gMKPVcGs4c7ct23/6g5VBMV2VK0IXYCvm6eW bMilpB8FyAir9rlVvM+uLI2kdrdAbyY0R1K9MiFwyp0KduLgT2WW7VELMGUZ0vJZHF/skAGpK5lq RWiPL7vQ81ZLZdGgeQKvTgGa/jNpYJwp2QGkRhMJYL3Kg30nO0d4PeEx+Z9jFmWbtcnnzQOtk= X-QQ-XMRINFO: OD9hHCdaPRBwq3WW+NvGbIU= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Mon, 23 Sep 2024 17:05:40 +0800 X-OQ-MSGID: <20240923090540.18807-3-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240923090540.18807-1-quinkblack@foxmail.com> References: <20240923090540.18807-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 3/3] aarch64/vvc: Add dmvr X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: uo/gteBf8cHa From: Zhao Zhili dmvr_8_12x20_c: 2.2 ( 1.00x) dmvr_8_12x20_neon: 0.5 ( 4.50x) dmvr_8_20x12_c: 2.0 ( 1.00x) dmvr_8_20x12_neon: 0.2 ( 8.00x) dmvr_8_20x20_c: 3.2 ( 1.00x) dmvr_8_20x20_neon: 0.5 ( 6.50x) dmvr_12_12x20_c: 2.2 ( 1.00x) dmvr_12_12x20_neon: 0.5 ( 4.50x) dmvr_12_20x12_c: 2.2 ( 1.00x) dmvr_12_20x12_neon: 0.5 ( 4.50x) dmvr_12_20x20_c: 3.2 ( 1.00x) dmvr_12_20x20_neon: 0.8 ( 4.33x) --- libavcodec/aarch64/vvc/dsp_init.c | 4 ++ libavcodec/aarch64/vvc/inter.S | 94 ++++++++++++++++++++++++++++++- 2 files changed, 97 insertions(+), 1 deletion(-) diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 995e26d163..e9bb65bd0f 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -88,6 +88,8 @@ W_AVG_FUN(12) const uint8_t *_src, const ptrdiff_t _src_stride, const int height, \ const intptr_t mx, const intptr_t my, const int width); +DMVR_FUN(, 8) +DMVR_FUN(, 12) DMVR_FUN(hv_, 8) DMVR_FUN(hv_, 10) DMVR_FUN(hv_, 12) @@ -164,6 +166,7 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.avg = ff_vvc_avg_8_neon; c->inter.w_avg = vvc_w_avg_8; + c->inter.dmvr[0][0] = ff_vvc_dmvr_8_neon; c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_8_neon; for (int i = 0; i < FF_ARRAY_ELEMS(c->sao.band_filter); i++) @@ -213,6 +216,7 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) } else if (bd == 12) { c->inter.avg = ff_vvc_avg_12_neon; c->inter.w_avg = vvc_w_avg_12; + c->inter.dmvr[0][0] = ff_vvc_dmvr_12_neon; c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_12_neon; c->alf.filter[LUMA] = alf_filter_luma_12_neon; diff --git a/libavcodec/aarch64/vvc/inter.S b/libavcodec/aarch64/vvc/inter.S index a0bb356f07..1b3fb5b468 100644 --- a/libavcodec/aarch64/vvc/inter.S +++ b/libavcodec/aarch64/vvc/inter.S @@ -235,7 +235,7 @@ vvc_avg w_avg, 12 * x5: const intptr_t my * w6: const int width */ -function ff_vvc_dmvr_hv_8_neon, export=1 +function ff_vvc_dmvr_8_neon, export=1 dst .req x0 src .req x1 src_stride .req x2 @@ -243,6 +243,98 @@ function ff_vvc_dmvr_hv_8_neon, export=1 mx .req x4 my .req x5 width .req w6 + + sxtw x6, w6 + mov x7, #(VVC_MAX_PB_SIZE * 2 + 8) + cmp width, #16 + sub src_stride, src_stride, x6 + cset w15, gt // width > 16 + movi v16.8h, #2 // DMVR_SHIFT + sub x7, x7, x6, lsl #1 +1: + cbz w15, 2f + ldr q0, [src], #16 + uxtl v1.8h, v0.8b + uxtl2 v2.8h, v0.16b + ushl v1.8h, v1.8h, v16.8h + ushl v2.8h, v2.8h, v16.8h + stp q1, q2, [dst], #32 + b 3f +2: + ldr d0, [src], #8 + uxtl v1.8h, v0.8b + ushl v1.8h, v1.8h, v16.8h + str q1, [dst], #16 +3: + subs height, height, #1 + ldr s3, [src], #4 + uxtl v4.8h, v3.8b + ushl v4.4h, v4.4h, v16.4h + st1 {v4.4h}, [dst], x7 + + add src, src, src_stride + b.ne 1b + + ret +endfunc + +function ff_vvc_dmvr_12_neon, export=1 + sxtw x6, w6 + mov x7, #(VVC_MAX_PB_SIZE * 2 + 8) + cmp width, #16 + sub src_stride, src_stride, x6, lsl #1 + cset w15, gt // width > 16 + movi v16.4s, #2 // offset4 + sub x7, x7, x6, lsl #1 +1: + cbz w15, 2f + ldp q0, q1, [src], #32 + uxtl v2.4s, v0.4h + uxtl2 v3.4s, v0.8h + uxtl v4.4s, v1.4h + uxtl2 v5.4s, v1.8h + add v2.4s, v2.4s, v16.4s + add v3.4s, v3.4s, v16.4s + add v4.4s, v4.4s, v16.4s + add v5.4s, v5.4s, v16.4s + ushr v2.4s, v2.4s, #2 + ushr v3.4s, v3.4s, #2 + ushr v4.4s, v4.4s, #2 + ushr v5.4s, v5.4s, #2 + uqxtn v2.4h, v2.4s + uqxtn2 v2.8h, v3.4s + uqxtn v4.4h, v4.4s + uqxtn2 v4.8h, v5.4s + + stp q2, q4, [dst], #32 + b 3f +2: + ldr q0, [src], #16 + uxtl v2.4s, v0.4h + uxtl2 v3.4s, v0.8h + add v2.4s, v2.4s, v16.4s + add v3.4s, v3.4s, v16.4s + ushr v2.4s, v2.4s, #2 + ushr v3.4s, v3.4s, #2 + uqxtn v2.4h, v2.4s + uqxtn2 v2.8h, v3.4s + str q2, [dst], #16 +3: + subs height, height, #1 + ldr d0, [src], #8 + uxtl v3.4s, v0.4h + add v3.4s, v3.4s, v16.4s + ushr v3.4s, v3.4s, #2 + uqxtn v3.4h, v3.4s + st1 {v3.4h}, [dst], x7 + + add src, src, src_stride + b.ne 1b + + ret +endfunc + +function ff_vvc_dmvr_hv_8_neon, export=1 tmp0 .req x7 tmp1 .req x8