From patchwork Sat Sep 21 17:41:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51688 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:d154:0:b0:48e:c0f8:d0de with SMTP id bt20csp1653731vqb; Sat, 21 Sep 2024 10:49:17 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCX1rCevGiSFI9NvjMdYkDEmM74yIhXd/lLWcoI7Z5EhDW0FIzZ4hQPxPV6Kc17EJHLBQO8znQwyVkUz+m4jlCSo@gmail.com X-Google-Smtp-Source: AGHT+IH7zVnRV0aW9c+Zp5tAIXWSIzDItwIvE1D/G6wcAIoKTNIVzDMjcrsxP4qcjQMjs6NE7Sxn X-Received: by 2002:a17:907:e60b:b0:a8a:8c92:1c9c with SMTP id a640c23a62f3a-a90d50045c9mr585687166b.29.1726940957088; Sat, 21 Sep 2024 10:49:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726940957; cv=none; d=google.com; s=arc-20240605; b=Kh5BUmyB+FQeaGWtyJ9knLQU7i0okGFfU4O8aWUAn2XgIG7Xanm60r0s1XjmR6bTB9 X7oJXfs/asqPIAyOU+kZ0kIC6Vz0r1o6cmFUFSS87Wwpgh9pNRrZINeqRCrSUnOuQgi0 W/YkAzxWN3lueMgp2OsGgL9/FlYGHgfA4+avd4ThB2As1pq4zR+qwiVHio7YnRHl+QwK kuUfwa7b1yM/J7P7iRkrEbg+USX1ZOUiy2lTZoaD9RWYUp2jTlm4BGttaQ72gXrHFWCK wJS1Urj5l43YVnMgns9tNIfa4sAyqThD+knTOt7Wj5lhUagBHbvDS/1xrpOlvXOBbEX4 QGsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=yXfRTdwg7ECYokDv+sW7d2u9qREwA8Dc2bA6usG4Pqw=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=V2O3cCYGpowvRGQJjJE9mDR2jy84BpTKFuBU4jFKKwa126xSdyYpij+XJ6IV4kTXa/ wSxV7Uzp4vnioCRdZdgPtDUGZ0bN3RzT6fFJLx1uUsaChNu9V8bKDl2CpjusVkksWs6g 4JXSBscTqfRtqrel14ywwZBdPA90uWEUgZDBLo/1eUEJYxFetbWxPOR82RGsZ6hxOM13 mX12B3M4DpuDHF6oBYdTtoskw1nEMO7oWSBAR5knVbXCsnY/kBhCmMUikvVjiZUJIenu CtnEnrCpZVS51J/1GvZKaKR3dER/pm7VSt0iwmWhTaGqFhj58iQuV0G7gk7qZetOoS6g UmIA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=GvDyCntS; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a9061336b78si1138390466b.800.2024.09.21.10.49.16; Sat, 21 Sep 2024 10:49:17 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=GvDyCntS; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A78D068DBD4; Sat, 21 Sep 2024 20:42:09 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-58-211.mail.qq.com (out162-62-58-211.mail.qq.com [162.62.58.211]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E904C68DB0E for ; Sat, 21 Sep 2024 20:41:59 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726940511; bh=nvLU/myQbNP/7EfIYvqs8S+3KyMjXp03+WQGgsUAQAQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=GvDyCntSXx8CsOK4ZJVh5P7br/JEbHv7Q32QgFSdUMXy8zzE8OjoORLLRdh2z01gr z4feKujFOetavNj/s86Yg3cdsKlnWr9LHwSk03UTIkFzfryIlTvkhPBp3HsJs1pnen o0g+mygBgPu8j4s2AtEqbNmBNgVYEDSgcrLKgPQ4= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.103.137]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id A7081A2C; Sun, 22 Sep 2024 01:41:48 +0800 X-QQ-mid: xmsmtpt1726940511tmslafmvs Message-ID: X-QQ-XMAILINFO: M01R5JnlBdC2MBzVZ+E607mIJfolGjrV+a3YSTDFj41LszR8hrlM5kIBnFIh3h vsr/jtUpMDAOkfFjTySZYfIzWFYjAQXOj1ASvwXGMMRiZFZFsEYREf0LV3270u4ecpO6OQWBcWGK C2sqyCxEvy+bVwj0xXZu6wrhlKsG0iy2lxNgnMLwFt9/8MtblPiXrV/FCEKkajF7mtPMHsjo7H4t EzqLbOVIJbzSFd1Ti78bUSDJA2qOcXduJ4nx5HIzb/IMNp3g1T0RnnsXuMRNUpOHFQkbpvym1q0O kjlGUUqCwkkRwpyn5mFcHF6UJFmWsL5hxFlWHtdf6drC2REKd7D33fexwogp02aTvWbHdYYFb0Uz /1BRGrN1rpNcnxdWH8OO5ew2R9Ijc9SGcOYo4S1iKemJdWRdhvIvAdlB03I5XiMLCrx9jkzzSMmQ kG/eLC1FLBGWYNXzizvUGjeaEHMP7PfnLPQmgNpmRPZ8ynAy5o+/VQuLCbB/mBZjawAjFeCDQGLk TnGdcTcfnd+AeCXqrSSQQkGfXe2ohT4qTew/qG1U3DxtDspd2eT+7SZ6Y4+wjylBKtif5iS4RLWx wfk935nqj/k8ZukAFdZ7YA5MehFjLnv/FcgMjw4hl1aDlZ4qOcuFcBl4leox+jQj30yIRLhGWTEa 1EapPommyjN8v/TZ13ebWfiu7I4ZLvrK9Vx23vGT6bQ4nC6cA8d9YrNrtD734LkxiAjITglU2dv3 5ertdK0OOOaWqX+DAugUnuiheykTlsd/cTuCQTLygOFjyna8mn1HgjdYhaoos98GRjIZkHZnOywa n8UzZ60oZqMZIk3Nxat2wfozGifihceslwY7HTct/6WdxK6yPvcgVyzhPfPCtos1JdiZp0QEvRZg zRgFeLg7Zr9rQ3uEXulPXlGOhivRLkOShCRYLHMG/i2bL+104triwofkJpFhwCBgtcBiC8qcGCh2 0eBO+7zYvg/00fxZjhaQ9WvsHjEOUDpO7kBx/rjsk1EedmLduEIurzK8+f1f2ogA+qXKmJTutyxt QGoYBK0A== X-QQ-XMRINFO: OWPUhxQsoeAVDbp3OJHYyFg= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Sun, 22 Sep 2024 01:41:46 +0800 X-OQ-MSGID: <20240921174146.10928-4-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240921174146.10928-1-quinkblack@foxmail.com> References: <20240921174146.10928-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/4] aarch64/vvc: Add dmvr X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: UahU7m39flm9 From: Zhao Zhili dmvr_8_12x20_c: 2.2 ( 1.00x) dmvr_8_12x20_neon: 0.5 ( 4.50x) dmvr_8_20x12_c: 2.0 ( 1.00x) dmvr_8_20x12_neon: 0.2 ( 8.00x) dmvr_8_20x20_c: 3.2 ( 1.00x) dmvr_8_20x20_neon: 0.5 ( 6.50x) dmvr_12_12x20_c: 2.2 ( 1.00x) dmvr_12_12x20_neon: 0.5 ( 4.50x) dmvr_12_20x12_c: 2.2 ( 1.00x) dmvr_12_20x12_neon: 0.5 ( 4.50x) dmvr_12_20x20_c: 3.2 ( 1.00x) dmvr_12_20x20_neon: 0.8 ( 4.33x) --- libavcodec/aarch64/vvc/dsp_init.c | 4 ++ libavcodec/aarch64/vvc/inter.S | 94 ++++++++++++++++++++++++++++++- 2 files changed, 97 insertions(+), 1 deletion(-) diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 48642e98e6..36611a6f5d 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -94,6 +94,8 @@ W_AVG_FUN(12) const uint8_t *_src, const ptrdiff_t _src_stride, const int height, \ const intptr_t mx, const intptr_t my, const int width); +DMVR_FUN(, 8) +DMVR_FUN(, 12) DMVR_FUN(hv_, 8) DMVR_FUN(hv_, 10) DMVR_FUN(hv_, 12) @@ -171,6 +173,7 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.avg = ff_vvc_avg_8_neon; c->inter.w_avg = vvc_w_avg_8; c->inter.apply_bdof = apply_bdof_8; + c->inter.dmvr[0][0] = ff_vvc_dmvr_8_neon; c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_8_neon; for (int i = 0; i < FF_ARRAY_ELEMS(c->sao.band_filter); i++) @@ -222,6 +225,7 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.avg = ff_vvc_avg_12_neon; c->inter.w_avg = vvc_w_avg_12; c->inter.apply_bdof = apply_bdof_12; + c->inter.dmvr[0][0] = ff_vvc_dmvr_12_neon; c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_12_neon; c->alf.filter[LUMA] = alf_filter_luma_12_neon; diff --git a/libavcodec/aarch64/vvc/inter.S b/libavcodec/aarch64/vvc/inter.S index b652e0d609..1f4706e2fa 100644 --- a/libavcodec/aarch64/vvc/inter.S +++ b/libavcodec/aarch64/vvc/inter.S @@ -578,7 +578,7 @@ endfunc * x5: const intptr_t my * w6: const int width */ -function ff_vvc_dmvr_hv_8_neon, export=1 +function ff_vvc_dmvr_8_neon, export=1 dst .req x0 src .req x1 src_stride .req x2 @@ -586,6 +586,98 @@ function ff_vvc_dmvr_hv_8_neon, export=1 mx .req x4 my .req x5 width .req w6 + + sxtw x6, w6 + mov x7, #(VVC_MAX_PB_SIZE * 2 + 8) + cmp width, #16 + sub src_stride, src_stride, x6 + cset w15, gt // width > 16 + movi v16.8h, #2 // DMVR_SHIFT + sub x7, x7, x6, lsl #1 +1: + cbz w15, 2f + ldr q0, [src], #16 + uxtl v1.8h, v0.8b + uxtl2 v2.8h, v0.16b + ushl v1.8h, v1.8h, v16.8h + ushl v2.8h, v2.8h, v16.8h + stp q1, q2, [dst], #32 + b 3f +2: + ldr d0, [src], #8 + uxtl v1.8h, v0.8b + ushl v1.8h, v1.8h, v16.8h + str q1, [dst], #16 +3: + subs height, height, #1 + ldr s3, [src], #4 + uxtl v4.8h, v3.8b + ushl v4.4h, v4.4h, v16.4h + st1 {v4.4h}, [dst], x7 + + add src, src, src_stride + b.ne 1b + + ret +endfunc + +function ff_vvc_dmvr_12_neon, export=1 + sxtw x6, w6 + mov x7, #(VVC_MAX_PB_SIZE * 2 + 8) + cmp width, #16 + sub src_stride, src_stride, x6, lsl #1 + cset w15, gt // width > 16 + movi v16.4s, #2 // offset4 + sub x7, x7, x6, lsl #1 +1: + cbz w15, 2f + ldp q0, q1, [src], #32 + uxtl v2.4s, v0.4h + uxtl2 v3.4s, v0.8h + uxtl v4.4s, v1.4h + uxtl2 v5.4s, v1.8h + add v2.4s, v2.4s, v16.4s + add v3.4s, v3.4s, v16.4s + add v4.4s, v4.4s, v16.4s + add v5.4s, v5.4s, v16.4s + ushr v2.4s, v2.4s, #2 + ushr v3.4s, v3.4s, #2 + ushr v4.4s, v4.4s, #2 + ushr v5.4s, v5.4s, #2 + uqxtn v2.4h, v2.4s + uqxtn2 v2.8h, v3.4s + uqxtn v4.4h, v4.4s + uqxtn2 v4.8h, v5.4s + + stp q2, q4, [dst], #32 + b 3f +2: + ldr q0, [src], #16 + uxtl v2.4s, v0.4h + uxtl2 v3.4s, v0.8h + add v2.4s, v2.4s, v16.4s + add v3.4s, v3.4s, v16.4s + ushr v2.4s, v2.4s, #2 + ushr v3.4s, v3.4s, #2 + uqxtn v2.4h, v2.4s + uqxtn2 v2.8h, v3.4s + str q2, [dst], #16 +3: + subs height, height, #1 + ldr d0, [src], #8 + uxtl v3.4s, v0.4h + add v3.4s, v3.4s, v16.4s + ushr v3.4s, v3.4s, #2 + uqxtn v3.4h, v3.4s + st1 {v3.4h}, [dst], x7 + + add src, src, src_stride + b.ne 1b + + ret +endfunc + +function ff_vvc_dmvr_hv_8_neon, export=1 tmp0 .req x7 tmp1 .req x8