From patchwork Tue Sep 10 17:35:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51496 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:41b1:b0:48e:c0f8:d0de with SMTP id le49csp567049vqb; Tue, 10 Sep 2024 10:35:46 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWozhqjQKlctdPQK2Pxs5qWsE2UsBVG41B8Z418wBL1vYCKlSIx79Nv/RskZGNLoG9RP8GPYrdIEBxckaMkzJ7L@gmail.com X-Google-Smtp-Source: AGHT+IHYi3EjMCEe2lvv3dime8b6y+qg7/TGIrTRNZKslvOeXcioxnN+7j+CpQd3PXBqCXwja9Gn X-Received: by 2002:adf:a457:0:b0:374:c318:2190 with SMTP id ffacd0b85a97d-37892466d2emr9719294f8f.59.1725989746573; Tue, 10 Sep 2024 10:35:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725989745; cv=none; d=google.com; s=arc-20240605; b=SU2QE3pkh/xCmMpe06TZQuKbMWfPp6yEIPtcCsUgVCV7S3UAPKMIbOlp1VjRRt6qef UAcCEDsKcukHilexUhxnOiNPNs1LyyrsiCH9FdGw/Iw8yNmfLSDmZ9dEnD6rEPJ7AiRk bxLBr4wCKHOBwZrgHCmuJ9g0Mk2XzQXuWWJtgdT8HgFAZUkr12ftOa3yKVHS+ltVvchk dI+K4UGDkyig8hrijo/VkhJJY8w+OgLooxTJHStWzNI2wIRY5eRUgSAtWNVvkWficw69 7wt7ZnVlpJXMx0p9hK79P7hMxADE6i7NqQ6fGDn3NEMGkg1JiNkg+PvTXL+szX5X3rVz XBjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=IPLgLoxM/aJyJoYy0KKWWjQVMCkDLQqPmrMCxGE1zrY=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=JSSYnr/QLNduCkQVIXj4HhWX9ICdRwxCHu8G14wtcwVAjUOMw/P62jpJDaRrx+gm/b p08vQvWW540Jkh/aPG3baua9R8Ads3NKyjkQNO3JSGkpfFKNMk8uDiJiW4rSiBmulVbW Guv2/KcA8r0Zqbhzkzjke1AhEfW5ajyTWp3yO8UrfdjSj6OpOOv/B9tnX8+7b+eNLg46 YnxkQkQOMYFJHiNZT71eMtEaldlzNBTUC1W2+WQ1JFQXdpQwWYfadU5eyHaEQML3w20M K5EWVoUlzWOh9l37LWfQWBsTgrotP0fp4jrBmmusM0icAgBgMhLiZ9L1VETaI06EXTqK lssg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=tSV2aqBy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a8d25856fd3si522144266b.68.2024.09.10.10.35.45; Tue, 10 Sep 2024 10:35:45 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=tSV2aqBy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CC8FA68E171; Tue, 10 Sep 2024 20:35:29 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-153.mail.qq.com (out203-205-221-153.mail.qq.com [203.205.221.153]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id EE4F468E158 for ; Tue, 10 Sep 2024 20:35:21 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1725989711; bh=DqW37b3K6FRTBBSTnkqf5GNxmFaXdar49z/KlFWlBqI=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=tSV2aqBy0NGckz1AL8vJvnlb08x7I3Dulxw9MEE9WmwIdjZ/6eKemXutYDx7YPxQO uMIyKH/RP/fLYQMaI8e955PS02ZllBsJdGPE5J/1/zobGawwc+fhawq3Omkm/hfcOU 3kU/7+0ralJec4p6BMuH8np2wSyFQg131fWj6/Qs= Received: from ZHILIZHAO-MB1.tencent.com ([121.34.200.168]) by newxmesmtplogicsvrszc13-0.qq.com (NewEsmtp) with SMTP id 8C786AF0; Wed, 11 Sep 2024 01:35:07 +0800 X-QQ-mid: xmsmtpt1725989708tralwkxfb Message-ID: X-QQ-XMAILINFO: M5Jk7qYLFVj1Tdq7USLJFYO7PpgD3xe8jQi8QjvRJ5Wf6Q/HDUQoFBd6PfNwqF cQY3nZ0RA+sCz9Fw4MTTLIU6rKZrOxUi4Q4+Lvb3yRlyBuNA1sVEeA8MknDsrh0cIaWhzMykTFJj RlEKhLxNROgatBALa3+JHKCpMYOP1E3eMRlUsy7LI5C/OiHMUtSjG8ILRdhpd9lAKDDWenZAzBPc 82KS1s0klMesQvQ7yJPQbvM50TLE8Bv1fl2YcgxKjs0VsKJxyaGRFT+X0vhKpdY9GTn/bVBU7Im5 7Y7Hk1BiRk3ixLg2MSsQmHuT5aNCN8BYeRA4cocUnNQaHnH6w0juvxXRs4zzmfj1jbYuoiIo9lXt eLRhek8fw8VwcGlk7U1PF0BWsVOdkMRGrzZvtxzH0h7395KicLDnyq5Jf4JtFkGpO8CmtAcBLaXO gGfXJ76E6DRog3R7AZbSgDTG5QQ7ENm06L0ZNDIhIztrmMeDZltXH15n3VKp75VCbZZyuFY59JEk pu90Axd4LzIlbauP0x4V53tgPtVYVG7Q+V7QFZ32ujUr8QHcyMaW1fph+nI/ap813XbBaah7OXgc T4SnYOXy2g0d+A7vb78vghlMqPQ1piTAwGhBRTE5E+9IyMzkywkliQRKr6u2IXF+IaVn5pOPViPc 70bxwXl7DIhxf00CIg4oM+tH6w/nZrKyBff2tD87Lp9YJSPpazpMFni4r3eCUMdekv1/NyadMd1g VLGoXpK9KMFHlA55UulQgWjhfMqCZ37Vsx/zhTha5/F1Vmy/LRPmac1m1RBYQ1ZJ5iGlRM1QTPX0 xEHLKl8OwSVQ8ZiIzOgEAyieSq7DFrjVResw2im6h5uSTQKmUOGuUdEpi7uzv8CawtlNUD/Ahc+s iUwzmvEObqDO2/g+H0EJ3VhbHiA99cQHcjU7b+v+tvTH1AL+XF/nUc7mzDUdi+8I3MWmOXJXuhLY J7ve/QHcjSKfkJ4yvh27eHQqiiszIDstCmFGfP0EGfXMn48cIGtg== X-QQ-XMRINFO: Nq+8W0+stu50PRdwbJxPCL0= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Wed, 11 Sep 2024 01:35:05 +0800 X-OQ-MSGID: <20240910173506.28876-2-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240910173506.28876-1-quinkblack@foxmail.com> References: <20240910173506.28876-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/3] aarch64/vvc: Add put_qpel_vx X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: vyrEY9eS24RM From: Zhao Zhili put_luma_v_8_4x4_c: 1.0 ( 1.00x) put_luma_v_8_4x4_neon: 0.0 ( 0.00x) put_luma_v_8_8x8_c: 3.5 ( 1.00x) put_luma_v_8_8x8_neon: 0.5 ( 7.00x) put_luma_v_8_16x16_c: 13.8 ( 1.00x) put_luma_v_8_16x16_neon: 1.2 (11.00x) put_luma_v_8_32x32_c: 54.2 ( 1.00x) put_luma_v_8_32x32_neon: 5.0 (10.85x) put_luma_v_8_64x64_c: 217.5 ( 1.00x) put_luma_v_8_64x64_neon: 18.8 (11.60x) put_luma_v_8_128x128_c: 886.2 ( 1.00x) put_luma_v_8_128x128_neon: 74.0 (11.98x) --- libavcodec/aarch64/h26x/dsp.h | 8 +++ libavcodec/aarch64/h26x/qpel_neon.S | 100 ++++++++++++++++++++++++++++ libavcodec/aarch64/vvc/dsp_init.c | 7 ++ 3 files changed, 115 insertions(+) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 323a253257..881091f39a 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -274,4 +274,12 @@ NEON8_FNPROTO_PARTIAL_6(qpel_h, (int16_t * dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width), _i8mm); +void ff_vvc_put_qpel_v4_8_neon(int16_t *dst, const uint8_t *_src, + ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width); + +void ff_vvc_put_qpel_v8_8_neon(int16_t *dst, const uint8_t *_src, + ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width); + #endif diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index 7868811b3b..671942109a 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -86,6 +86,11 @@ endconst sxtl v0.8h, v0.8b .endm +.macro vvc_load_qpel_filterh freg + ld1 {v0.8b}, [\freg] + sxtl v0.8h, v0.8b +.endm + .macro calc_qpelh dst, src0, src1, src2, src3, src4, src5, src6, src7, op, shift=6 smull \dst\().4s, \src0\().4h, v0.h[0] smlal \dst\().4s, \src1\().4h, v0.h[1] @@ -95,11 +100,15 @@ endconst smlal \dst\().4s, \src5\().4h, v0.h[5] smlal \dst\().4s, \src6\().4h, v0.h[6] smlal \dst\().4s, \src7\().4h, v0.h[7] +.ifc \op, sqxtn + sqxtn \dst\().4h, \dst\().4s +.else .ifc \op, sshr sshr \dst\().4s, \dst\().4s, \shift .else \op \dst\().4h, \dst\().4s, \shift .endif +.endif .endm .macro calc_qpelh2 dst, dstt, src0, src1, src2, src3, src4, src5, src6, src7, op, shift=6 @@ -111,11 +120,15 @@ endconst smlal2 \dstt\().4s, \src5\().8h, v0.h[5] smlal2 \dstt\().4s, \src6\().8h, v0.h[6] smlal2 \dstt\().4s, \src7\().8h, v0.h[7] +.ifc \op, sqxtn2 + sqxtn2 \dst\().8h, \dstt\().4s +.else .ifc \op, sshr sshr \dst\().4s, \dstt\().4s, \shift .else \op \dst\().8h, \dstt\().4s, \shift .endif +.endif .endm .macro calc_all @@ -1000,6 +1013,93 @@ function ff_hevc_put_hevc_qpel_v64_8_neon, export=1 ret endfunc +/* ff_hevc_put_hevc_qpel_vx require filter parameters be + * [-, +, -, +, +, -, +, -], + * vvc doesn't meet the requirement. + */ +function ff_vvc_put_qpel_v4_8_neon, export=1 + vvc_load_qpel_filterh x5 + sub x1, x1, x2, lsl #1 + mov x9, #(VVC_MAX_PB_SIZE * 2) + sub x1, x1, x2 + ldr s16, [x1] + ldr s17, [x1, x2] + add x1, x1, x2, lsl #1 + ldr s18, [x1] + ldr s19, [x1, x2] + uxtl v16.8h, v16.8b + uxtl v17.8h, v17.8b + add x1, x1, x2, lsl #1 + ldr s20, [x1] + ldr s21, [x1, x2] + uxtl v18.8h, v18.8b + uxtl v19.8h, v19.8b + add x1, x1, x2, lsl #1 + ldr s22, [x1] + add x1, x1, x2 + uxtl v20.8h, v20.8b + uxtl v21.8h, v21.8b + uxtl v22.8h, v22.8b +.macro calc tmp, src0, src1, src2, src3, src4, src5, src6, src7 + ld1 {\tmp\().s}[0], [x1], x2 + uxtl \tmp\().8h, \tmp\().8b + calc_qpelh v24, \src0, \src1, \src2, \src3, \src4, \src5, \src6, \src7, sqxtn + subs w3, w3, #1 + st1 {v24.4h}, [x0], x9 +.endm +1: + calc_all +.purgem calc +2: + ret +endfunc + +function ff_vvc_put_qpel_v8_8_neon, export=1 + vvc_load_qpel_filterh x5 + sub x1, x1, x2, lsl #1 + sub x1, x1, x2 + mov x9, #(VVC_MAX_PB_SIZE * 2) +0: + mov x8, x1 + ldr d16, [x8] + ldr d17, [x8, x2] + mov x10, x0 + mov w11, w3 + add x8, x8, x2, lsl #1 + ldr d18, [x8] + ldr d19, [x8, x2] + uxtl v16.8h, v16.8b + uxtl v17.8h, v17.8b + add x8, x8, x2, lsl #1 + ldr d20, [x8] + ldr d21, [x8, x2] + uxtl v18.8h, v18.8b + uxtl v19.8h, v19.8b + add x8, x8, x2, lsl #1 + ldr d22, [x8] + add x8, x8, x2 + uxtl v20.8h, v20.8b + uxtl v21.8h, v21.8b + uxtl v22.8h, v22.8b +.macro calc tmp, src0, src1, src2, src3, src4, src5, src6, src7 + ld1 {\tmp\().8b}, [x8], x2 + uxtl \tmp\().8h, \tmp\().8b + calc_qpelh v24, \src0, \src1, \src2, \src3, \src4, \src5, \src6, \src7, sqxtn + calc_qpelh2 v24, v25, \src0, \src1, \src2, \src3, \src4, \src5, \src6, \src7, sqxtn2 + subs w11, w11, #1 + st1 {v24.8h}, [x10], x9 +.endm +1: + calc_all +.purgem calc +2: + sub w6, w6, #8 + add x0, x0, #16 + add x1, x1, #8 + cbnz w6, 0b + ret +endfunc + function ff_hevc_put_hevc_qpel_bi_v4_8_neon, export=1 load_qpel_filterb x7, x6 sub x2, x2, x3, lsl #1 diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index bcc7df8f6c..ba3a49aa1a 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -60,6 +60,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][5][0][1] = c->inter.put[0][6][0][1] = ff_vvc_put_qpel_h32_8_neon; + c->inter.put[0][1][1][0] = ff_vvc_put_qpel_v4_8_neon; + c->inter.put[0][2][1][0] = + c->inter.put[0][3][1][0] = + c->inter.put[0][4][1][0] = + c->inter.put[0][5][1][0] = + c->inter.put[0][6][1][0] = ff_vvc_put_qpel_v8_8_neon; + c->inter.put_uni[0][1][0][0] = ff_vvc_put_pel_uni_pixels4_8_neon; c->inter.put_uni[0][2][0][0] = ff_vvc_put_pel_uni_pixels8_8_neon; c->inter.put_uni[0][3][0][0] = ff_vvc_put_pel_uni_pixels16_8_neon;