From patchwork Tue Sep 10 17:35:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51495 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:41b1:b0:48e:c0f8:d0de with SMTP id le49csp566944vqb; Tue, 10 Sep 2024 10:35:33 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWvGNIkzEWc1AaM6ZuUVYPrKk48shsu9SfHaWXoSWg/WH0J3JqVGr66mGWG743KTZonHBWNOxQDZ4RTU/OPYTMV@gmail.com X-Google-Smtp-Source: AGHT+IEfFMvUTYpvN26PyP+qbJ2L9gYQ/h2kLnxNqwqW8fBtpZWOXaZ47T/WqYJd3SVv5tAS+7at X-Received: by 2002:a2e:b8c2:0:b0:2ef:2247:987b with SMTP id 38308e7fff4ca-2f751f829c4mr94365541fa.32.1725989733630; Tue, 10 Sep 2024 10:35:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725989733; cv=none; d=google.com; s=arc-20240605; b=HKRsEnGom5XNXgaVPz90jfeDxY2ptX4ElESOSvqoFnywZnaLSX8UnPuy1HIW4Eq7pZ FVidnCIkEqcWFRdaoS8hG9kSTJuRUBu0sac5XKLJUlAHxqvcIUWsCXiM39QgN+VWC7zv FrkP0sLAHkWTYELmfIIDO89bMcZagFSURbhrEArJQ1WATzzvo2KDohUUaLawC+hr74a4 +WC1vrbTmQwWAt1gFTsf5vtjG9c1uexXOrJBSwSzHkIk8b3AHOpVVBzeULWsktVop8+k EXvPMt/g10JgVx9+vvlI4F6phyk6btAUjSupmDtHXUNkIsCCk1koHfBCItX5+fZ8KYSY vIpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=LyEZlusft3wM6Um/9c1VkJEBTUXRBz7PIgSuTup4l4M=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=gcDZAKKN7cmgYT+NCNuPv6sRwrkPmZFGm9VtvqPkeIF6ebCbsgen99UbwYW9WNO5xW dq0cZQAWfKyNpn7ttyaHdDE71wnm0YWocsFPzBfvI61QspvHUw3fbfNYte9Gf/O10wx8 xPVXWBW9CcyS0nYjCjOgt85A+AiwIsUTQ9eO9KvpG1LfeXB0M6Gh4LSEdAss2c06+uDj FuIzwv+hKeLflSudm32VaWNTcTi0Jj/dOHi0CiG0fTJgDAd1txNNFi54G6dEDuEYq7gq +/ISCw0XB10Lcl9B2hiWFyCqqvXvFaTvEE77fu9oa2BHNQf7m6AN7x6v3Bn0UlWjRavv RCiw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=zcmNEMXZ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f75bfd2dfcsi23183351fa.19.2024.09.10.10.35.33; Tue, 10 Sep 2024 10:35:33 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=zcmNEMXZ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 90B2668E161; Tue, 10 Sep 2024 20:35:27 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-173.mail.qq.com (out203-205-221-173.mail.qq.com [203.205.221.173]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D7FA168C2CF for ; Tue, 10 Sep 2024 20:35:19 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1725989708; bh=9e48lbSI9bVGf4bFU2azPi8+hZaiUY7/IVia4zQH+pI=; h=From:To:Cc:Subject:Date; b=zcmNEMXZEOLjA1egSRFhF/BKbL6oH7AJD8Zsj2PSgxcwbXW3Ja6tbEhf5QbucnxL+ lxfUwKb0cv2mHKLlAxjfJTjN5/i2viR6QfO4Zu2upICfYJY+ACaAIsmACwK6g6ezvw F+pDnPmHEUsf0HcC0IO+lWxGua8PMqY+R/uBCAYY= Received: from ZHILIZHAO-MB1.tencent.com ([121.34.200.168]) by newxmesmtplogicsvrszc13-0.qq.com (NewEsmtp) with SMTP id 8C786AF0; Wed, 11 Sep 2024 01:35:07 +0800 X-QQ-mid: xmsmtpt1725989707t6vypx1ic Message-ID: X-QQ-XMAILINFO: MmCmH9jyqHC2fovU6+HXAwm4fyDwyOaGNf4tdGJ0ZhY1EMPgxpixcFoBgSVEaZ dLfaqgFoop6LDgCeRjGKam3RwthMVAlreQAWHL+mezqIZw1W+7ZKCR9ycWAFcoNNG1j+imsqKmtT oOnzwQgsAp/PfR5OWxwLeciU61G6Iwu6WfUCSyONtgYszTLbXbzoDsM0pIjaT4Lvq+7dJ/POWdw6 lTIgV0d5+/3gcgRoXBlXWoc0KzyqU0XK6047RQk4TvOGHROUdawE+HtFyUFMqq9tMetyU1eA8YSj YB20e40j6M7Wrm1XcLDJI77x1Z9/dW0KWWn2vEPX9XF/HKNJLJcHgq3s2awHzVHKYDtHQPjEPcfX AfhhA5rDYFUvIVOnBC89yuXZubFVWjnT6pkVdQcpqrSXpA9j1jBthNnfIq+aMTbTYmTykmJSKSAh esIVWnQmRRnif+minBSIlL3gLcbmfd8lj0bSPVwFiYDpnHlABOBrou20xHWG4NCDlVZ2wOzE3RaO n4QxunDg7fniw0iGFio6W2RnVolzzTcoor9Zw+KLF5qP6ox5TRrvDErGnCyIcyMBqrwrjAj2N82U wIeBQOwYqCFaj3u+p8A+GaGdCE0V2TiKI9fr4ehW7xHrFDYn3wbzny7f8CV+f47zM2Ac/ND6bMs/ UkKPwJYqFUYep0rxI8NRQulFyvhiZV7n/Ct/bPAn6rrhB9FOTIZ5ftjdmRDeuJGq1clmt4sYLBb8 zc6k9Gp7kAuYc22XzDhTXBq/+Dc4WjJl4cj53QuUhGgA+x95S5iioMAltHvPhAMGvapTMAg62qXP JWUt3srZbZQb53/RH+JtNRQS4wk+ygL7fRE/2PBKFbcw21FsLZesVVG8idfotxiAYvmo9DFI2VcH QzfimPwyhOrCwrck4/O2zA/CCaAAQR9U1fcdkO1uPj/mrpy0X9JLX5KqtEoRwRTencai655rJswh 5AeX5bW7Ct6hw2UVpBkJ92zTpIfu9qH2knp7+Mh9g= X-QQ-XMRINFO: NyFYKkN4Ny6FSmKK/uo/jdU= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Wed, 11 Sep 2024 01:35:04 +0800 X-OQ-MSGID: <20240910173506.28876-1-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/3] aarch64/h26x: Remove duplicate b.eq instruction X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: rNVWHRYzmRQ3 From: Zhao Zhili b.eq is added by calc_all after each calc. --- libavcodec/aarch64/h26x/qpel_neon.S | 1 - 1 file changed, 1 deletion(-) diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index 8a372a76be..7868811b3b 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -754,7 +754,6 @@ function ff_hevc_put_hevc_qpel_v4_8_neon, export=1 calc_qpelb v24, \src0, \src1, \src2, \src3, \src4, \src5, \src6, \src7 st1 {v24.4h}, [x0], x9 subs w3, w3, #1 - b.eq 2f .endm 1: calc_all .purgem calc From patchwork Tue Sep 10 17:35:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51496 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:41b1:b0:48e:c0f8:d0de with SMTP id le49csp567049vqb; Tue, 10 Sep 2024 10:35:46 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWozhqjQKlctdPQK2Pxs5qWsE2UsBVG41B8Z418wBL1vYCKlSIx79Nv/RskZGNLoG9RP8GPYrdIEBxckaMkzJ7L@gmail.com X-Google-Smtp-Source: AGHT+IHYi3EjMCEe2lvv3dime8b6y+qg7/TGIrTRNZKslvOeXcioxnN+7j+CpQd3PXBqCXwja9Gn X-Received: by 2002:adf:a457:0:b0:374:c318:2190 with SMTP id ffacd0b85a97d-37892466d2emr9719294f8f.59.1725989746573; Tue, 10 Sep 2024 10:35:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725989745; cv=none; d=google.com; s=arc-20240605; b=SU2QE3pkh/xCmMpe06TZQuKbMWfPp6yEIPtcCsUgVCV7S3UAPKMIbOlp1VjRRt6qef UAcCEDsKcukHilexUhxnOiNPNs1LyyrsiCH9FdGw/Iw8yNmfLSDmZ9dEnD6rEPJ7AiRk bxLBr4wCKHOBwZrgHCmuJ9g0Mk2XzQXuWWJtgdT8HgFAZUkr12ftOa3yKVHS+ltVvchk dI+K4UGDkyig8hrijo/VkhJJY8w+OgLooxTJHStWzNI2wIRY5eRUgSAtWNVvkWficw69 7wt7ZnVlpJXMx0p9hK79P7hMxADE6i7NqQ6fGDn3NEMGkg1JiNkg+PvTXL+szX5X3rVz XBjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=IPLgLoxM/aJyJoYy0KKWWjQVMCkDLQqPmrMCxGE1zrY=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=JSSYnr/QLNduCkQVIXj4HhWX9ICdRwxCHu8G14wtcwVAjUOMw/P62jpJDaRrx+gm/b p08vQvWW540Jkh/aPG3baua9R8Ads3NKyjkQNO3JSGkpfFKNMk8uDiJiW4rSiBmulVbW Guv2/KcA8r0Zqbhzkzjke1AhEfW5ajyTWp3yO8UrfdjSj6OpOOv/B9tnX8+7b+eNLg46 YnxkQkQOMYFJHiNZT71eMtEaldlzNBTUC1W2+WQ1JFQXdpQwWYfadU5eyHaEQML3w20M K5EWVoUlzWOh9l37LWfQWBsTgrotP0fp4jrBmmusM0icAgBgMhLiZ9L1VETaI06EXTqK lssg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=tSV2aqBy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a8d25856fd3si522144266b.68.2024.09.10.10.35.45; Tue, 10 Sep 2024 10:35:45 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=tSV2aqBy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CC8FA68E171; Tue, 10 Sep 2024 20:35:29 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-153.mail.qq.com (out203-205-221-153.mail.qq.com [203.205.221.153]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id EE4F468E158 for ; Tue, 10 Sep 2024 20:35:21 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1725989711; bh=DqW37b3K6FRTBBSTnkqf5GNxmFaXdar49z/KlFWlBqI=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=tSV2aqBy0NGckz1AL8vJvnlb08x7I3Dulxw9MEE9WmwIdjZ/6eKemXutYDx7YPxQO uMIyKH/RP/fLYQMaI8e955PS02ZllBsJdGPE5J/1/zobGawwc+fhawq3Omkm/hfcOU 3kU/7+0ralJec4p6BMuH8np2wSyFQg131fWj6/Qs= Received: from ZHILIZHAO-MB1.tencent.com ([121.34.200.168]) by newxmesmtplogicsvrszc13-0.qq.com (NewEsmtp) with SMTP id 8C786AF0; Wed, 11 Sep 2024 01:35:07 +0800 X-QQ-mid: xmsmtpt1725989708tralwkxfb Message-ID: X-QQ-XMAILINFO: M5Jk7qYLFVj1Tdq7USLJFYO7PpgD3xe8jQi8QjvRJ5Wf6Q/HDUQoFBd6PfNwqF cQY3nZ0RA+sCz9Fw4MTTLIU6rKZrOxUi4Q4+Lvb3yRlyBuNA1sVEeA8MknDsrh0cIaWhzMykTFJj RlEKhLxNROgatBALa3+JHKCpMYOP1E3eMRlUsy7LI5C/OiHMUtSjG8ILRdhpd9lAKDDWenZAzBPc 82KS1s0klMesQvQ7yJPQbvM50TLE8Bv1fl2YcgxKjs0VsKJxyaGRFT+X0vhKpdY9GTn/bVBU7Im5 7Y7Hk1BiRk3ixLg2MSsQmHuT5aNCN8BYeRA4cocUnNQaHnH6w0juvxXRs4zzmfj1jbYuoiIo9lXt eLRhek8fw8VwcGlk7U1PF0BWsVOdkMRGrzZvtxzH0h7395KicLDnyq5Jf4JtFkGpO8CmtAcBLaXO gGfXJ76E6DRog3R7AZbSgDTG5QQ7ENm06L0ZNDIhIztrmMeDZltXH15n3VKp75VCbZZyuFY59JEk pu90Axd4LzIlbauP0x4V53tgPtVYVG7Q+V7QFZ32ujUr8QHcyMaW1fph+nI/ap813XbBaah7OXgc T4SnYOXy2g0d+A7vb78vghlMqPQ1piTAwGhBRTE5E+9IyMzkywkliQRKr6u2IXF+IaVn5pOPViPc 70bxwXl7DIhxf00CIg4oM+tH6w/nZrKyBff2tD87Lp9YJSPpazpMFni4r3eCUMdekv1/NyadMd1g VLGoXpK9KMFHlA55UulQgWjhfMqCZ37Vsx/zhTha5/F1Vmy/LRPmac1m1RBYQ1ZJ5iGlRM1QTPX0 xEHLKl8OwSVQ8ZiIzOgEAyieSq7DFrjVResw2im6h5uSTQKmUOGuUdEpi7uzv8CawtlNUD/Ahc+s iUwzmvEObqDO2/g+H0EJ3VhbHiA99cQHcjU7b+v+tvTH1AL+XF/nUc7mzDUdi+8I3MWmOXJXuhLY J7ve/QHcjSKfkJ4yvh27eHQqiiszIDstCmFGfP0EGfXMn48cIGtg== X-QQ-XMRINFO: Nq+8W0+stu50PRdwbJxPCL0= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Wed, 11 Sep 2024 01:35:05 +0800 X-OQ-MSGID: <20240910173506.28876-2-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240910173506.28876-1-quinkblack@foxmail.com> References: <20240910173506.28876-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/3] aarch64/vvc: Add put_qpel_vx X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: vyrEY9eS24RM From: Zhao Zhili put_luma_v_8_4x4_c: 1.0 ( 1.00x) put_luma_v_8_4x4_neon: 0.0 ( 0.00x) put_luma_v_8_8x8_c: 3.5 ( 1.00x) put_luma_v_8_8x8_neon: 0.5 ( 7.00x) put_luma_v_8_16x16_c: 13.8 ( 1.00x) put_luma_v_8_16x16_neon: 1.2 (11.00x) put_luma_v_8_32x32_c: 54.2 ( 1.00x) put_luma_v_8_32x32_neon: 5.0 (10.85x) put_luma_v_8_64x64_c: 217.5 ( 1.00x) put_luma_v_8_64x64_neon: 18.8 (11.60x) put_luma_v_8_128x128_c: 886.2 ( 1.00x) put_luma_v_8_128x128_neon: 74.0 (11.98x) --- libavcodec/aarch64/h26x/dsp.h | 8 +++ libavcodec/aarch64/h26x/qpel_neon.S | 100 ++++++++++++++++++++++++++++ libavcodec/aarch64/vvc/dsp_init.c | 7 ++ 3 files changed, 115 insertions(+) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 323a253257..881091f39a 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -274,4 +274,12 @@ NEON8_FNPROTO_PARTIAL_6(qpel_h, (int16_t * dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width), _i8mm); +void ff_vvc_put_qpel_v4_8_neon(int16_t *dst, const uint8_t *_src, + ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width); + +void ff_vvc_put_qpel_v8_8_neon(int16_t *dst, const uint8_t *_src, + ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width); + #endif diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index 7868811b3b..671942109a 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -86,6 +86,11 @@ endconst sxtl v0.8h, v0.8b .endm +.macro vvc_load_qpel_filterh freg + ld1 {v0.8b}, [\freg] + sxtl v0.8h, v0.8b +.endm + .macro calc_qpelh dst, src0, src1, src2, src3, src4, src5, src6, src7, op, shift=6 smull \dst\().4s, \src0\().4h, v0.h[0] smlal \dst\().4s, \src1\().4h, v0.h[1] @@ -95,11 +100,15 @@ endconst smlal \dst\().4s, \src5\().4h, v0.h[5] smlal \dst\().4s, \src6\().4h, v0.h[6] smlal \dst\().4s, \src7\().4h, v0.h[7] +.ifc \op, sqxtn + sqxtn \dst\().4h, \dst\().4s +.else .ifc \op, sshr sshr \dst\().4s, \dst\().4s, \shift .else \op \dst\().4h, \dst\().4s, \shift .endif +.endif .endm .macro calc_qpelh2 dst, dstt, src0, src1, src2, src3, src4, src5, src6, src7, op, shift=6 @@ -111,11 +120,15 @@ endconst smlal2 \dstt\().4s, \src5\().8h, v0.h[5] smlal2 \dstt\().4s, \src6\().8h, v0.h[6] smlal2 \dstt\().4s, \src7\().8h, v0.h[7] +.ifc \op, sqxtn2 + sqxtn2 \dst\().8h, \dstt\().4s +.else .ifc \op, sshr sshr \dst\().4s, \dstt\().4s, \shift .else \op \dst\().8h, \dstt\().4s, \shift .endif +.endif .endm .macro calc_all @@ -1000,6 +1013,93 @@ function ff_hevc_put_hevc_qpel_v64_8_neon, export=1 ret endfunc +/* ff_hevc_put_hevc_qpel_vx require filter parameters be + * [-, +, -, +, +, -, +, -], + * vvc doesn't meet the requirement. + */ +function ff_vvc_put_qpel_v4_8_neon, export=1 + vvc_load_qpel_filterh x5 + sub x1, x1, x2, lsl #1 + mov x9, #(VVC_MAX_PB_SIZE * 2) + sub x1, x1, x2 + ldr s16, [x1] + ldr s17, [x1, x2] + add x1, x1, x2, lsl #1 + ldr s18, [x1] + ldr s19, [x1, x2] + uxtl v16.8h, v16.8b + uxtl v17.8h, v17.8b + add x1, x1, x2, lsl #1 + ldr s20, [x1] + ldr s21, [x1, x2] + uxtl v18.8h, v18.8b + uxtl v19.8h, v19.8b + add x1, x1, x2, lsl #1 + ldr s22, [x1] + add x1, x1, x2 + uxtl v20.8h, v20.8b + uxtl v21.8h, v21.8b + uxtl v22.8h, v22.8b +.macro calc tmp, src0, src1, src2, src3, src4, src5, src6, src7 + ld1 {\tmp\().s}[0], [x1], x2 + uxtl \tmp\().8h, \tmp\().8b + calc_qpelh v24, \src0, \src1, \src2, \src3, \src4, \src5, \src6, \src7, sqxtn + subs w3, w3, #1 + st1 {v24.4h}, [x0], x9 +.endm +1: + calc_all +.purgem calc +2: + ret +endfunc + +function ff_vvc_put_qpel_v8_8_neon, export=1 + vvc_load_qpel_filterh x5 + sub x1, x1, x2, lsl #1 + sub x1, x1, x2 + mov x9, #(VVC_MAX_PB_SIZE * 2) +0: + mov x8, x1 + ldr d16, [x8] + ldr d17, [x8, x2] + mov x10, x0 + mov w11, w3 + add x8, x8, x2, lsl #1 + ldr d18, [x8] + ldr d19, [x8, x2] + uxtl v16.8h, v16.8b + uxtl v17.8h, v17.8b + add x8, x8, x2, lsl #1 + ldr d20, [x8] + ldr d21, [x8, x2] + uxtl v18.8h, v18.8b + uxtl v19.8h, v19.8b + add x8, x8, x2, lsl #1 + ldr d22, [x8] + add x8, x8, x2 + uxtl v20.8h, v20.8b + uxtl v21.8h, v21.8b + uxtl v22.8h, v22.8b +.macro calc tmp, src0, src1, src2, src3, src4, src5, src6, src7 + ld1 {\tmp\().8b}, [x8], x2 + uxtl \tmp\().8h, \tmp\().8b + calc_qpelh v24, \src0, \src1, \src2, \src3, \src4, \src5, \src6, \src7, sqxtn + calc_qpelh2 v24, v25, \src0, \src1, \src2, \src3, \src4, \src5, \src6, \src7, sqxtn2 + subs w11, w11, #1 + st1 {v24.8h}, [x10], x9 +.endm +1: + calc_all +.purgem calc +2: + sub w6, w6, #8 + add x0, x0, #16 + add x1, x1, #8 + cbnz w6, 0b + ret +endfunc + function ff_hevc_put_hevc_qpel_bi_v4_8_neon, export=1 load_qpel_filterb x7, x6 sub x2, x2, x3, lsl #1 diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index bcc7df8f6c..ba3a49aa1a 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -60,6 +60,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][5][0][1] = c->inter.put[0][6][0][1] = ff_vvc_put_qpel_h32_8_neon; + c->inter.put[0][1][1][0] = ff_vvc_put_qpel_v4_8_neon; + c->inter.put[0][2][1][0] = + c->inter.put[0][3][1][0] = + c->inter.put[0][4][1][0] = + c->inter.put[0][5][1][0] = + c->inter.put[0][6][1][0] = ff_vvc_put_qpel_v8_8_neon; + c->inter.put_uni[0][1][0][0] = ff_vvc_put_pel_uni_pixels4_8_neon; c->inter.put_uni[0][2][0][0] = ff_vvc_put_pel_uni_pixels8_8_neon; c->inter.put_uni[0][3][0][0] = ff_vvc_put_pel_uni_pixels16_8_neon; From patchwork Tue Sep 10 17:35:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51497 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:41b1:b0:48e:c0f8:d0de with SMTP id le49csp567154vqb; Tue, 10 Sep 2024 10:35:56 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXnecj7jfjEGqS3tPMpTVCsg1Ev8rm7QRXEQQ7xgA+EwKWsq+S4LSoIv3Wp/A2aaK0G7SbtROpNAFcXrplTVGqZ@gmail.com X-Google-Smtp-Source: AGHT+IH1g9TFU5lUsa8JYzXRBXXT2TF4DtbwkvrOtlWUCCxH05fB4B8Gb1f8DVkzwXK8Rn4UUrk0 X-Received: by 2002:a17:907:2cc4:b0:a8d:2623:dd18 with SMTP id a640c23a62f3a-a8ffae1cde4mr68895366b.13.1725989756656; Tue, 10 Sep 2024 10:35:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725989756; cv=none; d=google.com; s=arc-20240605; b=QdWdWdr64Do0dp9EogIpKgoAWRdcxNuhShLgVT5jvtWoVxZRbuooQ8Jkovv6YS7blM aEPfShxzeEjq7HJfsUciWGr2HuR0nPkJBPavEg6i5yi+rQ5i+Yj3eSi32vZhb1Sp12X5 nf+kKEyOjBGOE5Ht+dzbqNCku33n/l8RnZVY4xu9drXQGZwzpHRIXaro1V/H9gaOjVtE c+VQUk1g0J8ZmJwWhDbUi40IVl13yAaltsm4cxPLzFeuufd3uV5QgdWOG/D3tP1feQVE LVuKPEsVILhXacsXFRlc2DyqEX3vJZYv4caN2ZTxLimiyWSAAHW6hMJoHsI3ny90OVjv tAYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=ZWqPdLKvQSApnPLQCORmlrch3G2AP5EoBxBU0Ovzd/8=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=TsDWDQxp2OcJHqDcPpNQipRayzAqX0E1dvDernmj+vsROelaBChNN6JxU0x6jmXqtG baHPwNHSbiB7jQr30W/+niqbqiWxr6K8gAMl446bb9fFjQ2DVFv75JNf88kTRfJVnhiB WzUIe3FgTarKj4w0v4K9gfJBtuctoidC9U7CsyMJhH8XZM2O+IDd0bC3X5t745hvImOl 9qenJmLLxLOzFa9moYUj/1B2wuRue/T0578YhxC2TmY4iL5Hj8BcQL6a489bCUn0il0h kAhgR9fUIuWEYBwvfD0fCGDdblF781UNzVlV42jNJ7RCgwbK12wEXi5qYikxWf228zSJ kbrg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=X8ZjDftF; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a8d2585ec48si600795266b.59.2024.09.10.10.35.56; Tue, 10 Sep 2024 10:35:56 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=X8ZjDftF; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1151B68E175; Tue, 10 Sep 2024 20:35:31 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-236.mail.qq.com (out203-205-221-236.mail.qq.com [203.205.221.236]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id F344D68C2CF for ; Tue, 10 Sep 2024 20:35:22 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1725989713; bh=9lJVyJ4TzIeIfiQ5zInOtvw8ha6XnLU0kY9nXQo+aWU=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=X8ZjDftFsDVSiGdoyJK2MwYMx0ZmNkwxnlMbR7u6Rd/sGO6dM4PWBo2ujNl8A0U7v XxzAg9rFzn9G/liAiRYXaNs9qPoZcZFpqHAsleQSltJopgGBuMpm9Ql6X3bIow1vDC VKwXkmeq4mLCo6okMzyR2pq5KQM84CY5F71c5N3E= Received: from ZHILIZHAO-MB1.tencent.com ([121.34.200.168]) by newxmesmtplogicsvrszc13-0.qq.com (NewEsmtp) with SMTP id 8C786AF0; Wed, 11 Sep 2024 01:35:07 +0800 X-QQ-mid: xmsmtpt1725989711t55fqxkox Message-ID: X-QQ-XMAILINFO: MILSKRzDrLPbhL33qAdTPkgDSKFnGNfIaLxVMcFR3E/VkuiGROZvSzpzJg7Bbs l32tgEEDrc20BBvUkhyQ2yFQUbQ2+2UNGli+E5Ld7Xvr1xZLuXsI0ZcPRcbmcm0MyUjH5melcz0o wUSoNsKacVPl8JFBQewU77K5QamlbEiRAnWTpUVedj2GxEsVGErJ5AiekObGtdJb8mk9PkhFS0lC JkKnCmPyMm2d2djecBMCyZUTjAx/9IDbaY2jcVMXph/ajTRAz71++Sv7HfM3+7TP2CpbC8TAmHjb i5Fo7E3E8CP4vdsMfhYotXqavWNC2XvGdeu/bLbFEfQwl2Lc91We5Oag/6ZEmclM41uotUItjE89 O7Db6kvq4awJPi+L0y+F3QSVQFnc4kRkeJ0QeH4nwHXJAXgiT9S0PTFUrM2z4noeXGxiHn0W7QOx jCLy1/D2LIFDiNbwscUhbz/4ClCMlv559qGDv1iSaIPUlObfcPyu6HzZawBZr9VZlzoVhSFzEXES 1Z4hqkpFo9JdhczE/KfNWhR3ePRLTmQdobBNYXn2Zyo8CM5sdCT6G83UJ+7rHIObYp2IE1GILu6h FDTE+Pj4KhaPZVV8xFo8zJcpKHaoJIiVTUIqVKRnqkkc9u39fxESrXj4RWaMtMoB3p26iwbb3eEf eiIaKjkH4OYo0MtPYyKEotFblAzTs+Vv9iCX8D6doePhw2k6fBzXFD90BSoopaAw1BrXzjWcf5EQ wvnxTkPsxGVxGYsE3357gfxpwxfqm4oNnjuWP8FIkWn+HyTDG0niYFfFHZWgEREJEHJ0gXZdbLUU XFOuq5DzHuckRhWSyzBOGQHzE+evnbPt3eEq39ZRoVNMYr37rUgtSJqEnW8ua1xGc9CJ765YrMCw LLJyOP7XOF0ms2+Unj+qZ9OV3bBr1Ci9NM4y/uKNwXYj47gmPsJXW9Qe68lB64gSU5ozF070oDs9 vIH+nsWXyjb0Kg6RJIXtgNAeiubsVOJtGNUDI08Z7LKMDh5PNIW2vCdDs3lnw8jsb4om7OXVo= X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Wed, 11 Sep 2024 01:35:06 +0800 X-OQ-MSGID: <20240910173506.28876-3-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240910173506.28876-1-quinkblack@foxmail.com> References: <20240910173506.28876-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] aarch64/vvc: Add put_qpel_hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: qNE5YGT5vNrH From: Zhao Zhili With Apple M1 (no i8mm): put_luma_hv_8_4x4_c: 2.2 ( 1.00x) put_luma_hv_8_4x4_neon: 0.8 ( 3.00x) put_luma_hv_8_8x8_c: 7.0 ( 1.00x) put_luma_hv_8_8x8_neon: 0.8 ( 9.33x) put_luma_hv_8_16x16_c: 22.8 ( 1.00x) put_luma_hv_8_16x16_neon: 2.5 ( 9.10x) put_luma_hv_8_32x32_c: 84.8 ( 1.00x) put_luma_hv_8_32x32_neon: 9.5 ( 8.92x) put_luma_hv_8_64x64_c: 333.0 ( 1.00x) put_luma_hv_8_64x64_neon: 35.5 ( 9.38x) put_luma_hv_8_128x128_c: 1294.5 ( 1.00x) put_luma_hv_8_128x128_neon: 137.8 ( 9.40x) With Pixel 8 Pro: put_luma_hv_8_4x4_c: 5.0 ( 1.00x) put_luma_hv_8_4x4_neon: 0.8 ( 6.67x) put_luma_hv_8_4x4_i8mm: 0.2 (20.00x) put_luma_hv_8_8x8_c: 13.2 ( 1.00x) put_luma_hv_8_8x8_neon: 1.2 (10.60x) put_luma_hv_8_8x8_i8mm: 1.2 (10.60x) put_luma_hv_8_16x16_c: 44.2 ( 1.00x) put_luma_hv_8_16x16_neon: 4.5 ( 9.83x) put_luma_hv_8_16x16_i8mm: 4.2 (10.41x) put_luma_hv_8_32x32_c: 160.8 ( 1.00x) put_luma_hv_8_32x32_neon: 17.5 ( 9.19x) put_luma_hv_8_32x32_i8mm: 16.0 (10.05x) put_luma_hv_8_64x64_c: 611.2 ( 1.00x) put_luma_hv_8_64x64_neon: 68.0 ( 8.99x) put_luma_hv_8_64x64_i8mm: 62.2 ( 9.82x) put_luma_hv_8_128x128_c: 2384.8 ( 1.00x) put_luma_hv_8_128x128_neon: 268.8 ( 8.87x) put_luma_hv_8_128x128_i8mm: 245.8 ( 9.70x) --- libavcodec/aarch64/h26x/dsp.h | 8 ++ libavcodec/aarch64/h26x/qpel_neon.S | 140 ++++++++++++++++++++++++++++ libavcodec/aarch64/vvc/dsp_init.c | 14 +++ 3 files changed, 162 insertions(+) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 881091f39a..c54906dde2 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -282,4 +282,12 @@ void ff_vvc_put_qpel_v8_8_neon(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width); +NEON8_FNPROTO_PARTIAL_6(qpel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, int height, + const int8_t *hf, const int8_t *vf, int width),); + +NEON8_FNPROTO_PARTIAL_6(qpel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, int height, + const int8_t *hf, const int8_t *vf, int width), _i8mm); + #endif diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index 671942109a..1b3da375ba 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -4142,9 +4142,15 @@ endfunc DISABLE_I8MM #endif +function vvc_put_qpel_hv4_8_end_neon + vvc_load_qpel_filterh x5 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc function hevc_put_hevc_qpel_hv4_8_end_neon load_qpel_filterh x5, x4 +1: ldr d16, [sp] ldr d17, [sp, x7] add sp, sp, x7, lsl #1 @@ -4196,9 +4202,16 @@ function hevc_put_hevc_qpel_hv6_8_end_neon ret endfunc +function vvc_put_qpel_hv8_8_end_neon + vvc_load_qpel_filterh x5 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc + function hevc_put_hevc_qpel_hv8_8_end_neon mov x7, #128 load_qpel_filterh x5, x4 +1: ldr q16, [sp] ldr q17, [sp, x7] add sp, sp, x7, lsl #1 @@ -4249,9 +4262,16 @@ function hevc_put_hevc_qpel_hv12_8_end_neon ret endfunc +function vvc_put_qpel_hv16_8_end_neon + vvc_load_qpel_filterh x5 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc + function hevc_put_hevc_qpel_hv16_8_end_neon mov x7, #128 load_qpel_filterh x5, x4 +1: ld1 {v16.8h, v17.8h}, [sp], x7 ld1 {v18.8h, v19.8h}, [sp], x7 ld1 {v20.8h, v21.8h}, [sp], x7 @@ -4274,6 +4294,12 @@ function hevc_put_hevc_qpel_hv16_8_end_neon ret endfunc +function vvc_put_qpel_hv32_8_end_neon + vvc_load_qpel_filterh x5 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 0f +endfunc + function hevc_put_hevc_qpel_hv32_8_end_neon mov x7, #128 load_qpel_filterh x5, x4 @@ -4327,6 +4353,25 @@ function ff_hevc_put_hevc_qpel_hv4_8_\suffix, export=1 b hevc_put_hevc_qpel_hv4_8_end_neon endfunc +function ff_vvc_put_qpel_hv4_8_\suffix, export=1 + add w10, w3, #8 + lsl x10, x10, #8 + mov x14, sp + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x0, sp, #48 + sub x1, x1, x2, lsl #1 + add x3, x3, #7 + sub x1, x1, x2 + bl X(ff_vvc_put_qpel_h4_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 + b vvc_put_qpel_hv4_8_end_neon +endfunc + function ff_hevc_put_hevc_qpel_hv6_8_\suffix, export=1 add w10, w3, #8 mov x7, #128 @@ -4366,6 +4411,25 @@ function ff_hevc_put_hevc_qpel_hv8_8_\suffix, export=1 b hevc_put_hevc_qpel_hv8_8_end_neon endfunc +function ff_vvc_put_qpel_hv8_8_\suffix, export=1 + add w10, w3, #8 + lsl x10, x10, #8 + sub x1, x1, x2, lsl #1 + mov x14, sp + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x0, sp, #48 + add x3, x3, #7 + sub x1, x1, x2 + bl X(ff_vvc_put_qpel_h8_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 + b vvc_put_qpel_hv8_8_end_neon +endfunc + function ff_hevc_put_hevc_qpel_hv12_8_\suffix, export=1 add w10, w3, #8 lsl x10, x10, #7 @@ -4405,6 +4469,25 @@ function ff_hevc_put_hevc_qpel_hv16_8_\suffix, export=1 b hevc_put_hevc_qpel_hv16_8_end_neon endfunc +function ff_vvc_put_qpel_hv16_8_\suffix, export=1 + add w10, w3, #8 + lsl x10, x10, #8 + sub x1, x1, x2, lsl #1 + mov x14, sp + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x3, x3, #7 + add x0, sp, #48 + sub x1, x1, x2 + bl X(ff_vvc_put_qpel_h16_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 + b vvc_put_qpel_hv16_8_end_neon +endfunc + function ff_hevc_put_hevc_qpel_hv24_8_\suffix, export=1 stp x4, x5, [sp, #-64]! stp x2, x3, [sp, #16] @@ -4441,6 +4524,26 @@ function ff_hevc_put_hevc_qpel_hv32_8_\suffix, export=1 b hevc_put_hevc_qpel_hv32_8_end_neon endfunc +function ff_vvc_put_qpel_hv32_8_\suffix, export=1 + add w10, w3, #8 + sub x1, x1, x2, lsl #1 + lsl x10, x10, #8 + sub x1, x1, x2 + mov x14, sp + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x3, x3, #7 + add x0, sp, #48 + mov w6, #32 + bl X(ff_vvc_put_qpel_h32_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 + b vvc_put_qpel_hv32_8_end_neon +endfunc + function ff_hevc_put_hevc_qpel_hv48_8_\suffix, export=1 stp x4, x5, [sp, #-64]! stp x2, x3, [sp, #16] @@ -4474,6 +4577,43 @@ function ff_hevc_put_hevc_qpel_hv64_8_\suffix, export=1 ldr x30, [sp], #16 ret endfunc + +function ff_vvc_put_qpel_hv64_8_\suffix, export=1 + stp x4, x5, [sp, #-64]! + stp x2, x3, [sp, #16] + stp x0, x1, [sp, #32] + str x30, [sp, #48] + mov x6, #32 + bl X(ff_vvc_put_qpel_hv32_8_\suffix) + ldp x0, x1, [sp, #32] + ldp x2, x3, [sp, #16] + ldp x4, x5, [sp], #48 + add x1, x1, #32 + add x0, x0, #64 + mov x6, #32 + bl X(ff_vvc_put_qpel_hv32_8_\suffix) + ldr x30, [sp], #16 + ret +endfunc + +function ff_vvc_put_qpel_hv128_8_\suffix, export=1 + stp x4, x5, [sp, #-64]! + stp x2, x3, [sp, #16] + stp x0, x1, [sp, #32] + str x30, [sp, #48] + mov x6, #64 + bl X(ff_vvc_put_qpel_hv64_8_\suffix) + ldp x0, x1, [sp, #32] + ldp x2, x3, [sp, #16] + ldp x4, x5, [sp], #48 + add x1, x1, #64 + add x0, x0, #128 + mov x6, #64 + bl X(ff_vvc_put_qpel_hv64_8_\suffix) + ldr x30, [sp], #16 + ret +endfunc + .endm qpel_hv neon diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index ba3a49aa1a..934d918ffd 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -67,6 +67,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][5][1][0] = c->inter.put[0][6][1][0] = ff_vvc_put_qpel_v8_8_neon; + c->inter.put[0][1][1][1] = ff_vvc_put_qpel_hv4_8_neon; + c->inter.put[0][2][1][1] = ff_vvc_put_qpel_hv8_8_neon; + c->inter.put[0][3][1][1] = ff_vvc_put_qpel_hv16_8_neon; + c->inter.put[0][4][1][1] = ff_vvc_put_qpel_hv32_8_neon; + c->inter.put[0][5][1][1] = ff_vvc_put_qpel_hv64_8_neon; + c->inter.put[0][6][1][1] = ff_vvc_put_qpel_hv128_8_neon; + c->inter.put_uni[0][1][0][0] = ff_vvc_put_pel_uni_pixels4_8_neon; c->inter.put_uni[0][2][0][0] = ff_vvc_put_pel_uni_pixels8_8_neon; c->inter.put_uni[0][3][0][0] = ff_vvc_put_pel_uni_pixels16_8_neon; @@ -103,6 +110,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][4][0][1] = ff_vvc_put_qpel_h32_8_neon_i8mm; c->inter.put[0][5][0][1] = ff_vvc_put_qpel_h64_8_neon_i8mm; c->inter.put[0][6][0][1] = ff_vvc_put_qpel_h128_8_neon_i8mm; + + c->inter.put[0][1][1][1] = ff_vvc_put_qpel_hv4_8_neon_i8mm; + c->inter.put[0][2][1][1] = ff_vvc_put_qpel_hv8_8_neon_i8mm; + c->inter.put[0][3][1][1] = ff_vvc_put_qpel_hv16_8_neon_i8mm; + c->inter.put[0][4][1][1] = ff_vvc_put_qpel_hv32_8_neon_i8mm; + c->inter.put[0][5][1][1] = ff_vvc_put_qpel_hv64_8_neon_i8mm; + c->inter.put[0][6][1][1] = ff_vvc_put_qpel_hv128_8_neon_i8mm; } } else if (bd == 10) { c->alf.filter[LUMA] = alf_filter_luma_10_neon;