From patchwork Wed Sep 11 18:06:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51512 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp479326vqi; Wed, 11 Sep 2024 11:07:32 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUwL+v8w4ZAS5PLQwCgsEmLKdERdHep8Kuts1lyflQ54ggJbRG1a6QTw+XtanTL0uMEQ3onDMNN4005JpSYcdvM@gmail.com X-Google-Smtp-Source: AGHT+IHbNOmTrOCPSlab//55L/0LmrLFQCtxfmQa4DvUeIOW15jeJdehmsNa0lezruw5QEu7MFSu X-Received: by 2002:a05:6512:1196:b0:536:55ae:7458 with SMTP id 2adb3069b0e04-53678fed155mr151478e87.40.1726078052485; Wed, 11 Sep 2024 11:07:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726078052; cv=none; d=google.com; s=arc-20240605; b=deYlaehdl+5EmIUYp9U0dm9u4v9tFRy1Sev7XgVEBlHlUly7RDtAUsAhBa4dOO0sAv v93eHZYn0MImy0OBarxyD0UpsULsx+8DokXzAvMi2llD2apqXbK9pBMOeJnQ/a3Kjnk3 sDS5gc9NXQhu51qj9nLvCJklyLCGR7mYUfkUJgmo0hzibZichGnivZv6detYiusZ0qFw kUT2VD4kfOIQcNf9EBpHdYL3/mF8ajgIwOu2LhQHUKCv4IIT9jvH+JXSDOMeILYlPOxz Eh0zKtlH9QR96RW6Zi56aBfkFmXju4dYtrBNvoyJ563TF2npKb1YGbbVacHLVKQya6IB rq/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=nSwm2U37LLLddwJZ0cUGWH0juW0DL6bkIQEMhHY0b1M=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=ZdnqbyRBew45nCK1WRadoctEf9TNKF+h4p/fwqvtyIh1nfv1Csxodxsd5d/IQfOixm T1N0kIEVNwIHBFkv2ibiriuEENICk+yDzNi2nfAh/znPAwxuAJxMfkxZLOyIWLs46KyM 8OSqBugfpTJVzM6l3gyI+wWVeBCAOlcMg7W7cupHkNfj3N/OG2X8N0QtUYTWqEI/YLdq CFfb+pERTZRJSOCxugbX0vyt8grfGJoU5kOkaBwAIU7UKn6fd9b0UUICBufXuLlt43D2 9fm6bRhiPw//wiqOqIooPw9eBgH3aBxYZlLtHC551c9Erc5cFF59Dk/uW9ZwgFvLI5gQ UIxQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=hp0zZHnh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-5365f8661c8si3377408e87.116.2024.09.11.11.07.31; Wed, 11 Sep 2024 11:07:32 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=hp0zZHnh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C947E68E22A; Wed, 11 Sep 2024 21:06:43 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-240.mail.qq.com (out203-205-221-240.mail.qq.com [203.205.221.240]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1E02968E25E for ; Wed, 11 Sep 2024 21:06:30 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077981; bh=bFiAFpR2mwC/7XszQaT33DFI9g0hcOcpg3m+bleIxII=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=hp0zZHnhznbrg8BmwuUgkM+IGSeiQ0jmi5Uhr/7afYH35nDbXzvbKkQWT2AT1VNfv creLrAeUXz55L6Bsf/NK7iWccex7iTpGbxxut3ziK1egfJ1hV2i9w9vbLwMtqY8JLo FGK3GqMfQCWhE/BVLOQWKjPihq3OPIjTbq1iWFu4= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077981tdynpdxrv Message-ID: X-QQ-XMAILINFO: M218m5ZT2ej6BbjKgGNk379ME751IsvraxdM856JYUg9VlBmlZ+A+NprIM1Eym xsOuat5xnwoMU33AfVSDTWo4yVi7TPqgtZwb0LdVe/iE/DWpIqLPvHqOGFtE9pc0FFdw3BkdcXm9 IV9rtJ+pZW/irUJUELf8JxVEASv71r3sp7R9OUp3xICvMguHXjedL4FXKhDj1qMuC94omMzPjTOh cfVMbt8E7ds+elni/fYc/vzZjkZEKj58UVexnK7aSfgRbRkwPpO5ClJY7HO/Hhc250nKs+HTqFgQ 6/Of9bz6kC1YYw4uVJdEPbmf9sfWGFMagzSDGIjf7bEj0Zh28vEc4Z5yXkro/HbxLEtD6AIjh/t6 cTKm9YsQ6coPXXiR3FWGH83CxaRSSuPJxUQcclLNyLljM88JjY2PpURsF7jj6eCsc983ofQkVOB8 m5+V7uDqCB/uFGGpyCdpePAu5YJEGnAB6fTeHJXzC/b7U/9MLpBypdeSwAKvXudrdYR+AgTRmgdQ E4w9oxjNxD8Qa8A6KqCdTtmQ8aY/sEZ6fxVPqN4b4QcjbScRp3EZlRgBJP0vpdBSJFlfdTrbILjZ UPPo8kQ3c5bAEMsGzrBCNjsa3xG8mzfSRwwWdOveBwNS8dvojQakhfBCXRuj5fTHxwyVpARl5JWp pi3FUsh7uYvn22Fhvj3i3NSGYk5g8pHW97q/bZ6CDM3UWobSdjBUrntehZTlhCAxStcmqjWOieqe rVW7SERNfVnCMQvEIDsnk9LUmdnVaCEUoFsG/GOyHJ3EHFCXOqn2q3LSkkRKOo5+Y2oioX8Crmbs nG2mOaVwZPoV8OYPQRY31alevzKPpmAJeK5NfAIcpNqBBXwaIXrG+GiU5QIzjYLeJZqFGUoagCIw xWcAaygM+Hv4VfDNGJIgJPChUBl8jwFwgDDuNRdbGos0zrZvMD6n4pjhe9F9YmFO3BWt+QRO0fs6 nfnyhk/2u88SPBSLOygWgIrnlYQ9Bw7fJnkfabcPQ6szWjT5Fpmq52rg4jwcbbFBYbRVkeFqw= X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:06 +0800 X-OQ-MSGID: <20240911180618.28921-3-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 02/14] aarch64/hevc: Move epel/qpel to h26x directory X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 01D7Tx5BP/S1 From: Zhao Zhili So vvc can reuse the implementation. --- libavcodec/aarch64/Makefile | 4 +- libavcodec/aarch64/h26x/dsp.h | 198 ++++++++++++++++++ .../{hevcdsp_epel_neon.S => h26x/epel_neon.S} | 0 .../{hevcdsp_qpel_neon.S => h26x/qpel_neon.S} | 0 libavcodec/aarch64/hevcdsp_init_aarch64.c | 197 ----------------- 5 files changed, 200 insertions(+), 199 deletions(-) rename libavcodec/aarch64/{hevcdsp_epel_neon.S => h26x/epel_neon.S} (100%) rename libavcodec/aarch64/{hevcdsp_qpel_neon.S => h26x/qpel_neon.S} (100%) diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index a01e665b55..9affb92789 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -71,6 +71,6 @@ NEON-OBJS-$(CONFIG_VP9_DECODER) += aarch64/vp9itxfm_16bpp_neon.o \ NEON-OBJS-$(CONFIG_HEVC_DECODER) += aarch64/hevcdsp_deblock_neon.o \ aarch64/hevcdsp_idct_neon.o \ aarch64/hevcdsp_init_aarch64.o \ - aarch64/hevcdsp_qpel_neon.o \ - aarch64/hevcdsp_epel_neon.o \ + aarch64/h26x/epel_neon.o \ + aarch64/h26x/qpel_neon.o \ aarch64/h26x/sao_neon.o diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index d3f7a4dfe3..902286872d 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -37,4 +37,202 @@ void ff_vvc_sao_edge_filter_16x16_8_neon(uint8_t *dst, const uint8_t *src, ptrdi const int16_t *sao_offset_val, int eo, int width, int height); void ff_vvc_sao_edge_filter_8x8_8_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride_dst, const int16_t *sao_offset_val, int eo, int width, int height); + +#define NEON8_FNPROTO_PARTIAL_6(fn, args, ext) \ + void ff_hevc_put_hevc_##fn##_h4_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h6_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h8_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h12_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h16_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h32_8_neon##ext args; + +NEON8_FNPROTO_PARTIAL_6(qpel, (int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, + intptr_t mx, intptr_t my, int width),) + +NEON8_FNPROTO_PARTIAL_6(qpel_uni, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, + ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width),) + +NEON8_FNPROTO_PARTIAL_6(qpel_bi, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, + ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t + mx, intptr_t my, int width),) + +#define NEON8_FNPROTO(fn, args, ext) \ + void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##6_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##12_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##24_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##32_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##48_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##64_8_neon##ext args + +#define NEON8_FNPROTO_PARTIAL_4(fn, args, ext) \ + void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##64_8_neon##ext args + +#define NEON8_FNPROTO_PARTIAL_5(fn, args, ext) \ + void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##32_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##64_8_neon##ext args + +NEON8_FNPROTO(pel_pixels, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(pel_bi_pixels, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *_src, ptrdiff_t _srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_bi_h, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_bi_v, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(epel_v, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(pel_uni_pixels, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_uni_v, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_uni_hv, (uint8_t *dst, ptrdiff_t _dststride, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_uni_hv, (uint8_t *dst, ptrdiff_t _dststride, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(epel_uni_w_v, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO_PARTIAL_4(qpel_uni_w_v, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_h, (int16_t *dst, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width), ); + +NEON8_FNPROTO(epel_h, (int16_t *dst, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(epel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(epel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(qpel_h, (int16_t *dst, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(qpel_v, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(qpel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(qpel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(qpel_uni_v, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(qpel_uni_hv, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(qpel_uni_hv, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(qpel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(qpel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(epel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO_PARTIAL_5(qpel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO_PARTIAL_5(qpel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(qpel_bi_v, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(qpel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(qpel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + #endif diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/h26x/epel_neon.S similarity index 100% rename from libavcodec/aarch64/hevcdsp_epel_neon.S rename to libavcodec/aarch64/h26x/epel_neon.S diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S similarity index 100% rename from libavcodec/aarch64/hevcdsp_qpel_neon.S rename to libavcodec/aarch64/h26x/qpel_neon.S diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 26bbc8750f..386d7c59c8 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -93,203 +93,6 @@ void ff_hevc_idct_16x16_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_32x32_dc_10_neon(int16_t *coeffs); void ff_hevc_transform_luma_4x4_neon_8(int16_t *coeffs); -#define NEON8_FNPROTO_PARTIAL_6(fn, args, ext) \ - void ff_hevc_put_hevc_##fn##_h4_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##_h6_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##_h8_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##_h12_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##_h16_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##_h32_8_neon##ext args; - -NEON8_FNPROTO_PARTIAL_6(qpel, (int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, - intptr_t mx, intptr_t my, int width),) - -NEON8_FNPROTO_PARTIAL_6(qpel_uni, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width),) - -NEON8_FNPROTO_PARTIAL_6(qpel_bi, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t - mx, intptr_t my, int width),) - -#define NEON8_FNPROTO(fn, args, ext) \ - void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##6_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##12_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##24_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##32_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##48_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##64_8_neon##ext args - -#define NEON8_FNPROTO_PARTIAL_4(fn, args, ext) \ - void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##64_8_neon##ext args - -#define NEON8_FNPROTO_PARTIAL_5(fn, args, ext) \ - void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##32_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##64_8_neon##ext args - -NEON8_FNPROTO(pel_pixels, (int16_t *dst, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(pel_bi_pixels, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *_src, ptrdiff_t _srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_bi_h, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_bi_v, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(epel_v, (int16_t *dst, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(pel_uni_pixels, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_uni_v, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_uni_hv, (uint8_t *dst, ptrdiff_t _dststride, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_uni_hv, (uint8_t *dst, ptrdiff_t _dststride, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(epel_uni_w_v, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO_PARTIAL_4(qpel_uni_w_v, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_h, (int16_t *dst, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_hv, (int16_t *dst, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width), ); - -NEON8_FNPROTO(epel_h, (int16_t *dst, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(epel_hv, (int16_t *dst, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(epel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(qpel_h, (int16_t *dst, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(qpel_v, (int16_t *dst, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(qpel_hv, (int16_t *dst, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(qpel_hv, (int16_t *dst, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(qpel_uni_v, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(qpel_uni_hv, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(qpel_uni_hv, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(qpel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(qpel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(epel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO_PARTIAL_5(qpel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO_PARTIAL_5(qpel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(qpel_bi_v, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(qpel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(qpel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - #define NEON8_FNASSIGN(member, v, h, fn, ext) \ member[1][v][h] = ff_hevc_put_hevc_##fn##4_8_neon##ext; \ member[2][v][h] = ff_hevc_put_hevc_##fn##6_8_neon##ext; \ From patchwork Wed Sep 11 18:06:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51519 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp487652vqi; Wed, 11 Sep 2024 11:24:11 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXoQ0a8Vv4vjlk6qYtFjwhdMM/r7o535CBtXZWPIdlqhEtwvGKX0AcuYfWhn8fBiyW7wUKx4s22guA303Cyv5G3@gmail.com X-Google-Smtp-Source: AGHT+IGZB+17o9neB1/WEyc58H/OI81Vao6OEcHiauDmLi1FfHcOA3NyxTWKTRdxuRnJNSdoKjYv X-Received: by 2002:a05:6512:a8a:b0:530:e323:b1ca with SMTP id 2adb3069b0e04-53678fce6afmr215875e87.25.1726079050782; Wed, 11 Sep 2024 11:24:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726079050; cv=none; d=google.com; s=arc-20240605; b=GYY4l/MjfBQW6cqxkFKqajlloTa8wDHGzAHKyzt2PVecbUCxH/6QQrBi5L1O9gPNZf 6gQhAcTgI9109oQZR+y6s06Ve6ZZCgPs4YSxIR/iOpMgVHRcm2PIGZKT10gkrM48KlRj digHcVFfPuzgGYabVhKDzjOeNW5+sfyxBVCNX7saPmdNo8D6HM9O2lq+HRueYu9K0CwQ BemSZcHOtKXQjMGV4Va1ko1NKiR/rjAAl/u0yc5pIh60AyjVq2gfYi85/mzadPZO5YC9 YIeBk65BUF3F1L9s2dpJpRyAIl99famjjnuwygFXnpbo025YlY254RFMfH56WFwlxoOb 1SQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=0Kr4hfp1WHtqGX/k9jkyco/jljS7m3micHIZFk7YlYg=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=OxP6sevZn21rdM1Pbb6ytKmHFoVF0MxHy96HfbRd2SKexs1hMGxMqTR4gR/HbNw62Q LMDic5VY50qdIpZ/lmEUsERnFtwztXo9pdDdGUd0Vvom0b1H759Zu83E8Po8vqx2C5cQ P8KtYuNLRV37iY4LTK664BHoobcT5Sye2rg65Nanfqqq/Lradsk+H2qa+CQFbcn6Ey6Q NaftFdF3hx7v1JZ38yPYQe4+VJG+VN0ZKIwROUJUQBycT1y+qzc1ddVpHu2gzj6fL5kT kzgjm8XgLcQrMQfuwH1jglj8rti98ttd7BCU+MIElFbc1voCIPFII8wzloUSBzuVmp58 w1ow==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=h8gePsCH; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a8d25cf3d8bsi759687266b.611.2024.09.11.11.24.09; Wed, 11 Sep 2024 11:24:10 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=h8gePsCH; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 24BB768DFE3; Wed, 11 Sep 2024 21:06:40 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-192.mail.qq.com (out203-205-221-192.mail.qq.com [203.205.221.192]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A335068E23F for ; Wed, 11 Sep 2024 21:06:30 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077982; bh=EwGBvX2h1tzTO432poIjWGNNBDKy1s6NvzZHWViiIwk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=h8gePsCHtuxtP20XCz4C5NDKAwOhJ0SXoMeT3smZt2jg0R8cfM5VVSHWynVe94bB6 N/adeGLIm091CM2K7TIyh5Phe5rGhAvw0HRcFFIj7zifJpJANwcrCa2rLWc+YKkXWs 1gCJm5dKiCCifNosE6bxnyPtJiSZqkt50G2YV9b8= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077981tkl2r4e2e Message-ID: X-QQ-XMAILINFO: NvH2zBBgt3uTtbO9za2YWT9Hypz6q22bkHx/2BuazXbydRJxG2QiizYEQnbhOn KeF/zm4ShwOedJMIEN8K3YQw/vJ48BkoTq4eeX2wLJILxKQOAWSjbnzUvPgMDcpCtMRlhIpRyJiD XH4ZQcPvduOWjzOCr6TpTD7zJvq38nMFdjIJOmhON43QajrAtGECXlVkx53Z8yYxKXtEIKQiWQUU 1L3wi6U3F4tpWL7irhVtA3Ec8pzLzYAuWFA+ucZgMiFdv3Jhn3ZoEsI829dy9m9hDbtS3ib08XnN Jwzl/lGojYXS3a+7oouTQjeDqHLAI3HWaRN4XdRplPEgwatD+S7IvHeVSwLwWdRiLCXrEqMFY/2o agq/snDc6pcNNqX5sacQZwNlm4J1VhL1zMK4djHYHOtNnpHSmTK2LejYW3RPclwhhfZXJmvvvtS6 ERbrvJVFyYxVKilowjfrB9VXCgLhg/+7qmtBaXZByPGzyCWLSE9vbannheETBAW37F5YmGY4Cptc mFNiAWyF/v0uxnSJqC5NqRGx2+fQfVTdJhymAiZiYUbzDoZfeuo39b0KANVnDJjJW+j6JY+es7It tjDTAVlSTiuXZTafrKXg1KoQDJGOFGz/ww2d1/qlTpcRWp96S4ICJGErOc3OmO1ozNrHPI+/m+ft QzhFGEv/eplG/IqS76cJohfR7mnozbFGDETTlivb0UYamqf+OKKfTDG9yVZtd1XYHhNQudttIyEy 0b2SxrzrjrmPQvI5Lsadeq/F8NDHapBnR9xp7cOS+yFnYF+i6nYWxRVNPTk5m590BUYE0Pdxu9nF w2skq1FPJPxeCC6nxscqXKlZqXrXyU/fDTYH7ECoVjLFOS4zuYZVb7kuVxyUqILqPiM3ToXzeCo0 vJ6T0nf9keH1+cvMURklV211mhaa8GIYZzMTRe6lEbSnL0v+66v2fJcJQrQ0fKh9/Tn69xFVv00T 5A/jVZZ+ZMtnmmZdO7XL6pV4y2vYZ7WPv6PlPr00c= X-QQ-XMRINFO: MPJ6Tf5t3I/ycC2BItcBVIA= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:07 +0800 X-OQ-MSGID: <20240911180618.28921-4-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 03/14] aarch64/vvc: Add put_qpel_h_* and put_qpel_uni_h_* X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: UYQ7+Dnx/NUz From: Zhao Zhili Just share hevc implementation. checkasm --test=vvc_mc --benchmark: put_luma_h_8_4x4_c: 0.2 ( 1.00x) put_luma_h_8_4x4_neon: 0.2 ( 1.00x) put_luma_h_8_8x8_c: 1.0 ( 1.00x) put_luma_h_8_8x8_neon: 0.2 ( 4.33x) put_luma_h_8_16x16_c: 3.2 ( 1.00x) put_luma_h_8_16x16_neon: 1.2 ( 2.63x) put_luma_h_8_32x32_c: 13.7 ( 1.00x) put_luma_h_8_32x32_neon: 4.0 ( 3.45x) put_luma_h_8_64x64_c: 48.2 ( 1.00x) put_luma_h_8_64x64_neon: 15.7 ( 3.07x) put_luma_h_8_128x128_c: 203.5 ( 1.00x) put_luma_h_8_128x128_neon: 62.0 ( 3.28x) put_uni_h_luma_8_4x4_c: 0.2 ( 1.00x) put_uni_h_luma_8_4x4_neon: 0.2 ( 1.00x) put_uni_h_luma_8_8x8_c: 1.5 ( 1.00x) put_uni_h_luma_8_8x8_neon: 0.2 ( 6.56x) put_uni_h_luma_8_16x16_c: 5.7 ( 1.00x) put_uni_h_luma_8_16x16_neon: 1.2 ( 4.67x) put_uni_h_luma_8_32x32_c: 24.0 ( 1.00x) put_uni_h_luma_8_32x32_neon: 4.7 ( 5.07x) put_uni_h_luma_8_64x64_c: 90.0 ( 1.00x) put_uni_h_luma_8_64x64_neon: 17.0 ( 5.30x) put_uni_h_luma_8_128x128_c: 357.7 ( 1.00x) put_uni_h_luma_8_128x128_neon: 67.5 ( 5.30x) --- libavcodec/aarch64/h26x/dsp.h | 13 ++ libavcodec/aarch64/h26x/qpel_neon.S | 202 ++++++++++++++++++++-------- libavcodec/aarch64/vvc/Makefile | 1 + libavcodec/aarch64/vvc/dsp_init.c | 14 ++ 4 files changed, 171 insertions(+), 59 deletions(-) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 902286872d..f72746ce03 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -235,4 +235,17 @@ NEON8_FNPROTO(qpel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, int height, intptr_t mx, intptr_t my, int width), _i8mm); +#undef NEON8_FNPROTO_PARTIAL_4 +#define NEON8_FNPROTO_PARTIAL_4(fn, args, ext) \ + void ff_vvc_put_##fn##_h4_8_neon##ext args; \ + void ff_vvc_put_##fn##_h8_8_neon##ext args; \ + void ff_vvc_put_##fn##_h16_8_neon##ext args; \ + void ff_vvc_put_##fn##_h32_8_neon##ext args; + +NEON8_FNPROTO_PARTIAL_4(qpel, (int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width),) + +NEON8_FNPROTO_PARTIAL_4(qpel_uni, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, + ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width),) + #endif diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index 8ddaa32b70..a05009c9d6 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -21,7 +21,8 @@ */ #include "libavutil/aarch64/asm.S" -#define MAX_PB_SIZE 64 +#define HEVC_MAX_PB_SIZE 64 +#define VVC_MAX_PB_SIZE 128 const qpel_filters, align=4 .byte 0, 0, 0, 0, 0, 0, 0, 0 @@ -44,6 +45,11 @@ endconst sxtl v0.8h, v0.8b .endm +.macro vvc_load_filter m + ld1 {v0.8b}, [\m] + sxtl v0.8h, v0.8b +.endm + .macro load_qpel_filterb freg, xreg movrel \xreg, qpel_filters_abs add \xreg, \xreg, \freg, lsl #3 @@ -212,22 +218,40 @@ function ff_hevc_put_hevc_h4_8_neon, export=0 endfunc .endif +.ifnc \type, qpel_bi +function ff_vvc_put_\type\()_h4_8_neon, export=1 + vvc_load_filter mx + sub src, src, #3 + mov mx, x30 +.ifc \type, qpel + mov dststride, #(VVC_MAX_PB_SIZE << 1) + lsl x13, srcstride, #1 // srcstridel + mov x14, #(VVC_MAX_PB_SIZE << 2) +.else + lsl x14, dststride, #1 // dststridel + lsl x13, srcstride, #1 // srcstridel +.endif + b 1f +endfunc +.endif // !qpel_bi + function ff_hevc_put_hevc_\type\()_h4_8_neon, export=1 load_filter mx .ifc \type, qpel_bi - mov x16, #(MAX_PB_SIZE << 2) // src2bstridel - add x15, x4, #(MAX_PB_SIZE << 1) // src2b + mov x16, #(HEVC_MAX_PB_SIZE << 2) // src2bstridel + add x15, x4, #(HEVC_MAX_PB_SIZE << 1) // src2b .endif sub src, src, #3 mov mx, x30 .ifc \type, qpel - mov dststride, #(MAX_PB_SIZE << 1) + mov dststride, #(HEVC_MAX_PB_SIZE << 1) lsl x13, srcstride, #1 // srcstridel - mov x14, #(MAX_PB_SIZE << 2) + mov x14, #(HEVC_MAX_PB_SIZE << 2) .else lsl x14, dststride, #1 // dststridel lsl x13, srcstride, #1 // srcstridel .endif +1: add x10, dst, dststride // dstb add x12, src, srcstride // srcb 0: ld1 {v16.8b, v17.8b}, [src], x13 @@ -283,15 +307,15 @@ endfunc function ff_hevc_put_hevc_\type\()_h6_8_neon, export=1 load_filter mx .ifc \type, qpel_bi - mov x16, #(MAX_PB_SIZE << 2) // src2bstridel - add x15, x4, #(MAX_PB_SIZE << 1) // src2b + mov x16, #(HEVC_MAX_PB_SIZE << 2) // src2bstridel + add x15, x4, #(HEVC_MAX_PB_SIZE << 1) // src2b .endif sub src, src, #3 mov mx, x30 .ifc \type, qpel - mov dststride, #(MAX_PB_SIZE << 1) + mov dststride, #(HEVC_MAX_PB_SIZE << 1) lsl x13, srcstride, #1 // srcstridel - mov x14, #((MAX_PB_SIZE << 2) - 8) + mov x14, #((HEVC_MAX_PB_SIZE << 2) - 8) .else lsl x14, dststride, #1 // dststridel lsl x13, srcstride, #1 // srcstridel @@ -333,22 +357,40 @@ function ff_hevc_put_hevc_\type\()_h6_8_neon, export=1 ret mx endfunc +.ifnc \type, qpel_bi +function ff_vvc_put_\type\()_h8_8_neon, export=1 + vvc_load_filter mx + sub src, src, #3 + mov mx, x30 +.ifc \type, qpel + mov dststride, #(VVC_MAX_PB_SIZE << 1) + lsl x13, srcstride, #1 // srcstridel + mov x14, #(VVC_MAX_PB_SIZE << 2) +.else + lsl x14, dststride, #1 // dststridel + lsl x13, srcstride, #1 // srcstridel +.endif + b 1f +endfunc +.endif // !qpel_bi + function ff_hevc_put_hevc_\type\()_h8_8_neon, export=1 load_filter mx .ifc \type, qpel_bi - mov x16, #(MAX_PB_SIZE << 2) // src2bstridel - add x15, x4, #(MAX_PB_SIZE << 1) // src2b + mov x16, #(HEVC_MAX_PB_SIZE << 2) // src2bstridel + add x15, x4, #(HEVC_MAX_PB_SIZE << 1) // src2b .endif sub src, src, #3 mov mx, x30 .ifc \type, qpel - mov dststride, #(MAX_PB_SIZE << 1) + mov dststride, #(HEVC_MAX_PB_SIZE << 1) lsl x13, srcstride, #1 // srcstridel - mov x14, #(MAX_PB_SIZE << 2) + mov x14, #(HEVC_MAX_PB_SIZE << 2) .else lsl x14, dststride, #1 // dststridel lsl x13, srcstride, #1 // srcstridel .endif +1: add x10, dst, dststride // dstb add x12, src, srcstride // srcb 0: ld1 {v16.8b, v17.8b}, [src], x13 @@ -415,16 +457,16 @@ function ff_hevc_put_hevc_\type\()_h12_8_neon, export=1 sxtw height, heightw .ifc \type, qpel_bi ldrh w8, [sp] // width - mov x16, #(MAX_PB_SIZE << 2) // src2bstridel - lsl x17, height, #7 // src2b reset (height * (MAX_PB_SIZE << 1)) - add x15, x4, #(MAX_PB_SIZE << 1) // src2b + mov x16, #(HEVC_MAX_PB_SIZE << 2) // src2bstridel + lsl x17, height, #7 // src2b reset (height * (HEVC_MAX_PB_SIZE << 1)) + add x15, x4, #(HEVC_MAX_PB_SIZE << 1) // src2b .endif sub src, src, #3 mov mx, x30 .ifc \type, qpel - mov dststride, #(MAX_PB_SIZE << 1) + mov dststride, #(HEVC_MAX_PB_SIZE << 1) lsl x13, srcstride, #1 // srcstridel - mov x14, #((MAX_PB_SIZE << 2) - 16) + mov x14, #((HEVC_MAX_PB_SIZE << 2) - 16) .else lsl x14, dststride, #1 // dststridel lsl x13, srcstride, #1 // srcstridel @@ -497,25 +539,45 @@ function ff_hevc_put_hevc_\type\()_h12_8_neon, export=1 ret mx endfunc +.ifnc \type, qpel_bi +function ff_vvc_put_\type\()_h16_8_neon, export=1 + vvc_load_filter mx + sxtw height, heightw + mov mx, x30 + sub src, src, #3 + mov mx, x30 +.ifc \type, qpel + mov dststride, #(VVC_MAX_PB_SIZE << 1) + lsl x13, srcstride, #1 // srcstridel + mov x14, #(VVC_MAX_PB_SIZE << 2) +.else + lsl x14, dststride, #1 // dststridel + lsl x13, srcstride, #1 // srcstridel +.endif + b 0f +endfunc +.endif // !qpel_bi + function ff_hevc_put_hevc_\type\()_h16_8_neon, export=1 load_filter mx sxtw height, heightw mov mx, x30 .ifc \type, qpel_bi ldrh w8, [sp] // width - mov x16, #(MAX_PB_SIZE << 2) // src2bstridel - add x15, x4, #(MAX_PB_SIZE << 1) // src2b + mov x16, #(HEVC_MAX_PB_SIZE << 2) // src2bstridel + add x15, x4, #(HEVC_MAX_PB_SIZE << 1) // src2b .endif sub src, src, #3 mov mx, x30 .ifc \type, qpel - mov dststride, #(MAX_PB_SIZE << 1) + mov dststride, #(HEVC_MAX_PB_SIZE << 1) lsl x13, srcstride, #1 // srcstridel - mov x14, #(MAX_PB_SIZE << 2) + mov x14, #(HEVC_MAX_PB_SIZE << 2) .else lsl x14, dststride, #1 // dststridel lsl x13, srcstride, #1 // srcstridel .endif +0: add x10, dst, dststride // dstb add x12, src, srcstride // srcb @@ -555,29 +617,51 @@ function ff_hevc_put_hevc_\type\()_h16_8_neon, export=1 ret mx endfunc +.ifnc \type, qpel_bi +function ff_vvc_put_\type\()_h32_8_neon, export=1 + vvc_load_filter mx + sxtw height, heightw + mov mx, x30 + sub src, src, #3 + mov mx, x30 +.ifc \type, qpel + mov dststride, #(VVC_MAX_PB_SIZE << 1) + lsl x13, srcstride, #1 // srcstridel + mov x14, #(VVC_MAX_PB_SIZE << 2) + sub x14, x14, width, uxtw #1 +.else + lsl x14, dststride, #1 // dststridel + lsl x13, srcstride, #1 // srcstridel + sub x14, x14, width, uxtw +.endif + b 1f +endfunc +.endif // !qpel_bi + function ff_hevc_put_hevc_\type\()_h32_8_neon, export=1 load_filter mx sxtw height, heightw mov mx, x30 .ifc \type, qpel_bi ldrh w8, [sp] // width - mov x16, #(MAX_PB_SIZE << 2) // src2bstridel + mov x16, #(HEVC_MAX_PB_SIZE << 2) // src2bstridel lsl x17, x5, #7 // src2b reset - add x15, x4, #(MAX_PB_SIZE << 1) // src2b + add x15, x4, #(HEVC_MAX_PB_SIZE << 1) // src2b sub x16, x16, width, uxtw #1 .endif sub src, src, #3 mov mx, x30 .ifc \type, qpel - mov dststride, #(MAX_PB_SIZE << 1) + mov dststride, #(HEVC_MAX_PB_SIZE << 1) lsl x13, srcstride, #1 // srcstridel - mov x14, #(MAX_PB_SIZE << 2) + mov x14, #(HEVC_MAX_PB_SIZE << 2) sub x14, x14, width, uxtw #1 .else lsl x14, dststride, #1 // dststridel lsl x13, srcstride, #1 // srcstridel sub x14, x14, width, uxtw .endif +1: sub x13, x13, width, uxtw sub x13, x13, #8 add x10, dst, dststride // dstb @@ -651,7 +735,7 @@ put_hevc qpel_bi function ff_hevc_put_hevc_qpel_v4_8_neon, export=1 load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) sub x1, x1, x2 ldr s16, [x1] ldr s17, [x1, x2] @@ -680,7 +764,7 @@ endfunc function ff_hevc_put_hevc_qpel_v6_8_neon, export=1 load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 - mov x9, #(MAX_PB_SIZE * 2 - 8) + mov x9, #(HEVC_MAX_PB_SIZE * 2 - 8) sub x1, x1, x2 ldr d16, [x1] ldr d17, [x1, x2] @@ -709,7 +793,7 @@ endfunc function ff_hevc_put_hevc_qpel_v8_8_neon, export=1 load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) sub x1, x1, x2 ldr d16, [x1] ldr d17, [x1, x2] @@ -737,7 +821,7 @@ endfunc function ff_hevc_put_hevc_qpel_v12_8_neon, export=1 load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 - mov x9, #(MAX_PB_SIZE * 2 - 16) + mov x9, #(HEVC_MAX_PB_SIZE * 2 - 16) sub x1, x1, x2 ldr q16, [x1] ldr q17, [x1, x2] @@ -768,7 +852,7 @@ endfunc function ff_hevc_put_hevc_qpel_v16_8_neon, export=1 load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) sub x1, x1, x2 ldr q16, [x1] ldr q17, [x1, x2] @@ -802,7 +886,7 @@ function ff_hevc_put_hevc_qpel_v24_8_neon, export=1 load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 sub x1, x1, x2 - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.16b, v17.16b}, [x1], x2 ld1 {v18.16b, v19.16b}, [x1], x2 ld1 {v20.16b, v21.16b}, [x1], x2 @@ -833,7 +917,7 @@ function ff_hevc_put_hevc_qpel_v32_8_neon, export=1 st1 {v8.8b-v11.8b}, [sp] load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) sub x1, x1, x2 ld1 {v16.16b, v17.16b}, [x1], x2 ld1 {v18.16b, v19.16b}, [x1], x2 @@ -883,7 +967,7 @@ function ff_hevc_put_hevc_qpel_v64_8_neon, export=1 load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 sub x1, x1, x2 - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) 0: mov x8, x1 // src ld1 {v16.16b, v17.16b}, [x8], x2 mov w11, w3 // height @@ -921,7 +1005,7 @@ function ff_hevc_put_hevc_qpel_bi_v4_8_neon, export=1 load_qpel_filterb x7, x6 sub x2, x2, x3, lsl #1 sub x2, x2, x3 - mov x12, #(MAX_PB_SIZE * 2) + mov x12, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.s}[0], [x2], x3 ld1 {v17.s}[0], [x2], x3 ld1 {v18.s}[0], [x2], x3 @@ -951,7 +1035,7 @@ function ff_hevc_put_hevc_qpel_bi_v6_8_neon, export=1 ld1 {v16.8b}, [x2], x3 sub x1, x1, #4 ld1 {v17.8b}, [x2], x3 - mov x12, #(MAX_PB_SIZE * 2) + mov x12, #(HEVC_MAX_PB_SIZE * 2) ld1 {v18.8b}, [x2], x3 ld1 {v19.8b}, [x2], x3 ld1 {v20.8b}, [x2], x3 @@ -977,7 +1061,7 @@ function ff_hevc_put_hevc_qpel_bi_v8_8_neon, export=1 load_qpel_filterb x7, x6 sub x2, x2, x3, lsl #1 sub x2, x2, x3 - mov x12, #(MAX_PB_SIZE * 2) + mov x12, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8b}, [x2], x3 ld1 {v17.8b}, [x2], x3 ld1 {v18.8b}, [x2], x3 @@ -1006,7 +1090,7 @@ function ff_hevc_put_hevc_qpel_bi_v12_8_neon, export=1 sub x2, x2, x3 sub x1, x1, #8 ld1 {v16.16b}, [x2], x3 - mov x12, #(MAX_PB_SIZE * 2) + mov x12, #(HEVC_MAX_PB_SIZE * 2) ld1 {v17.16b}, [x2], x3 ld1 {v18.16b}, [x2], x3 ld1 {v19.16b}, [x2], x3 @@ -1037,7 +1121,7 @@ function ff_hevc_put_hevc_qpel_bi_v16_8_neon, export=1 load_qpel_filterb x7, x6 sub x2, x2, x3, lsl #1 sub x2, x2, x3 - mov x12, #(MAX_PB_SIZE * 2) + mov x12, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.16b}, [x2], x3 ld1 {v17.16b}, [x2], x3 ld1 {v18.16b}, [x2], x3 @@ -1092,7 +1176,7 @@ function ff_hevc_put_hevc_qpel_bi_v32_8_neon, export=1 sub x2, x2, x3 load_qpel_filterb x7, x6 ldr w6, [sp, #64] - mov x12, #(MAX_PB_SIZE * 2) + mov x12, #(HEVC_MAX_PB_SIZE * 2) 0: mov x8, x2 // src ld1 {v16.16b, v17.16b}, [x8], x3 mov w11, w5 // height @@ -2147,7 +2231,7 @@ function ff_hevc_put_hevc_qpel_uni_w_v64_8_neon, export=1 endfunc function hevc_put_hevc_qpel_uni_hv4_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 ldr d16, [sp] ldr d17, [sp, x9] @@ -2174,7 +2258,7 @@ function hevc_put_hevc_qpel_uni_hv4_8_end_neon endfunc function hevc_put_hevc_qpel_uni_hv6_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 sub x1, x1, #4 ldr q16, [sp] @@ -2204,7 +2288,7 @@ function hevc_put_hevc_qpel_uni_hv6_8_end_neon endfunc function hevc_put_hevc_qpel_uni_hv8_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 ldr q16, [sp] ldr q17, [sp, x9] @@ -2232,7 +2316,7 @@ function hevc_put_hevc_qpel_uni_hv8_8_end_neon endfunc function hevc_put_hevc_qpel_uni_hv12_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 sub x1, x1, #8 ld1 {v16.8h, v17.8h}, [sp], x9 @@ -2260,7 +2344,7 @@ function hevc_put_hevc_qpel_uni_hv12_8_end_neon endfunc function hevc_put_hevc_qpel_uni_hv16_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 sub w12, w9, w7, lsl #1 0: mov x8, sp // src @@ -3355,7 +3439,7 @@ endfunc function ff_hevc_put_hevc_qpel_h4_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 + mov x10, #HEVC_MAX_PB_SIZE * 2 1: ld1 {v0.16b}, [x1], x2 ext v1.16b, v0.16b, v0.16b, #1 @@ -3378,7 +3462,7 @@ endfunc function ff_hevc_put_hevc_qpel_h6_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 + mov x10, #HEVC_MAX_PB_SIZE * 2 add x15, x0, #8 1: ld1 {v0.16b}, [x1], x2 @@ -3411,7 +3495,7 @@ endfunc function ff_hevc_put_hevc_qpel_h8_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 + mov x10, #HEVC_MAX_PB_SIZE * 2 1: ld1 {v0.16b}, [x1], x2 ext v1.16b, v0.16b, v0.16b, #1 @@ -3457,7 +3541,7 @@ endfunc function ff_hevc_put_hevc_qpel_h12_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 + mov x10, #HEVC_MAX_PB_SIZE * 2 add x15, x0, #16 1: ld1 {v16.16b, v17.16b}, [x1], x2 @@ -3495,7 +3579,7 @@ endfunc function ff_hevc_put_hevc_qpel_h16_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 + mov x10, #HEVC_MAX_PB_SIZE * 2 1: ld1 {v16.16b, v17.16b}, [x1], x2 ext v1.16b, v16.16b, v17.16b, #1 @@ -3533,7 +3617,7 @@ endfunc function ff_hevc_put_hevc_qpel_h24_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 + mov x10, #HEVC_MAX_PB_SIZE * 2 add x15, x0, #32 1: ld1 {v16.16b, v17.16b}, [x1], x2 @@ -3585,7 +3669,7 @@ endfunc function ff_hevc_put_hevc_qpel_h32_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 + mov x10, #HEVC_MAX_PB_SIZE * 2 add x15, x0, #32 1: ld1 {v16.16b, v17.16b, v18.16b}, [x1], x2 @@ -3642,7 +3726,7 @@ endfunc function ff_hevc_put_hevc_qpel_h48_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 - 64 + mov x10, #HEVC_MAX_PB_SIZE * 2 - 64 1: ld1 {v16.16b, v17.16b, v18.16b, v19.16b}, [x1], x2 ext v1.16b, v16.16b, v17.16b, #1 @@ -4173,7 +4257,7 @@ DISABLE_I8MM stp x24, x25, [sp, #48] stp x26, x27, [sp, #64] mov x19, sp - mov x11, #(MAX_PB_SIZE*(MAX_PB_SIZE+8)*2) + mov x11, #(HEVC_MAX_PB_SIZE*(HEVC_MAX_PB_SIZE+8)*2) sub sp, sp, x11 mov x20, x0 mov x21, x1 @@ -4204,7 +4288,7 @@ DISABLE_I8MM add x9, x9, x23, lsl #3 ld1 {v0.8b}, [x9] sxtl v0.8h, v0.8b - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) dup v28.4s, w24 dup v29.4s, w25 dup v30.4s, w26 @@ -4591,7 +4675,7 @@ endfunc qpel_uni_w_hv neon function hevc_put_hevc_qpel_bi_hv4_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x7, x6 ld1 {v16.4h}, [sp], x9 ld1 {v17.4h}, [sp], x9 @@ -4617,7 +4701,7 @@ function hevc_put_hevc_qpel_bi_hv4_8_end_neon endfunc function hevc_put_hevc_qpel_bi_hv6_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x7, x6 sub x1, x1, #4 ld1 {v16.8h}, [sp], x9 @@ -4648,7 +4732,7 @@ function hevc_put_hevc_qpel_bi_hv6_8_end_neon endfunc function hevc_put_hevc_qpel_bi_hv8_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x7, x6 ld1 {v16.8h}, [sp], x9 ld1 {v17.8h}, [sp], x9 @@ -4678,7 +4762,7 @@ endfunc function hevc_put_hevc_qpel_bi_hv16_8_end_neon load_qpel_filterh x7, x8 - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) mov x10, x6 0: mov x8, sp // src ld1 {v16.8h, v17.8h}, [x8], x9 diff --git a/libavcodec/aarch64/vvc/Makefile b/libavcodec/aarch64/vvc/Makefile index 54c49fea92..a5ad24dfc5 100644 --- a/libavcodec/aarch64/vvc/Makefile +++ b/libavcodec/aarch64/vvc/Makefile @@ -3,4 +3,5 @@ clean:: OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/dsp_init.o NEON-OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/alf.o \ + aarch64/h26x/qpel_neon.o \ aarch64/h26x/sao_neon.o diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 0aac140a8f..ea6245d9a3 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -46,6 +46,20 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) return; if (bd == 8) { + c->inter.put[0][1][0][1] = ff_vvc_put_qpel_h4_8_neon; + c->inter.put[0][2][0][1] = ff_vvc_put_qpel_h8_8_neon; + c->inter.put[0][3][0][1] = ff_vvc_put_qpel_h16_8_neon; + c->inter.put[0][4][0][1] = + c->inter.put[0][5][0][1] = + c->inter.put[0][6][0][1] = ff_vvc_put_qpel_h32_8_neon; + + c->inter.put_uni[0][1][0][1] = ff_vvc_put_qpel_uni_h4_8_neon; + c->inter.put_uni[0][2][0][1] = ff_vvc_put_qpel_uni_h8_8_neon; + c->inter.put_uni[0][3][0][1] = ff_vvc_put_qpel_uni_h16_8_neon; + c->inter.put_uni[0][4][0][1] = + c->inter.put_uni[0][5][0][1] = + c->inter.put_uni[0][6][0][1] = ff_vvc_put_qpel_uni_h32_8_neon; + for (int i = 0; i < FF_ARRAY_ELEMS(c->sao.band_filter); i++) c->sao.band_filter[i] = ff_h26x_sao_band_filter_8x8_8_neon; c->sao.edge_filter[0] = ff_vvc_sao_edge_filter_8x8_8_neon; From patchwork Wed Sep 11 18:06:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51511 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp479180vqi; Wed, 11 Sep 2024 11:07:18 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWgefiYDfvuniZbpWgC87LVpTId8/rNY7kDjko0zRID+dGR2GC6nRzT6GzA8yo+cyVylkb/W6FapO0YDydXLOXp@gmail.com X-Google-Smtp-Source: AGHT+IH/rQLXe9Hm9VjYUN2LW1sAmF88FvhHPEQ23+TlZXoSvps9OcUsUqHx0+iLJX4FBFqVPMbQ X-Received: by 2002:a17:907:72cb:b0:a8d:60e2:396b with SMTP id a640c23a62f3a-a9029678bf3mr27109866b.65.1726078038299; Wed, 11 Sep 2024 11:07:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726078038; cv=none; d=google.com; s=arc-20240605; b=g6kFcGtLiRockMAeAWX/XhruxSbHtgz0mt1uYQL9aoNUzJWFB2qQYEcsMaLyB4h3Ty tJgcDqwP1rIR04JRc1qcTh82IOh3/9wfC/LFwpFpFFyptFfH6y83vIJW/zQF29E/AImI cqblRnNAoVfFd14mVF0Q/Q9+0+rRef0buf372o0FL5QmrmjdIiT/rH6bsydgb8C4TP2a +o3VuExgBFugtv+glrZnWr598WohIYvZy7+ApYey90P7k31/Ghq/a4WcZW7Ec6aMmkoZ ltqo3oIfUhCPb5NJO1l85peJMNXYwT6qyxQg15Bh3ejTZKb3fCW30AFRIFoUP5qCH+AE tDDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=C4n0LTZpWlXnzhlN/ZlGvcqxZElPayYV9kkySiWJ1Co=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=WVMH3KPTGGKBxzHXcrcVNxHmP/TW8z1H8mqNEHNohY4S7eN8t2ZwKuXDPYNBthYA+U QuG64PZrKvEoTbcmqSNyNqByUxZkd2Wxr7vyWMJE5jqXnQbwSO48s4Pi00Z86aityuzk EmAMrWa+BwfCRS6irNERiSXUhK62E59tj6yLpJIaxl3dhZ0jzhUjHVv2Z/4MorjB8oJS vvtpJWvVihOjeIpZEkCtiUqtThRBXwQdh+SK+EtW5eqbSSiZLbQ8bwMusjQx3tcTQ3La QfVk0tHgC12kwreJk08fb7mbi4N81LOzZ8/d7bRgr7y5NHLCa0umLkLrYApRgL0AwJWd QlBg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=TwVeWNXI; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a8d25d9dff3si777279366b.1044.2024.09.11.11.07.17; Wed, 11 Sep 2024 11:07:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=TwVeWNXI; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6498568E2A0; Wed, 11 Sep 2024 21:06:42 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-236.mail.qq.com (out203-205-221-236.mail.qq.com [203.205.221.236]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A670968E256 for ; Wed, 11 Sep 2024 21:06:30 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077982; bh=GuV3vMib9DCP6ER19atN3BCNZAXmbcdZsagvlu2RhIs=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=TwVeWNXIR88bsYceeUG74PBohHEIfZJyR8Vj0Wl+ei8iWNR5qOvOT6mNOYohYzJWw E+6Z46Euu8KFT51rLaQvjdrAZPpaIMfGc6pgHba15HnRq4oVisDSYvQlap3VypgNSk Yswn8HAFtkrY29LWwTdgBAefNsGzSUYlaru90Olc= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077982tab5q8mqn Message-ID: X-QQ-XMAILINFO: NDgMZBR9sMmaFMVG5tTMCCBUdugyUi7rhIx8mRwzDdBy30h1esMq97swMVQIE9 0zI2Qy6sCbB0HhPs9rvxwPk3/s4Wc4gf2sLfg1cFnUAoh77Gc878cjAFxpmn9X7UoDR1N41Pu4Tm SrVTtgwKw7YOPAbrI8X9vNyLn9P6Sbb0SBQGFR8kAGsiSVDJa4CBUCO/Yk+LDrnY6k4CeFHztHjN ZRlqljJIcW9GMrcvTnsrL791aKQGAISSMXCN3+QxY+kdNlm+3coXj41msvb7ycuK2QVTOFb9s3Q5 gCzhfK3WLL7oRGH7JTvlnL78WnlJ05liZWy4GhZwC51O3z2vOyQyzYwvW2ewr4+HzIamFN/yYl5E FNSLkVbNMnogjdgVnZf6Q+vlG1pySqWNOJPHy6WFAWhNhcO0LW+I/ir3ngEBTz8WQTsSLoMuG0DZ i5+bua8y6+F7dmj8T2XY6h2wNjADnR/HDIRWAUnG9JKtsxuvczUk3hcAFaRRmuPJmFz1YQpTnjoj +86GhdYiHz8cvRwSd8YkBAdHyRCs+2QZnss9O2iIt3HOhw869u0NznocsdtKDJvknABHXB92b9XP 7hSGHiA6I0QfmmpNFvzby31LFbRkmiDdkE7t5rsjZnnF8HTh6B8oUgJVolZPOTsDEG1jOnCcCOB/ fCBrFRpv50gYKNGjNUFdVyQNnKQK7IViXqjpg37Mzc1F/Qn1VeH24Uy9EGDjzzeCY26S2tRGwbLU z/DCwoXzRHwsScLr519u9BCikYv/AgLJ1Hs7MzKgOHgtnnVzTXSOuI+ruxufHkeP3Kq+NyJOo63h i5Q0lcmXYEm5cJRaMuCZHqJZHp0kQTXnAqzY4Xn0ucQ7chpo+EWVJ/RSlVMb1e81mdXPq2tmHO3x tjbVM5pxTz7++QRAsNyxq3jkXhomAfRx4RssMsg2qxrmjnspWzu2gLH4YsPV257sQ2gFH/rHYmaC t5o3JzD/RQqgu37g/CtKr7uI5niiQzNJ9wzW0DuNeQF6YtZf2K2qyizPTNN57wnom1BnJ9C89uLX Nw1/kwKQ== X-QQ-XMRINFO: Nq+8W0+stu50PRdwbJxPCL0= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:08 +0800 X-OQ-MSGID: <20240911180618.28921-5-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 04/14] aarch64/vvc: Add put_pel/put_pel_uni/put_pel_uni_w X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: /UnbSGFdl3fD From: Zhao Zhili put_luma_pixels_8_4x4_c: 0.2 ( 1.00x) put_luma_pixels_8_4x4_neon: 0.2 ( 1.00x) put_luma_pixels_8_8x8_c: 0.7 ( 1.00x) put_luma_pixels_8_8x8_neon: 0.2 ( 3.22x) put_luma_pixels_8_16x16_c: 2.2 ( 1.00x) put_luma_pixels_8_16x16_neon: 0.2 ( 9.89x) put_luma_pixels_8_32x32_c: 8.2 ( 1.00x) put_luma_pixels_8_32x32_neon: 1.2 ( 6.71x) put_luma_pixels_8_64x64_c: 33.7 ( 1.00x) put_luma_pixels_8_64x64_neon: 2.5 (13.63x) put_luma_pixels_8_128x128_c: 145.5 ( 1.00x) put_luma_pixels_8_128x128_neon: 10.2 (14.23x) put_uni_pixels_luma_8_4x4_c: 0.5 ( 1.00x) put_uni_pixels_luma_8_4x4_neon: 0.0 ( 0.00x) put_uni_pixels_luma_8_8x8_c: 0.5 ( 1.00x) put_uni_pixels_luma_8_8x8_neon: 0.2 ( 2.11x) put_uni_pixels_luma_8_16x16_c: 1.2 ( 1.00x) put_uni_pixels_luma_8_16x16_neon: 0.2 ( 5.44x) put_uni_pixels_luma_8_32x32_c: 3.0 ( 1.00x) put_uni_pixels_luma_8_32x32_neon: 0.5 ( 6.26x) put_uni_pixels_luma_8_64x64_c: 3.0 ( 1.00x) put_uni_pixels_luma_8_64x64_neon: 1.7 ( 1.72x) put_uni_pixels_luma_8_128x128_c: 6.5 ( 1.00x) put_uni_pixels_luma_8_128x128_neon: 6.5 ( 1.00x) --- libavcodec/aarch64/h26x/dsp.h | 22 ++++ libavcodec/aarch64/h26x/epel_neon.S | 189 +++++++++++++++++----------- libavcodec/aarch64/h26x/qpel_neon.S | 81 +++++++++++- libavcodec/aarch64/vvc/Makefile | 1 + libavcodec/aarch64/vvc/dsp_init.c | 21 ++++ 5 files changed, 241 insertions(+), 73 deletions(-) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index f72746ce03..076d01b477 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -248,4 +248,26 @@ NEON8_FNPROTO_PARTIAL_4(qpel, (int16_t *dst, const uint8_t *_src, ptrdiff_t _src NEON8_FNPROTO_PARTIAL_4(qpel_uni, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width),) +#undef NEON8_FNPROTO_PARTIAL_6 +#define NEON8_FNPROTO_PARTIAL_6(fn, args, ext) \ + void ff_vvc_put_##fn##4_8_neon##ext args; \ + void ff_vvc_put_##fn##8_8_neon##ext args; \ + void ff_vvc_put_##fn##16_8_neon##ext args; \ + void ff_vvc_put_##fn##32_8_neon##ext args; \ + void ff_vvc_put_##fn##64_8_neon##ext args; \ + void ff_vvc_put_##fn##128_8_neon##ext args + +NEON8_FNPROTO_PARTIAL_6(pel_pixels, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, int height, + const int8_t *hf, const int8_t *vf, int width),); + +NEON8_FNPROTO_PARTIAL_6(pel_uni_pixels, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width),); + +NEON8_FNPROTO_PARTIAL_6(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + const int8_t *hf, const int8_t *vf, int width),); + #endif diff --git a/libavcodec/aarch64/h26x/epel_neon.S b/libavcodec/aarch64/h26x/epel_neon.S index 378b0f7fb2..8ca42a5c3a 100644 --- a/libavcodec/aarch64/h26x/epel_neon.S +++ b/libavcodec/aarch64/h26x/epel_neon.S @@ -19,7 +19,8 @@ */ #include "libavutil/aarch64/asm.S" -#define MAX_PB_SIZE 64 +#define HEVC_MAX_PB_SIZE 64 +#define VVC_MAX_PB_SIZE 128 const epel_filters, align=4 .byte 0, 0, 0, 0 @@ -131,8 +132,13 @@ endconst b.ne 1b .endm +function ff_vvc_put_pel_pixels4_8_neon, export=1 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc + function ff_hevc_put_hevc_pel_pixels4_8_neon, export=1 - mov x7, #(MAX_PB_SIZE * 2) + mov x7, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.s}[0], [x1], x2 ushll v4.8h, v0.8b, #6 subs w3, w3, #1 @@ -142,7 +148,7 @@ function ff_hevc_put_hevc_pel_pixels4_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_pixels6_8_neon, export=1 - mov x7, #(MAX_PB_SIZE * 2 - 8) + mov x7, #(HEVC_MAX_PB_SIZE * 2 - 8) 1: ld1 {v0.8b}, [x1], x2 ushll v4.8h, v0.8b, #6 st1 {v4.d}[0], [x0], #8 @@ -152,8 +158,13 @@ function ff_hevc_put_hevc_pel_pixels6_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_pixels8_8_neon, export=1 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc + function ff_hevc_put_hevc_pel_pixels8_8_neon, export=1 - mov x7, #(MAX_PB_SIZE * 2) + mov x7, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.8b}, [x1], x2 ushll v4.8h, v0.8b, #6 subs w3, w3, #1 @@ -163,7 +174,7 @@ function ff_hevc_put_hevc_pel_pixels8_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_pixels12_8_neon, export=1 - mov x7, #(MAX_PB_SIZE * 2 - 16) + mov x7, #(HEVC_MAX_PB_SIZE * 2 - 16) 1: ld1 {v0.8b, v1.8b}, [x1], x2 ushll v4.8h, v0.8b, #6 st1 {v4.8h}, [x0], #16 @@ -174,8 +185,13 @@ function ff_hevc_put_hevc_pel_pixels12_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_pixels16_8_neon, export=1 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc + function ff_hevc_put_hevc_pel_pixels16_8_neon, export=1 - mov x7, #(MAX_PB_SIZE * 2) + mov x7, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.8b, v1.8b}, [x1], x2 ushll v4.8h, v0.8b, #6 ushll v5.8h, v1.8b, #6 @@ -186,7 +202,7 @@ function ff_hevc_put_hevc_pel_pixels16_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_pixels24_8_neon, export=1 - mov x7, #(MAX_PB_SIZE * 2) + mov x7, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.8b-v2.8b}, [x1], x2 ushll v4.8h, v0.8b, #6 ushll v5.8h, v1.8b, #6 @@ -197,8 +213,13 @@ function ff_hevc_put_hevc_pel_pixels24_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_pixels32_8_neon, export=1 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc + function ff_hevc_put_hevc_pel_pixels32_8_neon, export=1 - mov x7, #(MAX_PB_SIZE * 2) + mov x7, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.8b-v3.8b}, [x1], x2 ushll v4.8h, v0.8b, #6 ushll v5.8h, v1.8b, #6 @@ -211,7 +232,7 @@ function ff_hevc_put_hevc_pel_pixels32_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_pixels48_8_neon, export=1 - mov x7, #(MAX_PB_SIZE) + mov x7, #(HEVC_MAX_PB_SIZE) 1: ld1 {v0.16b-v2.16b}, [x1], x2 ushll v4.8h, v0.8b, #6 ushll2 v5.8h, v0.16b, #6 @@ -226,26 +247,50 @@ function ff_hevc_put_hevc_pel_pixels48_8_neon, export=1 ret endfunc -function ff_hevc_put_hevc_pel_pixels64_8_neon, export=1 -1: ld1 {v0.16b-v3.16b}, [x1], x2 +.macro put_pel_pixels64_8_neon ushll v4.8h, v0.8b, #6 ushll2 v5.8h, v0.16b, #6 ushll v6.8h, v1.8b, #6 ushll2 v7.8h, v1.16b, #6 - st1 {v4.8h-v7.8h}, [x0], #(MAX_PB_SIZE) + st1 {v4.8h-v7.8h}, [x0], #64 ushll v16.8h, v2.8b, #6 ushll2 v17.8h, v2.16b, #6 ushll v18.8h, v3.8b, #6 ushll2 v19.8h, v3.16b, #6 + st1 {v16.8h-v19.8h}, [x0], x7 +.endm + +function ff_vvc_put_pel_pixels64_8_neon, export=1 + mov x7, #(2 * VVC_MAX_PB_SIZE - 64) + b 1f +endfunc + +function ff_hevc_put_hevc_pel_pixels64_8_neon, export=1 + mov x7, #(HEVC_MAX_PB_SIZE) +1: + ld1 {v0.16b-v3.16b}, [x1], x2 subs w3, w3, #1 - st1 {v16.8h-v19.8h}, [x0], #(MAX_PB_SIZE) + put_pel_pixels64_8_neon b.ne 1b ret endfunc +function ff_vvc_put_pel_pixels128_8_neon, export=1 + mov x7, #64 +1: + mov x6, x1 + ld1 {v0.16b-v3.16b}, [x6], #64 + add x1, x1, x2 + subs w3, w3, #1 + put_pel_pixels64_8_neon + ld1 {v0.16b-v3.16b}, [x6], #64 + put_pel_pixels64_8_neon + b.ne 1b + ret +endfunc function ff_hevc_put_hevc_pel_bi_pixels4_8_neon, export=1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.s}[0], [x2], x3 // src ushll v16.8h, v0.8b, #6 ld1 {v20.4h}, [x4], x10 // src2 @@ -258,7 +303,7 @@ function ff_hevc_put_hevc_pel_bi_pixels4_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_bi_pixels6_8_neon, export=1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) sub x1, x1, #4 1: ld1 {v0.8b}, [x2], x3 ushll v16.8h, v0.8b, #6 @@ -273,7 +318,7 @@ function ff_hevc_put_hevc_pel_bi_pixels6_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_bi_pixels8_8_neon, export=1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.8b}, [x2], x3 // src ushll v16.8h, v0.8b, #6 ld1 {v20.8h}, [x4], x10 // src2 @@ -286,7 +331,7 @@ function ff_hevc_put_hevc_pel_bi_pixels8_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_bi_pixels12_8_neon, export=1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) sub x1, x1, #8 1: ld1 {v0.16b}, [x2], x3 ushll v16.8h, v0.8b, #6 @@ -304,7 +349,7 @@ function ff_hevc_put_hevc_pel_bi_pixels12_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_bi_pixels16_8_neon, export=1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.16b}, [x2], x3 // src ushll v16.8h, v0.8b, #6 ushll2 v17.8h, v0.16b, #6 @@ -320,7 +365,7 @@ function ff_hevc_put_hevc_pel_bi_pixels16_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_bi_pixels24_8_neon, export=1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.8b-v2.8b}, [x2], x3 // src ushll v16.8h, v0.8b, #6 ushll v17.8h, v1.8b, #6 @@ -339,7 +384,7 @@ function ff_hevc_put_hevc_pel_bi_pixels24_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_bi_pixels32_8_neon, export=1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.16b-v1.16b}, [x2], x3 // src ushll v16.8h, v0.8b, #6 ushll2 v17.8h, v0.16b, #6 @@ -361,7 +406,7 @@ function ff_hevc_put_hevc_pel_bi_pixels32_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_bi_pixels48_8_neon, export=1 - mov x10, #(MAX_PB_SIZE) + mov x10, #(HEVC_MAX_PB_SIZE) 1: ld1 {v0.16b-v2.16b}, [x2], x3 // src ushll v16.8h, v0.8b, #6 ushll2 v17.8h, v0.16b, #6 @@ -369,7 +414,7 @@ function ff_hevc_put_hevc_pel_bi_pixels48_8_neon, export=1 ushll2 v19.8h, v1.16b, #6 ushll v20.8h, v2.8b, #6 ushll2 v21.8h, v2.16b, #6 - ld1 {v24.8h-v27.8h}, [x4], #(MAX_PB_SIZE) // src2 + ld1 {v24.8h-v27.8h}, [x4], #(HEVC_MAX_PB_SIZE) // src2 sqadd v16.8h, v16.8h, v24.8h sqadd v17.8h, v17.8h, v25.8h sqadd v18.8h, v18.8h, v26.8h @@ -399,12 +444,12 @@ function ff_hevc_put_hevc_pel_bi_pixels64_8_neon, export=1 ushll2 v21.8h, v2.16b, #6 ushll v22.8h, v3.8b, #6 ushll2 v23.8h, v3.16b, #6 - ld1 {v24.8h, v25.8h, v26.8h, v27.8h}, [x4], #(MAX_PB_SIZE) // src2 + ld1 {v24.8h, v25.8h, v26.8h, v27.8h}, [x4], #(HEVC_MAX_PB_SIZE) // src2 sqadd v16.8h, v16.8h, v24.8h sqadd v17.8h, v17.8h, v25.8h sqadd v18.8h, v18.8h, v26.8h sqadd v19.8h, v19.8h, v27.8h - ld1 {v24.8h, v25.8h, v26.8h, v27.8h}, [x4], #(MAX_PB_SIZE) + ld1 {v24.8h, v25.8h, v26.8h, v27.8h}, [x4], #(HEVC_MAX_PB_SIZE) sqadd v20.8h, v20.8h, v24.8h sqadd v21.8h, v21.8h, v25.8h sqadd v22.8h, v22.8h, v26.8h @@ -427,7 +472,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_h4_8_neon, export=1 load_epel_filterb x6, x7 sub x2, x2, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v4.8b}, [x2], x3 ext v5.8b, v4.8b, v4.8b, #1 ext v6.8b, v4.8b, v4.8b, #2 @@ -446,7 +491,7 @@ function ff_hevc_put_hevc_epel_bi_h6_8_neon, export=1 load_epel_filterb x6, x7 sub w1, w1, #4 sub x2, x2, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v24.16b}, [x2], x3 ext v26.16b, v24.16b, v24.16b, #1 ext v27.16b, v24.16b, v24.16b, #2 @@ -465,7 +510,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_h8_8_neon, export=1 load_epel_filterb x6, x7 sub x2, x2, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v24.16b}, [x2], x3 ext v26.16b, v24.16b, v24.16b, #1 ext v27.16b, v24.16b, v24.16b, #2 @@ -484,7 +529,7 @@ function ff_hevc_put_hevc_epel_bi_h12_8_neon, export=1 load_epel_filterb x6, x7 sub x1, x1, #8 sub x2, x2, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v24.16b}, [x2], x3 ext v26.16b, v24.16b, v24.16b, #1 ext v27.16b, v24.16b, v24.16b, #2 @@ -506,7 +551,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_h16_8_neon, export=1 load_epel_filterb x6, x7 sub x2, x2, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ldr q24, [x2] ldr s25, [x2, #16] add x2, x2, x3 @@ -529,7 +574,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_h24_8_neon, export=1 load_epel_filterb x6, x7 sub x2, x2, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v24.16b, v25.16b}, [x2], x3 ext v26.16b, v24.16b, v25.16b, #1 ext v27.16b, v24.16b, v25.16b, #2 @@ -556,7 +601,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_h32_8_neon, export=1 load_epel_filterb x6, x7 sub x2, x2, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ldp q24, q25, [x2] ldr s26, [x2, #32] add x2, x2, x3 @@ -589,7 +634,7 @@ function ff_hevc_put_hevc_epel_bi_h48_8_neon, export=1 load_epel_filterb x6, x7 sub x2, x2, #1 mov x7, #24 - mov x10, #(MAX_PB_SIZE * 2 - 48) + mov x10, #(HEVC_MAX_PB_SIZE * 2 - 48) 1: ld1 {v24.16b, v25.16b, v26.16b}, [x2] ldr s27, [x2, #48] add x2, x2, x3 @@ -683,7 +728,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_v4_8_neon, export=1 load_epel_filterb x7, x6 sub x2, x2, x3 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.s}[0], [x2], x3 ld1 {v17.s}[0], [x2], x3 ld1 {v18.s}[0], [x2], x3 @@ -705,7 +750,7 @@ function ff_hevc_put_hevc_epel_bi_v6_8_neon, export=1 load_epel_filterb x7, x6 sub x2, x2, x3 sub x1, x1, #4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8b}, [x2], x3 ld1 {v17.8b}, [x2], x3 ld1 {v18.8b}, [x2], x3 @@ -727,7 +772,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_v8_8_neon, export=1 load_epel_filterb x7, x6 sub x2, x2, x3 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8b}, [x2], x3 ld1 {v17.8b}, [x2], x3 ld1 {v18.8b}, [x2], x3 @@ -749,7 +794,7 @@ function ff_hevc_put_hevc_epel_bi_v12_8_neon, export=1 load_epel_filterb x7, x6 sub x1, x1, #8 sub x2, x2, x3 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.16b}, [x2], x3 ld1 {v17.16b}, [x2], x3 ld1 {v18.16b}, [x2], x3 @@ -774,7 +819,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_v16_8_neon, export=1 load_epel_filterb x7, x6 sub x2, x2, x3 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.16b}, [x2], x3 ld1 {v17.16b}, [x2], x3 ld1 {v18.16b}, [x2], x3 @@ -798,7 +843,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_v24_8_neon, export=1 load_epel_filterb x7, x6 sub x2, x2, x3 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8b, v17.8b, v18.8b}, [x2], x3 ld1 {v19.8b, v20.8b, v21.8b}, [x2], x3 ld1 {v22.8b, v23.8b, v24.8b}, [x2], x3 @@ -825,7 +870,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_v32_8_neon, export=1 load_epel_filterb x7, x6 sub x2, x2, x3 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.16b, v17.16b}, [x2], x3 ld1 {v18.16b, v19.16b}, [x2], x3 ld1 {v20.16b, v21.16b}, [x2], x3 @@ -895,7 +940,7 @@ endfunc function ff_hevc_put_hevc_epel_v4_8_neon, export=1 load_epel_filterb x5, x4 sub x1, x1, x2 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ldr s16, [x1] ldr s17, [x1, x2] add x1, x1, x2, lsl #1 @@ -915,7 +960,7 @@ endfunc function ff_hevc_put_hevc_epel_v6_8_neon, export=1 load_epel_filterb x5, x4 sub x1, x1, x2 - mov x10, #(MAX_PB_SIZE * 2 - 8) + mov x10, #(HEVC_MAX_PB_SIZE * 2 - 8) ldr d16, [x1] ldr d17, [x1, x2] add x1, x1, x2, lsl #1 @@ -936,7 +981,7 @@ endfunc function ff_hevc_put_hevc_epel_v8_8_neon, export=1 load_epel_filterb x5, x4 sub x1, x1, x2 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ldr d16, [x1] ldr d17, [x1, x2] add x1, x1, x2, lsl #1 @@ -956,7 +1001,7 @@ endfunc function ff_hevc_put_hevc_epel_v12_8_neon, export=1 load_epel_filterb x5, x4 sub x1, x1, x2 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ldr q16, [x1] ldr q17, [x1, x2] add x1, x1, x2, lsl #1 @@ -980,7 +1025,7 @@ endfunc function ff_hevc_put_hevc_epel_v16_8_neon, export=1 load_epel_filterb x5, x4 sub x1, x1, x2 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ldr q16, [x1] ldr q17, [x1, x2] add x1, x1, x2, lsl #1 @@ -1002,7 +1047,7 @@ endfunc function ff_hevc_put_hevc_epel_v24_8_neon, export=1 load_epel_filterb x5, x4 sub x1, x1, x2 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8b, v17.8b, v18.8b}, [x1], x2 ld1 {v19.8b, v20.8b, v21.8b}, [x1], x2 ld1 {v22.8b, v23.8b, v24.8b}, [x1], x2 @@ -1025,7 +1070,7 @@ endfunc function ff_hevc_put_hevc_epel_v32_8_neon, export=1 load_epel_filterb x5, x4 sub x1, x1, x2 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.16b, v17.16b}, [x1], x2 ld1 {v18.16b, v19.16b}, [x1], x2 ld1 {v20.16b, v21.16b}, [x1], x2 @@ -1327,7 +1372,7 @@ endfunc add x5, x5, x4, lsl #2 ld1r {v30.4s}, [x5] sub x1, x1, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) .endm function ff_hevc_put_hevc_epel_h4_8_neon, export=1 @@ -2179,7 +2224,7 @@ DISABLE_I8MM function hevc_put_hevc_epel_hv4_8_end_neon load_epel_filterh x5, x4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ldr d16, [sp] ldr d17, [sp, x10] add sp, sp, x10, lsl #1 @@ -2198,7 +2243,7 @@ endfunc function hevc_put_hevc_epel_hv6_8_end_neon load_epel_filterh x5, x4 mov x5, #120 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ldr q16, [sp] ldr q17, [sp, x10] add sp, sp, x10, lsl #1 @@ -2218,7 +2263,7 @@ endfunc function hevc_put_hevc_epel_hv8_8_end_neon load_epel_filterh x5, x4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ldr q16, [sp] ldr q17, [sp, x10] add sp, sp, x10, lsl #1 @@ -2238,7 +2283,7 @@ endfunc function hevc_put_hevc_epel_hv12_8_end_neon load_epel_filterh x5, x4 mov x5, #112 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -2258,7 +2303,7 @@ endfunc function hevc_put_hevc_epel_hv16_8_end_neon load_epel_filterh x5, x4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -2278,7 +2323,7 @@ endfunc function hevc_put_hevc_epel_hv24_8_end_neon load_epel_filterh x5, x4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h, v18.8h}, [sp], x10 ld1 {v19.8h, v20.8h, v21.8h}, [sp], x10 ld1 {v22.8h, v23.8h, v24.8h}, [sp], x10 @@ -2462,7 +2507,7 @@ epel_hv neon function hevc_put_hevc_epel_uni_hv4_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.4h}, [sp], x10 ld1 {v17.4h}, [sp], x10 ld1 {v18.4h}, [sp], x10 @@ -2481,7 +2526,7 @@ endfunc function hevc_put_hevc_epel_uni_hv6_8_end_neon load_epel_filterh x6, x5 sub x1, x1, #4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h}, [sp], x10 ld1 {v17.8h}, [sp], x10 ld1 {v18.8h}, [sp], x10 @@ -2501,7 +2546,7 @@ endfunc function hevc_put_hevc_epel_uni_hv8_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h}, [sp], x10 ld1 {v17.8h}, [sp], x10 ld1 {v18.8h}, [sp], x10 @@ -2521,7 +2566,7 @@ endfunc function hevc_put_hevc_epel_uni_hv12_8_end_neon load_epel_filterh x6, x5 sub x1, x1, #8 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -2543,7 +2588,7 @@ endfunc function hevc_put_hevc_epel_uni_hv16_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -2565,7 +2610,7 @@ endfunc function hevc_put_hevc_epel_uni_hv24_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h, v18.8h}, [sp], x10 ld1 {v19.8h, v20.8h, v21.8h}, [sp], x10 ld1 {v22.8h, v23.8h, v24.8h}, [sp], x10 @@ -3223,7 +3268,7 @@ DISABLE_I8MM function hevc_put_hevc_epel_uni_w_hv4_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.4h}, [sp], x10 ld1 {v17.4h}, [sp], x10 ld1 {v18.4h}, [sp], x10 @@ -3273,7 +3318,7 @@ endfunc function hevc_put_hevc_epel_uni_w_hv6_8_end_neon load_epel_filterh x6, x5 sub x1, x1, #4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h}, [sp], x10 ld1 {v17.8h}, [sp], x10 ld1 {v18.8h}, [sp], x10 @@ -3326,7 +3371,7 @@ endfunc function hevc_put_hevc_epel_uni_w_hv8_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h}, [sp], x10 ld1 {v17.8h}, [sp], x10 ld1 {v18.8h}, [sp], x10 @@ -3376,7 +3421,7 @@ endfunc function hevc_put_hevc_epel_uni_w_hv12_8_end_neon load_epel_filterh x6, x5 sub x1, x1, #8 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -3437,7 +3482,7 @@ endfunc function hevc_put_hevc_epel_uni_w_hv16_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -3498,7 +3543,7 @@ endfunc function hevc_put_hevc_epel_uni_w_hv24_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h, v18.8h}, [sp], x10 ld1 {v19.8h, v20.8h, v21.8h}, [sp], x10 ld1 {v22.8h, v23.8h, v24.8h}, [sp], x10 @@ -3795,7 +3840,7 @@ epel_uni_w_hv neon function hevc_put_hevc_epel_bi_hv4_8_end_neon load_epel_filterh x7, x6 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.4h}, [sp], x10 ld1 {v17.4h}, [sp], x10 ld1 {v18.4h}, [sp], x10 @@ -3816,7 +3861,7 @@ endfunc function hevc_put_hevc_epel_bi_hv6_8_end_neon load_epel_filterh x7, x6 sub x1, x1, #4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h}, [sp], x10 ld1 {v17.8h}, [sp], x10 ld1 {v18.8h}, [sp], x10 @@ -3838,7 +3883,7 @@ endfunc function hevc_put_hevc_epel_bi_hv8_8_end_neon load_epel_filterh x7, x6 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h}, [sp], x10 ld1 {v17.8h}, [sp], x10 ld1 {v18.8h}, [sp], x10 @@ -3860,7 +3905,7 @@ endfunc function hevc_put_hevc_epel_bi_hv12_8_end_neon load_epel_filterh x7, x6 sub x1, x1, #8 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -3885,7 +3930,7 @@ endfunc function hevc_put_hevc_epel_bi_hv16_8_end_neon load_epel_filterh x7, x6 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -3910,7 +3955,7 @@ endfunc function hevc_put_hevc_epel_bi_hv24_8_end_neon load_epel_filterh x7, x6 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h, v18.8h}, [sp], x10 ld1 {v19.8h, v20.8h, v21.8h}, [sp], x10 ld1 {v22.8h, v23.8h, v24.8h}, [sp], x10 @@ -3939,7 +3984,7 @@ endfunc function hevc_put_hevc_epel_bi_hv32_8_end_neon load_epel_filterh x7, x6 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h, v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h, v22.8h, v23.8h}, [sp], x10 ld1 {v24.8h, v25.8h, v26.8h, v27.8h}, [sp], x10 diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index a05009c9d6..47b3948f8b 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -1250,6 +1250,10 @@ function ff_hevc_put_hevc_qpel_bi_v64_8_neon, export=1 b X(ff_hevc_put_hevc_qpel_bi_v32_8_neon) endfunc +function ff_vvc_put_pel_uni_pixels4_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_pixels4_8_neon) +endfunc + function ff_hevc_put_hevc_pel_uni_pixels4_8_neon, export=1 1: ldr s0, [x2] @@ -1278,6 +1282,10 @@ function ff_hevc_put_hevc_pel_uni_pixels6_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_pixels8_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_pixels8_8_neon) +endfunc + function ff_hevc_put_hevc_pel_uni_pixels8_8_neon, export=1 1: ldr d0, [x2] @@ -1306,6 +1314,10 @@ function ff_hevc_put_hevc_pel_uni_pixels12_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_pixels16_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_pixels16_8_neon) +endfunc + function ff_hevc_put_hevc_pel_uni_pixels16_8_neon, export=1 1: ldr q0, [x2] @@ -1328,6 +1340,10 @@ function ff_hevc_put_hevc_pel_uni_pixels24_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_pixels32_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_pixels32_8_neon) +endfunc + function ff_hevc_put_hevc_pel_uni_pixels32_8_neon, export=1 1: ld1 {v0.16b, v1.16b}, [x2], x3 @@ -1346,6 +1362,10 @@ function ff_hevc_put_hevc_pel_uni_pixels48_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_pixels64_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_pixels64_8_neon) +endfunc + function ff_hevc_put_hevc_pel_uni_pixels64_8_neon, export=1 1: ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x2], x3 @@ -1355,6 +1375,19 @@ function ff_hevc_put_hevc_pel_uni_pixels64_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_pixels128_8_neon, export=1 + sub x1, x1, #64 + sub x3, x3, #64 +1: + ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x2], #64 + subs w4, w4, #1 + ld1 {v4.16b, v5.16b, v6.16b, v7.16b}, [x2], x3 + st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x0], #64 + st1 {v4.16b, v5.16b, v6.16b, v7.16b}, [x0], x1 + b.ne 1b + ret +endfunc + function ff_hevc_put_hevc_qpel_uni_v4_8_neon, export=1 load_qpel_filterb x6, x5 sub x2, x2, x3, lsl #1 @@ -1528,6 +1561,10 @@ function ff_hevc_put_hevc_qpel_uni_v64_8_neon, export=1 b X(ff_hevc_put_hevc_qpel_uni_v16_8_neon) endfunc +function ff_vvc_put_pel_uni_w_pixels4_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_w_pixels4_8_neon) +endfunc + function ff_hevc_put_hevc_pel_uni_w_pixels4_8_neon, export=1 mov w10, #-6 sub w10, w10, w5 @@ -1598,6 +1635,10 @@ function ff_hevc_put_hevc_pel_uni_w_pixels6_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_w_pixels8_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_w_pixels8_8_neon) +endfunc + function ff_hevc_put_hevc_pel_uni_w_pixels8_8_neon, export=1 mov w10, #-6 sub w10, w10, w5 @@ -1741,7 +1782,9 @@ function ff_hevc_put_hevc_pel_uni_w_pixels16_8_neon, export=1 ret endfunc - +function ff_vvc_put_pel_uni_w_pixels16_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_w_pixels16_8_neon) +endfunc function ff_hevc_put_hevc_pel_uni_w_pixels24_8_neon, export=1 mov w10, #-6 @@ -1803,6 +1846,9 @@ function ff_hevc_put_hevc_pel_uni_w_pixels32_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_w_pixels32_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_w_pixels32_8_neon) +endfunc function ff_hevc_put_hevc_pel_uni_w_pixels48_8_neon, export=1 mov w10, #-6 @@ -1839,6 +1885,39 @@ function ff_hevc_put_hevc_pel_uni_w_pixels64_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_w_pixels64_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_w_pixels64_8_neon) +endfunc + +function ff_vvc_put_pel_uni_w_pixels128_8_neon, export=1 + mov w10, #-6 + sub w10, w10, w5 + dup v30.8h, w6 + dup v31.4s, w10 + dup v29.4s, w7 + sub x1, x1, #64 + sub x3, x3, #64 +1: + mov x11, x2 + mov x12, x0 + ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x2], #64 + PEL_UNI_W_PIXEL_CALC v0, v4, v5, v16, v17, v18, v19 + PEL_UNI_W_PIXEL_CALC v1, v6, v7, v20, v21, v22, v23 + PEL_UNI_W_PIXEL_CALC v2, v4, v5, v16, v17, v18, v19 + PEL_UNI_W_PIXEL_CALC v3, v6, v7, v20, v21, v22, v23 + st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x0], #64 + + ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x2], x3 + subs w4, w4, #1 + PEL_UNI_W_PIXEL_CALC v0, v4, v5, v16, v17, v18, v19 + PEL_UNI_W_PIXEL_CALC v1, v6, v7, v20, v21, v22, v23 + PEL_UNI_W_PIXEL_CALC v2, v4, v5, v16, v17, v18, v19 + PEL_UNI_W_PIXEL_CALC v3, v6, v7, v20, v21, v22, v23 + st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x0], x1 + b.ne 1b + ret +endfunc + .macro QPEL_UNI_W_V_HEADER ldur x12, [sp, #8] // my sub x2, x2, x3, lsl #1 diff --git a/libavcodec/aarch64/vvc/Makefile b/libavcodec/aarch64/vvc/Makefile index a5ad24dfc5..a1c1f03e27 100644 --- a/libavcodec/aarch64/vvc/Makefile +++ b/libavcodec/aarch64/vvc/Makefile @@ -3,5 +3,6 @@ clean:: OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/dsp_init.o NEON-OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/alf.o \ + aarch64/h26x/epel_neon.o \ aarch64/h26x/qpel_neon.o \ aarch64/h26x/sao_neon.o diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index ea6245d9a3..457be8c725 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -46,6 +46,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) return; if (bd == 8) { + c->inter.put[0][1][0][0] = ff_vvc_put_pel_pixels4_8_neon; + c->inter.put[0][2][0][0] = ff_vvc_put_pel_pixels8_8_neon; + c->inter.put[0][3][0][0] = ff_vvc_put_pel_pixels16_8_neon; + c->inter.put[0][4][0][0] = ff_vvc_put_pel_pixels32_8_neon; + c->inter.put[0][5][0][0] = ff_vvc_put_pel_pixels64_8_neon; + c->inter.put[0][6][0][0] = ff_vvc_put_pel_pixels128_8_neon; + c->inter.put[0][1][0][1] = ff_vvc_put_qpel_h4_8_neon; c->inter.put[0][2][0][1] = ff_vvc_put_qpel_h8_8_neon; c->inter.put[0][3][0][1] = ff_vvc_put_qpel_h16_8_neon; @@ -53,6 +60,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][5][0][1] = c->inter.put[0][6][0][1] = ff_vvc_put_qpel_h32_8_neon; + c->inter.put_uni[0][1][0][0] = ff_vvc_put_pel_uni_pixels4_8_neon; + c->inter.put_uni[0][2][0][0] = ff_vvc_put_pel_uni_pixels8_8_neon; + c->inter.put_uni[0][3][0][0] = ff_vvc_put_pel_uni_pixels16_8_neon; + c->inter.put_uni[0][4][0][0] = ff_vvc_put_pel_uni_pixels32_8_neon; + c->inter.put_uni[0][5][0][0] = ff_vvc_put_pel_uni_pixels64_8_neon; + c->inter.put_uni[0][6][0][0] = ff_vvc_put_pel_uni_pixels128_8_neon; + c->inter.put_uni[0][1][0][1] = ff_vvc_put_qpel_uni_h4_8_neon; c->inter.put_uni[0][2][0][1] = ff_vvc_put_qpel_uni_h8_8_neon; c->inter.put_uni[0][3][0][1] = ff_vvc_put_qpel_uni_h16_8_neon; @@ -60,6 +74,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put_uni[0][5][0][1] = c->inter.put_uni[0][6][0][1] = ff_vvc_put_qpel_uni_h32_8_neon; + c->inter.put_uni_w[0][1][0][0] = ff_vvc_put_pel_uni_w_pixels4_8_neon; + c->inter.put_uni_w[0][2][0][0] = ff_vvc_put_pel_uni_w_pixels8_8_neon; + c->inter.put_uni_w[0][3][0][0] = ff_vvc_put_pel_uni_w_pixels16_8_neon; + c->inter.put_uni_w[0][4][0][0] = ff_vvc_put_pel_uni_w_pixels32_8_neon; + c->inter.put_uni_w[0][5][0][0] = ff_vvc_put_pel_uni_w_pixels64_8_neon; + c->inter.put_uni_w[0][6][0][0] = ff_vvc_put_pel_uni_w_pixels128_8_neon; + for (int i = 0; i < FF_ARRAY_ELEMS(c->sao.band_filter); i++) c->sao.band_filter[i] = ff_h26x_sao_band_filter_8x8_8_neon; c->sao.edge_filter[0] = ff_vvc_sao_edge_filter_8x8_8_neon; From patchwork Wed Sep 11 18:06:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51513 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp479400vqi; Wed, 11 Sep 2024 11:07:42 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXIlEbTJcqYXS/U6+MlgCNEsveRRiU8igdc+O5Pl33DsipGj+AmfjYXw66weKm1C4f3LnTwkuu1NR+UaRXz7oQT@gmail.com X-Google-Smtp-Source: AGHT+IG4/lnOETZUgygpa2Trl6xXSLab8F95isLfh0bTfBXKLzD+QLrTdNwrBsAf2PCuqAvAfZg3 X-Received: by 2002:a17:907:9618:b0:a86:ac9e:45fd with SMTP id a640c23a62f3a-a902974b93fmr22079866b.62.1726078062213; Wed, 11 Sep 2024 11:07:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726078062; cv=none; d=google.com; s=arc-20240605; b=R01KPnvlv8L+4RCocZa1O/TRIJmh2f2/ofRJ+EAvKdQwABbZ0KdYXX99XvYqKnBFxA VnC+gQoyTZxLrzlyhveg2JgIFlIsmMpDVBq/oJ94SL6jHHy2wDeeuBslNtwW2eg4i7it FB/rL/UaeYQFYR+ElKiyx0Sg5bANUHLROg8iY6a+Onr2CE+FrhD6NIvbXS1UoDs2dEtA 2b640yujgn9tO8zX8AlnqY7bXCYXO1dDaTWXjcqeSc1kbqu4LMYmSuFeABBLj3X6UnPA /S/YfBn5YVsOKYScQUE+4jp3WHs3cDuH2MXBQ3HhGAtexDi2DFTOGOTNqlPHz0fErFF4 fQ8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=TlB+Rb0lcXKpDyHtYaNUg3WGSpAQo/2l8EsB7UV7j2w=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=SKpyHc6sVJqnUGbpkRmaKnUeKIyhFvJ7lI0xcL3uHvzZ7C/ZOtlkYmgOg3GkqwWOJi DEJ0+cYpB6Itstc2Yju1s+SI9m2ZKVH99UK8vsZf96hq5ivBOI3LYvaDsHU50l01zKG0 IUNceLXmqqdRhCFFfm3swjVlDvxFzHrWD/QeCjdgQUsi3cmA/+TSHi5KGr1qV9RghCQD 8oB4D99//yT0MXYMe6hRCSw6PbDx7oUTTeW7EGLYTiIFxSZwiUn0GEvr448PA4iKjf3M QxJYQnmXLhcb25Dkm7RIxrDmJ1HTJCOxsb8oROx58HHvfjrTSMscpR40y7VOlhcgOk3R m/IQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=zGpUElhG; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a8d25ce92f6si724261166b.499.2024.09.11.11.07.41; Wed, 11 Sep 2024 11:07:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=zGpUElhG; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EE81868E2BD; Wed, 11 Sep 2024 21:06:44 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-205.mail.qq.com (out203-205-221-205.mail.qq.com [203.205.221.205]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3E2DA68E274 for ; Wed, 11 Sep 2024 21:06:30 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077983; bh=/buJoYZS/K9UVfRbTaCk8sH7S8GN7uGTGBrRvxC1UFU=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=zGpUElhGLHaGGGvA/0vLsDoSa6VriBDntcw8fSxZiWk8Vbatc6iH9wE0a3YFge8V/ ph/1lxc2wqKEnabWc4PwnK8Mbszka7gL9liYNXqyLYLBhsbqcvj0LCwbmciJs8b+tn 2GTzGkA+O0znGLDvciymlbqdYCsFB7KojDrOf2KQ= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077982t9lq03a4u Message-ID: X-QQ-XMAILINFO: NVJ0hJNx7N5SoZ9zmMMHKuutgP8jElMIhDoWOLUepf24mJfLTW2URxf0zM+FeE M1aSsSqgoc/iwa5v8Ed/Yz59Y61XdkXv1ZaDS1cjY9qABbsUdhoHIh9EcNkASp2PueibBLAZORfW sASl3Aoc+jYrm/KTxGgnd8B+FE4O86faLWME5vFkGgj1qnE74RlmTTkT9XBQw9YADAx3TEaC5Km6 fQx1J+EXuknn37yO1XZnX97C0HmzlkS8T3DH/vXb8LvWH/3tYPGh5Is5zwVwMTcoTEBZJB5EtJgH kIMHDOFqGdWV7rqv1FG5si14wX22X35d4dGMN0e4vOBV5zq84bQZKVADpWrpzMrEPFlfqyr7c1a7 WgTw1bWeZNM5hrAVqZY5+koP/qt9dMlf+3JoiH2ZiRblJmHIKyEWJYAPky7pexluRdJz74Ty/RsZ y4YAJgFnsKEqRDtujswcNlWx3LgX2nv47srdVdBdlkCWruELzejku9fN0m8Nf/06CkHB9FLEKdm/ W3X/6VQj3xNtJRslKUzTS6dRjUYOMZEPvna6dxvLUAowiGxaj/CPAB+RuFZRIJusYmQ9adweIAF4 EEK7QlUsmSd5/yizX/nnBSz6TmEdFmLVSUFznaT96hoK4MhCrqNtrwhBA/Jyp6KEx1reP2F759Yz G1QRMCwgSARMP5XZpyOrB6QpGjVnL5Z+pcd65YVQ0lq396/dcmbGZDHguNJt/QW6x+7FWTburfZ7 kI1zC7VkilaXhCbKlsxMxb5XF6GOc4lmxsuLbjgoZHcq+sglGcDWZhCjf89Epo6VjzyLrZj9QhIs EefdlmfR76F04iI5yV84NDM7sDlH5R/Momg+5msAHVwYAWsuKuNzCUj85rSWllcCnLs4G7AvbYjj eypHmEEAnXyWknflf89CW47F6tUN+ePBynjmFsQh5tWSnlb424hKThXX82ETZ6i028NH7b/j43MA A2KffHNasQX/foLRoS6TukGsoBct5w1ZPGlNUzB5BTUH4dw+0gXfiLW2ppx+xdkugNhxn0e1t+bi qsXCmkCA== X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:09 +0800 X-OQ-MSGID: <20240911180618.28921-6-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 05/14] aarch64/vvc: Add put_qpel_hx i8mm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: mVSB1S+94v2o From: Zhao Zhili Benchmark on Android pixel 8 with -fno-vectorize put_luma_h_8_4x4_c: 0.2 ( 1.00x) put_luma_h_8_4x4_neon: 0.2 ( 1.00x) put_luma_h_8_4x4_i8mm: 0.0 ( 0.00x) put_luma_h_8_8x8_c: 1.5 ( 1.00x) put_luma_h_8_8x8_neon: 0.5 ( 3.00x) put_luma_h_8_8x8_i8mm: 0.5 ( 3.00x) put_luma_h_8_16x16_c: 6.2 ( 1.00x) put_luma_h_8_16x16_neon: 2.0 ( 3.12x) put_luma_h_8_16x16_i8mm: 1.5 ( 4.17x) put_luma_h_8_32x32_c: 25.5 ( 1.00x) put_luma_h_8_32x32_neon: 9.0 ( 2.83x) put_luma_h_8_32x32_i8mm: 6.8 ( 3.78x) put_luma_h_8_64x64_c: 99.8 ( 1.00x) put_luma_h_8_64x64_neon: 35.2 ( 2.83x) put_luma_h_8_64x64_i8mm: 27.2 ( 3.66x) put_luma_h_8_128x128_c: 422.0 ( 1.00x) put_luma_h_8_128x128_neon: 138.5 ( 3.05x) put_luma_h_8_128x128_i8mm: 109.2 ( 3.86x) --- libavcodec/aarch64/h26x/dsp.h | 4 ++ libavcodec/aarch64/h26x/qpel_neon.S | 68 ++++++++++++++++++++++++++--- libavcodec/aarch64/vvc/dsp_init.c | 9 ++++ 3 files changed, 76 insertions(+), 5 deletions(-) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 076d01b477..323a253257 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -270,4 +270,8 @@ NEON8_FNPROTO_PARTIAL_6(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride, int height, int denom, int wx, int ox, const int8_t *hf, const int8_t *vf, int width),); +NEON8_FNPROTO_PARTIAL_6(qpel_h, (int16_t * dst, + const uint8_t *_src, ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width), _i8mm); + #endif diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index 47b3948f8b..1fa5a1dd0e 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -3516,6 +3516,17 @@ endfunc sub x1, x1, #3 .endm +.macro VVC_QPEL_H_HEADER + ld1r {v31.2d}, [x4] + sub x1, x1, #3 +.endm + +function ff_vvc_put_qpel_h4_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h4_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3572,6 +3583,12 @@ function ff_hevc_put_hevc_qpel_h6_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_qpel_h8_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h8_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3656,6 +3673,12 @@ function ff_hevc_put_hevc_qpel_h12_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_qpel_h16_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h16_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3746,6 +3769,13 @@ function ff_hevc_put_hevc_qpel_h24_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_qpel_h32_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + add x15, x0, #32 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h32_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3881,10 +3911,7 @@ function ff_hevc_put_hevc_qpel_h48_8_neon_i8mm, export=1 ret endfunc -function ff_hevc_put_hevc_qpel_h64_8_neon_i8mm, export=1 - QPEL_H_HEADER - sub x2, x2, #64 -1: +.macro put_qpel_h64_8_neon_i8mm ld1 {v16.16b, v17.16b, v18.16b, v19.16b}, [x1], #64 ext v1.16b, v16.16b, v17.16b, #1 ext v2.16b, v16.16b, v17.16b, #2 @@ -3975,11 +4002,42 @@ function ff_hevc_put_hevc_qpel_h64_8_neon_i8mm, export=1 sqxtn2 v20.8h, v26.4s sqxtn v21.4h, v23.4s sqxtn2 v21.8h, v27.4s - stp q20, q21, [x0], #32 + stp q20, q21, [x0] + add x0, x0, x10 +.endm + +function ff_vvc_put_qpel_h64_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #(VVC_MAX_PB_SIZE * 2 - 32 * 3) + sub x2, x2, #64 + b 1f +endfunc + +function ff_hevc_put_hevc_qpel_h64_8_neon_i8mm, export=1 + QPEL_H_HEADER + mov x10, #32 + sub x2, x2, #64 +1: + put_qpel_h64_8_neon_i8mm + subs w3, w3, #1 + b.ne 1b + ret +endfunc + +function ff_vvc_put_qpel_h128_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + sub x11, x2, #128 + mov x10, #32 + mov x2, #0 +1: + put_qpel_h64_8_neon_i8mm subs w3, w3, #1 + put_qpel_h64_8_neon_i8mm + add x1, x1, x11 b.ne 1b ret endfunc + DISABLE_I8MM #endif diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 457be8c725..bcc7df8f6c 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -88,6 +88,15 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->sao.edge_filter[i] = ff_vvc_sao_edge_filter_16x16_8_neon; c->alf.filter[LUMA] = alf_filter_luma_8_neon; c->alf.filter[CHROMA] = alf_filter_chroma_8_neon; + + if (have_i8mm(cpu_flags)) { + c->inter.put[0][1][0][1] = ff_vvc_put_qpel_h4_8_neon_i8mm; + c->inter.put[0][2][0][1] = ff_vvc_put_qpel_h8_8_neon_i8mm; + c->inter.put[0][3][0][1] = ff_vvc_put_qpel_h16_8_neon_i8mm; + c->inter.put[0][4][0][1] = ff_vvc_put_qpel_h32_8_neon_i8mm; + c->inter.put[0][5][0][1] = ff_vvc_put_qpel_h64_8_neon_i8mm; + c->inter.put[0][6][0][1] = ff_vvc_put_qpel_h128_8_neon_i8mm; + } } else if (bd == 10) { c->alf.filter[LUMA] = alf_filter_luma_10_neon; c->alf.filter[CHROMA] = alf_filter_chroma_10_neon; From patchwork Wed Sep 11 18:06:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51514 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp482756vqi; Wed, 11 Sep 2024 11:14:14 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCX1BUffMGj/Io/tEjNj25xzNj06iJnv6ccqhmGRY2iMHISPv3RmqN61jT1iHk7Ii35ynlxrOmNQPsHUG97jNNbD@gmail.com X-Google-Smtp-Source: AGHT+IGnMlKgv88vvDHulGwj6FDYHiuNWN53rYDpevjMNB/lGkGkkF8347pty5JlVfpeWzSmLSQm X-Received: by 2002:a05:6402:2793:b0:5c2:5523:7de5 with SMTP id 4fb4d7f45d1cf-5c413e53f46mr83271a12.6.1726078453807; Wed, 11 Sep 2024 11:14:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726078453; cv=none; d=google.com; s=arc-20240605; b=JOHM9xF8I4UrjMco+Y9D1OmgTsrTLyLkkI+/Sdzlny8BwUBJqvrS+38MEoPOo6x8a9 8RL2ufsCrIQDDJ1oVfwKFYSwdsIIc01jz1MxcRhMuDTbvI1ifvBMGsLvZvXLv3yyuAT1 h5RJznIzjc9QJPzRm3ug8CrNr0BAyO/6JaBhbEAs+Utm9btKdcjpZKGAkur/Qt9W1Hzc cUtyJglkpx4nEgjmC8vcgRoKbz3H82ed/9I1+3Lusi+oBQPTC5CLgyvhA9GmhvolxUor 7//tzFIkoIFe+fWpd3+NwkJrZ8qZRpC3iEVqrTV8m79L971jA54tvleLuX7quShu4ohX xT1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=8dQxN7CNwzNwaX+23aLhtZ2PSqX2E/WNSUIzUjdM4qM=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=k2Dp31ztDcsveyF7K13wiW9MN35cOxk5qCDLelOHxz9Rs7ceESXWoyg0pphErhGtN9 7X8x6VZYbQMAYmfR1cZqJL0EuJMZJlW2ta+imMdH8rgXYs9SLKrySaEzkIMEOXJzagym aTll7vzExQQshbe2RXDe9YAfFck/WR5vTRQT5sBYDUK//uJZOdZhpbfjng3eS4uTExsw 7gxhb3IclZiFMoOADYEN7BM/87CIH6EitOLAC/n+ORI00xsY9nH1OXTQXCVKAfcT9vTf VFdyi3CMrgSTpzCSIY2KznO6/8GVf727AaZfhURNoYn3XbbaghwGUa4rgSegH9TNxVbY edUQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=ZQKhqcQu; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5c3ebd42cc6si6909797a12.62.2024.09.11.11.14.13; Wed, 11 Sep 2024 11:14:13 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=ZQKhqcQu; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3361268E2C4; Wed, 11 Sep 2024 21:06:46 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-231.mail.qq.com (out203-205-221-231.mail.qq.com [203.205.221.231]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id AA2CD68E22A for ; Wed, 11 Sep 2024 21:06:31 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077983; bh=YTaxjzotbmztkWkMZEXlplKPjFur/rff4PgOEMSAxAk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=ZQKhqcQuBKN6pwp7gurPreiES3kSmwdMo6dcZlJILjjAs00DRVPOMrOf8RE1ZUeUI AgQCsDgpxcTCXqdBxZrIqP+yLTSUaZ/mE0wjMmsHDqqxLrMQB6P9rp04E4yQdn7Ruf kyt9P9YettIcl9+Q7M+vTEUlKXFW09ce3jZbYhXQ= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077983tmr67vskl Message-ID: X-QQ-XMAILINFO: ONalP3paAs/o9UhhfgYC4lQb57rZbiuDVPGdEfQ04V6gP+eDBLhsCZ2Q5bslpZ vKAdTTDeM2ozf08yzsU7BNN5GHiGTkQ0MU/qL22NBUbW2Z7lzc7UI5mCtMUNF+wtcNmZLBtPQSXW /J4BDvvw6zPzlnv46Hu4bEG5ueBYJu2jR4qUJuhl1jOc20kw0ts25UjU1Nh7ddfbuTgC034Lq5HE JdQ7YVBbnFS9TA78kMUmW78BZS8o8tCwPOpODbKUcsLvfBydtoxgfjBqAwq5r2ErE5VYSG7mgR1D ZXNOzh1kIG9EST3039fZIN/HWfxITRpoxP7Cj5XLiIaKRcqqNOvLt5D84UtxYq4Men8wAdkMFB4U 2Ol1gT1kMKK0SKEW1Mp77xsfLf/R9sNsPgQBufMZb1SeT/S3yJ4WWRQ+CfLSnchmRMQXRgi/WioP snzXHVsiZ8B/P0X913Beea/ahHM2r2QIqB1Xo4UHbTxtn+LJqHMY7ldHXW0SnYuerTLiwpE9xY82 YCbFNtcR9NUC5optKkXUJcJvDa9m8nW1MYNItUEngMgfPB/DtPCqAFSn2NhjmUzbanIcacd3J8DW jRpku9GRWTGsIlFv3L/ZPDnrRqQR61axTLftDWuW9fwVDwZQRgZWFwhGj30gr3Wi98tcLTk2fcxU F+sBLZUKLBU2BqUPIU/QR5wvfVM3B9L0cBD0xnJKW1YCBpPeFZ18JdJEEQ+9sqPq18Ojx56R4jY/ dWvSFsamoZlGQKWiOeTyHoIFRgTkKTZNQLfVLATh+Wa8pb7pu7B7ZZ+aSbTPCUe5j0617FlrH4m1 9dkrgwwZsGEyXQQPxyfW+3AHubv92gOyzh5KMokMmablfhGEgy8CKgwT0RqND0/h1S6XRRjly+2z MrNMlLy7Zw5C+wlbn1v5t7RVkKMJNnp3s0Kfxs2NFLvyZZSgk+OMTZUdvbKvcZWfRPdLPPHoXv3t xxLAHKvZcraD4guM9NmlrYv1iL+4B2XI9zubzVBx9jIpEeRk5CNu9dvjpD0bcMcUIETCLAQaI= X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:10 +0800 X-OQ-MSGID: <20240911180618.28921-7-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 06/14] avcodec/hevc: ff_hevc_(qpel/epel)_filters are signed type X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: x6zZywTZivcI From: Zhao Zhili --- libavcodec/hevc/dsp_template.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/hevc/dsp_template.c b/libavcodec/hevc/dsp_template.c index aebccd1a0c..a0f79c2673 100644 --- a/libavcodec/hevc/dsp_template.c +++ b/libavcodec/hevc/dsp_template.c @@ -302,8 +302,8 @@ IDCT_DC(32) //////////////////////////////////////////////////////////////////////////////// #define ff_hevc_pel_filters ff_hevc_qpel_filters #define DECL_HV_FILTER(f) \ - const uint8_t *hf = ff_hevc_ ## f ## _filters[mx]; \ - const uint8_t *vf = ff_hevc_ ## f ## _filters[my]; + const int8_t *hf = ff_hevc_ ## f ## _filters[mx]; \ + const int8_t *vf = ff_hevc_ ## f ## _filters[my]; #define FW_PUT(p, f, t) \ static void FUNC(put_hevc_## f)(int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height, \ From patchwork Wed Sep 11 18:06:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51520 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp487666vqi; Wed, 11 Sep 2024 11:24:12 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUE2cRothEYWj4inK6aoEWqW9Rbln0ZRETgNTnB4C2PgAr9r7gjWjmkgbcIAzAqqLZtEahf1hEZODiD1pnYaYkZ@gmail.com X-Google-Smtp-Source: AGHT+IH/2IMrT8AP8aiFgX66I1wTbFE6zcdIc6fH2wAlTn01d50jmolwr0w+nDGsRR8yL4RmJl66 X-Received: by 2002:a2e:bd86:0:b0:2ef:2405:ff63 with SMTP id 38308e7fff4ca-2f787dbe555mr223141fa.5.1726079051993; Wed, 11 Sep 2024 11:24:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726079051; cv=none; d=google.com; s=arc-20240605; b=FRY0sQXX6MgY1JqkcChBr1uuAvXIqgpWD4uJHGufBye3/WAZn/IScP7BcYILr8qYM+ Bk1nuNGjHBvhozFPz76clKy2VpaNGIVDJo6AiwVEUJxmf7ABZSi5jid8IvGnp7ENXZec 0+x6/XykbxPRhkjlBjOolieF9bE4lRQsO8HrdJOU+0bs1M+7A/V0RRftDQ4n95nCJhmf 1RNClvlZJvq5VKGumr9mIobBAtxMyAB0eVBKCANJKouU5ZKe2Cm2u2cvmMJ5bSRG+mTB O+UJdbNRuEO33VC5hIIPXO5Xypm0R08g9gILFF4gag4i8b2hm+9heBUTgbxngqUpjLnC JBNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=BOf0dyumV3REoTXKo0eNh1DPIV6pZ0ZDZP4ofJpkd80=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=VRIPqKebOBib+zsnjUJrQIMISWJjqzBv508h9eFbzbgQqaGVD+iJKapG8Bzx1XRyZ6 yS5tjP/jpQdycgtlD7VmNEiTXPMF8WeJnEcc7y/VoNidOgPO1rL2K+gLPA+UCk/mJ4xD dxL83E1WCf2Or/lJI2tCaHpElS+8bfXOM2373RwQuK3HOd8ZwsbXbapMwlS0r4U/fjYu s9ds7iWMqauQDHAl7d8jGlThIQaFlzOrbWKt0iiuuR7hehVCfrthmJOoNAHHSAIGRzEI nA++BxAMZRwU20YwkkOu2sHnsupXNzZmochCVRw/f27t6cKNlzB98aHgQGKK3QSvobka BeRQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=gh94+5Zd; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f75bfd2e42si30292711fa.63.2024.09.11.11.24.11; Wed, 11 Sep 2024 11:24:11 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=gh94+5Zd; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2EED268E2D0; Wed, 11 Sep 2024 21:06:47 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from xmbghk7.mail.qq.com (xmbghk7.mail.qq.com [43.163.128.53]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CCA4268E274 for ; Wed, 11 Sep 2024 21:06:32 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077984; bh=DOkzW7mR58PeyTD7dsAEsLBiBTjYHuzAeQjc0HUWa+M=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=gh94+5Zdmd1lz/wSOVrSOJKbKjezl89eA+7B4z3UFkxIt6RH9yHm10SXR6VuhQflP XQRgaOKcSI+5hC2iRRT6N+b4BSFKTlJnAxb6goyb5r/1Vf7vK2K2QKUVPAviDeYarX P2kxOhliOOLuLX+zElHUIYjolDEL+7F1Pvfwr+Go= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077983th9p3j26l Message-ID: X-QQ-XMAILINFO: OXtRplkh+jv9Z8b2GWglqLVpDS/654Jq5oVDI0Lm15fyJPLnTmPQa4aCKrTNfE p4oDoe/UxCPD4EiM8XchYv50FWpJ7UtFxEHPDIxi6MhRAE4oR8wt5m64HOXxcizDES9dARuJIE2E xgSzLaj1EowMNVKGhYkvHPTjoAe5wKEW3Dm9gY1m7vYMH+KZ0F5zQ22J83kP1Y8ZoygSnL9dtGfl WfneSRVz+XMj9l6KQPYxvhD2YS3h8E7WU34UAhzBL+Cx2Jl0vIIcqB0Pao3ripF/nUAuRxFwtXKJ xjRNuGiErnBG2Y6n26aFG60Rcgr3IZSc90b+7Vmr7xJOnvxUlHGA3B3PzXG0YwhG2ed/wAmKW9jY mzR/Fn+D618k64firKZT57YdhB4O9kS1lAlx1sopIIiiKNGHCmZMrtPCsTdu+r09RVp0yZo6mA/I 65pYmTvngIHYOl1o60zZxAm3XOtqfPiOBPVBZq9gqjqBIKpymd4/JaYHEhAUJshVOpSQb4LzCcBc 9pLZbAZ+zJa+tcToowotvgC9nOIsM39eeZ3nL1oBFqprXBVL/jPzGJ4CvfbwCCZKIv+tnmRaZnpA hn+WEY7ubsp4zulN7n1kuFQE7hVyOh8fsRybhDoOVbjrktSU9cjUQ9qQiqMXsKd80X+MttTGIT/4 fCZSks4m/ndHOAk2kdvWLMqHhgnckl/YVwbIQFNGtpVmNzRDDFd0bMd2E1Wx6b17tOyENoelEtm9 nrPpKU0/ZSpWPya2sKSuSM0JlVSvwvIZ2psB6CiV6rLaDaUx5JwbpsuEiFEufxw+g8Jy1U/QsQdk yJAtgWb7H2GdXm4J4sPXGGfc+LcILYB2AGsTFtpbjkhNK/lJIQJPXoch21e7SMGbjndLi3NzZUP6 8F/KFuh2hfT1wYnVxaa7RZcGpSE+ri0YR636IKbQQGB6FJI6v0xCI2DaC7R+A0tpcB2M/n/aPF/2 3hR1zk1CslzU1TLciE60e+k/LnVDhwTf//RjoXvubXltyIuI3gSPZ+cV6dGDWJNp/hNpeUeEU= X-QQ-XMRINFO: OD9hHCdaPRBwq3WW+NvGbIU= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:11 +0800 X-OQ-MSGID: <20240911180618.28921-8-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 07/14] aarch64/h26x: Remove duplicate b.eq instruction X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: S6+tx1vJjw84 From: Zhao Zhili b.eq is added by calc_all after each calc. --- libavcodec/aarch64/h26x/qpel_neon.S | 1 - 1 file changed, 1 deletion(-) diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index 1fa5a1dd0e..417d43e365 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -754,7 +754,6 @@ function ff_hevc_put_hevc_qpel_v4_8_neon, export=1 calc_qpelb v24, \src0, \src1, \src2, \src3, \src4, \src5, \src6, \src7 st1 {v24.4h}, [x0], x9 subs w3, w3, #1 - b.eq 2f .endm 1: calc_all .purgem calc From patchwork Wed Sep 11 18:06:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51523 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp492796vqi; Wed, 11 Sep 2024 11:34:15 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUPKuQEd0dGdQenhZZcyetpwthxOr/PtlCzpTsNOwi1ZqdfKgifBTSVTpml6Zjub2SwnJs0WqUMJf6Ap5ILPwMw@gmail.com X-Google-Smtp-Source: AGHT+IEkjiCVQWaRa5jH2+qIb1/PHzGMwvaY19sAdiMGA9TeJ3q6s9OsNJZjFDp+vR1M+XFIud07 X-Received: by 2002:a05:6512:1248:b0:535:665f:cfc0 with SMTP id 2adb3069b0e04-53678fc22f2mr178945e87.32.1726079655244; Wed, 11 Sep 2024 11:34:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726079655; cv=none; d=google.com; s=arc-20240605; b=NFexcJWL/dje7tjBh2gWXkyOI8bzknxLkNPo4REtWii7JKX8ungUxQDQYo19/puWbE 6MnaB7JYPlx6+pag+BjgvThct9IZ1ny3OSMWruWsTxy5uaKN11/ge1g0vrom+XCyzR2s jVioa+ijqBClQKO1SoyGzTBfI8LMgC5CQwtkI9tI7V2wpaxdV0HJlILDT3pQK6ftwF1R M2Mth+L+yRopjzPIlJlEfelmdxwtnPC7rJNoeS98vgGf6bjmqG1RPbkw59RrnbDvoBU+ uWj629sf9R+Xe2+kscTghBT9biVrCo1g51vtAtuPotSN5+++z+0dIBd5Y0oA8LZWFgy2 CzLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=PsyGSAk9O64Bn/gZ7BK+GS/ouZV8tn6kf3NtdjlSOrM=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=QHgF/OqPFCBfFRJrR4dD6r92D0sN6cFwcenK29hJiy09YCnFW0ApMimUFMq/kQBy71 YCinzYnkdsNU4qy5SWSXoVOewmy0Yl0iGlfhuS/+EyBARInIAOoarAMPq0nP4v05617d t0mYC6FeFvOG1lBOIitP/G3srtvhF8yqd8Ky/UItKhiE5asRdggZimZ6lrXRu9yXPoYA FSB5hA4DQ5acIek1M3frmOD3j6CSW8qtnup3nMyUJsfSQVdM+BlinSPLtO3eTdPevC1M PcZz622ubG50/nfY3Ip2G6+fdn/vuWXNf/wXQdHMQg21V/FzCzYI2anW83wK8l7kiSP7 ACTg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=czZusRJB; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-5365f8ecaf2si3402766e87.329.2024.09.11.11.34.14; Wed, 11 Sep 2024 11:34:15 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=czZusRJB; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 43EBE68E2FF; Wed, 11 Sep 2024 21:06:55 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from xmbghk7.mail.qq.com (xmbghk7.mail.qq.com [43.163.128.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9ECA368E22A for ; Wed, 11 Sep 2024 21:06:33 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077984; bh=jpgZAf+PykZUR8tL5wom6EwZ1QkWgWq/N1a+rLqXjjA=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=czZusRJBmn2dCORWEo365LXlEG4GHa2Lqf65Q5IT9EgISomMU4ZTX26tARex3zKqr YleI4kBZHKDRYnycTPbV/mOjQFfK7jqkwKjiMUZqF7dNeoM3/Rvd9FBMSnVqqAL6kP bT/S7SkmUDQuAQ5crTgbZWbdgykfj2zi4l/jNCBM= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077983tpxu2juq0 Message-ID: X-QQ-XMAILINFO: NvH2zBBgt3uTtbO9za2YWT/e2Nh9NfYNSbM1xAZ71NXkb+jEnsyzO3rBbcEe0z N3sjWUQO/Gi0ASwkcvvKUmoZCvG6afXvVeXKHCbUAeQ9yBG8d8NIKTpgvOAbBX62zlnXM+HZpl8w i3iNigFOf9Ps16z1r9qbOPWF13fTJW+Kf9dKA1qmyXTUOpHQ32V8X4W7JA0QAoSv1sXmL9FjysAm V0bSzJy3RlW4bu8H8hfZ4lQUuHeN06rUAPmiDa5ZRFGZ0F4ADArEzJ06vX0w/RelppX7sXu67c0K 7CCrTYOYJUV5oed3Oqn1N1ZfjSTiJLkmzXf3hCIvGMiJgxhgAmtokf8+AXI9k9zEe1Uugggo5/O8 VM8GMLCf/b+b7Vaq21R/DicilaRQKNGSRrUSm9Dmr/0tBCHhzWa17DVKY0a7MGnXSQtHA6vN87/m Wkh7GdSm037x+MlLIAZ7/3tWE20jqCyq670FvU2p1ihgY7B53aaacndYfon7p5zvnGJB6ZZHvZ1v dwQdhnWtvzcOr6Mo/leFl7SjlPymzsoHe55Vo3fOnwnKZYiJrRIFTg/IG6HqO86mB80MsngdNVN4 kuBgneyq6YqSQhCFfs57qeA/rz7GHhGq0i42wLH8vo5G6UjsyhtWjaPLqLt4PLhw10hJ+3s3IHoA JG76hh16ORM04Y0cSxdNyNI+AMwC2B4iknZLew3SE1ybSsVdFGi6+nFy1MqqPoSTqYZgJSqgcSjm simcWAZvSlX/eZYugLci/uGdHw5fbkPw6c+IVkIXY+30ipi02CqcgdSYqDQXUVb47M3aI6nfP5DH DrFl6t2hbsD7qExXwCPCAVsDE9sPPunEGikCmDQvnTpHjQ27113WiX2JlALUJUQr6yQOOEmb/dmy oo24LxTS5AnvC24OWJ37Hz/EyJZY7kE5WcR3BXbvRqmSv/NSAOPPrI2NNIKKoficDQjG+gVg+eHh MSXJhbTDwCthhw96eQFbBuxcYr3izsXq+owvy4XPS5XYzF/fkw9Q== X-QQ-XMRINFO: NyFYKkN4Ny6FSmKK/uo/jdU= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:12 +0800 X-OQ-MSGID: <20240911180618.28921-9-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 08/14] aarch64/vvc: Add put_qpel_vx X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Y0ezGBcj6zuG From: Zhao Zhili put_luma_v_8_4x4_c: 1.0 ( 1.00x) put_luma_v_8_4x4_neon: 0.0 ( 0.00x) put_luma_v_8_8x8_c: 3.5 ( 1.00x) put_luma_v_8_8x8_neon: 0.5 ( 7.00x) put_luma_v_8_16x16_c: 13.8 ( 1.00x) put_luma_v_8_16x16_neon: 1.2 (11.00x) put_luma_v_8_32x32_c: 54.2 ( 1.00x) put_luma_v_8_32x32_neon: 5.0 (10.85x) put_luma_v_8_64x64_c: 217.5 ( 1.00x) put_luma_v_8_64x64_neon: 18.8 (11.60x) put_luma_v_8_128x128_c: 886.2 ( 1.00x) put_luma_v_8_128x128_neon: 74.0 (11.98x) --- libavcodec/aarch64/h26x/dsp.h | 8 +++ libavcodec/aarch64/h26x/qpel_neon.S | 100 ++++++++++++++++++++++++++++ libavcodec/aarch64/vvc/dsp_init.c | 7 ++ 3 files changed, 115 insertions(+) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 323a253257..881091f39a 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -274,4 +274,12 @@ NEON8_FNPROTO_PARTIAL_6(qpel_h, (int16_t * dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width), _i8mm); +void ff_vvc_put_qpel_v4_8_neon(int16_t *dst, const uint8_t *_src, + ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width); + +void ff_vvc_put_qpel_v8_8_neon(int16_t *dst, const uint8_t *_src, + ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width); + #endif diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index 417d43e365..a6a3b9549d 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -86,6 +86,11 @@ endconst sxtl v0.8h, v0.8b .endm +.macro vvc_load_qpel_filterh freg + ld1 {v0.8b}, [\freg] + sxtl v0.8h, v0.8b +.endm + .macro calc_qpelh dst, src0, src1, src2, src3, src4, src5, src6, src7, op, shift=6 smull \dst\().4s, \src0\().4h, v0.h[0] smlal \dst\().4s, \src1\().4h, v0.h[1] @@ -95,11 +100,15 @@ endconst smlal \dst\().4s, \src5\().4h, v0.h[5] smlal \dst\().4s, \src6\().4h, v0.h[6] smlal \dst\().4s, \src7\().4h, v0.h[7] +.ifc \op, sqxtn + sqxtn \dst\().4h, \dst\().4s +.else .ifc \op, sshr sshr \dst\().4s, \dst\().4s, \shift .else \op \dst\().4h, \dst\().4s, \shift .endif +.endif .endm .macro calc_qpelh2 dst, dstt, src0, src1, src2, src3, src4, src5, src6, src7, op, shift=6 @@ -111,11 +120,15 @@ endconst smlal2 \dstt\().4s, \src5\().8h, v0.h[5] smlal2 \dstt\().4s, \src6\().8h, v0.h[6] smlal2 \dstt\().4s, \src7\().8h, v0.h[7] +.ifc \op, sqxtn2 + sqxtn2 \dst\().8h, \dstt\().4s +.else .ifc \op, sshr sshr \dst\().4s, \dstt\().4s, \shift .else \op \dst\().8h, \dstt\().4s, \shift .endif +.endif .endm .macro calc_all @@ -1000,6 +1013,93 @@ function ff_hevc_put_hevc_qpel_v64_8_neon, export=1 ret endfunc +/* ff_hevc_put_hevc_qpel_vx require filter parameters be + * [-, +, -, +, +, -, +, -], + * vvc doesn't meet the requirement. + */ +function ff_vvc_put_qpel_v4_8_neon, export=1 + vvc_load_qpel_filterh x5 + sub x1, x1, x2, lsl #1 + mov x9, #(VVC_MAX_PB_SIZE * 2) + sub x1, x1, x2 + ldr s16, [x1] + ldr s17, [x1, x2] + add x1, x1, x2, lsl #1 + ldr s18, [x1] + ldr s19, [x1, x2] + uxtl v16.8h, v16.8b + uxtl v17.8h, v17.8b + add x1, x1, x2, lsl #1 + ldr s20, [x1] + ldr s21, [x1, x2] + uxtl v18.8h, v18.8b + uxtl v19.8h, v19.8b + add x1, x1, x2, lsl #1 + ldr s22, [x1] + add x1, x1, x2 + uxtl v20.8h, v20.8b + uxtl v21.8h, v21.8b + uxtl v22.8h, v22.8b +.macro calc tmp, src0, src1, src2, src3, src4, src5, src6, src7 + ld1 {\tmp\().s}[0], [x1], x2 + uxtl \tmp\().8h, \tmp\().8b + calc_qpelh v24, \src0, \src1, \src2, \src3, \src4, \src5, \src6, \src7, sqxtn + subs w3, w3, #1 + st1 {v24.4h}, [x0], x9 +.endm +1: + calc_all +.purgem calc +2: + ret +endfunc + +function ff_vvc_put_qpel_v8_8_neon, export=1 + vvc_load_qpel_filterh x5 + sub x1, x1, x2, lsl #1 + sub x1, x1, x2 + mov x9, #(VVC_MAX_PB_SIZE * 2) +0: + mov x8, x1 + ldr d16, [x8] + ldr d17, [x8, x2] + mov x10, x0 + mov w11, w3 + add x8, x8, x2, lsl #1 + ldr d18, [x8] + ldr d19, [x8, x2] + uxtl v16.8h, v16.8b + uxtl v17.8h, v17.8b + add x8, x8, x2, lsl #1 + ldr d20, [x8] + ldr d21, [x8, x2] + uxtl v18.8h, v18.8b + uxtl v19.8h, v19.8b + add x8, x8, x2, lsl #1 + ldr d22, [x8] + add x8, x8, x2 + uxtl v20.8h, v20.8b + uxtl v21.8h, v21.8b + uxtl v22.8h, v22.8b +.macro calc tmp, src0, src1, src2, src3, src4, src5, src6, src7 + ld1 {\tmp\().8b}, [x8], x2 + uxtl \tmp\().8h, \tmp\().8b + calc_qpelh v24, \src0, \src1, \src2, \src3, \src4, \src5, \src6, \src7, sqxtn + calc_qpelh2 v24, v25, \src0, \src1, \src2, \src3, \src4, \src5, \src6, \src7, sqxtn2 + subs w11, w11, #1 + st1 {v24.8h}, [x10], x9 +.endm +1: + calc_all +.purgem calc +2: + subs w6, w6, #8 + add x0, x0, #16 + add x1, x1, #8 + b.ne 0b + ret +endfunc + function ff_hevc_put_hevc_qpel_bi_v4_8_neon, export=1 load_qpel_filterb x7, x6 sub x2, x2, x3, lsl #1 diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index bcc7df8f6c..ba3a49aa1a 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -60,6 +60,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][5][0][1] = c->inter.put[0][6][0][1] = ff_vvc_put_qpel_h32_8_neon; + c->inter.put[0][1][1][0] = ff_vvc_put_qpel_v4_8_neon; + c->inter.put[0][2][1][0] = + c->inter.put[0][3][1][0] = + c->inter.put[0][4][1][0] = + c->inter.put[0][5][1][0] = + c->inter.put[0][6][1][0] = ff_vvc_put_qpel_v8_8_neon; + c->inter.put_uni[0][1][0][0] = ff_vvc_put_pel_uni_pixels4_8_neon; c->inter.put_uni[0][2][0][0] = ff_vvc_put_pel_uni_pixels8_8_neon; c->inter.put_uni[0][3][0][0] = ff_vvc_put_pel_uni_pixels16_8_neon; From patchwork Wed Sep 11 18:06:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51515 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp482780vqi; Wed, 11 Sep 2024 11:14:16 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCX9Pp9M6HfFl2pRJcxV6h5/nBG+HImyzlObWiFItSmfiy4eOoPD3Elx2KuBqhVSyWUQPWwvGoHxqP4QFlVZ/saE@gmail.com X-Google-Smtp-Source: AGHT+IF3KMjdr3mVooTadkWJD5MUW0uHn7gXczuw8Ai+Y0qf5P3HzgXXbR87PLjvY2uNfliEv/fn X-Received: by 2002:a17:907:1b05:b0:a72:750d:ab08 with SMTP id a640c23a62f3a-a90294f3b59mr32928066b.14.1726078456605; Wed, 11 Sep 2024 11:14:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726078456; cv=none; d=google.com; s=arc-20240605; b=XA67AAo3uht2iwSWeJqwliTu1LOBuey65lA2e/qqQqa7BVzt0FMOcGY1X5pYwjP0XX zVuMeZujor+gCfpRcxE75sqyurh6fnBLMvSFBkb+WTowRXSje3Df0QzQiY5jVMGTtqm8 QbkoU1w4jKJ43oROYcZf0Xozv73j585MMrSxp9Jf4E/i6fNItRUt0nTiSaoyTwTuoZYk CDVCPndZPOkJPEoV/hqmdtQjl6PHS8pVuPElYCZYG6qiiCshFD4WbrBxU3Q5k3CESAVp R0PI2GB3O8O8KHqL09Jh9qR6Wb3GWhHQ2ppQeEs2Q9M6ddc2ekBhytXDmMTCYtRy2q50 AGsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=HWZmJaTNDRuYC5LCHuOz1e9TqHXcStGw2gYQX6DL+3k=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=cq9mIQZOv+NvXpDwI5fY4NlfI4uot05gZdYUpNdB+fBorW+Q9Qfq2bEvnltNlKQpRg 5tD4+qlrABlbFUA+gARcjXIvhWyaPlxiUEUQPXCRYtdIXJBoxZot86dyl5m6m6irWLw0 5fLMoSbW727TMjaKf+6NUIyKn9dfHSOJk33D7DLJxMBFkyD9+nQvGjFHTQfX37nO1PX8 wa9M/1RTluQpgW1l4p+rWPC0fXIm1KoMoqgP2tfdq68m4RNliFgBA/xs3EqQh4emtkkS EUUeDCNVAJzL+rYYWjMiC9HjWVxqmZpLppNIMloOtmli0ochR9V5biqmUi+gBnKptlrL AWcA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=frE8VZsL; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a8d25d65367si730550966b.813.2024.09.11.11.14.16; Wed, 11 Sep 2024 11:14:16 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=frE8VZsL; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6647168E2DD; Wed, 11 Sep 2024 21:06:49 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-149.mail.qq.com (out203-205-221-149.mail.qq.com [203.205.221.149]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C817D68E274 for ; Wed, 11 Sep 2024 21:06:33 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077985; bh=vhduru0a93vJSYI77SuXcWDrEbnyDdFlfukg8TcZjpM=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=frE8VZsLGe8bFg/Gv8kV+UcAZTd7ru/ZW3nvveqh09bcKONk5TzBYQBtFtrcnZdsH 8SEXfC+Y/5LW9cF85FS4p/sbSaZFJWpx4ejPaLZJVasY9Nt1h13GpGgtKOkxiXPwZ7 3N42Y8jzaOiJibr5AflaLmGdHOQ2tdnu6Y3TgILA= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077984trjxrxmce Message-ID: X-QQ-XMAILINFO: NZ3yMkVY2d1s4TwKLiWCuBRHQ1dtnzm3yBZjMpqPAwPQfIHck1oD9jcks0055c bTyU2h71kuFMPv3im2G+OFXHgrazd2xWY2HSCQBCyJLX9Nvf2vshFtdN0Z9r7GuFtv4pBCgI1c3l 3aS4XyUfrnB6MNhIXeBODA+C4z7XnUkXR3PeYcPng7bQ5uFiLURsWFg6pSpEplTMiicTr5q3WSkr +ZlcRYw2X5qcQRaa4j07tc45WjQrnauvEDsdx3PRZC0lw7SFWfT5b7Cld0P3B59vE3IOZuCIeFy/ KRcDi2GLLoDFfsLpvZBmkxDsOP0bxLlyu9jmEp8x/4TJ0ql1S/ghtbyqzmenA69uqzCt4DS2FSHX +ZJYwvZXmmgUULD+5sihaIZcZKQorB1wDv9fHuyxy7u8XlqaHyklr0I8voOg3gZBzYWnQDkiM8zo dZdEhnjbImQVayMOB3qU2Ap1/BVV0y11BTfmqPxm1pQj5QAYFNgnbzqEorFg7qYTQl4MPYxrFPje xe2D1WAt0HhfchJLAEJ4IQbpsny5J7kJiz3i+lqPrW9WdXVPWIEnmxKS+pKtOyQt4/oatzJA1/Ci D9XDhStkndcv3g/t5DhLzmL1hFLMHdytdYTFsydnxxg1QsoZrw0EaOvlT4Y5x+kAzx6GBp6QZjIX 2PPdB/LFBtjh6set2H7h1ZRRnt53ef9wWzzS2scoHcV/BVos4NazfqK0vwEhpL9Cw8lHzQNg4xIA M+DPG7yri9aE7azFx5QYQBdna6BAWBa107MywjvWO9iXDINBndiu3CYI2g79oIPznEsMPwQjc1/Q KmHch09YoM4IZKNJMkjRy+sHFbEMyQZv970+ilQuBcjRCUJY+t/Dp2blDFTFe2/iwpatKjcBQyIf JH+SMkOpfw1TJpp3S7Gs2vl1SzhwcWy8oxJ35vXHJ91PU+H14VtJe5+w6W0oaie+W/MJ6cIJLA9x T4blRUQWDlINy7tSYGsJl25a7IfU9Yi/cFFVn+qXEAYIYbUERYoZOw44YVueWW3EPwozADz+E= X-QQ-XMRINFO: MPJ6Tf5t3I/ycC2BItcBVIA= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:13 +0800 X-OQ-MSGID: <20240911180618.28921-10-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 09/14] aarch64/vvc: Add put_qpel_hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ug4Yc8qaZLBc From: Zhao Zhili With Apple M1 (no i8mm): put_luma_hv_8_4x4_c: 2.2 ( 1.00x) put_luma_hv_8_4x4_neon: 0.8 ( 3.00x) put_luma_hv_8_8x8_c: 7.0 ( 1.00x) put_luma_hv_8_8x8_neon: 0.8 ( 9.33x) put_luma_hv_8_16x16_c: 22.8 ( 1.00x) put_luma_hv_8_16x16_neon: 2.5 ( 9.10x) put_luma_hv_8_32x32_c: 84.8 ( 1.00x) put_luma_hv_8_32x32_neon: 9.5 ( 8.92x) put_luma_hv_8_64x64_c: 333.0 ( 1.00x) put_luma_hv_8_64x64_neon: 35.5 ( 9.38x) put_luma_hv_8_128x128_c: 1294.5 ( 1.00x) put_luma_hv_8_128x128_neon: 137.8 ( 9.40x) With Pixel 8 Pro: put_luma_hv_8_4x4_c: 5.0 ( 1.00x) put_luma_hv_8_4x4_neon: 0.8 ( 6.67x) put_luma_hv_8_4x4_i8mm: 0.2 (20.00x) put_luma_hv_8_8x8_c: 13.2 ( 1.00x) put_luma_hv_8_8x8_neon: 1.2 (10.60x) put_luma_hv_8_8x8_i8mm: 1.2 (10.60x) put_luma_hv_8_16x16_c: 44.2 ( 1.00x) put_luma_hv_8_16x16_neon: 4.5 ( 9.83x) put_luma_hv_8_16x16_i8mm: 4.2 (10.41x) put_luma_hv_8_32x32_c: 160.8 ( 1.00x) put_luma_hv_8_32x32_neon: 17.5 ( 9.19x) put_luma_hv_8_32x32_i8mm: 16.0 (10.05x) put_luma_hv_8_64x64_c: 611.2 ( 1.00x) put_luma_hv_8_64x64_neon: 68.0 ( 8.99x) put_luma_hv_8_64x64_i8mm: 62.2 ( 9.82x) put_luma_hv_8_128x128_c: 2384.8 ( 1.00x) put_luma_hv_8_128x128_neon: 268.8 ( 8.87x) put_luma_hv_8_128x128_i8mm: 245.8 ( 9.70x) --- libavcodec/aarch64/h26x/dsp.h | 8 ++ libavcodec/aarch64/h26x/qpel_neon.S | 140 ++++++++++++++++++++++++++++ libavcodec/aarch64/vvc/dsp_init.c | 14 +++ 3 files changed, 162 insertions(+) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 881091f39a..c54906dde2 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -282,4 +282,12 @@ void ff_vvc_put_qpel_v8_8_neon(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width); +NEON8_FNPROTO_PARTIAL_6(qpel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, int height, + const int8_t *hf, const int8_t *vf, int width),); + +NEON8_FNPROTO_PARTIAL_6(qpel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, int height, + const int8_t *hf, const int8_t *vf, int width), _i8mm); + #endif diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index a6a3b9549d..5c3f0263b6 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -4140,9 +4140,15 @@ endfunc DISABLE_I8MM #endif +function vvc_put_qpel_hv4_8_end_neon + vvc_load_qpel_filterh x5 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc function hevc_put_hevc_qpel_hv4_8_end_neon load_qpel_filterh x5, x4 +1: ldr d16, [sp] ldr d17, [sp, x7] add sp, sp, x7, lsl #1 @@ -4194,9 +4200,16 @@ function hevc_put_hevc_qpel_hv6_8_end_neon ret endfunc +function vvc_put_qpel_hv8_8_end_neon + vvc_load_qpel_filterh x5 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc + function hevc_put_hevc_qpel_hv8_8_end_neon mov x7, #128 load_qpel_filterh x5, x4 +1: ldr q16, [sp] ldr q17, [sp, x7] add sp, sp, x7, lsl #1 @@ -4247,9 +4260,16 @@ function hevc_put_hevc_qpel_hv12_8_end_neon ret endfunc +function vvc_put_qpel_hv16_8_end_neon + vvc_load_qpel_filterh x5 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc + function hevc_put_hevc_qpel_hv16_8_end_neon mov x7, #128 load_qpel_filterh x5, x4 +1: ld1 {v16.8h, v17.8h}, [sp], x7 ld1 {v18.8h, v19.8h}, [sp], x7 ld1 {v20.8h, v21.8h}, [sp], x7 @@ -4272,6 +4292,12 @@ function hevc_put_hevc_qpel_hv16_8_end_neon ret endfunc +function vvc_put_qpel_hv32_8_end_neon + vvc_load_qpel_filterh x5 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 0f +endfunc + function hevc_put_hevc_qpel_hv32_8_end_neon mov x7, #128 load_qpel_filterh x5, x4 @@ -4325,6 +4351,25 @@ function ff_hevc_put_hevc_qpel_hv4_8_\suffix, export=1 b hevc_put_hevc_qpel_hv4_8_end_neon endfunc +function ff_vvc_put_qpel_hv4_8_\suffix, export=1 + add w10, w3, #8 + lsl x10, x10, #8 + mov x14, sp + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x0, sp, #48 + sub x1, x1, x2, lsl #1 + add x3, x3, #7 + sub x1, x1, x2 + bl X(ff_vvc_put_qpel_h4_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 + b vvc_put_qpel_hv4_8_end_neon +endfunc + function ff_hevc_put_hevc_qpel_hv6_8_\suffix, export=1 add w10, w3, #8 mov x7, #128 @@ -4364,6 +4409,25 @@ function ff_hevc_put_hevc_qpel_hv8_8_\suffix, export=1 b hevc_put_hevc_qpel_hv8_8_end_neon endfunc +function ff_vvc_put_qpel_hv8_8_\suffix, export=1 + add w10, w3, #8 + lsl x10, x10, #8 + sub x1, x1, x2, lsl #1 + mov x14, sp + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x0, sp, #48 + add x3, x3, #7 + sub x1, x1, x2 + bl X(ff_vvc_put_qpel_h8_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 + b vvc_put_qpel_hv8_8_end_neon +endfunc + function ff_hevc_put_hevc_qpel_hv12_8_\suffix, export=1 add w10, w3, #8 lsl x10, x10, #7 @@ -4403,6 +4467,25 @@ function ff_hevc_put_hevc_qpel_hv16_8_\suffix, export=1 b hevc_put_hevc_qpel_hv16_8_end_neon endfunc +function ff_vvc_put_qpel_hv16_8_\suffix, export=1 + add w10, w3, #8 + lsl x10, x10, #8 + sub x1, x1, x2, lsl #1 + mov x14, sp + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x3, x3, #7 + add x0, sp, #48 + sub x1, x1, x2 + bl X(ff_vvc_put_qpel_h16_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 + b vvc_put_qpel_hv16_8_end_neon +endfunc + function ff_hevc_put_hevc_qpel_hv24_8_\suffix, export=1 stp x4, x5, [sp, #-64]! stp x2, x3, [sp, #16] @@ -4439,6 +4522,26 @@ function ff_hevc_put_hevc_qpel_hv32_8_\suffix, export=1 b hevc_put_hevc_qpel_hv32_8_end_neon endfunc +function ff_vvc_put_qpel_hv32_8_\suffix, export=1 + add w10, w3, #8 + sub x1, x1, x2, lsl #1 + lsl x10, x10, #8 + sub x1, x1, x2 + mov x14, sp + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x3, x3, #7 + add x0, sp, #48 + mov w6, #32 + bl X(ff_vvc_put_qpel_h32_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 + b vvc_put_qpel_hv32_8_end_neon +endfunc + function ff_hevc_put_hevc_qpel_hv48_8_\suffix, export=1 stp x4, x5, [sp, #-64]! stp x2, x3, [sp, #16] @@ -4472,6 +4575,43 @@ function ff_hevc_put_hevc_qpel_hv64_8_\suffix, export=1 ldr x30, [sp], #16 ret endfunc + +function ff_vvc_put_qpel_hv64_8_\suffix, export=1 + stp x4, x5, [sp, #-64]! + stp x2, x3, [sp, #16] + stp x0, x1, [sp, #32] + str x30, [sp, #48] + mov x6, #32 + bl X(ff_vvc_put_qpel_hv32_8_\suffix) + ldp x0, x1, [sp, #32] + ldp x2, x3, [sp, #16] + ldp x4, x5, [sp], #48 + add x1, x1, #32 + add x0, x0, #64 + mov x6, #32 + bl X(ff_vvc_put_qpel_hv32_8_\suffix) + ldr x30, [sp], #16 + ret +endfunc + +function ff_vvc_put_qpel_hv128_8_\suffix, export=1 + stp x4, x5, [sp, #-64]! + stp x2, x3, [sp, #16] + stp x0, x1, [sp, #32] + str x30, [sp, #48] + mov x6, #64 + bl X(ff_vvc_put_qpel_hv64_8_\suffix) + ldp x0, x1, [sp, #32] + ldp x2, x3, [sp, #16] + ldp x4, x5, [sp], #48 + add x1, x1, #64 + add x0, x0, #128 + mov x6, #64 + bl X(ff_vvc_put_qpel_hv64_8_\suffix) + ldr x30, [sp], #16 + ret +endfunc + .endm qpel_hv neon diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index ba3a49aa1a..934d918ffd 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -67,6 +67,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][5][1][0] = c->inter.put[0][6][1][0] = ff_vvc_put_qpel_v8_8_neon; + c->inter.put[0][1][1][1] = ff_vvc_put_qpel_hv4_8_neon; + c->inter.put[0][2][1][1] = ff_vvc_put_qpel_hv8_8_neon; + c->inter.put[0][3][1][1] = ff_vvc_put_qpel_hv16_8_neon; + c->inter.put[0][4][1][1] = ff_vvc_put_qpel_hv32_8_neon; + c->inter.put[0][5][1][1] = ff_vvc_put_qpel_hv64_8_neon; + c->inter.put[0][6][1][1] = ff_vvc_put_qpel_hv128_8_neon; + c->inter.put_uni[0][1][0][0] = ff_vvc_put_pel_uni_pixels4_8_neon; c->inter.put_uni[0][2][0][0] = ff_vvc_put_pel_uni_pixels8_8_neon; c->inter.put_uni[0][3][0][0] = ff_vvc_put_pel_uni_pixels16_8_neon; @@ -103,6 +110,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][4][0][1] = ff_vvc_put_qpel_h32_8_neon_i8mm; c->inter.put[0][5][0][1] = ff_vvc_put_qpel_h64_8_neon_i8mm; c->inter.put[0][6][0][1] = ff_vvc_put_qpel_h128_8_neon_i8mm; + + c->inter.put[0][1][1][1] = ff_vvc_put_qpel_hv4_8_neon_i8mm; + c->inter.put[0][2][1][1] = ff_vvc_put_qpel_hv8_8_neon_i8mm; + c->inter.put[0][3][1][1] = ff_vvc_put_qpel_hv16_8_neon_i8mm; + c->inter.put[0][4][1][1] = ff_vvc_put_qpel_hv32_8_neon_i8mm; + c->inter.put[0][5][1][1] = ff_vvc_put_qpel_hv64_8_neon_i8mm; + c->inter.put[0][6][1][1] = ff_vvc_put_qpel_hv128_8_neon_i8mm; } } else if (bd == 10) { c->alf.filter[LUMA] = alf_filter_luma_10_neon; From patchwork Wed Sep 11 18:06:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51516 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp482808vqi; Wed, 11 Sep 2024 11:14:19 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVsepr1uLFy2tsOE1vqzym4IJiSf36io3Q8By8z9h5W1LfVOpIjkYtCkm5H2WCQ8+BUICU6nJbXQ9BArPgy2haD@gmail.com X-Google-Smtp-Source: AGHT+IHDPJweGIU3amqXxNhGj+UPGWfjjx492MdkNwnCmaaSGDpkEC2yl7CLKaRnx6PUv4MikN/C X-Received: by 2002:a2e:a54e:0:b0:2f7:544e:5cca with SMTP id 38308e7fff4ca-2f787edb9b0mr857121fa.22.1726078459317; Wed, 11 Sep 2024 11:14:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726078459; cv=none; d=google.com; s=arc-20240605; b=KjXng/gQy76EQoFOmmdNxHA1hVYCFOhc8PjZgR/dizxPTvlRcUC4itzPDA3BrAENZG UIMBQNdLVBq1V8Kk795EPXcyIydq6J4b2CN0mjQ8szi3JEF7a8uTMBQoeXcBsc5W6dhd JAECQaW+o3vy+hPqGnQ53S/OGtzxcr5Jf6z2YWEThcsbSjU2N/kEXurhLzFNh3eQE0jW A3RvqrINa5YUDsL717MmQ4EXyZRoTe32jnw0l7G5wCzN2xZY4IxDzQ+lL/XyR9CImsuf 8HBZmD7UnMU5QnocikgYMHoYOiibTKlapExkrsWdA5tIaJT3kRcVcyxEnkkMJgvDUpLz jsAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=zQLiZI0RiI/1GvaEZc397WqfAio4LeO5LP5r0wS7WAU=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=fp7T5wNMBzZQH1e93cc5ofj7hSCobc1h/lcIFxF/j425+dvym8mI6hlRKei9MHlhxb Vc+44+/7m3cncHzYPMlp5pwfJmd68aWvHd1fFhUcf0aeD9G+U4aW/10yCz/Clbpvoowf FtdIFaWHSLSHw9nif9HktuOeV0WbB2/xMHzCgOjqiqQzLV5FBzq2NtxG6jW6epwM+KNm MAKVD+g7L82dWZ+4tlwlCZmzfHNhX1a5RO3mQVATOisdDh3uNn4TZRsOrgZlSZGvMSlX Zw/6UY7tTjX4TCW6O4VsVEAiwS6L3HW66SrL6GnDOeUnf82qhwxVlkxLaq7Ez20yoEb8 9w2Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=dXHWt++I; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f75bfea222si30945781fa.165.2024.09.11.11.14.18; Wed, 11 Sep 2024 11:14:19 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=dXHWt++I; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0F04068E2D5; Wed, 11 Sep 2024 21:06:48 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-153.mail.qq.com (out203-205-221-153.mail.qq.com [203.205.221.153]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9376168E108 for ; Wed, 11 Sep 2024 21:06:33 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077985; bh=lJlqJMU6On0Mwk0jQDx2toc4IjR+b/7Tdeg/jfurVWA=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=dXHWt++IWJSgBto4J8vukxsg7pMq9wJ3PRgSUirtQ3KZ1kC5bXRR4110ocjmZyOpE 5B7BtGgis/1lZL764snficdH1Rz8ZaTbl1amHXpSFiEGCBtKU8tcV9FYCaen2GqA/k SC8SYx1NvVgL0H/3UTwQw1IpS+Nf4HXN/iVQQCaw= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077984t4ggqkoyi Message-ID: X-QQ-XMAILINFO: M/oQ2p0eBD4uJtk/rLbGz9q6k6stFD5vVuRT9fkoZ7/TotbqNcrZQA/VSpHVRb YBc+MuNC1RqLoOGHm+S2PuI0n9+y6Q3QGZrUjqCq725O2PlpADT7PR98NgixOttlRVGyZF978y+X tFRx9FDBmob40CeuOdcoVsZxWpD7lv5MS090O04yR24u66p3Cm4OeAQ97g5OaNiHWS9o2+iFDcC8 n1JLcZyk84XYI0nUE4iSei9tOhLlBCOBt4fxfvR2Ttn+F6s/g6QkR6bqaTFH/Mdg+s4iY1ovaOl5 yjLxcxxIj6/VK/DMcss/RMtY3Kv8c7V08vuc0xdoBrDyIXX0CRxV0FIPRfyFFXm7uDYuD+kB266s lWNmYD29Bvf5HXvfCazRkZSa4khjpryylwA6X//DqiPirtmzc7ZVOP0c3lYppqTQx3UYpI1KkvKY 0bxGUsYnKAXsK5yq/Nr7X1T8EckXgFci/El/TcXtAmuBIgKEmA+Cb19KnAYHiOETk6l2fXiEh1vQ PjYj14d6Do09R3dablVDZqeIEn6Wj3GSJcRF7iDqCoxDvk34QWOpLQle8rN2DFLnoFlVX1HCirkR xaLjSCVeTjrFD7SLgvdvlRT2rWeDJOjwhDWVj6xd/W6V1drNBNcevroEyIuOuf64uhWbAx2aWWX+ +KBiepIo9yyh8ok+UPXy+wnxBomJF5YOQz+dbA7p5vghrTUX0k7mrkZExaEoJl/GBimd+DX21NMv Hzj0F1SLq0X7y1e9wpUuorxhgEa5M6SxOMVzwjbGe/MleRQmPOgZ615oiVDHGRCSSZhrsnbbfWdh StYPxVdnyvxMWSW8ygjGr+qvMbrm83YhQLIK4YA/PxleN2iXXeMG6LIB0K1uiX9qoUZAVqhTRrLx 0swpHrVRWKCaE5ri7yfmTvDuXNJC5rkoMcGABwiRXQmVdtmShD9PqsgV7FBfHkXPU9LZVS6127lC VmThqMoTCSvA7fivwiMwnR9tp3C4vu6ztD4t1DdCIY/GHRyRlKRcNty/uyozOym4wVgyRz8YTr6a fRmJsE+gZ9vZcWwvIk X-QQ-XMRINFO: M/715EihBoGSf6IYSX1iLFg= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:14 +0800 X-OQ-MSGID: <20240911180618.28921-11-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 10/14] aarch64/vvc: Add sad X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: j7WMsiV7gRUj From: Zhao Zhili sad_8x16_c: 0.8 ( 1.00x) sad_8x16_neon: 0.2 ( 3.00x) sad_16x8_c: 0.5 ( 1.00x) sad_16x8_neon: 0.2 ( 2.00x) sad_16x16_c: 1.5 ( 1.00x) sad_16x16_neon: 0.2 ( 6.00x) --- libavcodec/aarch64/vvc/Makefile | 1 + libavcodec/aarch64/vvc/dsp_init.c | 5 +++ libavcodec/aarch64/vvc/sad.S | 75 +++++++++++++++++++++++++++++++ 3 files changed, 81 insertions(+) create mode 100644 libavcodec/aarch64/vvc/sad.S diff --git a/libavcodec/aarch64/vvc/Makefile b/libavcodec/aarch64/vvc/Makefile index a1c1f03e27..7ba13a2165 100644 --- a/libavcodec/aarch64/vvc/Makefile +++ b/libavcodec/aarch64/vvc/Makefile @@ -3,6 +3,7 @@ clean:: OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/dsp_init.o NEON-OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/alf.o \ + aarch64/vvc/sad.o \ aarch64/h26x/epel_neon.o \ aarch64/h26x/qpel_neon.o \ aarch64/h26x/sao_neon.o diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 934d918ffd..714d642634 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -39,6 +39,9 @@ #include "alf_template.c" #undef BIT_DEPTH +int ff_vvc_sad_neon(const int16_t *src0, const int16_t *src1, int dx, int dy, + const int block_w, const int block_h); + void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) { int cpu_flags = av_get_cpu_flags(); @@ -125,4 +128,6 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->alf.filter[LUMA] = alf_filter_luma_12_neon; c->alf.filter[CHROMA] = alf_filter_chroma_12_neon; } + + c->inter.sad = ff_vvc_sad_neon; } diff --git a/libavcodec/aarch64/vvc/sad.S b/libavcodec/aarch64/vvc/sad.S new file mode 100644 index 0000000000..beca876faf --- /dev/null +++ b/libavcodec/aarch64/vvc/sad.S @@ -0,0 +1,75 @@ +/* + * Copyright (c) 2024 Zhao Zhili + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +#define VVC_MAX_PB_SIZE 128 + +function ff_vvc_sad_neon, export=1 + src0 .req x0 + src1 .req x1 + dx .req w2 + dy .req w3 + block_w .req w4 + block_h .req w5 + + sub w7, dx, #4 + sub w8, dy, #4 + add w6, dx, dy, lsl #7 + add w7, w7, w8, lsl #7 + sxtw x6, w6 + sxtw x7, w7 + add src0, src0, x6, lsl #1 + sub src1, src1, x7, lsl #1 + + cmp block_w, #16 + movi v16.4s, #0 + b.ge 2f +1: + // block_w == 8 + ldr q0, [src0] + ldr q2, [src1] + subs block_h, block_h, #2 + sabal v16.4s, v0.4h, v2.4h + sabal2 v16.4s, v0.8h, v2.8h + + add src0, src0, #(2 * VVC_MAX_PB_SIZE * 2) + add src1, src1, #(2 * VVC_MAX_PB_SIZE * 2) + b.ne 1b + b 4f +2: + // block_w == 16, no block_w > 16 according the spec + movi v17.4s, #0 +3: + ldp q0, q1, [src0], #(2 * VVC_MAX_PB_SIZE * 2) + ldp q2, q3, [src1], #(2 * VVC_MAX_PB_SIZE * 2) + subs block_h, block_h, #2 + sabal v16.4s, v0.4h, v2.4h + sabal2 v16.4s, v0.8h, v2.8h + sabal v17.4s, v1.4h, v3.4h + sabal2 v17.4s, v1.8h, v3.8h + + b.ne 3b + add v16.4s, v16.4s, v17.4s +4: + addv s16, v16.4s + mov w0, v16.s[0] + ret +endfunc From patchwork Wed Sep 11 18:06:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51521 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp487702vqi; Wed, 11 Sep 2024 11:24:17 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVXLG8vQND32V+eTHBAu8SagjuUiC1gvDJhI3s2wBSSdNmicvtSwuzmkWjCUX+hv/OjhLqCOpB8Z1KmoA4L6Qx8@gmail.com X-Google-Smtp-Source: AGHT+IH/mKJbbj93eTfJbwkOPEWWMhfQOMUDPWBI/XBnS/4t10JbRI3CZupI2QN4rQzmODzOIJ86 X-Received: by 2002:a05:6402:13c5:b0:5ba:8ad7:4859 with SMTP id 4fb4d7f45d1cf-5c413e126f3mr394128a12.14.1726079056986; Wed, 11 Sep 2024 11:24:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726079056; cv=none; d=google.com; s=arc-20240605; b=LcykWSqWhpygV4QqACbI6cqZUxDzlvoUay7tVky+Gl+/kmDChYy+zBrIpdXlT9RfEN gcIUAqy0AEv3IUuGZH4VdQoqJeQ1wOGhum+R9q2mhWkX2jlldaHe935CROhQ2DYHFbEK UsBwgJsC0dbC7s2D3seSyX4XG8+YSHNT2hCoa/kcOxCxjIKG3YMEFE82A2089Ghbjw93 OGcjYk1vhiYpC/+pAsBVRahoRM5hAVFSzHigZYI+zt7Jx9R/7DV158HXJKl4eLKB6kLG 9q9GuuCCQAW4ApbsY3vE04ek8m0y/RJVfCLm7qZJJvAIFrpwyWVOyREHTdOgXGRs0LxG CUCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=vXjbDClyfa8PTsVmE2aiAgKEGmfM2P66jXA5Pjd7YmQ=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=ZmbOeKL3IrexdzyeK4A/Ks2LVvFuTdiZDw0tCypIc1Ot5USP55mjK1WOJY5eW0Z2EA QD116lrPGtQKAu2ri/JjaAP0+B9hfWaKNWrFiadwJmeybnAcIdPC9uudL86cmlgKuIOE AzBIUEfA2MZ2D3h3HD0jU9ijtpCp2AceW/AcwcuhJAc8kY3Fg494+ZfM3eNsalQskPY2 RzflRMxi0doxZileWfEgaJDYXXwLqSKPcRTdvhetSyL5w+hW1Bui6yCygcPJGeBxjf/x li6I8K1Ccbp595iK4BBmygHwHtoNIEeVdOs2PNIFCnqVV3x6gwde0tEClqdArRYLQLpc 934Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=gLsmgiop; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5c3ebdb6d84si7149205a12.653.2024.09.11.11.24.15; Wed, 11 Sep 2024 11:24:16 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=gLsmgiop; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CA2F268E2E4; Wed, 11 Sep 2024 21:06:50 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-209.mail.qq.com (out203-205-221-209.mail.qq.com [203.205.221.209]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DDECB68E291 for ; Wed, 11 Sep 2024 21:06:33 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077985; bh=Iz4vFYtzUwuZe+6xD1fAbeBc1ZCJ1Kq1X0/lsaic1uE=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=gLsmgiop956cxSbKPMirMSbuswF6Wg8TyogQkWEGKnp1Cf4eeDY7v4Rx7rt+p4xTf x57G5a0PuZRQcYHr3nB/j1WtOmHfm1srczltJa/h8VMDtHe0w0M24+ihgdrSJeCoeM TtejIaOBBIIU+OeGDgfs/I3NmlsqlvqxWXGPS2GY= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077985te5i0svbk Message-ID: X-QQ-XMAILINFO: Mdc3TkmnJyI/uGxTrG9ziXCe60I9IUJTuVc/rTWdHJMlofcVP1U2DsvtG2rp6E MTRC8/fhnAMvflbm0m/mhCJfxuigSH89Rj1djKhu/o46D1gMuhYbTA4NgW8zLK2iW2WUkmEvqx3H pWh8dP2qjb5M5RHoCZaAgDNyPcloDsmOuuYKnd4eDjqT9fnRW8RVWR2oGSTYWdRi8cS4VZvsJfjh DkoO3zyQcj5BiHOBpYudywWWl43HpaFk+cMKnacm9xd7kvqUxdEqd+CHpIddEFPVq5Y/5umZOmX6 WJg9aJN7YDmElEnyhYh1hqvKWGZgHEl2ao2bSsHTQ/tbmPR0K4EA60wiWNb3oFLsYBmwl/IR8E4D kYLg7hkS4J7ObNhPjtptY4nM9wY6eOvd6O2jOhf7RyBvRlzckek7ZHmJS5PiWVtvzNiU1JDE4WKI 4Ip65hZ4l09O1zIJEadCFsbOefLYYgNsi0lDNyksR1vMsLyE9XYgLlAELwLFjzEonL/cgUBJSZ7y CvgGNCQcetopk82dNuiB8v5vYjWNviz5zpQyGpNEihdwal+c8ENkaxmu/kcWc99pFOwWqSEwC3w6 5u/YXmy0sgo/iXIJ3YCtBDvf+Ep3pN1XnR2faFAE+tRmqKyFByVSu5QGLJoNQUrz/8VfaiEfi4NY ig8jO3Yy6BkCKqCyD4NYo2expGdDuP847dDqJU21CqVFwb8uSHfVvZvmDPSWhpaauQr0H7GpATLi jnRdhMXg/lg3+HCktPS9ozEhpYBJvHJsDkvkbh83iyFtEYe0CxN8vcxYLQ2fF2hhXAa6oSR7MeYv fwoWlTkFbATN2dUN6oMxmrvvmAgJZN8GYVOFCcykiwJlkaPlMNV2NlDQmVjJEbxz5uXjZPFNnFc0 CzOLOQ17Sy4VvD604c2rGBjynqT3iqqCC4hZOFVAT8HtU5vBEvQJdoiu9G6xlnKaBKNkHW47lx/D Dsi30uN+Mo4IoGrEB2EHhtRcp+6fq583m3yp+rkFhpGSE7SE3+qD6JjOyAxGH/yPMOXjNm0XE= X-QQ-XMRINFO: MPJ6Tf5t3I/ycC2BItcBVIA= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:15 +0800 X-OQ-MSGID: <20240911180618.28921-12-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 11/14] aarch64/vvc: Add put_epel_h X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: l8gBXkKg4Xg/ From: Zhao Zhili put_chroma_h_8_4x4_c: 0.2 ( 1.00x) put_chroma_h_8_4x4_neon: 0.2 ( 1.00x) put_chroma_h_8_8x8_c: 0.8 ( 1.00x) put_chroma_h_8_8x8_neon: 0.2 ( 3.00x) put_chroma_h_8_16x16_c: 3.8 ( 1.00x) put_chroma_h_8_16x16_neon: 0.8 ( 5.00x) put_chroma_h_8_32x32_c: 12.5 ( 1.00x) put_chroma_h_8_32x32_neon: 2.2 ( 5.56x) put_chroma_h_8_64x64_c: 47.0 ( 1.00x) put_chroma_h_8_64x64_neon: 8.8 ( 5.37x) put_chroma_h_8_128x128_c: 200.2 ( 1.00x) put_chroma_h_8_128x128_neon: 31.8 ( 6.31x) --- libavcodec/aarch64/h26x/dsp.h | 3 +++ libavcodec/aarch64/h26x/epel_neon.S | 30 +++++++++++++++++++++++++++++ libavcodec/aarch64/vvc/dsp_init.c | 7 +++++++ 3 files changed, 40 insertions(+) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index c54906dde2..6978b900fe 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -248,6 +248,9 @@ NEON8_FNPROTO_PARTIAL_4(qpel, (int16_t *dst, const uint8_t *_src, ptrdiff_t _src NEON8_FNPROTO_PARTIAL_4(qpel_uni, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width),) +NEON8_FNPROTO_PARTIAL_4(epel, (int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width),) + #undef NEON8_FNPROTO_PARTIAL_6 #define NEON8_FNPROTO_PARTIAL_6(fn, args, ext) \ void ff_vvc_put_##fn##4_8_neon##ext args; \ diff --git a/libavcodec/aarch64/h26x/epel_neon.S b/libavcodec/aarch64/h26x/epel_neon.S index 8ca42a5c3a..80a0b66a52 100644 --- a/libavcodec/aarch64/h26x/epel_neon.S +++ b/libavcodec/aarch64/h26x/epel_neon.S @@ -1375,6 +1375,18 @@ endfunc mov x10, #(HEVC_MAX_PB_SIZE * 2) .endm +.macro VVC_EPEL_H_HEADER + ld1r {v30.4s}, [x4] + sub x1, x1, #1 + mov x10, #(VVC_MAX_PB_SIZE * 2) +.endm + +function ff_vvc_put_epel_h4_8_neon, export=1 + VVC_EPEL_H_HEADER + sxtl v0.8h, v30.8b + b 1f +endfunc + function ff_hevc_put_hevc_epel_h4_8_neon, export=1 EPEL_H_HEADER sxtl v0.8h, v30.8b @@ -1414,6 +1426,12 @@ function ff_hevc_put_hevc_epel_h6_8_neon, export=1 ret endfunc +function ff_vvc_put_epel_h8_8_neon, export=1 + VVC_EPEL_H_HEADER + sxtl v0.8h, v30.8b + b 1f +endfunc + function ff_hevc_put_hevc_epel_h8_8_neon, export=1 EPEL_H_HEADER sxtl v0.8h, v30.8b @@ -1461,6 +1479,12 @@ function ff_hevc_put_hevc_epel_h12_8_neon, export=1 ret endfunc +function ff_vvc_put_epel_h16_8_neon, export=1 + VVC_EPEL_H_HEADER + sxtl v0.8h, v30.8b + b 1f +endfunc + function ff_hevc_put_hevc_epel_h16_8_neon, export=1 EPEL_H_HEADER sxtl v0.8h, v30.8b @@ -1523,8 +1547,14 @@ function ff_hevc_put_hevc_epel_h24_8_neon, export=1 ret endfunc +function ff_vvc_put_epel_h32_8_neon, export=1 + VVC_EPEL_H_HEADER + b 0f +endfunc + function ff_hevc_put_hevc_epel_h32_8_neon, export=1 EPEL_H_HEADER +0: ld1 {v1.8b}, [x1], #8 sub x2, x2, w6, uxtw // decrement src stride mov w7, w6 // original width diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 714d642634..c8c13eb068 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -77,6 +77,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][5][1][1] = ff_vvc_put_qpel_hv64_8_neon; c->inter.put[0][6][1][1] = ff_vvc_put_qpel_hv128_8_neon; + c->inter.put[1][1][0][1] = ff_vvc_put_epel_h4_8_neon; + c->inter.put[1][2][0][1] = ff_vvc_put_epel_h8_8_neon; + c->inter.put[1][3][0][1] = ff_vvc_put_epel_h16_8_neon; + c->inter.put[1][4][0][1] = + c->inter.put[1][5][0][1] = + c->inter.put[1][6][0][1] = ff_vvc_put_epel_h32_8_neon; + c->inter.put_uni[0][1][0][0] = ff_vvc_put_pel_uni_pixels4_8_neon; c->inter.put_uni[0][2][0][0] = ff_vvc_put_pel_uni_pixels8_8_neon; c->inter.put_uni[0][3][0][0] = ff_vvc_put_pel_uni_pixels16_8_neon; From patchwork Wed Sep 11 18:06:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51517 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp485286vqi; Wed, 11 Sep 2024 11:19:17 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUywD7vVNTA11L4yYlx78g06SS5aO/IaRkqw9y/tvr564A5zXqdjW9PXBwUlwANaU5er20Jtnfi4JJuvCd6MdMe@gmail.com X-Google-Smtp-Source: AGHT+IHLURy4w6U4YC/TEPd+5O9EZj4Tc2wIO+YdSK8NkS4p4TxRrXh29CtgoqdxhDYsXhLV7+3s X-Received: by 2002:a17:907:3e94:b0:a8a:926a:cffe with SMTP id a640c23a62f3a-a902942e92emr18749066b.4.1726078757630; Wed, 11 Sep 2024 11:19:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726078757; cv=none; d=google.com; s=arc-20240605; b=C99bt52E8Wu+1oi4fBmJ+hll0vqq6DrMf7NYRJoQC4XI/DE+tu1ZyvGQSa44eT9cBU OD4Da4tO67PJXFrj6XYhTq3fhP6br4Vmmm4RN+LLoTTKTOH/NypaBvEdYZ9ARqBdARZy pi8AoUtrJbazU4DElaZ3VSzZUTDDPsrNDqnzbxNwY3VFilnhK5hjOJNfOZfsLlPXMNuM XCwm3C6xj98fiGzVwFJk9jA62uacZ3nyU4QNwChE1hPTseQCQmk67IWJ0WhHgCk0g+PW c1y4MQymDlZIX/iAFOt09GugpplNF4Py9jHeAuat9BvkmLKvAEpbriScHaEo6WzyN2lb vSNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=zA9Tu2N7bNMBHZWgv1JKF2EcqYpmmuazbegP6J1qZDs=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=GrX2YU5iox4WBmMmc9T0/YysLDKGfRQfFAcY2TTsmzSN7bD9G2RAR3NC3hMg3gEm3d pP+X4kcV3K5a2WJ9muWpyEC2twvGCUZ8RhG5w231jkDWzIHDoM4s77OhOh5WvDlpZ9+D LgIc3fg+m0ivdu+EQymjmQoZ2FoKf3DaogiIsiWiDP3M41mOHhJAYITJx+TqmjogulP8 MJPFqWQB9f0wq7b0YCXFWhE3GEnVe8rSGic0DG0SbOHtlE67tmJ6wlZCnen3+ZJEXQoX hfeFj/R5jk5T0FK70s95yFRj+i5tn8+EK7pgkU8vxBtKMdQdeYxm6l3BhOCLQEwJYm3p AxHg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="h/GQ5DTR"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a8d2585695asi702269166b.102.2024.09.11.11.19.17; Wed, 11 Sep 2024 11:19:17 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="h/GQ5DTR"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DC07968E2EB; Wed, 11 Sep 2024 21:06:51 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-58-211.mail.qq.com (out162-62-58-211.mail.qq.com [162.62.58.211]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 70FEF68E26E for ; Wed, 11 Sep 2024 21:06:34 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077986; bh=sRjuPKcFBO8GpUe2ObiP8cMoXjJiQk6nuL/3sYVAnqk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=h/GQ5DTRNa9Vohkt8hYaaf88A6sFSypnUi2qCTY84Jlb5w1m2H8wovt6Ii0qz/QV7 CJrodzU5nwHsWHwGYiueh1xBOvylwaMbtfYju3bSJVrbfm1islgEE/dqtjtnIAfkep hAewoAkVFOqdrTlvrxArz60l2V/Sufm50/pSoesk= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077985tp2bv2ele Message-ID: X-QQ-XMAILINFO: MyIXMys/8kCtplcVbRXz6ORcqURHUJswaUjz7VxrfnjX7wGFy1FZCx1j5tDWg3 LK93jSLOsEuxEqL+OQnmfIN0FS357awY7e3vEZsviiLXlyqNEyV8+OzL/m7wJfG18ud8gDtzqny+ ANhgk4COCHwCxkakwhMGo7/g0DT5XkRnVencNi4gVELGksoi4597PIxaCdwYeOaGS8YST6m6tRKF 4giMscRvmaZIH4I1Tddjaa8UQ7ez9JbYp/GxWoYe4FJukiQurHdLn5v19YMW3bnnE+vkTuuLmXeW R5pxynjjUgz1do3ueObw9IKBLOh5ReO2fwjKWEndD0gSyWLRtLtBaXULf4XEq8KpWyFy/z6qakwJ 9unZFapziZIdCfrEO478siDm6ipmjKSISsZd47T1HPrJkMP7dcC7JxYU0T/lZAaQSejnLhkWuvj3 ZrsHvlc4WuyZEVwJNMIjAi0r8s/xfa/hirlERLVnstQNsJ6oW5USUpjYzN8+pHaQuqmOJNMJDtBA fSiwwJnTrIZ6ysLSwzrDdyiY9zv4DPuVuh9sZgL2SoDQjyWiYYrnMIbfQlUyYRcLgL35XaVAebe0 HY4pnhIXwK6axVrEtyugZNncvJk+rqgHk7/UpeqJst51/405UYltk6VArYlHur/rnShHEVtlV5DX izCxT8XX9VgdSbr+Y18xaWW0LHYl4bJvx83s8XmchIQpRGT7bSAGL75Cbc3TCghThmZy+i+06F7E INEdteqbumgVd40j8F95fhGQq2rVGPPLUlHjfTGGRpQ1XFxuBunPtciU+azd3NlwMwmvXRzk2jyI 4eMJPU/1PiF9u8EAS92Je/RoKQf30WD8T6uoWE04E6RbO43KAECAX31V32SN/vl6CKjdN+1HG+D6 F8VhZULMd40Loo/Xb6itZRwQQBN71Ny/Ub/BrhWHuxgTbD/9qPoUrFVdvgrQc3LXM2E9YdA+qM9y mlqMplE+V7poAp8PUtG/576Sem5FS/XyJfSZWq2Y5qLnNuhJ6RL7X11hR0U6C/Y4uaO5/rFZg+RV Innsyvc4BSLbUA0BE/ X-QQ-XMRINFO: NI4Ajvh11aEj8Xl/2s1/T8w= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:16 +0800 X-OQ-MSGID: <20240911180618.28921-13-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 12/14] aarch64/vvc: Add put_epel_h i8mm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: u9bFlmecXIZ5 From: Zhao Zhili put_chroma_h_8_4x4_c: 0.4 ( 1.00x) put_chroma_h_8_4x4_neon: 0.0 ( 0.00x) put_chroma_h_8_4x4_i8mm: 0.1 ( 2.67x) put_chroma_h_8_8x8_c: 1.6 ( 1.00x) put_chroma_h_8_8x8_neon: 0.1 (11.00x) put_chroma_h_8_8x8_i8mm: 0.1 (11.00x) put_chroma_h_8_16x16_c: 6.9 ( 1.00x) put_chroma_h_8_16x16_neon: 1.1 ( 6.00x) put_chroma_h_8_16x16_i8mm: 0.7 (10.62x) put_chroma_h_8_32x32_c: 27.6 ( 1.00x) put_chroma_h_8_32x32_neon: 4.7 ( 5.95x) put_chroma_h_8_32x32_i8mm: 4.4 ( 6.28x) put_chroma_h_8_64x64_c: 116.2 ( 1.00x) put_chroma_h_8_64x64_neon: 19.1 ( 6.07x) put_chroma_h_8_64x64_i8mm: 17.1 ( 6.77x) put_chroma_h_8_128x128_c: 466.6 ( 1.00x) put_chroma_h_8_128x128_neon: 81.4 ( 5.73x) put_chroma_h_8_128x128_i8mm: 71.7 ( 6.51x) --- libavcodec/aarch64/h26x/dsp.h | 6 ++- libavcodec/aarch64/h26x/epel_neon.S | 60 ++++++++++++++++++++++++++--- libavcodec/aarch64/vvc/dsp_init.c | 7 ++++ 3 files changed, 66 insertions(+), 7 deletions(-) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 6978b900fe..90a42d7108 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -273,7 +273,11 @@ NEON8_FNPROTO_PARTIAL_6(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride, int height, int denom, int wx, int ox, const int8_t *hf, const int8_t *vf, int width),); -NEON8_FNPROTO_PARTIAL_6(qpel_h, (int16_t * dst, +NEON8_FNPROTO_PARTIAL_6(qpel_h, (int16_t *dst, + const uint8_t *_src, ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width), _i8mm); + +NEON8_FNPROTO_PARTIAL_6(epel_h, (int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width), _i8mm); diff --git a/libavcodec/aarch64/h26x/epel_neon.S b/libavcodec/aarch64/h26x/epel_neon.S index 80a0b66a52..cad8f2a5f4 100644 --- a/libavcodec/aarch64/h26x/epel_neon.S +++ b/libavcodec/aarch64/h26x/epel_neon.S @@ -1910,6 +1910,12 @@ endfunc #if HAVE_I8MM ENABLE_I8MM + +function ff_vvc_put_epel_h4_8_neon_i8mm, export=1 + VVC_EPEL_H_HEADER + b 1f +endfunc + function ff_hevc_put_hevc_epel_h4_8_neon_i8mm, export=1 EPEL_H_HEADER 1: ld1 {v4.8b}, [x1], x2 @@ -1953,6 +1959,11 @@ function ff_hevc_put_hevc_epel_h6_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_epel_h8_8_neon_i8mm, export=1 + VVC_EPEL_H_HEADER + b 1f +endfunc + function ff_hevc_put_hevc_epel_h8_8_neon_i8mm, export=1 EPEL_H_HEADER 1: ld1 {v4.16b}, [x1], x2 @@ -2003,6 +2014,11 @@ function ff_hevc_put_hevc_epel_h12_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_epel_h16_8_neon_i8mm, export=1 + VVC_EPEL_H_HEADER + b 1f +endfunc + function ff_hevc_put_hevc_epel_h16_8_neon_i8mm, export=1 EPEL_H_HEADER 1: ld1 {v0.16b, v1.16b}, [x1], x2 @@ -2077,6 +2093,11 @@ function ff_hevc_put_hevc_epel_h24_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_epel_h32_8_neon_i8mm, export=1 + VVC_EPEL_H_HEADER + b 1f +endfunc + function ff_hevc_put_hevc_epel_h32_8_neon_i8mm, export=1 EPEL_H_HEADER 1: ld1 {v0.16b, v1.16b, v2.16b}, [x1], x2 @@ -2176,11 +2197,8 @@ function ff_hevc_put_hevc_epel_h48_8_neon_i8mm, export=1 ret endfunc -function ff_hevc_put_hevc_epel_h64_8_neon_i8mm, export=1 - EPEL_H_HEADER - sub x2, x2, #64 -1: ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x1], #64 - subs w3, w3, #1 // height +.macro put_epel_h64_8_neon_i8mm + ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x1], #64 ext v4.16b, v0.16b, v1.16b, #1 ext v5.16b, v0.16b, v1.16b, #2 ext v6.16b, v0.16b, v1.16b, #3 @@ -2243,7 +2261,37 @@ function ff_hevc_put_hevc_epel_h64_8_neon_i8mm, export=1 xtn2 v22.8h, v26.4s xtn v23.4h, v23.4s xtn2 v23.8h, v27.4s - st4 {v20.8h, v21.8h, v22.8h, v23.8h}, [x0], #64 + st4 {v20.8h, v21.8h, v22.8h, v23.8h}, [x0], x10 +.endm + +function ff_vvc_put_epel_h64_8_neon_i8mm, export=1 + VVC_EPEL_H_HEADER + mov x10, #(VVC_MAX_PB_SIZE * 2 - 64) + sub x2, x2, #64 + b 1f +endfunc + +function ff_hevc_put_hevc_epel_h64_8_neon_i8mm, export=1 + EPEL_H_HEADER + mov x10, #64 + sub x2, x2, #64 +1: + subs w3, w3, #1 // height + put_epel_h64_8_neon_i8mm + b.ne 1b + ret +endfunc + +function ff_vvc_put_epel_h128_8_neon_i8mm, export=1 + VVC_EPEL_H_HEADER + sub x11, x2, #128 + mov x10, #64 + mov x2, #0 +1: + put_epel_h64_8_neon_i8mm + subs w3, w3, #1 + put_epel_h64_8_neon_i8mm + add x1, x1, x11 b.ne 1b ret endfunc diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index c8c13eb068..c947885145 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -127,6 +127,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][4][1][1] = ff_vvc_put_qpel_hv32_8_neon_i8mm; c->inter.put[0][5][1][1] = ff_vvc_put_qpel_hv64_8_neon_i8mm; c->inter.put[0][6][1][1] = ff_vvc_put_qpel_hv128_8_neon_i8mm; + + c->inter.put[1][1][0][1] = ff_vvc_put_epel_h4_8_neon_i8mm; + c->inter.put[1][2][0][1] = ff_vvc_put_epel_h8_8_neon_i8mm; + c->inter.put[1][3][0][1] = ff_vvc_put_epel_h16_8_neon_i8mm; + c->inter.put[1][4][0][1] = ff_vvc_put_epel_h32_8_neon_i8mm; + c->inter.put[1][5][0][1] = ff_vvc_put_epel_h64_8_neon_i8mm; + c->inter.put[1][6][0][1] = ff_vvc_put_epel_h128_8_neon_i8mm; } } else if (bd == 10) { c->alf.filter[LUMA] = alf_filter_luma_10_neon; From patchwork Wed Sep 11 18:06:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51522 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp492771vqi; Wed, 11 Sep 2024 11:34:13 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXqorGqYZLKsWorVjquryCAmAwCKRonLlii2khBwhz+Qsi42DOkdXzJwsyaSLG8sxQ4UcroHKpSjUuMoS/NVewm@gmail.com X-Google-Smtp-Source: AGHT+IG+c2TyserSuUaKc1AGiFp4Wfilmf/iV2zKkiZhyR3r/ZQqQauC7ozN3aV7IljEdvX/KV1i X-Received: by 2002:a05:6512:b1e:b0:530:da96:a986 with SMTP id 2adb3069b0e04-53678feb0eemr182156e87.47.1726079652822; Wed, 11 Sep 2024 11:34:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726079652; cv=none; d=google.com; s=arc-20240605; b=IWbywPz5BVz7187GM3wckeisc3DP8VCTOwACZmXJ9Znsx4qjsaopHUcEQgYpL6AskI Ukv6i0p+rhpaS/CeZwc7TVIblE79bWxA85e5xrPakUEVeDWxRXFm/Puxx31/I9fG5WIi FQmkQ8pV8ZHP2W1lAIdXGQvS3q/ljOKHg+hnA7camr7piLRkk2kgitDqGZyoWWLr3aax ObIf9aWxX0X2asN9NMbj042bWfaDBcqASbyLZvjH4Mxf7hXDkXS2H64s89zHdv9SIoqA 99aPFvYD5os+PuuZ+wd5RQ35wmfG4Ao6r8YINh6mhXdSnIoIdFkpp+HPhkixNMZClRZF eJjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=aE+ukGchHS9zvw/nl9/HqR0bPAmi3pyuCEu9x4ghE7E=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=lanVZXL5fVz13NlSgjf3QKY3r4eu9w8r96P6OPCTYbt3coYNUOnhvphq3FR0Hi7sGp XAxuwJIH9xIoc9Ft51WjP6Zc0pl6FG8+NwHK9AZCZK0ghqEObBSsupB9yfa0DMgbIpow yNt0xxufUFylnMqOj73XxgUlWBLCdf50GmdJVxjwxBLyyE4RG8lpmZD7mW+ry0m6+Bl2 FH/7T9Ea/SyaDSTvJ4edaeIgDclQDbC6FULgVXdNIO6majUfNNRkE+7XDhWKSLBZt7dD fLUVLBnxhXlnjuEniplw29/e+1tnR5pRBr47QD1XKBDDWkCvUW/09kN+NEJaK4gwrnUc bgfQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=T9Rk2qor; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5c3ebdb750esi7390756a12.689.2024.09.11.11.34.12; Wed, 11 Sep 2024 11:34:12 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=T9Rk2qor; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id F09A368E2F3; Wed, 11 Sep 2024 21:06:52 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-210.mail.qq.com (out203-205-221-210.mail.qq.com [203.205.221.210]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E2AA568E29A for ; Wed, 11 Sep 2024 21:06:34 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077986; bh=Ya/qU53+pGW0K4U4NH9+RISmFt/4p6S26ee3giE/Wb0=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=T9Rk2qor39XgHo028LU+1PBDwyzL0RsSyaq37TB6eUWbqiho3thj0x09Zn59iJiV4 NeHC4gv6n+4anlzc5aPuHSTzaSCMurY+oZ4BU6/2JdiOUas6v3PGKcqDstjemJjPZn /W9p8W+TvpPwQLKKDaAJ39+Bmji3S7W2zATR6Pco= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077986tu8pv4obv Message-ID: X-QQ-XMAILINFO: OKKHiI6c9SH39yEiQZBt0jZ8j3eniNaOIzEYrtu6llGuX54tGMA1sFfo1mEPZh Ru0RKY01DsOuFK8/X87qsWq8bo6cGyCdXzgRV1vJ9GZBWNWtM+nhfwpKpN1HDBywCzluGU4Drzt2 s2oJWa0KPMYLmpcBBhx96d2IT9fi2+CMHesK0xWUQ97p7Uiyr6M4Zc/Pi1FzxgjNqVpc4dOjxIem ZJPdb7fDCcYiXp7JUd+e0Ra9MufwITbgE0LVS5O0KO4NSJs0w33gKhUsvHRCJRFwLNev9bkCnZ0a tLmtTDecv+oN5pemTnULkF3JnG+t4A3IKzz4vY5UzFRbec1qqthsxzKYCALxuO+jNchPPKZnFczB lEn5keBlNXIR20rgGl0u7jm3xSww0ANg1lD3CqO88HHoIO68JEJTq55mK8M2dE4QOyjAd5GamXAK 2TU5c4UW4SHlA8aEPplF3e1iQIRwXdYCE+GuZoRqf0rsGkouOPwNqlsvi3AjpPxRhJz0c4Q3x7KH CBV/G8a6VPIZ7h+8rMGE8DXs8kMJqdOT+vOrTpPJvI1vOhxPdh/8DizC+qG7pIQAFamMBHszufoP /AetyRYmPvVHg1xRDtdlZfTl/ZIQSTYe0RbucTYk4SbCT/1MWVRt6fh0P/vOcLVOUpGmmEtczOIY N6DOR1oVspQCS72QN7lG2zqjpN9DsrBSxaX/Z12tDYHNDljITT/QJXt5Hoi+ECyt/W8SEOqKYdLE gGqGYJisfO6KYdjQ7d0clSY6DU6dOIaUyU8JKMDe0dpDu1JNGay1Iw3wTIT8lriq+LNF5rXPIztN /uDTkdtEZY3MAldjZ4JDm2D0jn/IZ1KT4o7b2656pTDR5kEMCJbtA7UGV0HYmtO1U2qGJASOu3QQ JAh3pxs470SNJKWJP/t1R/C9DW2lZiNf4apsXpW66X/z8/zsISf89E7kyByeDPSsAPxMgNRltlZo aVv1Luob72bgwtxoNeivaVZBs5tRVyk1aCWoPuErY4x7dyb43AAy/3w4ZSkrFKHgX27d46VBY= X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:17 +0800 X-OQ-MSGID: <20240911180618.28921-14-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 13/14] aarch64/vvc: Add put_epel_hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: caHlLWGiWNUP From: Zhao Zhili On Apple M1: put_chroma_hv_8_4x4_c: 1.7 ( 1.00x) put_chroma_hv_8_4x4_neon: 0.2 ( 7.67x) put_chroma_hv_8_8x8_c: 5.5 ( 1.00x) put_chroma_hv_8_8x8_neon: 0.5 (11.53x) put_chroma_hv_8_16x16_c: 18.5 ( 1.00x) put_chroma_hv_8_16x16_neon: 1.5 (12.53x) put_chroma_hv_8_32x32_c: 72.5 ( 1.00x) put_chroma_hv_8_32x32_neon: 4.7 (15.34x) put_chroma_hv_8_64x64_c: 274.0 ( 1.00x) put_chroma_hv_8_64x64_neon: 18.5 (14.83x) put_chroma_hv_8_128x128_c: 1058.7 ( 1.00x) put_chroma_hv_8_128x128_neon: 75.2 (14.07x) On Android Pixel 8 Pro: put_chroma_hv_8_4x4_c: 1.2 ( 1.00x) put_chroma_hv_8_4x4_neon: 0.0 ( 0.00x) put_chroma_hv_8_4x4_i8mm: 0.2 ( 5.00x) put_chroma_hv_8_8x8_c: 4.0 ( 1.00x) put_chroma_hv_8_8x8_neon: 0.5 ( 8.00x) put_chroma_hv_8_8x8_i8mm: 0.5 ( 8.00x) put_chroma_hv_8_16x16_c: 15.2 ( 1.00x) put_chroma_hv_8_16x16_neon: 2.5 ( 6.10x) put_chroma_hv_8_16x16_i8mm: 2.2 ( 6.78x) put_chroma_hv_8_32x32_c: 61.0 ( 1.00x) put_chroma_hv_8_32x32_neon: 9.8 ( 6.26x) put_chroma_hv_8_32x32_i8mm: 8.5 ( 7.18x) put_chroma_hv_8_64x64_c: 229.5 ( 1.00x) put_chroma_hv_8_64x64_neon: 38.5 ( 5.96x) put_chroma_hv_8_64x64_i8mm: 34.0 ( 6.75x) put_chroma_hv_8_128x128_c: 919.8 ( 1.00x) put_chroma_hv_8_128x128_neon: 154.5 ( 5.95x) put_chroma_hv_8_128x128_i8mm: 140.0 ( 6.57x) --- libavcodec/aarch64/h26x/dsp.h | 8 ++ libavcodec/aarch64/h26x/epel_neon.S | 125 ++++++++++++++++++++++++++++ libavcodec/aarch64/vvc/dsp_init.c | 14 ++++ 3 files changed, 147 insertions(+) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 90a42d7108..0fefb4d70f 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -297,4 +297,12 @@ NEON8_FNPROTO_PARTIAL_6(qpel_hv, (int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height, const int8_t *hf, const int8_t *vf, int width), _i8mm); +NEON8_FNPROTO_PARTIAL_6(epel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, int height, + const int8_t *hf, const int8_t *vf, int width),); + +NEON8_FNPROTO_PARTIAL_6(epel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, int height, + const int8_t *hf, const int8_t *vf, int width), _i8mm); + #endif diff --git a/libavcodec/aarch64/h26x/epel_neon.S b/libavcodec/aarch64/h26x/epel_neon.S index cad8f2a5f4..e44a448b1f 100644 --- a/libavcodec/aarch64/h26x/epel_neon.S +++ b/libavcodec/aarch64/h26x/epel_neon.S @@ -72,6 +72,11 @@ endconst sxtl v0.8h, v0.8b .endm +.macro vvc_load_epel_filterh freg + ld1 {v0.8b}, [\freg] + sxtl v0.8h, v0.8b +.endm + .macro calc_epelh dst, src0, src1, src2, src3 smull \dst\().4s, \src0\().4h, v0.h[0] smlal \dst\().4s, \src1\().4h, v0.h[1] @@ -2299,10 +2304,16 @@ endfunc DISABLE_I8MM #endif +function vvc_put_epel_hv4_8_end_neon + vvc_load_epel_filterh x5 + mov x10, #(VVC_MAX_PB_SIZE * 2) + b 0f +endfunc function hevc_put_hevc_epel_hv4_8_end_neon load_epel_filterh x5, x4 mov x10, #(HEVC_MAX_PB_SIZE * 2) +0: ldr d16, [sp] ldr d17, [sp, x10] add sp, sp, x10, lsl #1 @@ -2339,9 +2350,16 @@ function hevc_put_hevc_epel_hv6_8_end_neon 2: ret endfunc +function vvc_put_epel_hv8_8_end_neon + vvc_load_epel_filterh x5 + mov x10, #(VVC_MAX_PB_SIZE * 2) + b 0f +endfunc + function hevc_put_hevc_epel_hv8_8_end_neon load_epel_filterh x5, x4 mov x10, #(HEVC_MAX_PB_SIZE * 2) +0: ldr q16, [sp] ldr q17, [sp, x10] add sp, sp, x10, lsl #1 @@ -2379,9 +2397,16 @@ function hevc_put_hevc_epel_hv12_8_end_neon 2: ret endfunc +function vvc_put_epel_hv16_8_end_neon + vvc_load_epel_filterh x5 + mov x10, #(VVC_MAX_PB_SIZE * 2) + b 0f +endfunc + function hevc_put_hevc_epel_hv16_8_end_neon load_epel_filterh x5, x4 mov x10, #(HEVC_MAX_PB_SIZE * 2) +0: ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -2437,6 +2462,21 @@ function ff_hevc_put_hevc_epel_hv4_8_\suffix, export=1 b hevc_put_hevc_epel_hv4_8_end_neon endfunc +function ff_vvc_put_epel_hv4_8_\suffix, export=1 + add w10, w3, #3 + lsl x10, x10, #8 + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-32]! + stp x0, x3, [sp, #16] + add x0, sp, #32 + sub x1, x1, x2 + add w3, w3, #3 + bl X(ff_vvc_put_epel_h4_8_\suffix) + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #32 + b vvc_put_epel_hv4_8_end_neon +endfunc + function ff_hevc_put_hevc_epel_hv6_8_\suffix, export=1 add w10, w3, #3 lsl x10, x10, #7 @@ -2467,6 +2507,21 @@ function ff_hevc_put_hevc_epel_hv8_8_\suffix, export=1 b hevc_put_hevc_epel_hv8_8_end_neon endfunc +function ff_vvc_put_epel_hv8_8_\suffix, export=1 + add w10, w3, #3 + lsl x10, x10, #8 + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-32]! + stp x0, x3, [sp, #16] + add x0, sp, #32 + sub x1, x1, x2 + add w3, w3, #3 + bl X(ff_vvc_put_epel_h8_8_\suffix) + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #32 + b vvc_put_epel_hv8_8_end_neon +endfunc + function ff_hevc_put_hevc_epel_hv12_8_\suffix, export=1 add w10, w3, #3 lsl x10, x10, #7 @@ -2497,6 +2552,21 @@ function ff_hevc_put_hevc_epel_hv16_8_\suffix, export=1 b hevc_put_hevc_epel_hv16_8_end_neon endfunc +function ff_vvc_put_epel_hv16_8_\suffix, export=1 + add w10, w3, #3 + lsl x10, x10, #8 + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-32]! + stp x0, x3, [sp, #16] + add x0, sp, #32 + sub x1, x1, x2 + add w3, w3, #3 + bl X(ff_vvc_put_epel_h16_8_\suffix) + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #32 + b vvc_put_epel_hv16_8_end_neon +endfunc + function ff_hevc_put_hevc_epel_hv24_8_\suffix, export=1 add w10, w3, #3 lsl x10, x10, #7 @@ -2530,6 +2600,24 @@ function ff_hevc_put_hevc_epel_hv32_8_\suffix, export=1 ret endfunc +function ff_vvc_put_epel_hv32_8_\suffix, export=1 + stp x4, x5, [sp, #-64]! + stp x2, x3, [sp, #16] + stp x0, x1, [sp, #32] + str x30, [sp, #48] + mov x6, #16 + bl X(ff_vvc_put_epel_hv16_8_\suffix) + ldp x0, x1, [sp, #32] + ldp x2, x3, [sp, #16] + ldp x4, x5, [sp], #48 + add x0, x0, #32 + add x1, x1, #16 + mov x6, #16 + bl X(ff_vvc_put_epel_hv16_8_\suffix) + ldr x30, [sp], #16 + ret +endfunc + function ff_hevc_put_hevc_epel_hv48_8_\suffix, export=1 stp x4, x5, [sp, #-64]! stp x2, x3, [sp, #16] @@ -2579,6 +2667,43 @@ function ff_hevc_put_hevc_epel_hv64_8_\suffix, export=1 ldr x30, [sp], #16 ret endfunc + +function ff_vvc_put_epel_hv64_8_\suffix, export=1 + stp x4, x5, [sp, #-64]! + stp x2, x3, [sp, #16] + stp x0, x1, [sp, #32] + str x30, [sp, #48] + mov x6, #32 + bl X(ff_vvc_put_epel_hv32_8_\suffix) + ldp x0, x1, [sp, #32] + ldp x2, x3, [sp, #16] + ldp x4, x5, [sp], #48 + add x0, x0, #64 + add x1, x1, #32 + mov x6, #32 + bl X(ff_vvc_put_epel_hv32_8_\suffix) + ldr x30, [sp], #16 + ret +endfunc + +function ff_vvc_put_epel_hv128_8_\suffix, export=1 + stp x4, x5, [sp, #-64]! + stp x2, x3, [sp, #16] + stp x0, x1, [sp, #32] + str x30, [sp, #48] + mov x6, #64 + bl X(ff_vvc_put_epel_hv64_8_\suffix) + ldp x0, x1, [sp, #32] + ldp x2, x3, [sp, #16] + ldp x4, x5, [sp], #48 + add x0, x0, #128 + add x1, x1, #64 + mov x6, #64 + bl X(ff_vvc_put_epel_hv64_8_\suffix) + ldr x30, [sp], #16 + ret +endfunc + .endm epel_hv neon diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index c947885145..4867491620 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -84,6 +84,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[1][5][0][1] = c->inter.put[1][6][0][1] = ff_vvc_put_epel_h32_8_neon; + c->inter.put[1][1][1][1] = ff_vvc_put_epel_hv4_8_neon; + c->inter.put[1][2][1][1] = ff_vvc_put_epel_hv8_8_neon; + c->inter.put[1][3][1][1] = ff_vvc_put_epel_hv16_8_neon; + c->inter.put[1][4][1][1] = ff_vvc_put_epel_hv32_8_neon; + c->inter.put[1][5][1][1] = ff_vvc_put_epel_hv64_8_neon; + c->inter.put[1][6][1][1] = ff_vvc_put_epel_hv128_8_neon; + c->inter.put_uni[0][1][0][0] = ff_vvc_put_pel_uni_pixels4_8_neon; c->inter.put_uni[0][2][0][0] = ff_vvc_put_pel_uni_pixels8_8_neon; c->inter.put_uni[0][3][0][0] = ff_vvc_put_pel_uni_pixels16_8_neon; @@ -134,6 +141,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[1][4][0][1] = ff_vvc_put_epel_h32_8_neon_i8mm; c->inter.put[1][5][0][1] = ff_vvc_put_epel_h64_8_neon_i8mm; c->inter.put[1][6][0][1] = ff_vvc_put_epel_h128_8_neon_i8mm; + + c->inter.put[1][1][1][1] = ff_vvc_put_epel_hv4_8_neon_i8mm; + c->inter.put[1][2][1][1] = ff_vvc_put_epel_hv8_8_neon_i8mm; + c->inter.put[1][3][1][1] = ff_vvc_put_epel_hv16_8_neon_i8mm; + c->inter.put[1][4][1][1] = ff_vvc_put_epel_hv32_8_neon_i8mm; + c->inter.put[1][5][1][1] = ff_vvc_put_epel_hv64_8_neon_i8mm; + c->inter.put[1][6][1][1] = ff_vvc_put_epel_hv128_8_neon_i8mm; } } else if (bd == 10) { c->alf.filter[LUMA] = alf_filter_luma_10_neon; From patchwork Wed Sep 11 18:06:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51518 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp485296vqi; Wed, 11 Sep 2024 11:19:19 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUQK4N/CI3AEn5TAhfeVx4AWbIbyl3Br2FAmm58tdA1rnm9kavmntwTHyfX4mukaZCsldbCSED9UEAViw+o4iqQ@gmail.com X-Google-Smtp-Source: AGHT+IGQTp/dNnShmxwCZQzr8QEJ1/LIQycFv4hIgBd2gC4QhJQ0jMdon6VTmzXhjZC5F6puBgmV X-Received: by 2002:a05:6402:254d:b0:5c2:6f74:782f with SMTP id 4fb4d7f45d1cf-5c413e4b6e0mr82994a12.5.1726078759014; Wed, 11 Sep 2024 11:19:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726078759; cv=none; d=google.com; s=arc-20240605; b=jz667TunTaI5GN6N2qsKf51op61vVr9JsgYzo/1+dAhtboTfdYkRjyyGm/ZHP+5YGT zM8I03/MdBKDnAMHeO99or3BlX3WAeFznGV2CLmhjJpucZS36xyMFwrkLX/IsdyliZds 1TqvMnVhpByWVhyn5ta2KFEZB7kTLoEXdcTQWEs2SslMuNvE9T7Y97yOrSRlYtBrYURt OEEP0/IR2iMst/OBz6rujaHrF9NWsEqDK+MzWNYqLjXmu2r9VLNVlzoLXc3cGh/5PnmO MmTa5JbYyyFI7GleSRBU5m+KfnqRxgFw0J9+FJj565bShpFdM95Crz3Jd+TIEmSZDhoc /vOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=EXIOa9zNFz0Qw4cWdVkahU8w5/FlXySOlEo6WhRgHVY=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=huBThz6JcUJCQRgT2Qt45M1iomAiew80gbDTqFr/RYNg8lxEsiTDL6TZL8BhmFDvlQ y7UsedXxqBP4NDdpVIsrp+X0Lnc/Z6+FxrD3RCYuCcNjwrzw/+iuVTrG+ZeAYPXe4HDW wdM61uI5eJ5Zr37QmHRCIkxtBAstpDoFFi0BhYqy1tarEsb1RMRXld3EhyB5EtCg7/qv iVnrA47GheQzgAnZE3onb04u1GA5aGuqTRvSu5vC70NAEf1R4nO9UZMQRz7AEORZa3eK c9wjKNsmJq0U5QL+18ZWRH43rJB/O3aq6O7NNUPHG8kdHI92Ckem/7hMvw+Rvg77pSTG M7gA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="QfGtSL/B"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5c3ebd8561asi6832346a12.222.2024.09.11.11.19.18; Wed, 11 Sep 2024 11:19:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="QfGtSL/B"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EF7B268E2F9; Wed, 11 Sep 2024 21:06:53 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-242.mail.qq.com (out203-205-221-242.mail.qq.com [203.205.221.242]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 590A568E26E for ; Wed, 11 Sep 2024 21:06:35 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077987; bh=51rhxSTfp/wwqoDgjOBxHekM+PPJLCar+M09KrV+avI=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=QfGtSL/BBA3w9XsTnGeNFc7SfQ/elG9CTbLU3nEhufdGdtAAMbor14vp3JHCf13Sh uBpX/AGBZBhUZlGFeupxFB6uRUTmNQzCqx4+d1kjUf0hNbTTh/DnFmlSKX5LDeeVaG uV+05JYBR2Xp0y27ktJ3Y9Fyu6Guzp+ddbqIOmFQ= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077986tr4sul490 Message-ID: X-QQ-XMAILINFO: Mdc3TkmnJyI/lpFxeXrX6vfJXOIC+E7pb5cGa0O0P1R/rhQPWM1ew+ebROyC75 EVPv9dIMVvF/LNZAfbP6SJF6sBMvH/wg1gInuqcLV2E8KwVbzxr2LJ9hrJft/dlovZcTI1veXTsi K1vjhyQKc9RCTJdmERItmpBbCTlbF9xYz1F2AOr16B4spVABB4ijjBrppD/vtsj786Nznv+k+zzV HU0r1DgTDS+ZFzJn6n88GLOxDyNCmjR7FkRzofwruOGwcAQldItYWC0WVNHE7yp6T8MrthUtrZx9 jk/k+Zcl8T2E0sZXanAZA1FCvJAJhKqzJX2/XFK/lnmU5RR8MMfkv/xw7SrQFjdscyi8tMgMJRtB Ks288Bm8kS+jj98pKiV3QHPWy6cqmSyBsoh25ZL3BlNHm4zNvVrnfzgzMOnfarkGq9EN0P4bH8Pk sWjfe3MduK7K9tOU1E/jPW/etb6BMdS1wn04rFHmC0U5mXiJGSoBU8CjfOYdAJC4Y7Y5aRL76UYr tItFpXg+Iy6tDwXNDX5YlsqZvJt2j+R3N7YvlbytiZRLNJp3BllgqEQ6ftcfBK+K6WK6qgYTCUls EEc/aDfCcBH8+3ckaQUsfh+ziw86dY8QW66TbT6kThXXxuI9O2pN0lvQLjtEKaqEd7Mo8VSfV653 QKu04qBwZ+z+knEMvIVupyV5W8SHf75XJTqoZyXF2IxbCXy3W75tM5HDglxJiTBBQzpk4xZq4ebj W7KdbZrVHdjvgdb96sMf+V9NljsZq9SsOcl5PjytwhvqfkDjmsMRhJNbR4Q2iKISN8MmP3Tx+HO/ GcCevwxa+qz7l2okTOmK3Mh/DIaojbvm4fWiIfbzzgXdXp9vnDZd0IUauygNC1cyVvgGkG4OFEoH fURgC4KYYdm1oYDVEe8AZs57CSO7fovIndn4SbyL8BgACLMhTGh2GfSgtviJBtIoB1h7aUgUbQ7P +lmsbQiMM4osC6UY/K2x+sYaYqp6cCP+4bY30pgOB6dLJ3N7H3O+kTpN5VsUGykZeySMihBPuKXn +H9KEy9Bq6zIA9GIU3hhwvQ8c1jZmgs/oTkvS9pQ== X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:18 +0800 X-OQ-MSGID: <20240911180618.28921-15-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 14/14] aarch64/vvc: Add avg X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: HZ8dQnrghPOT From: Zhao Zhili avg_8_2x2_c: 0.2 ( 1.00x) avg_8_2x2_neon: 0.2 ( 1.00x) avg_8_4x4_c: 0.2 ( 1.00x) avg_8_4x4_neon: 0.2 ( 1.00x) avg_8_8x8_c: 0.9 ( 1.00x) avg_8_8x8_neon: 0.2 ( 5.29x) avg_8_16x16_c: 3.7 ( 1.00x) avg_8_16x16_neon: 0.7 ( 5.44x) avg_8_32x32_c: 14.9 ( 1.00x) avg_8_32x32_neon: 1.7 ( 8.91x) avg_8_64x64_c: 59.7 ( 1.00x) avg_8_64x64_neon: 6.9 ( 8.62x) avg_8_128x128_c: 254.7 ( 1.00x) avg_8_128x128_neon: 26.9 ( 9.46x) avg_10_2x2_c: 0.2 ( 1.00x) avg_10_2x2_neon: 0.2 ( 1.00x) avg_10_4x4_c: 0.2 ( 1.00x) avg_10_4x4_neon: 0.2 ( 1.00x) avg_10_8x8_c: 0.9 ( 1.00x) avg_10_8x8_neon: 0.2 ( 5.29x) avg_10_16x16_c: 3.4 ( 1.00x) avg_10_16x16_neon: 0.4 ( 8.06x) avg_10_32x32_c: 13.9 ( 1.00x) avg_10_32x32_neon: 1.9 ( 7.23x) avg_10_64x64_c: 54.2 ( 1.00x) avg_10_64x64_neon: 8.4 ( 6.43x) avg_10_128x128_c: 232.4 ( 1.00x) avg_10_128x128_neon: 30.9 ( 7.52x) avg_12_2x2_c: 0.0 ( 0.00x) avg_12_2x2_neon: 0.2 ( 0.00x) avg_12_4x4_c: 0.4 ( 1.00x) avg_12_4x4_neon: 0.2 ( 2.43x) avg_12_8x8_c: 0.7 ( 1.00x) avg_12_8x8_neon: 0.2 ( 3.86x) avg_12_16x16_c: 3.7 ( 1.00x) avg_12_16x16_neon: 0.4 ( 8.65x) avg_12_32x32_c: 13.7 ( 1.00x) avg_12_32x32_neon: 2.2 ( 6.29x) avg_12_64x64_c: 53.9 ( 1.00x) avg_12_64x64_neon: 7.7 ( 7.03x) avg_12_128x128_c: 270.9 ( 1.00x) avg_12_128x128_neon: 30.4 ( 8.90x) --- libavcodec/aarch64/vvc/Makefile | 1 + libavcodec/aarch64/vvc/dsp_init.c | 16 +++ libavcodec/aarch64/vvc/inter.S | 163 ++++++++++++++++++++++++++++++ 3 files changed, 180 insertions(+) create mode 100644 libavcodec/aarch64/vvc/inter.S diff --git a/libavcodec/aarch64/vvc/Makefile b/libavcodec/aarch64/vvc/Makefile index 7ba13a2165..ed80338969 100644 --- a/libavcodec/aarch64/vvc/Makefile +++ b/libavcodec/aarch64/vvc/Makefile @@ -3,6 +3,7 @@ clean:: OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/dsp_init.o NEON-OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/alf.o \ + aarch64/vvc/inter.o \ aarch64/vvc/sad.o \ aarch64/h26x/epel_neon.o \ aarch64/h26x/qpel_neon.o \ diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 4867491620..ad767d17e2 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -42,6 +42,16 @@ int ff_vvc_sad_neon(const int16_t *src0, const int16_t *src1, int dx, int dy, const int block_w, const int block_h); +void ff_vvc_avg_8_neon(uint8_t *dst, ptrdiff_t dst_stride, + const int16_t *src0, const int16_t *src1, int width, + int height); +void ff_vvc_avg_10_neon(uint8_t *dst, ptrdiff_t dst_stride, + const int16_t *src0, const int16_t *src1, int width, + int height); +void ff_vvc_avg_12_neon(uint8_t *dst, ptrdiff_t dst_stride, + const int16_t *src0, const int16_t *src1, int width, + int height); + void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) { int cpu_flags = av_get_cpu_flags(); @@ -112,6 +122,8 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put_uni_w[0][5][0][0] = ff_vvc_put_pel_uni_w_pixels64_8_neon; c->inter.put_uni_w[0][6][0][0] = ff_vvc_put_pel_uni_w_pixels128_8_neon; + c->inter.avg = ff_vvc_avg_8_neon; + for (int i = 0; i < FF_ARRAY_ELEMS(c->sao.band_filter); i++) c->sao.band_filter[i] = ff_h26x_sao_band_filter_8x8_8_neon; c->sao.edge_filter[0] = ff_vvc_sao_edge_filter_8x8_8_neon; @@ -150,9 +162,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[1][6][1][1] = ff_vvc_put_epel_hv128_8_neon_i8mm; } } else if (bd == 10) { + c->inter.avg = ff_vvc_avg_10_neon; + c->alf.filter[LUMA] = alf_filter_luma_10_neon; c->alf.filter[CHROMA] = alf_filter_chroma_10_neon; } else if (bd == 12) { + c->inter.avg = ff_vvc_avg_12_neon; + c->alf.filter[LUMA] = alf_filter_luma_12_neon; c->alf.filter[CHROMA] = alf_filter_chroma_12_neon; } diff --git a/libavcodec/aarch64/vvc/inter.S b/libavcodec/aarch64/vvc/inter.S new file mode 100644 index 0000000000..2f69274b86 --- /dev/null +++ b/libavcodec/aarch64/vvc/inter.S @@ -0,0 +1,163 @@ +/* + * Copyright (c) 2024 Zhao Zhili + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +#define VVC_MAX_PB_SIZE 128 + +.macro vvc_avg, bit_depth + +.macro vvc_avg_\bit_depth\()_2_4, tap +.if \tap == 2 + ldr s0, [src0] + ldr s2, [src1] +.else + ldr d0, [src0] + ldr d2, [src1] +.endif + saddl v4.4s, v0.4h, v2.4h + add v4.4s, v4.4s, v16.4s + sqshrn v4.4h, v4.4s, #(15 - \bit_depth) +.if \bit_depth == 8 + sqxtun v4.8b, v4.8h +.if \tap == 2 + str h4, [dst] +.else // tap == 4 + str s4, [dst] +.endif + +.else // bit_depth > 8 + smin v4.4h, v4.4h, v17.4h + smax v4.4h, v4.4h, v18.4h +.if \tap == 2 + str s4, [dst] +.else + str d4, [dst] +.endif +.endif + add src0, src0, x10 + add src1, src1, x10 + add dst, dst, dst_stride +.endm + +function ff_vvc_avg_\bit_depth\()_neon, export=1 + dst .req x0 + dst_stride .req x1 + src0 .req x2 + src1 .req x3 + width .req w4 + height .req w5 + + mov x10, #(VVC_MAX_PB_SIZE * 2) + cmp width, #8 +.if \bit_depth == 8 + movi v16.4s, #64 +.else +.if \bit_depth == 10 + mov w6, #1023 + movi v16.4s, #16 +.else + mov w6, #4095 + movi v16.4s, #4 +.endif + movi v18.8h, #0 + dup v17.8h, w6 +.endif + b.eq 8f + b.hi 16f + cmp width, #4 + b.eq 4f +2: // width == 2 + subs height, height, #1 + vvc_avg_\bit_depth\()_2_4 2 + b.ne 2b + b 32f +4: // width == 4 + subs height, height, #1 + vvc_avg_\bit_depth\()_2_4 4 + b.ne 4b + b 32f +8: // width == 8 + ld1 {v0.8h}, [src0], x10 + ld1 {v2.8h}, [src1], x10 + saddl v4.4s, v0.4h, v2.4h + saddl2 v5.4s, v0.8h, v2.8h + add v4.4s, v4.4s, v16.4s + add v5.4s, v5.4s, v16.4s + sqshrn v4.4h, v4.4s, #(15 - \bit_depth) + sqshrn2 v4.8h, v5.4s, #(15 - \bit_depth) + subs height, height, #1 +.if \bit_depth == 8 + sqxtun v4.8b, v4.8h + st1 {v4.8b}, [dst], dst_stride +.else + smin v4.8h, v4.8h, v17.8h + smax v4.8h, v4.8h, v18.8h + st1 {v4.8h}, [dst], dst_stride +.endif + b.ne 8b + b 32f +16: // width >= 16 + mov w6, width + mov x7, src0 + mov x8, src1 + mov x9, dst +17: + ldp q0, q1, [x7], #32 + ldp q2, q3, [x8], #32 + saddl v4.4s, v0.4h, v2.4h + saddl2 v5.4s, v0.8h, v2.8h + saddl v6.4s, v1.4h, v3.4h + saddl2 v7.4s, v1.8h, v3.8h + add v4.4s, v4.4s, v16.4s + add v5.4s, v5.4s, v16.4s + add v6.4s, v6.4s, v16.4s + add v7.4s, v7.4s, v16.4s + sqshrn v4.4h, v4.4s, #(15 - \bit_depth) + sqshrn2 v4.8h, v5.4s, #(15 - \bit_depth) + sqshrn v6.4h, v6.4s, #(15 - \bit_depth) + sqshrn2 v6.8h, v7.4s, #(15 - \bit_depth) + subs w6, w6, #16 +.if \bit_depth == 8 + sqxtun v4.8b, v4.8h + sqxtun2 v4.16b, v6.8h + str q4, [x9], #16 +.else + smin v4.8h, v4.8h, v17.8h + smin v6.8h, v6.8h, v17.8h + smax v4.8h, v4.8h, v18.8h + smax v6.8h, v6.8h, v18.8h + stp q4, q6, [x9], #32 +.endif + b.ne 17b + + subs height, height, #1 + add src0, src0, x10 + add src1, src1, x10 + add dst, dst, dst_stride + b.ne 16b +32: + ret +endfunc +.endm + +vvc_avg 8 +vvc_avg 10 +vvc_avg 12