From patchwork Sat Sep 7 17:13:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51384 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9c4f:0:b0:48e:c0f8:d0de with SMTP id w15csp833346vqu; Sat, 7 Sep 2024 10:14:07 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXEU9miXvMl4JJT8+JiWAwQXLTMvnsJRYroRt6tudd2kMnUI6rS/cI/1mYr228HhNJ0ger+56KAFpAMTgBOZ9es@gmail.com X-Google-Smtp-Source: AGHT+IF1Cy5PemHcphfhJw2VYSqOGhUww6hgI/VMszP8svwNf2vciUNphA1P47sDX7VwkP1zrtMI X-Received: by 2002:a05:651c:1547:b0:2f1:5c54:7517 with SMTP id 38308e7fff4ca-2f751fb4a67mr21732861fa.7.1725729247467; Sat, 07 Sep 2024 10:14:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725729247; cv=none; d=google.com; s=arc-20240605; b=gIY/vGQoi8XYTOTV5ANPMPwNqnkuR+kOjDzL+94gWPKDZt593HYzYFZ5a31+SVlvxg QxpWc8m3+F+O7GL5NRILGj/cQmxdas8mKuGIkLDZoaaRV2obRl+I9XCU2h4AqUDLn50X GmUPj5jWNkWC1k1Pmh7iRhPkZBKV9URzNll32Jb53nyIwbh6YUziivRF2G4LdZsh0Z8a y8YF+nwvUJNC0VZj1xFDNHDhNSmZUJVdhreDb+IjhtnIZu9iVUUiig8cxSiWZY7ZIqCy l04ldtYFjs/NS8I/yif2TO0Q8JULVT4b0iUCTIIX35zhrWAPcZaykniNREMni/70qHEw fgTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=YIb8ToNz0nrRLNDC71vVHkCxeTAQNzsdRObvF1X0eW4=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=AYX03tzzh/pM44gcyqfTgtbFeC5XXF5rKcmnHKdmc5JxDeUyGNUF0QitNyBdc/w/OV qgRWg5oNizhNoIVl0T/C6wy8o6vxx2QGgzUGGuPwqHytpynZyFNyCqCBV1k+V/kPMhNU aaEja8GkxYzcCAHF67vh/xf3itMFg1k+1iFk//yC6Rj4HyXEEBdpBx5L6PAz1zePQjNm cp03UFEgkIAk24D6xuYmv6j6jYVUAXzOdTC6u+dWVgrCellp8fyj6xcoKKZ5Qf3PBdFk 2G1o0qtUxEqXl3qt2r1RxWr8N8N0RqChRJFFgH8r6Bsyv5Gqc1NAxYhHgED7u6x/8k0E Y5Uw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=yAe21KnA; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a8d25c80cdcsi114777666b.416.2024.09.07.10.14.06; Sat, 07 Sep 2024 10:14:07 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=yAe21KnA; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5332568D57B; Sat, 7 Sep 2024 20:14:01 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from xmbghk7.mail.qq.com (xmbghk7.mail.qq.com [43.163.128.54]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 11E5068D7A3 for ; Sat, 7 Sep 2024 20:13:53 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1725729223; bh=QhizpfewhvVOgStOx1OlQy1tgqtspDjxf4M3ccu/J5Q=; h=From:To:Cc:Subject:Date; b=yAe21KnAvIma2oZSBSwBAj/ZVWZBjsUdw9XYIA3x9mYdvh2GKQIhTkC3JjjksPWM6 yWnqzruhYwZ8shjemBCoWguWz65A06FqatGawHuXx8B/uTd4/z2HP6/TJ4kKwg1AGh Pr/X3Pbr3DsOr9tqgQFWkeeUwuBow/B3dmI/fU58= Received: from ZHILIZHAO-MB1.tencent.com ([121.34.200.168]) by newxmesmtplogicsvrszc5-2.qq.com (NewEsmtp) with SMTP id 36A3CCF4; Sun, 08 Sep 2024 01:13:42 +0800 X-QQ-mid: xmsmtpt1725729222tmh1w49uq Message-ID: X-QQ-XMAILINFO: NUygYfydBsqcVYZP4ps6uMRNTEo2jg5QdD89GkW0Bsk/fRXZbHjnbMEBwFZkHl TUW6A/501H5KzBLadANzVlV9IDJ5+tEvZ8yo652fV53XEjSqP5ip1pW+3/h7PlmX6nqO673WJvC4 gQBzBU5b4We+rrNkDRocShR5eDI56cDbC/njjYF/YTCp3b3lMEfRUi5N42PALyO6U97L6RLAddT/ 6b4aNsHbjtTS5VpagD7eYMKgoVuTEXxDKaAV9AdfMWRFi+qT/NUvYgFev6HTdI5H23AF0AymAIuD 7Gqe1mfCN+6kOj1R4j4CQL9NrrJhrfUkNUoUrPKBFJRpdd9Y3t75Slr87vMOsDhCb8eVrTpjcPR7 OqMDkV+VRgPbBwgxGp+H3IzbrqxXYQm5U2+gdjN4eztEwJGQ3v4Ex5w51G9eu2gnIqYMzvXU+7pz DkkJvYv3dpnma9cxxubtxzjSt2TWMT0pzU5V2ioH9WvhLT35zdJvaoOvuO99A2nr9U3y6RhqsfD1 9MdZEWCvgAFv722tn3/tP7piaSl9x9dDbs14FNYtwvKTG1pou1w+NIa5Q72bQwrQlxjxYJVcNo7A ervTxxZM+NlxwSr8nMVd5k7ZWgAKmamSDkweaxP5alcKRBM+LVtn3lWDgvTd7UC4pnCzq/SyQfUr 8mT76o7Jl4y9WJ23GtvbaA1XSTlzvXlmR03rWmZQTOtqG71izJEhU2xmJ5ZMTThg4+9/iIPX/W1N z6ezeRYdxICjpOPknJ/zSuocMl0ZuXCR7rEvvDc4ODrjeHV0kkSjzHNrM2VCVKa0qEtyRTSMKHg8 SHKcmWbn1HfQ1TANF5J/XZ/vTiXnbP2Tspx8/+7BYyYC3xJeOKUQI/ab666uY+OUNWycbg4UV9m6 sKlyHusTgZ3V+52iyrEHjBedGkjznvipewr92B1gY69SVMrwYn9YlrmhJYV7Vnz5WNhGb7Kuzw7Y 5XB1VLVENlPIdGKW7vQ2zfuJ5Cvqbx1pI9UfGR1gk= X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Sun, 8 Sep 2024 01:13:35 +0800 X-OQ-MSGID: <20240907171340.55502-1-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/6] aarch64/hevc: Simplify function prototypes by macro X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: T7A1+kuHPXsW From: Zhao Zhili --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 66 +++++++---------------- 1 file changed, 18 insertions(+), 48 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index a90da0246e..26bbc8750f 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -92,54 +92,24 @@ void ff_hevc_idct_8x8_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_16x16_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_32x32_dc_10_neon(int16_t *coeffs); void ff_hevc_transform_luma_4x4_neon_8(int16_t *coeffs); -void ff_hevc_put_hevc_qpel_h4_8_neon(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, - intptr_t mx, intptr_t my, int width); -void ff_hevc_put_hevc_qpel_h6_8_neon(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, - intptr_t mx, intptr_t my, int width); -void ff_hevc_put_hevc_qpel_h8_8_neon(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, - intptr_t mx, intptr_t my, int width); -void ff_hevc_put_hevc_qpel_h12_8_neon(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, - intptr_t mx, intptr_t my, int width); -void ff_hevc_put_hevc_qpel_h16_8_neon(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, - intptr_t mx, intptr_t my, int width); -void ff_hevc_put_hevc_qpel_h32_8_neon(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, - intptr_t mx, intptr_t my, int width); -void ff_hevc_put_hevc_qpel_uni_h4_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, - int width); -void ff_hevc_put_hevc_qpel_uni_h6_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, - int width); -void ff_hevc_put_hevc_qpel_uni_h8_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, - int width); -void ff_hevc_put_hevc_qpel_uni_h12_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t - my, int width); -void ff_hevc_put_hevc_qpel_uni_h16_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t - my, int width); -void ff_hevc_put_hevc_qpel_uni_h32_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t - my, int width); -void ff_hevc_put_hevc_qpel_bi_h4_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t - mx, intptr_t my, int width); -void ff_hevc_put_hevc_qpel_bi_h6_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t - mx, intptr_t my, int width); -void ff_hevc_put_hevc_qpel_bi_h8_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t - mx, intptr_t my, int width); -void ff_hevc_put_hevc_qpel_bi_h12_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t - mx, intptr_t my, int width); -void ff_hevc_put_hevc_qpel_bi_h16_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t - mx, intptr_t my, int width); -void ff_hevc_put_hevc_qpel_bi_h32_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t - mx, intptr_t my, int width); + +#define NEON8_FNPROTO_PARTIAL_6(fn, args, ext) \ + void ff_hevc_put_hevc_##fn##_h4_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h6_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h8_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h12_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h16_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h32_8_neon##ext args; + +NEON8_FNPROTO_PARTIAL_6(qpel, (int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, + intptr_t mx, intptr_t my, int width),) + +NEON8_FNPROTO_PARTIAL_6(qpel_uni, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, + ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width),) + +NEON8_FNPROTO_PARTIAL_6(qpel_bi, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, + ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t + mx, intptr_t my, int width),) #define NEON8_FNPROTO(fn, args, ext) \ void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ From patchwork Sat Sep 7 17:13:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51388 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9c4f:0:b0:48e:c0f8:d0de with SMTP id w15csp833719vqu; Sat, 7 Sep 2024 10:15:08 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUX9xT2sMUz2oxSztMjoouK4CsrI2cSPLBJ41q4ChhFV9k2oIte2x2wUOQxS5kwPBbl6XokNksMQDJFyNl276iL@gmail.com X-Google-Smtp-Source: AGHT+IGQrr6he0P3eWew9C/mwBFZ+5SL7tWb2GvUMCDsNZK0r0XBYO8llff1yQcidV+yAg564Nfm X-Received: by 2002:a05:6512:3c89:b0:535:6986:9806 with SMTP id 2adb3069b0e04-536587ef097mr1749027e87.3.1725729308146; Sat, 07 Sep 2024 10:15:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725729308; cv=none; d=google.com; s=arc-20240605; b=k6UuIC+wr51ha951fpD0K4RnoK8+/du3ggrRqAO7Ou/kXD5CAsU2qV7Xi8t/rHNZXC FuHckmg91eqL2rymNqv9EaSU6Opq7FQl4F8A/eBJbe3k1zmi2KHSH1l/BlO8YsSLgqn6 r6yVS2skV8i6a3sk4BkMXBR1cKKK6gKAu5le2ev0IuGuCICBiyw/COWk/4SvQBZjjp/L mMTcMphaGFdlXYCGQSofnYgqjQ6GM2U3cbFqsr/02R0cj45gkqf7LrbD5cEE/diP+5Nd z0aFb21uGBH7utETpJDR3vvmwxN/PssI+kd1HAJ+OCUUZRqOg9FNC4TxVqO7VdvKoX82 sUXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=nSwm2U37LLLddwJZ0cUGWH0juW0DL6bkIQEMhHY0b1M=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=VGZSST5FN2HUnYbGNipUHR6epYr55FSyQGu7wfNrgszf0a3VQZtTCRf7alC8TgH+oW GazlUGkypHAP7k0F6Fhf12tD86dZkUb7LiqNeo5Ksm1uIqhFpQZF6AsDfM4cWGfURk/T YQxqGo0odhAXBkML5pOZjHzjj+ElJ84SQDdTyAoTkPK6PvRfb/1WTFg4A7W6+cXwCz3H J21/iw7g28DL+5KpiZsAZFBekWZ/VusUtfNXcv3wpDO31iZvnBNSfout33ExifBccJ1K MiseXoQHQSkLX/J9XrlHPdJm/0NP2RBaeZ2fA3B0zTm6aKN/Sij5FYinVrNm9vIln3DK HmPQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="af4Gnyq/"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-5365f86ec36si500608e87.159.2024.09.07.10.15.07; Sat, 07 Sep 2024 10:15:08 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="af4Gnyq/"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9685D68D97D; Sat, 7 Sep 2024 20:14:06 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from xmbghk7.mail.qq.com (xmbghk7.mail.qq.com [43.163.128.44]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8E58168D7C8 for ; Sat, 7 Sep 2024 20:13:55 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1725729223; bh=bFiAFpR2mwC/7XszQaT33DFI9g0hcOcpg3m+bleIxII=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=af4Gnyq/vbAdlKWaOg3aXFORrkTJfJ0soxwjzyp3IKeqzXzVK2Vi6IQV3LwqvxRR3 bRJCQ5idCi7cvhquoz7IspBxzhR8OmjNeETfVA7wXY2j+uQbABi5dgjc/Ck/0Owidl mknSimY5TbBcLhLNJ0K65fFQJp1OTqHDnRYDfKu8= Received: from ZHILIZHAO-MB1.tencent.com ([121.34.200.168]) by newxmesmtplogicsvrszc5-2.qq.com (NewEsmtp) with SMTP id 36A3CCF4; Sun, 08 Sep 2024 01:13:42 +0800 X-QQ-mid: xmsmtpt1725729223tnvnogszn Message-ID: X-QQ-XMAILINFO: MyIXMys/8kCtNXVnZmFR/ZtlyM1f+fvqMCgc4m0BA/1QkcX4mwdZWOzBJVAivr buGglerHdMxr2rOmBu0rdOCLxW9rwg+15NZtICDDHkdq65/j9yeBVGRstNfRGEYLUj746k4JcHrd lcXTCL9wsteWG5qrLc9Wc4Sq/JIo1fOusRrVXsdo8q7w0w2xBsw1gebObpmKZkLyzh8ZDxjGE8Oh rzKG0QdRsDAbcvOaWNwlYwMsHFNlv5O0DSUcs0PFXqi9dDpAwYpBzqNi/jBLfwYLNcwbjhr6c88x QozgArFPnbXAP6ynO4oxgf9a6FWYNbdqdqUPb1G0XiawNTdqLLbxmHTwOc4MlL9t3pQSiWBEgel0 djlh+k1FxZZ4vWFy0kSrsQoYlPCbgu6BDyt5FCKBs9MAD1ULZR63yLPlalKffi4ukwMtCkxziGw1 KZS9jISQVTp/4FWI86lRctz51aQlCpuX2wf6ri1K9vHJJn6bKW9kBC7aADFQV4uK1gMBcgPAgPVv HND/rHqm/UK4ly2J8hAMCcMJWjyPLTpG9EnLvXd6tHj/VgojTHlk60DCrhPKsD5xZ0WGEKcvq92K llND2/kGEzroRXiTx5JswkGJliKJ+CH0lZCgkg64fbBRA2FZp0x4khdGnazlyrJaJVOwsplEZHzp Z7boBBZcArYAfZEXRiuDuICle+kLm1u2xbPiBfx7spCqG8sxohQG7rtJAtoyjIhn7/7fjTLXSmIw VZEnpOWE3zOISRzdHdoI2Y77lPE0+zGHYaGoNEzPta2aRXFqNumG8FoCmpa7e/YU3lBuHqduGfXw RWTAcoXDRljVsd0AMZcKHTWjMm3aDeH9oWHr4lqNeW1bw1MLss/o/+fWaEyZfAKKw+bfnHvaipDu Ulfe/6mw7CD7rlWc0SlySFcR5pK/fNtYEVlTuuFVgAhTKgueI02+POQPz/0k3ahODZgmtb6ruUrK +P5/vUStgn8A7xGnFtn7teZqrDvZROTlNeRlrwQrdNCMF1x0DbLLqEAXRNG4nJYyNDCJk/4poMpX aweDSwYA== X-QQ-XMRINFO: M/715EihBoGSf6IYSX1iLFg= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Sun, 8 Sep 2024 01:13:36 +0800 X-OQ-MSGID: <20240907171340.55502-2-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240907171340.55502-1-quinkblack@foxmail.com> References: <20240907171340.55502-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/6] aarch64/hevc: Move epel/qpel to h26x directory X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: t4NjwExrVVvD From: Zhao Zhili So vvc can reuse the implementation. --- libavcodec/aarch64/Makefile | 4 +- libavcodec/aarch64/h26x/dsp.h | 198 ++++++++++++++++++ .../{hevcdsp_epel_neon.S => h26x/epel_neon.S} | 0 .../{hevcdsp_qpel_neon.S => h26x/qpel_neon.S} | 0 libavcodec/aarch64/hevcdsp_init_aarch64.c | 197 ----------------- 5 files changed, 200 insertions(+), 199 deletions(-) rename libavcodec/aarch64/{hevcdsp_epel_neon.S => h26x/epel_neon.S} (100%) rename libavcodec/aarch64/{hevcdsp_qpel_neon.S => h26x/qpel_neon.S} (100%) diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index a01e665b55..9affb92789 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -71,6 +71,6 @@ NEON-OBJS-$(CONFIG_VP9_DECODER) += aarch64/vp9itxfm_16bpp_neon.o \ NEON-OBJS-$(CONFIG_HEVC_DECODER) += aarch64/hevcdsp_deblock_neon.o \ aarch64/hevcdsp_idct_neon.o \ aarch64/hevcdsp_init_aarch64.o \ - aarch64/hevcdsp_qpel_neon.o \ - aarch64/hevcdsp_epel_neon.o \ + aarch64/h26x/epel_neon.o \ + aarch64/h26x/qpel_neon.o \ aarch64/h26x/sao_neon.o diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index d3f7a4dfe3..902286872d 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -37,4 +37,202 @@ void ff_vvc_sao_edge_filter_16x16_8_neon(uint8_t *dst, const uint8_t *src, ptrdi const int16_t *sao_offset_val, int eo, int width, int height); void ff_vvc_sao_edge_filter_8x8_8_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride_dst, const int16_t *sao_offset_val, int eo, int width, int height); + +#define NEON8_FNPROTO_PARTIAL_6(fn, args, ext) \ + void ff_hevc_put_hevc_##fn##_h4_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h6_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h8_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h12_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h16_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##_h32_8_neon##ext args; + +NEON8_FNPROTO_PARTIAL_6(qpel, (int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, + intptr_t mx, intptr_t my, int width),) + +NEON8_FNPROTO_PARTIAL_6(qpel_uni, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, + ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width),) + +NEON8_FNPROTO_PARTIAL_6(qpel_bi, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, + ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t + mx, intptr_t my, int width),) + +#define NEON8_FNPROTO(fn, args, ext) \ + void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##6_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##12_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##24_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##32_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##48_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##64_8_neon##ext args + +#define NEON8_FNPROTO_PARTIAL_4(fn, args, ext) \ + void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##64_8_neon##ext args + +#define NEON8_FNPROTO_PARTIAL_5(fn, args, ext) \ + void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##32_8_neon##ext args; \ + void ff_hevc_put_hevc_##fn##64_8_neon##ext args + +NEON8_FNPROTO(pel_pixels, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(pel_bi_pixels, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *_src, ptrdiff_t _srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_bi_h, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_bi_v, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(epel_v, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(pel_uni_pixels, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_uni_v, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_uni_hv, (uint8_t *dst, ptrdiff_t _dststride, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_uni_hv, (uint8_t *dst, ptrdiff_t _dststride, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(epel_uni_w_v, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO_PARTIAL_4(qpel_uni_w_v, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_h, (int16_t *dst, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width), ); + +NEON8_FNPROTO(epel_h, (int16_t *dst, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(epel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(epel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(qpel_h, (int16_t *dst, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(qpel_v, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(qpel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(qpel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(qpel_uni_v, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(qpel_uni_hv, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(qpel_uni_hv, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(qpel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(qpel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(epel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(epel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO_PARTIAL_5(qpel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO_PARTIAL_5(qpel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + intptr_t mx, intptr_t my, int width), _i8mm); + +NEON8_FNPROTO(qpel_bi_v, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(qpel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO(qpel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, + const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, + int height, intptr_t mx, intptr_t my, int width), _i8mm); + #endif diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/h26x/epel_neon.S similarity index 100% rename from libavcodec/aarch64/hevcdsp_epel_neon.S rename to libavcodec/aarch64/h26x/epel_neon.S diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S similarity index 100% rename from libavcodec/aarch64/hevcdsp_qpel_neon.S rename to libavcodec/aarch64/h26x/qpel_neon.S diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 26bbc8750f..386d7c59c8 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -93,203 +93,6 @@ void ff_hevc_idct_16x16_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_32x32_dc_10_neon(int16_t *coeffs); void ff_hevc_transform_luma_4x4_neon_8(int16_t *coeffs); -#define NEON8_FNPROTO_PARTIAL_6(fn, args, ext) \ - void ff_hevc_put_hevc_##fn##_h4_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##_h6_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##_h8_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##_h12_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##_h16_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##_h32_8_neon##ext args; - -NEON8_FNPROTO_PARTIAL_6(qpel, (int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, - intptr_t mx, intptr_t my, int width),) - -NEON8_FNPROTO_PARTIAL_6(qpel_uni, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width),) - -NEON8_FNPROTO_PARTIAL_6(qpel_bi, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, - ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t - mx, intptr_t my, int width),) - -#define NEON8_FNPROTO(fn, args, ext) \ - void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##6_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##12_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##24_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##32_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##48_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##64_8_neon##ext args - -#define NEON8_FNPROTO_PARTIAL_4(fn, args, ext) \ - void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##64_8_neon##ext args - -#define NEON8_FNPROTO_PARTIAL_5(fn, args, ext) \ - void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##32_8_neon##ext args; \ - void ff_hevc_put_hevc_##fn##64_8_neon##ext args - -NEON8_FNPROTO(pel_pixels, (int16_t *dst, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(pel_bi_pixels, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *_src, ptrdiff_t _srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_bi_h, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_bi_v, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(epel_v, (int16_t *dst, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(pel_uni_pixels, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_uni_v, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_uni_hv, (uint8_t *dst, ptrdiff_t _dststride, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_uni_hv, (uint8_t *dst, ptrdiff_t _dststride, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(epel_uni_w_v, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO_PARTIAL_4(qpel_uni_w_v, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_h, (int16_t *dst, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_hv, (int16_t *dst, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width), ); - -NEON8_FNPROTO(epel_h, (int16_t *dst, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(epel_hv, (int16_t *dst, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(epel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(qpel_h, (int16_t *dst, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(qpel_v, (int16_t *dst, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(qpel_hv, (int16_t *dst, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(qpel_hv, (int16_t *dst, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(qpel_uni_v, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(qpel_uni_hv, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(qpel_uni_hv, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(qpel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(qpel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(epel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(epel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO_PARTIAL_5(qpel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO_PARTIAL_5(qpel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, - const uint8_t *_src, ptrdiff_t _srcstride, - int height, int denom, int wx, int ox, - intptr_t mx, intptr_t my, int width), _i8mm); - -NEON8_FNPROTO(qpel_bi_v, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(qpel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width),); - -NEON8_FNPROTO(qpel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, - const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, - int height, intptr_t mx, intptr_t my, int width), _i8mm); - #define NEON8_FNASSIGN(member, v, h, fn, ext) \ member[1][v][h] = ff_hevc_put_hevc_##fn##4_8_neon##ext; \ member[2][v][h] = ff_hevc_put_hevc_##fn##6_8_neon##ext; \ From patchwork Sat Sep 7 17:13:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51389 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9c4f:0:b0:48e:c0f8:d0de with SMTP id w15csp836987vqu; Sat, 7 Sep 2024 10:24:10 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUX5IS0qWnQOaz7ZMGKGabUYvMsGZZQkBZjRYzpcirDYNtdMAdK1vAqpoHMf9Suhbi+0VYKcPjL/CBgw/eWu+P4@gmail.com X-Google-Smtp-Source: AGHT+IGUrprgT4kXCxR9+dWlUjX0XvE/LoH9drdJ6TqCTckueg+/ZHRwkjWTzcx7uUs7W1UbNwiC X-Received: by 2002:a2e:be1b:0:b0:2ef:2543:457c with SMTP id 38308e7fff4ca-2f751f2c8f6mr42316991fa.24.1725729850626; Sat, 07 Sep 2024 10:24:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725729850; cv=none; d=google.com; s=arc-20240605; b=f0jdjKIAhoHoFaFKBrxSPeBcOElpFUeuBgeJeMZ5e+6Abcl/LVXWf+NEQAFaf1wNFc L2q49VzlCLs7IhAc3HRbPy1xZhF+6ZXE1irj+bsqMmfk+GGS9ZvkPgeeYv6vP+PyNEgZ C3ALhqZVI5R5/j4VJxs7MVgrerSb0GnVbrFs/jtj86Szqmtbpie8ve4AZnaq+n7FJljP PUE8k8Zx4foCpt00wGuQIEV0nGV0H2e85ixHu5rDgvXsoAd+pA/pG1oZmZOGUmuRs92w e1uc8SV3mZBi1T/f7Z+uU7Gh9oQOVJbGaQXKOiRMzyPHK1+x82dcwp1+/MFqEOCWTAVw PS7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=0Kr4hfp1WHtqGX/k9jkyco/jljS7m3micHIZFk7YlYg=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=UozgdP/7wYHguEUKAGvSYDYN6oagNoQuA+FgtQY0KORH7wJC4Sv+D8Mhw4ny20TbzU u0y60mdhvP63V6+adlvd2QxI8juAB7kHtwU46y3MyUMHnaK2SILfXZs16Nqczlu0uYwh voJv5aS/gPId4SR3Q1o25epi1hiZeOwLWsq0TwRNAGjfmMdoLmBYCh7Zq/PDhPUc0G5B PoQJ9LM35AcTRGEzyV3qb5jR7pzvUuJ6A5eKC+SG7q+lJbKlksVT8peI586XuPKE9Beg 6CH+hhsVRvx+mDkwWeeWNVC44y5owYNAWL8tiMIeA2Xn0RuNZArQPVjlkH550IK06ARL qAdQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=OJ1jdCj2; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5c3ebd47007si1116702a12.169.2024.09.07.10.24.10; Sat, 07 Sep 2024 10:24:10 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=OJ1jdCj2; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6AE6A68D7DF; Sat, 7 Sep 2024 20:14:02 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from xmbghk7.mail.qq.com (xmbghk7.mail.qq.com [43.163.128.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1674468D7A6 for ; Sat, 7 Sep 2024 20:13:53 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1725729224; bh=EwGBvX2h1tzTO432poIjWGNNBDKy1s6NvzZHWViiIwk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=OJ1jdCj2EjFRawUNKeAEpksARSGQDQVYuJcRzwu4mZB6/I4yFlYl/wa02bhhuSdwv LnfytgFB8/jWeanKzBa/dJTtGVCDREO2+H6WrA1V4J5rJEdSDYbE+cpUmvOBbAefEA NOM6Gh1+Mt3teUWDWq0p4bxZIbB3iEwKLWQRZqig= Received: from ZHILIZHAO-MB1.tencent.com ([121.34.200.168]) by newxmesmtplogicsvrszc5-2.qq.com (NewEsmtp) with SMTP id 36A3CCF4; Sun, 08 Sep 2024 01:13:42 +0800 X-QQ-mid: xmsmtpt1725729223tksufh4rg Message-ID: X-QQ-XMAILINFO: Oa7+iEOxmULlBReHvsIVVaqKdR1XFbDMsX3NlLbCEsGSrS93/cD1/xXlg5Zxh6 s8aEbEOrEkGhrURp2K+Ef7elBt54m7nGjDNXiCj11mqCm5pdODkk4GG/NVIog/R6z0uxCenAlO3g /+pnn1ATyZZaEV6JJOfjwbRbaCUvf3GB6C7y5SoE6ePyw+o3/i+wmi9RT7rYb9C2j6XjM5bypR74 mRY0PNNbeeiINEljiI355qbcuunXvSrpI94QJptI+OCXs1Nm2AUEXX0LfvOoNwi8D56nbpvIDQov yNcCzWMxUTTTOgJgcMPhspxTaP5t/R8qfRl4HKfabSkZdsIF5FVtL2LFb5/GdtQ8eQw8mvJQTPsZ vkQpMMfsTrY86R/9DKgjPMwqPjzgy+8/7XtRN77s78utIh3IS3EMLFxLIwNbnm+yqV63ewFZoFqe /00/EHGcI55BJJElIkujweXsuwvr621I1eD2xlGgahhZB+Mn9WjQGWNUcVz2mdx670w542sItE/X 1VHDYCWlmkGcKO/1fwwUcvLPsg4BHGB+B4Bb5ZiAAnyw7gkoXeTzhyk6BRKZwfU2dwurJ97G8+2X luxwHIhPaNwVXbndQB1drrO0+eI07bIcB9VPWlgTFer9MWiNvsFai1PKpu/QSh7iwtwKl+TCgIcR au/073kJ8Z/ANvwLr4/AY6z9w6TUulb2uBchLcy//iSTOEXvKEkAJ5DPSoAC82EyjNg4FvXoop3w p8nxVn+NFOQRKzAZqT0J1+tBEe8KFH+wzPtIU75RdBK/QfYoV/CCX+QHjSmNtx5fUlINxtKPPMyr xCA1MB/2ikekSr17ADKbUv3VCaR2+CO59qtvgKG0L/d7k1ayYZ6hml4jS4e9IJD87HNHcn6mFfO+ eBbi4prJuQ5VECR5jp6sN/uzXEDKaz7HhKKBlsJt165/hXmLbbGANnh4auZXjgoS/Dg08cVAXOR/ xApUgngDdQX2eFRJSVRDj9yj/d/4x+rS3ePnVfI26ut60J76EeGnAIjcqWGb4H0N9AD0OsEi4= X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Sun, 8 Sep 2024 01:13:37 +0800 X-OQ-MSGID: <20240907171340.55502-3-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240907171340.55502-1-quinkblack@foxmail.com> References: <20240907171340.55502-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/6] aarch64/vvc: Add put_qpel_h_* and put_qpel_uni_h_* X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Ck5EwLgK0rh/ From: Zhao Zhili Just share hevc implementation. checkasm --test=vvc_mc --benchmark: put_luma_h_8_4x4_c: 0.2 ( 1.00x) put_luma_h_8_4x4_neon: 0.2 ( 1.00x) put_luma_h_8_8x8_c: 1.0 ( 1.00x) put_luma_h_8_8x8_neon: 0.2 ( 4.33x) put_luma_h_8_16x16_c: 3.2 ( 1.00x) put_luma_h_8_16x16_neon: 1.2 ( 2.63x) put_luma_h_8_32x32_c: 13.7 ( 1.00x) put_luma_h_8_32x32_neon: 4.0 ( 3.45x) put_luma_h_8_64x64_c: 48.2 ( 1.00x) put_luma_h_8_64x64_neon: 15.7 ( 3.07x) put_luma_h_8_128x128_c: 203.5 ( 1.00x) put_luma_h_8_128x128_neon: 62.0 ( 3.28x) put_uni_h_luma_8_4x4_c: 0.2 ( 1.00x) put_uni_h_luma_8_4x4_neon: 0.2 ( 1.00x) put_uni_h_luma_8_8x8_c: 1.5 ( 1.00x) put_uni_h_luma_8_8x8_neon: 0.2 ( 6.56x) put_uni_h_luma_8_16x16_c: 5.7 ( 1.00x) put_uni_h_luma_8_16x16_neon: 1.2 ( 4.67x) put_uni_h_luma_8_32x32_c: 24.0 ( 1.00x) put_uni_h_luma_8_32x32_neon: 4.7 ( 5.07x) put_uni_h_luma_8_64x64_c: 90.0 ( 1.00x) put_uni_h_luma_8_64x64_neon: 17.0 ( 5.30x) put_uni_h_luma_8_128x128_c: 357.7 ( 1.00x) put_uni_h_luma_8_128x128_neon: 67.5 ( 5.30x) --- libavcodec/aarch64/h26x/dsp.h | 13 ++ libavcodec/aarch64/h26x/qpel_neon.S | 202 ++++++++++++++++++++-------- libavcodec/aarch64/vvc/Makefile | 1 + libavcodec/aarch64/vvc/dsp_init.c | 14 ++ 4 files changed, 171 insertions(+), 59 deletions(-) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 902286872d..f72746ce03 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -235,4 +235,17 @@ NEON8_FNPROTO(qpel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, int height, intptr_t mx, intptr_t my, int width), _i8mm); +#undef NEON8_FNPROTO_PARTIAL_4 +#define NEON8_FNPROTO_PARTIAL_4(fn, args, ext) \ + void ff_vvc_put_##fn##_h4_8_neon##ext args; \ + void ff_vvc_put_##fn##_h8_8_neon##ext args; \ + void ff_vvc_put_##fn##_h16_8_neon##ext args; \ + void ff_vvc_put_##fn##_h32_8_neon##ext args; + +NEON8_FNPROTO_PARTIAL_4(qpel, (int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width),) + +NEON8_FNPROTO_PARTIAL_4(qpel_uni, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, + ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width),) + #endif diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index 8ddaa32b70..a05009c9d6 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -21,7 +21,8 @@ */ #include "libavutil/aarch64/asm.S" -#define MAX_PB_SIZE 64 +#define HEVC_MAX_PB_SIZE 64 +#define VVC_MAX_PB_SIZE 128 const qpel_filters, align=4 .byte 0, 0, 0, 0, 0, 0, 0, 0 @@ -44,6 +45,11 @@ endconst sxtl v0.8h, v0.8b .endm +.macro vvc_load_filter m + ld1 {v0.8b}, [\m] + sxtl v0.8h, v0.8b +.endm + .macro load_qpel_filterb freg, xreg movrel \xreg, qpel_filters_abs add \xreg, \xreg, \freg, lsl #3 @@ -212,22 +218,40 @@ function ff_hevc_put_hevc_h4_8_neon, export=0 endfunc .endif +.ifnc \type, qpel_bi +function ff_vvc_put_\type\()_h4_8_neon, export=1 + vvc_load_filter mx + sub src, src, #3 + mov mx, x30 +.ifc \type, qpel + mov dststride, #(VVC_MAX_PB_SIZE << 1) + lsl x13, srcstride, #1 // srcstridel + mov x14, #(VVC_MAX_PB_SIZE << 2) +.else + lsl x14, dststride, #1 // dststridel + lsl x13, srcstride, #1 // srcstridel +.endif + b 1f +endfunc +.endif // !qpel_bi + function ff_hevc_put_hevc_\type\()_h4_8_neon, export=1 load_filter mx .ifc \type, qpel_bi - mov x16, #(MAX_PB_SIZE << 2) // src2bstridel - add x15, x4, #(MAX_PB_SIZE << 1) // src2b + mov x16, #(HEVC_MAX_PB_SIZE << 2) // src2bstridel + add x15, x4, #(HEVC_MAX_PB_SIZE << 1) // src2b .endif sub src, src, #3 mov mx, x30 .ifc \type, qpel - mov dststride, #(MAX_PB_SIZE << 1) + mov dststride, #(HEVC_MAX_PB_SIZE << 1) lsl x13, srcstride, #1 // srcstridel - mov x14, #(MAX_PB_SIZE << 2) + mov x14, #(HEVC_MAX_PB_SIZE << 2) .else lsl x14, dststride, #1 // dststridel lsl x13, srcstride, #1 // srcstridel .endif +1: add x10, dst, dststride // dstb add x12, src, srcstride // srcb 0: ld1 {v16.8b, v17.8b}, [src], x13 @@ -283,15 +307,15 @@ endfunc function ff_hevc_put_hevc_\type\()_h6_8_neon, export=1 load_filter mx .ifc \type, qpel_bi - mov x16, #(MAX_PB_SIZE << 2) // src2bstridel - add x15, x4, #(MAX_PB_SIZE << 1) // src2b + mov x16, #(HEVC_MAX_PB_SIZE << 2) // src2bstridel + add x15, x4, #(HEVC_MAX_PB_SIZE << 1) // src2b .endif sub src, src, #3 mov mx, x30 .ifc \type, qpel - mov dststride, #(MAX_PB_SIZE << 1) + mov dststride, #(HEVC_MAX_PB_SIZE << 1) lsl x13, srcstride, #1 // srcstridel - mov x14, #((MAX_PB_SIZE << 2) - 8) + mov x14, #((HEVC_MAX_PB_SIZE << 2) - 8) .else lsl x14, dststride, #1 // dststridel lsl x13, srcstride, #1 // srcstridel @@ -333,22 +357,40 @@ function ff_hevc_put_hevc_\type\()_h6_8_neon, export=1 ret mx endfunc +.ifnc \type, qpel_bi +function ff_vvc_put_\type\()_h8_8_neon, export=1 + vvc_load_filter mx + sub src, src, #3 + mov mx, x30 +.ifc \type, qpel + mov dststride, #(VVC_MAX_PB_SIZE << 1) + lsl x13, srcstride, #1 // srcstridel + mov x14, #(VVC_MAX_PB_SIZE << 2) +.else + lsl x14, dststride, #1 // dststridel + lsl x13, srcstride, #1 // srcstridel +.endif + b 1f +endfunc +.endif // !qpel_bi + function ff_hevc_put_hevc_\type\()_h8_8_neon, export=1 load_filter mx .ifc \type, qpel_bi - mov x16, #(MAX_PB_SIZE << 2) // src2bstridel - add x15, x4, #(MAX_PB_SIZE << 1) // src2b + mov x16, #(HEVC_MAX_PB_SIZE << 2) // src2bstridel + add x15, x4, #(HEVC_MAX_PB_SIZE << 1) // src2b .endif sub src, src, #3 mov mx, x30 .ifc \type, qpel - mov dststride, #(MAX_PB_SIZE << 1) + mov dststride, #(HEVC_MAX_PB_SIZE << 1) lsl x13, srcstride, #1 // srcstridel - mov x14, #(MAX_PB_SIZE << 2) + mov x14, #(HEVC_MAX_PB_SIZE << 2) .else lsl x14, dststride, #1 // dststridel lsl x13, srcstride, #1 // srcstridel .endif +1: add x10, dst, dststride // dstb add x12, src, srcstride // srcb 0: ld1 {v16.8b, v17.8b}, [src], x13 @@ -415,16 +457,16 @@ function ff_hevc_put_hevc_\type\()_h12_8_neon, export=1 sxtw height, heightw .ifc \type, qpel_bi ldrh w8, [sp] // width - mov x16, #(MAX_PB_SIZE << 2) // src2bstridel - lsl x17, height, #7 // src2b reset (height * (MAX_PB_SIZE << 1)) - add x15, x4, #(MAX_PB_SIZE << 1) // src2b + mov x16, #(HEVC_MAX_PB_SIZE << 2) // src2bstridel + lsl x17, height, #7 // src2b reset (height * (HEVC_MAX_PB_SIZE << 1)) + add x15, x4, #(HEVC_MAX_PB_SIZE << 1) // src2b .endif sub src, src, #3 mov mx, x30 .ifc \type, qpel - mov dststride, #(MAX_PB_SIZE << 1) + mov dststride, #(HEVC_MAX_PB_SIZE << 1) lsl x13, srcstride, #1 // srcstridel - mov x14, #((MAX_PB_SIZE << 2) - 16) + mov x14, #((HEVC_MAX_PB_SIZE << 2) - 16) .else lsl x14, dststride, #1 // dststridel lsl x13, srcstride, #1 // srcstridel @@ -497,25 +539,45 @@ function ff_hevc_put_hevc_\type\()_h12_8_neon, export=1 ret mx endfunc +.ifnc \type, qpel_bi +function ff_vvc_put_\type\()_h16_8_neon, export=1 + vvc_load_filter mx + sxtw height, heightw + mov mx, x30 + sub src, src, #3 + mov mx, x30 +.ifc \type, qpel + mov dststride, #(VVC_MAX_PB_SIZE << 1) + lsl x13, srcstride, #1 // srcstridel + mov x14, #(VVC_MAX_PB_SIZE << 2) +.else + lsl x14, dststride, #1 // dststridel + lsl x13, srcstride, #1 // srcstridel +.endif + b 0f +endfunc +.endif // !qpel_bi + function ff_hevc_put_hevc_\type\()_h16_8_neon, export=1 load_filter mx sxtw height, heightw mov mx, x30 .ifc \type, qpel_bi ldrh w8, [sp] // width - mov x16, #(MAX_PB_SIZE << 2) // src2bstridel - add x15, x4, #(MAX_PB_SIZE << 1) // src2b + mov x16, #(HEVC_MAX_PB_SIZE << 2) // src2bstridel + add x15, x4, #(HEVC_MAX_PB_SIZE << 1) // src2b .endif sub src, src, #3 mov mx, x30 .ifc \type, qpel - mov dststride, #(MAX_PB_SIZE << 1) + mov dststride, #(HEVC_MAX_PB_SIZE << 1) lsl x13, srcstride, #1 // srcstridel - mov x14, #(MAX_PB_SIZE << 2) + mov x14, #(HEVC_MAX_PB_SIZE << 2) .else lsl x14, dststride, #1 // dststridel lsl x13, srcstride, #1 // srcstridel .endif +0: add x10, dst, dststride // dstb add x12, src, srcstride // srcb @@ -555,29 +617,51 @@ function ff_hevc_put_hevc_\type\()_h16_8_neon, export=1 ret mx endfunc +.ifnc \type, qpel_bi +function ff_vvc_put_\type\()_h32_8_neon, export=1 + vvc_load_filter mx + sxtw height, heightw + mov mx, x30 + sub src, src, #3 + mov mx, x30 +.ifc \type, qpel + mov dststride, #(VVC_MAX_PB_SIZE << 1) + lsl x13, srcstride, #1 // srcstridel + mov x14, #(VVC_MAX_PB_SIZE << 2) + sub x14, x14, width, uxtw #1 +.else + lsl x14, dststride, #1 // dststridel + lsl x13, srcstride, #1 // srcstridel + sub x14, x14, width, uxtw +.endif + b 1f +endfunc +.endif // !qpel_bi + function ff_hevc_put_hevc_\type\()_h32_8_neon, export=1 load_filter mx sxtw height, heightw mov mx, x30 .ifc \type, qpel_bi ldrh w8, [sp] // width - mov x16, #(MAX_PB_SIZE << 2) // src2bstridel + mov x16, #(HEVC_MAX_PB_SIZE << 2) // src2bstridel lsl x17, x5, #7 // src2b reset - add x15, x4, #(MAX_PB_SIZE << 1) // src2b + add x15, x4, #(HEVC_MAX_PB_SIZE << 1) // src2b sub x16, x16, width, uxtw #1 .endif sub src, src, #3 mov mx, x30 .ifc \type, qpel - mov dststride, #(MAX_PB_SIZE << 1) + mov dststride, #(HEVC_MAX_PB_SIZE << 1) lsl x13, srcstride, #1 // srcstridel - mov x14, #(MAX_PB_SIZE << 2) + mov x14, #(HEVC_MAX_PB_SIZE << 2) sub x14, x14, width, uxtw #1 .else lsl x14, dststride, #1 // dststridel lsl x13, srcstride, #1 // srcstridel sub x14, x14, width, uxtw .endif +1: sub x13, x13, width, uxtw sub x13, x13, #8 add x10, dst, dststride // dstb @@ -651,7 +735,7 @@ put_hevc qpel_bi function ff_hevc_put_hevc_qpel_v4_8_neon, export=1 load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) sub x1, x1, x2 ldr s16, [x1] ldr s17, [x1, x2] @@ -680,7 +764,7 @@ endfunc function ff_hevc_put_hevc_qpel_v6_8_neon, export=1 load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 - mov x9, #(MAX_PB_SIZE * 2 - 8) + mov x9, #(HEVC_MAX_PB_SIZE * 2 - 8) sub x1, x1, x2 ldr d16, [x1] ldr d17, [x1, x2] @@ -709,7 +793,7 @@ endfunc function ff_hevc_put_hevc_qpel_v8_8_neon, export=1 load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) sub x1, x1, x2 ldr d16, [x1] ldr d17, [x1, x2] @@ -737,7 +821,7 @@ endfunc function ff_hevc_put_hevc_qpel_v12_8_neon, export=1 load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 - mov x9, #(MAX_PB_SIZE * 2 - 16) + mov x9, #(HEVC_MAX_PB_SIZE * 2 - 16) sub x1, x1, x2 ldr q16, [x1] ldr q17, [x1, x2] @@ -768,7 +852,7 @@ endfunc function ff_hevc_put_hevc_qpel_v16_8_neon, export=1 load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) sub x1, x1, x2 ldr q16, [x1] ldr q17, [x1, x2] @@ -802,7 +886,7 @@ function ff_hevc_put_hevc_qpel_v24_8_neon, export=1 load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 sub x1, x1, x2 - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.16b, v17.16b}, [x1], x2 ld1 {v18.16b, v19.16b}, [x1], x2 ld1 {v20.16b, v21.16b}, [x1], x2 @@ -833,7 +917,7 @@ function ff_hevc_put_hevc_qpel_v32_8_neon, export=1 st1 {v8.8b-v11.8b}, [sp] load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) sub x1, x1, x2 ld1 {v16.16b, v17.16b}, [x1], x2 ld1 {v18.16b, v19.16b}, [x1], x2 @@ -883,7 +967,7 @@ function ff_hevc_put_hevc_qpel_v64_8_neon, export=1 load_qpel_filterb x5, x4 sub x1, x1, x2, lsl #1 sub x1, x1, x2 - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) 0: mov x8, x1 // src ld1 {v16.16b, v17.16b}, [x8], x2 mov w11, w3 // height @@ -921,7 +1005,7 @@ function ff_hevc_put_hevc_qpel_bi_v4_8_neon, export=1 load_qpel_filterb x7, x6 sub x2, x2, x3, lsl #1 sub x2, x2, x3 - mov x12, #(MAX_PB_SIZE * 2) + mov x12, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.s}[0], [x2], x3 ld1 {v17.s}[0], [x2], x3 ld1 {v18.s}[0], [x2], x3 @@ -951,7 +1035,7 @@ function ff_hevc_put_hevc_qpel_bi_v6_8_neon, export=1 ld1 {v16.8b}, [x2], x3 sub x1, x1, #4 ld1 {v17.8b}, [x2], x3 - mov x12, #(MAX_PB_SIZE * 2) + mov x12, #(HEVC_MAX_PB_SIZE * 2) ld1 {v18.8b}, [x2], x3 ld1 {v19.8b}, [x2], x3 ld1 {v20.8b}, [x2], x3 @@ -977,7 +1061,7 @@ function ff_hevc_put_hevc_qpel_bi_v8_8_neon, export=1 load_qpel_filterb x7, x6 sub x2, x2, x3, lsl #1 sub x2, x2, x3 - mov x12, #(MAX_PB_SIZE * 2) + mov x12, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8b}, [x2], x3 ld1 {v17.8b}, [x2], x3 ld1 {v18.8b}, [x2], x3 @@ -1006,7 +1090,7 @@ function ff_hevc_put_hevc_qpel_bi_v12_8_neon, export=1 sub x2, x2, x3 sub x1, x1, #8 ld1 {v16.16b}, [x2], x3 - mov x12, #(MAX_PB_SIZE * 2) + mov x12, #(HEVC_MAX_PB_SIZE * 2) ld1 {v17.16b}, [x2], x3 ld1 {v18.16b}, [x2], x3 ld1 {v19.16b}, [x2], x3 @@ -1037,7 +1121,7 @@ function ff_hevc_put_hevc_qpel_bi_v16_8_neon, export=1 load_qpel_filterb x7, x6 sub x2, x2, x3, lsl #1 sub x2, x2, x3 - mov x12, #(MAX_PB_SIZE * 2) + mov x12, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.16b}, [x2], x3 ld1 {v17.16b}, [x2], x3 ld1 {v18.16b}, [x2], x3 @@ -1092,7 +1176,7 @@ function ff_hevc_put_hevc_qpel_bi_v32_8_neon, export=1 sub x2, x2, x3 load_qpel_filterb x7, x6 ldr w6, [sp, #64] - mov x12, #(MAX_PB_SIZE * 2) + mov x12, #(HEVC_MAX_PB_SIZE * 2) 0: mov x8, x2 // src ld1 {v16.16b, v17.16b}, [x8], x3 mov w11, w5 // height @@ -2147,7 +2231,7 @@ function ff_hevc_put_hevc_qpel_uni_w_v64_8_neon, export=1 endfunc function hevc_put_hevc_qpel_uni_hv4_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 ldr d16, [sp] ldr d17, [sp, x9] @@ -2174,7 +2258,7 @@ function hevc_put_hevc_qpel_uni_hv4_8_end_neon endfunc function hevc_put_hevc_qpel_uni_hv6_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 sub x1, x1, #4 ldr q16, [sp] @@ -2204,7 +2288,7 @@ function hevc_put_hevc_qpel_uni_hv6_8_end_neon endfunc function hevc_put_hevc_qpel_uni_hv8_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 ldr q16, [sp] ldr q17, [sp, x9] @@ -2232,7 +2316,7 @@ function hevc_put_hevc_qpel_uni_hv8_8_end_neon endfunc function hevc_put_hevc_qpel_uni_hv12_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 sub x1, x1, #8 ld1 {v16.8h, v17.8h}, [sp], x9 @@ -2260,7 +2344,7 @@ function hevc_put_hevc_qpel_uni_hv12_8_end_neon endfunc function hevc_put_hevc_qpel_uni_hv16_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 sub w12, w9, w7, lsl #1 0: mov x8, sp // src @@ -3355,7 +3439,7 @@ endfunc function ff_hevc_put_hevc_qpel_h4_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 + mov x10, #HEVC_MAX_PB_SIZE * 2 1: ld1 {v0.16b}, [x1], x2 ext v1.16b, v0.16b, v0.16b, #1 @@ -3378,7 +3462,7 @@ endfunc function ff_hevc_put_hevc_qpel_h6_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 + mov x10, #HEVC_MAX_PB_SIZE * 2 add x15, x0, #8 1: ld1 {v0.16b}, [x1], x2 @@ -3411,7 +3495,7 @@ endfunc function ff_hevc_put_hevc_qpel_h8_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 + mov x10, #HEVC_MAX_PB_SIZE * 2 1: ld1 {v0.16b}, [x1], x2 ext v1.16b, v0.16b, v0.16b, #1 @@ -3457,7 +3541,7 @@ endfunc function ff_hevc_put_hevc_qpel_h12_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 + mov x10, #HEVC_MAX_PB_SIZE * 2 add x15, x0, #16 1: ld1 {v16.16b, v17.16b}, [x1], x2 @@ -3495,7 +3579,7 @@ endfunc function ff_hevc_put_hevc_qpel_h16_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 + mov x10, #HEVC_MAX_PB_SIZE * 2 1: ld1 {v16.16b, v17.16b}, [x1], x2 ext v1.16b, v16.16b, v17.16b, #1 @@ -3533,7 +3617,7 @@ endfunc function ff_hevc_put_hevc_qpel_h24_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 + mov x10, #HEVC_MAX_PB_SIZE * 2 add x15, x0, #32 1: ld1 {v16.16b, v17.16b}, [x1], x2 @@ -3585,7 +3669,7 @@ endfunc function ff_hevc_put_hevc_qpel_h32_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 + mov x10, #HEVC_MAX_PB_SIZE * 2 add x15, x0, #32 1: ld1 {v16.16b, v17.16b, v18.16b}, [x1], x2 @@ -3642,7 +3726,7 @@ endfunc function ff_hevc_put_hevc_qpel_h48_8_neon_i8mm, export=1 QPEL_H_HEADER - mov x10, #MAX_PB_SIZE * 2 - 64 + mov x10, #HEVC_MAX_PB_SIZE * 2 - 64 1: ld1 {v16.16b, v17.16b, v18.16b, v19.16b}, [x1], x2 ext v1.16b, v16.16b, v17.16b, #1 @@ -4173,7 +4257,7 @@ DISABLE_I8MM stp x24, x25, [sp, #48] stp x26, x27, [sp, #64] mov x19, sp - mov x11, #(MAX_PB_SIZE*(MAX_PB_SIZE+8)*2) + mov x11, #(HEVC_MAX_PB_SIZE*(HEVC_MAX_PB_SIZE+8)*2) sub sp, sp, x11 mov x20, x0 mov x21, x1 @@ -4204,7 +4288,7 @@ DISABLE_I8MM add x9, x9, x23, lsl #3 ld1 {v0.8b}, [x9] sxtl v0.8h, v0.8b - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) dup v28.4s, w24 dup v29.4s, w25 dup v30.4s, w26 @@ -4591,7 +4675,7 @@ endfunc qpel_uni_w_hv neon function hevc_put_hevc_qpel_bi_hv4_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x7, x6 ld1 {v16.4h}, [sp], x9 ld1 {v17.4h}, [sp], x9 @@ -4617,7 +4701,7 @@ function hevc_put_hevc_qpel_bi_hv4_8_end_neon endfunc function hevc_put_hevc_qpel_bi_hv6_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x7, x6 sub x1, x1, #4 ld1 {v16.8h}, [sp], x9 @@ -4648,7 +4732,7 @@ function hevc_put_hevc_qpel_bi_hv6_8_end_neon endfunc function hevc_put_hevc_qpel_bi_hv8_8_end_neon - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) load_qpel_filterh x7, x6 ld1 {v16.8h}, [sp], x9 ld1 {v17.8h}, [sp], x9 @@ -4678,7 +4762,7 @@ endfunc function hevc_put_hevc_qpel_bi_hv16_8_end_neon load_qpel_filterh x7, x8 - mov x9, #(MAX_PB_SIZE * 2) + mov x9, #(HEVC_MAX_PB_SIZE * 2) mov x10, x6 0: mov x8, sp // src ld1 {v16.8h, v17.8h}, [x8], x9 diff --git a/libavcodec/aarch64/vvc/Makefile b/libavcodec/aarch64/vvc/Makefile index 54c49fea92..a5ad24dfc5 100644 --- a/libavcodec/aarch64/vvc/Makefile +++ b/libavcodec/aarch64/vvc/Makefile @@ -3,4 +3,5 @@ clean:: OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/dsp_init.o NEON-OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/alf.o \ + aarch64/h26x/qpel_neon.o \ aarch64/h26x/sao_neon.o diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 0aac140a8f..ea6245d9a3 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -46,6 +46,20 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) return; if (bd == 8) { + c->inter.put[0][1][0][1] = ff_vvc_put_qpel_h4_8_neon; + c->inter.put[0][2][0][1] = ff_vvc_put_qpel_h8_8_neon; + c->inter.put[0][3][0][1] = ff_vvc_put_qpel_h16_8_neon; + c->inter.put[0][4][0][1] = + c->inter.put[0][5][0][1] = + c->inter.put[0][6][0][1] = ff_vvc_put_qpel_h32_8_neon; + + c->inter.put_uni[0][1][0][1] = ff_vvc_put_qpel_uni_h4_8_neon; + c->inter.put_uni[0][2][0][1] = ff_vvc_put_qpel_uni_h8_8_neon; + c->inter.put_uni[0][3][0][1] = ff_vvc_put_qpel_uni_h16_8_neon; + c->inter.put_uni[0][4][0][1] = + c->inter.put_uni[0][5][0][1] = + c->inter.put_uni[0][6][0][1] = ff_vvc_put_qpel_uni_h32_8_neon; + for (int i = 0; i < FF_ARRAY_ELEMS(c->sao.band_filter); i++) c->sao.band_filter[i] = ff_h26x_sao_band_filter_8x8_8_neon; c->sao.edge_filter[0] = ff_vvc_sao_edge_filter_8x8_8_neon; From patchwork Sat Sep 7 17:13:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51386 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9c4f:0:b0:48e:c0f8:d0de with SMTP id w15csp833558vqu; Sat, 7 Sep 2024 10:14:46 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWkUZK5JXJd5ukev8gD6x52yDJArSHIqHf3eQDAMp3WaxXTPTFH52RbtTiAoKknpMj8QCbKnKXuGMnTQOULIE2X@gmail.com X-Google-Smtp-Source: AGHT+IE+f46RN8fHLuVLsHJ4KjGPRyoI9UY2/QCj3Mbgv5xR3YbeIBMOGK9QFO3dqSj6rI4bA4or X-Received: by 2002:a05:6512:6cd:b0:533:4560:48b7 with SMTP id 2adb3069b0e04-536587b4595mr4512159e87.30.1725729286064; Sat, 07 Sep 2024 10:14:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725729286; cv=none; d=google.com; s=arc-20240605; b=foI3FMaWZOuYos9GFj4q3+Gp6C9bC35vXF8K24F9mNYHo9Lx+lqVdyBcKPOVlMp562 n5zV+cBtjbdfQ16NnOQE+yYrGArAHfW2vvsX7GRZbdXBnAjxfwF4n5QpAFnVKSZgtQkr QPSLTWdymRYSV3suf0+COuxx5BNd8JDk7/KlQD8ueDWBAQ/dAfwkk+fWmehNT6sgiSF6 gpisl9lhjX02hh45rJmz7OkSEOjie8ROj3fKd32ezyniYSoqSjzvGHL7yEVs4NTs9v7W gWyfEZiH42OY6MQ5f+aoEuIT3j/rdE/vmI2OQPDwE5z0qi/1GGDB5QGM3Z4qimS95zf9 F0nA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=6Y4nVMIw+0dx7GiRzaBExWOTLkS2/UBj5pNpR5+A1rY=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=glF1SCTBnvwDdR4EJO8UHI2OVIQcjNAjk8R/IRgFhbglFMMDKsUKx3o+4wVK8EIlJd Y+fjNpQX+EBC1TYvd3dHnf54JlMloCdsg5f0/Gw3vyNXwGAOpjZSSoWOFCf/OeMQaSSQ /tNILJEKF1gl+ojQSCgb7T6/GNuPi20ZRZ26JUfoqPl3pA7kkiw+2+lppAHZNZBVtbXF Vj37Od6QKodXWgEQCZpwKCx9mn7Fro2/3SafEsIGoXLEKYCD4CQDtuV5DAEfJbxB4HsR r7pmeGtnyCntNw2Dow7sT73uVPO2SxgFZmkwZ63HsdE9sWeVsMVHhIsAxG5VZ6rdWwvf PTTw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=KoHbi52G; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-5365f7dde07si505605e87.0.2024.09.07.10.14.45; Sat, 07 Sep 2024 10:14:46 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=KoHbi52G; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7214868D955; Sat, 7 Sep 2024 20:14:04 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-209.mail.qq.com (out203-205-221-209.mail.qq.com [203.205.221.209]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 93EF368D57B for ; Sat, 7 Sep 2024 20:13:52 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1725729224; bh=FXvifidGaJS4RFUxHT+Gm21WmMkiRYG/t0xkZHc8Suo=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=KoHbi52Gk9RdKWTVqI95RCG9LaA2q+C33AHkwZ0ALP0fmlny9BOMXoridWOxYYAqT G4xSO+XJ30hhXwZtg6gm8gSWrrFCAhcsW8nDzs0hqdIEIuNJevHO+HaB3zA+FeCauW s/vnxgxvG1jZysP5XB4FeuVm+QcaCssyqIyTOsMU= Received: from ZHILIZHAO-MB1.tencent.com ([121.34.200.168]) by newxmesmtplogicsvrszc5-2.qq.com (NewEsmtp) with SMTP id 36A3CCF4; Sun, 08 Sep 2024 01:13:42 +0800 X-QQ-mid: xmsmtpt1725729224tvi35ijuk Message-ID: X-QQ-XMAILINFO: OQhZ3T0tjf0aYaulWQeaLCfUBc1eZ7vmD71qYH8fNgu+iPZ1r9Q5eE8JH0Tr9V TfVSUvI9u+lmS2RGv6WcpLoK2nw2p0Xrp6BVYyAeQ3GHvClrTj9l5ysDyit02OlCGuDjBCNEx60R avsKsE9LPY7ysn8jMmcFD7hd5Em/GwseGPsTGEGs2dxpp7Sm45pyEIv+UsH5iE4zuQZN7SeN1jgZ FGAu71109JtWN/2vpPxPEkeLv6QKof21Vi3wSSho3YNide103v51BJXwsT8GmT6hCDB2m1Me0Q8t +X3OWg3mgk1hVxsN54eewiZQqnkVeNHgbH8lSPSibh0D09KbDgQnlcSaxAiGOs0B9j/kHCOqo+w8 Gxi+ORW+mPja1mjfg2pred1GqNrVwqWcTkmg7soMW7VoiGxmfPQs5kCtkOWBZD+37IHtBYvHK0Ru nRJsURZKNOWfLrHfqoRJDIGMuPoOlWn2EeKLW8GOyo03gLR62yppOOAs0s3ghJMbGIwCAaDdT1JZ Y9jCh7vllLZjkDNM2+OZmO4ru1OaOj41xQ9AZllZ5AmUNIugpY/lrXOTTdSw2oHHskGKvEdCluJv 7mxVKdpQ9yCwRGd0rXGhujZ5/7nA9I8wHxoem7FuRteHM4CX9Uo+axrW3b5QJT/NeyxWR4p8s87j HZv9uJHAsHwLHpWR3CIzYsz1W2YkTvt9l9jFaZACT1PwhNe6+agLLy17Hczo65nSsPECXYOTXEG2 Qt9burgGHaWl4CaLy84Mc9g7OPDKOsyziNk5LVHi2hAyi2g1CfEFN6SvVLPYXpomXufLPlylslE1 Q6EkBWX+baMA4AFH6a4M6igapYSlEZMO0qSqoTAc8RbHdz8+qNEMmqpMnhxpCa3NnJJJf5M/e6cf pC3+LNHUxNysbwRjgZ6wAWjXjx4NIXXlazuRCGOt6uqvQ/29eC5a2ydQw9EnvLhxmbKGel0hx9m5 VxODm66PgPjmB8fShTS+JOckIVy3pfTR8VBZogT1kp+Jencg041SsLMRPgEeA8vpJLRqdHoCntFA A1OEG8DNhpAgeuMIcM X-QQ-XMRINFO: Nq+8W0+stu50PRdwbJxPCL0= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Sun, 8 Sep 2024 01:13:38 +0800 X-OQ-MSGID: <20240907171340.55502-4-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240907171340.55502-1-quinkblack@foxmail.com> References: <20240907171340.55502-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/6] aarch64/vvc: Add put_pel/put_pel_uni/put_pel_uni_w X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: keDGBCT1F16n From: Zhao Zhili put_luma_pixels_8_4x4_c: 0.2 ( 1.00x) put_luma_pixels_8_4x4_neon: 0.2 ( 1.00x) put_luma_pixels_8_8x8_c: 0.7 ( 1.00x) put_luma_pixels_8_8x8_neon: 0.2 ( 3.22x) put_luma_pixels_8_16x16_c: 2.2 ( 1.00x) put_luma_pixels_8_16x16_neon: 0.2 ( 9.89x) put_luma_pixels_8_32x32_c: 8.2 ( 1.00x) put_luma_pixels_8_32x32_neon: 1.2 ( 6.71x) put_luma_pixels_8_64x64_c: 33.7 ( 1.00x) put_luma_pixels_8_64x64_neon: 2.5 (13.63x) put_luma_pixels_8_128x128_c: 145.5 ( 1.00x) put_luma_pixels_8_128x128_neon: 10.2 (14.23x) put_uni_pixels_luma_8_4x4_c: 0.5 ( 1.00x) put_uni_pixels_luma_8_4x4_neon: 0.0 ( 0.00x) put_uni_pixels_luma_8_8x8_c: 0.5 ( 1.00x) put_uni_pixels_luma_8_8x8_neon: 0.2 ( 2.11x) put_uni_pixels_luma_8_16x16_c: 1.2 ( 1.00x) put_uni_pixels_luma_8_16x16_neon: 0.2 ( 5.44x) put_uni_pixels_luma_8_32x32_c: 3.0 ( 1.00x) put_uni_pixels_luma_8_32x32_neon: 0.5 ( 6.26x) put_uni_pixels_luma_8_64x64_c: 3.0 ( 1.00x) put_uni_pixels_luma_8_64x64_neon: 1.7 ( 1.72x) put_uni_pixels_luma_8_128x128_c: 6.5 ( 1.00x) put_uni_pixels_luma_8_128x128_neon: 6.5 ( 1.00x) --- libavcodec/aarch64/h26x/dsp.h | 22 ++++ libavcodec/aarch64/h26x/epel_neon.S | 193 +++++++++++++++++----------- libavcodec/aarch64/h26x/qpel_neon.S | 83 +++++++++++- libavcodec/aarch64/vvc/Makefile | 1 + libavcodec/aarch64/vvc/dsp_init.c | 21 +++ 5 files changed, 245 insertions(+), 75 deletions(-) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index f72746ce03..076d01b477 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -248,4 +248,26 @@ NEON8_FNPROTO_PARTIAL_4(qpel, (int16_t *dst, const uint8_t *_src, ptrdiff_t _src NEON8_FNPROTO_PARTIAL_4(qpel_uni, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width),) +#undef NEON8_FNPROTO_PARTIAL_6 +#define NEON8_FNPROTO_PARTIAL_6(fn, args, ext) \ + void ff_vvc_put_##fn##4_8_neon##ext args; \ + void ff_vvc_put_##fn##8_8_neon##ext args; \ + void ff_vvc_put_##fn##16_8_neon##ext args; \ + void ff_vvc_put_##fn##32_8_neon##ext args; \ + void ff_vvc_put_##fn##64_8_neon##ext args; \ + void ff_vvc_put_##fn##128_8_neon##ext args + +NEON8_FNPROTO_PARTIAL_6(pel_pixels, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, int height, + const int8_t *hf, const int8_t *vf, int width),); + +NEON8_FNPROTO_PARTIAL_6(pel_uni_pixels, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width),); + +NEON8_FNPROTO_PARTIAL_6(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride, + const uint8_t *_src, ptrdiff_t _srcstride, + int height, int denom, int wx, int ox, + const int8_t *hf, const int8_t *vf, int width),); + #endif diff --git a/libavcodec/aarch64/h26x/epel_neon.S b/libavcodec/aarch64/h26x/epel_neon.S index 378b0f7fb2..729395f2f0 100644 --- a/libavcodec/aarch64/h26x/epel_neon.S +++ b/libavcodec/aarch64/h26x/epel_neon.S @@ -19,7 +19,8 @@ */ #include "libavutil/aarch64/asm.S" -#define MAX_PB_SIZE 64 +#define HEVC_MAX_PB_SIZE 64 +#define VVC_MAX_PB_SIZE 128 const epel_filters, align=4 .byte 0, 0, 0, 0 @@ -131,8 +132,13 @@ endconst b.ne 1b .endm +function ff_vvc_put_pel_pixels4_8_neon, export=1 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc + function ff_hevc_put_hevc_pel_pixels4_8_neon, export=1 - mov x7, #(MAX_PB_SIZE * 2) + mov x7, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.s}[0], [x1], x2 ushll v4.8h, v0.8b, #6 subs w3, w3, #1 @@ -142,7 +148,7 @@ function ff_hevc_put_hevc_pel_pixels4_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_pixels6_8_neon, export=1 - mov x7, #(MAX_PB_SIZE * 2 - 8) + mov x7, #(HEVC_MAX_PB_SIZE * 2 - 8) 1: ld1 {v0.8b}, [x1], x2 ushll v4.8h, v0.8b, #6 st1 {v4.d}[0], [x0], #8 @@ -152,8 +158,13 @@ function ff_hevc_put_hevc_pel_pixels6_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_pixels8_8_neon, export=1 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc + function ff_hevc_put_hevc_pel_pixels8_8_neon, export=1 - mov x7, #(MAX_PB_SIZE * 2) + mov x7, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.8b}, [x1], x2 ushll v4.8h, v0.8b, #6 subs w3, w3, #1 @@ -163,7 +174,7 @@ function ff_hevc_put_hevc_pel_pixels8_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_pixels12_8_neon, export=1 - mov x7, #(MAX_PB_SIZE * 2 - 16) + mov x7, #(HEVC_MAX_PB_SIZE * 2 - 16) 1: ld1 {v0.8b, v1.8b}, [x1], x2 ushll v4.8h, v0.8b, #6 st1 {v4.8h}, [x0], #16 @@ -174,8 +185,13 @@ function ff_hevc_put_hevc_pel_pixels12_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_pixels16_8_neon, export=1 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc + function ff_hevc_put_hevc_pel_pixels16_8_neon, export=1 - mov x7, #(MAX_PB_SIZE * 2) + mov x7, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.8b, v1.8b}, [x1], x2 ushll v4.8h, v0.8b, #6 ushll v5.8h, v1.8b, #6 @@ -186,7 +202,7 @@ function ff_hevc_put_hevc_pel_pixels16_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_pixels24_8_neon, export=1 - mov x7, #(MAX_PB_SIZE * 2) + mov x7, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.8b-v2.8b}, [x1], x2 ushll v4.8h, v0.8b, #6 ushll v5.8h, v1.8b, #6 @@ -197,8 +213,13 @@ function ff_hevc_put_hevc_pel_pixels24_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_pixels32_8_neon, export=1 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc + function ff_hevc_put_hevc_pel_pixels32_8_neon, export=1 - mov x7, #(MAX_PB_SIZE * 2) + mov x7, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.8b-v3.8b}, [x1], x2 ushll v4.8h, v0.8b, #6 ushll v5.8h, v1.8b, #6 @@ -211,7 +232,7 @@ function ff_hevc_put_hevc_pel_pixels32_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_pixels48_8_neon, export=1 - mov x7, #(MAX_PB_SIZE) + mov x7, #(HEVC_MAX_PB_SIZE) 1: ld1 {v0.16b-v2.16b}, [x1], x2 ushll v4.8h, v0.8b, #6 ushll2 v5.8h, v0.16b, #6 @@ -226,26 +247,50 @@ function ff_hevc_put_hevc_pel_pixels48_8_neon, export=1 ret endfunc -function ff_hevc_put_hevc_pel_pixels64_8_neon, export=1 -1: ld1 {v0.16b-v3.16b}, [x1], x2 +.macro put_pel_pixels64_8_neon ushll v4.8h, v0.8b, #6 ushll2 v5.8h, v0.16b, #6 ushll v6.8h, v1.8b, #6 ushll2 v7.8h, v1.16b, #6 - st1 {v4.8h-v7.8h}, [x0], #(MAX_PB_SIZE) + st1 {v4.8h-v7.8h}, [x0], #64 ushll v16.8h, v2.8b, #6 ushll2 v17.8h, v2.16b, #6 ushll v18.8h, v3.8b, #6 ushll2 v19.8h, v3.16b, #6 - subs w3, w3, #1 - st1 {v16.8h-v19.8h}, [x0], #(MAX_PB_SIZE) - b.ne 1b + st1 {v16.8h-v19.8h}, [x0], x7 +.endm + +function ff_vvc_put_pel_pixels64_8_neon, export=1 + mov x7, #(2 * VVC_MAX_PB_SIZE - 64) + b 1f +endfunc + +function ff_hevc_put_hevc_pel_pixels64_8_neon, export=1 + mov x7, #(HEVC_MAX_PB_SIZE) +1: + ld1 {v0.16b-v3.16b}, [x1], x2 + sub w3, w3, #1 + put_pel_pixels64_8_neon + cbnz w3, 1b ret endfunc +function ff_vvc_put_pel_pixels128_8_neon, export=1 + mov x7, #64 +1: + mov x6, x1 + ld1 {v0.16b-v3.16b}, [x6], #64 + add x1, x1, x2 + sub w3, w3, #1 + put_pel_pixels64_8_neon + ld1 {v0.16b-v3.16b}, [x6], #64 + put_pel_pixels64_8_neon + cbnz w3, 1b + ret +endfunc function ff_hevc_put_hevc_pel_bi_pixels4_8_neon, export=1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.s}[0], [x2], x3 // src ushll v16.8h, v0.8b, #6 ld1 {v20.4h}, [x4], x10 // src2 @@ -258,7 +303,7 @@ function ff_hevc_put_hevc_pel_bi_pixels4_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_bi_pixels6_8_neon, export=1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) sub x1, x1, #4 1: ld1 {v0.8b}, [x2], x3 ushll v16.8h, v0.8b, #6 @@ -273,7 +318,7 @@ function ff_hevc_put_hevc_pel_bi_pixels6_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_bi_pixels8_8_neon, export=1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.8b}, [x2], x3 // src ushll v16.8h, v0.8b, #6 ld1 {v20.8h}, [x4], x10 // src2 @@ -286,7 +331,7 @@ function ff_hevc_put_hevc_pel_bi_pixels8_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_bi_pixels12_8_neon, export=1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) sub x1, x1, #8 1: ld1 {v0.16b}, [x2], x3 ushll v16.8h, v0.8b, #6 @@ -304,7 +349,7 @@ function ff_hevc_put_hevc_pel_bi_pixels12_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_bi_pixels16_8_neon, export=1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.16b}, [x2], x3 // src ushll v16.8h, v0.8b, #6 ushll2 v17.8h, v0.16b, #6 @@ -320,7 +365,7 @@ function ff_hevc_put_hevc_pel_bi_pixels16_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_bi_pixels24_8_neon, export=1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.8b-v2.8b}, [x2], x3 // src ushll v16.8h, v0.8b, #6 ushll v17.8h, v1.8b, #6 @@ -339,7 +384,7 @@ function ff_hevc_put_hevc_pel_bi_pixels24_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_bi_pixels32_8_neon, export=1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v0.16b-v1.16b}, [x2], x3 // src ushll v16.8h, v0.8b, #6 ushll2 v17.8h, v0.16b, #6 @@ -361,7 +406,7 @@ function ff_hevc_put_hevc_pel_bi_pixels32_8_neon, export=1 endfunc function ff_hevc_put_hevc_pel_bi_pixels48_8_neon, export=1 - mov x10, #(MAX_PB_SIZE) + mov x10, #(HEVC_MAX_PB_SIZE) 1: ld1 {v0.16b-v2.16b}, [x2], x3 // src ushll v16.8h, v0.8b, #6 ushll2 v17.8h, v0.16b, #6 @@ -369,7 +414,7 @@ function ff_hevc_put_hevc_pel_bi_pixels48_8_neon, export=1 ushll2 v19.8h, v1.16b, #6 ushll v20.8h, v2.8b, #6 ushll2 v21.8h, v2.16b, #6 - ld1 {v24.8h-v27.8h}, [x4], #(MAX_PB_SIZE) // src2 + ld1 {v24.8h-v27.8h}, [x4], #(HEVC_MAX_PB_SIZE) // src2 sqadd v16.8h, v16.8h, v24.8h sqadd v17.8h, v17.8h, v25.8h sqadd v18.8h, v18.8h, v26.8h @@ -399,12 +444,12 @@ function ff_hevc_put_hevc_pel_bi_pixels64_8_neon, export=1 ushll2 v21.8h, v2.16b, #6 ushll v22.8h, v3.8b, #6 ushll2 v23.8h, v3.16b, #6 - ld1 {v24.8h, v25.8h, v26.8h, v27.8h}, [x4], #(MAX_PB_SIZE) // src2 + ld1 {v24.8h, v25.8h, v26.8h, v27.8h}, [x4], #(HEVC_MAX_PB_SIZE) // src2 sqadd v16.8h, v16.8h, v24.8h sqadd v17.8h, v17.8h, v25.8h sqadd v18.8h, v18.8h, v26.8h sqadd v19.8h, v19.8h, v27.8h - ld1 {v24.8h, v25.8h, v26.8h, v27.8h}, [x4], #(MAX_PB_SIZE) + ld1 {v24.8h, v25.8h, v26.8h, v27.8h}, [x4], #(HEVC_MAX_PB_SIZE) sqadd v20.8h, v20.8h, v24.8h sqadd v21.8h, v21.8h, v25.8h sqadd v22.8h, v22.8h, v26.8h @@ -427,7 +472,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_h4_8_neon, export=1 load_epel_filterb x6, x7 sub x2, x2, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v4.8b}, [x2], x3 ext v5.8b, v4.8b, v4.8b, #1 ext v6.8b, v4.8b, v4.8b, #2 @@ -446,7 +491,7 @@ function ff_hevc_put_hevc_epel_bi_h6_8_neon, export=1 load_epel_filterb x6, x7 sub w1, w1, #4 sub x2, x2, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v24.16b}, [x2], x3 ext v26.16b, v24.16b, v24.16b, #1 ext v27.16b, v24.16b, v24.16b, #2 @@ -465,7 +510,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_h8_8_neon, export=1 load_epel_filterb x6, x7 sub x2, x2, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v24.16b}, [x2], x3 ext v26.16b, v24.16b, v24.16b, #1 ext v27.16b, v24.16b, v24.16b, #2 @@ -484,7 +529,7 @@ function ff_hevc_put_hevc_epel_bi_h12_8_neon, export=1 load_epel_filterb x6, x7 sub x1, x1, #8 sub x2, x2, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v24.16b}, [x2], x3 ext v26.16b, v24.16b, v24.16b, #1 ext v27.16b, v24.16b, v24.16b, #2 @@ -506,7 +551,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_h16_8_neon, export=1 load_epel_filterb x6, x7 sub x2, x2, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ldr q24, [x2] ldr s25, [x2, #16] add x2, x2, x3 @@ -529,7 +574,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_h24_8_neon, export=1 load_epel_filterb x6, x7 sub x2, x2, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ld1 {v24.16b, v25.16b}, [x2], x3 ext v26.16b, v24.16b, v25.16b, #1 ext v27.16b, v24.16b, v25.16b, #2 @@ -556,7 +601,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_h32_8_neon, export=1 load_epel_filterb x6, x7 sub x2, x2, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) 1: ldp q24, q25, [x2] ldr s26, [x2, #32] add x2, x2, x3 @@ -589,7 +634,7 @@ function ff_hevc_put_hevc_epel_bi_h48_8_neon, export=1 load_epel_filterb x6, x7 sub x2, x2, #1 mov x7, #24 - mov x10, #(MAX_PB_SIZE * 2 - 48) + mov x10, #(HEVC_MAX_PB_SIZE * 2 - 48) 1: ld1 {v24.16b, v25.16b, v26.16b}, [x2] ldr s27, [x2, #48] add x2, x2, x3 @@ -683,7 +728,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_v4_8_neon, export=1 load_epel_filterb x7, x6 sub x2, x2, x3 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.s}[0], [x2], x3 ld1 {v17.s}[0], [x2], x3 ld1 {v18.s}[0], [x2], x3 @@ -705,7 +750,7 @@ function ff_hevc_put_hevc_epel_bi_v6_8_neon, export=1 load_epel_filterb x7, x6 sub x2, x2, x3 sub x1, x1, #4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8b}, [x2], x3 ld1 {v17.8b}, [x2], x3 ld1 {v18.8b}, [x2], x3 @@ -727,7 +772,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_v8_8_neon, export=1 load_epel_filterb x7, x6 sub x2, x2, x3 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8b}, [x2], x3 ld1 {v17.8b}, [x2], x3 ld1 {v18.8b}, [x2], x3 @@ -749,7 +794,7 @@ function ff_hevc_put_hevc_epel_bi_v12_8_neon, export=1 load_epel_filterb x7, x6 sub x1, x1, #8 sub x2, x2, x3 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.16b}, [x2], x3 ld1 {v17.16b}, [x2], x3 ld1 {v18.16b}, [x2], x3 @@ -774,7 +819,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_v16_8_neon, export=1 load_epel_filterb x7, x6 sub x2, x2, x3 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.16b}, [x2], x3 ld1 {v17.16b}, [x2], x3 ld1 {v18.16b}, [x2], x3 @@ -798,7 +843,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_v24_8_neon, export=1 load_epel_filterb x7, x6 sub x2, x2, x3 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8b, v17.8b, v18.8b}, [x2], x3 ld1 {v19.8b, v20.8b, v21.8b}, [x2], x3 ld1 {v22.8b, v23.8b, v24.8b}, [x2], x3 @@ -825,7 +870,7 @@ endfunc function ff_hevc_put_hevc_epel_bi_v32_8_neon, export=1 load_epel_filterb x7, x6 sub x2, x2, x3 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.16b, v17.16b}, [x2], x3 ld1 {v18.16b, v19.16b}, [x2], x3 ld1 {v20.16b, v21.16b}, [x2], x3 @@ -895,7 +940,7 @@ endfunc function ff_hevc_put_hevc_epel_v4_8_neon, export=1 load_epel_filterb x5, x4 sub x1, x1, x2 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ldr s16, [x1] ldr s17, [x1, x2] add x1, x1, x2, lsl #1 @@ -915,7 +960,7 @@ endfunc function ff_hevc_put_hevc_epel_v6_8_neon, export=1 load_epel_filterb x5, x4 sub x1, x1, x2 - mov x10, #(MAX_PB_SIZE * 2 - 8) + mov x10, #(HEVC_MAX_PB_SIZE * 2 - 8) ldr d16, [x1] ldr d17, [x1, x2] add x1, x1, x2, lsl #1 @@ -936,7 +981,7 @@ endfunc function ff_hevc_put_hevc_epel_v8_8_neon, export=1 load_epel_filterb x5, x4 sub x1, x1, x2 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ldr d16, [x1] ldr d17, [x1, x2] add x1, x1, x2, lsl #1 @@ -956,7 +1001,7 @@ endfunc function ff_hevc_put_hevc_epel_v12_8_neon, export=1 load_epel_filterb x5, x4 sub x1, x1, x2 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ldr q16, [x1] ldr q17, [x1, x2] add x1, x1, x2, lsl #1 @@ -980,7 +1025,7 @@ endfunc function ff_hevc_put_hevc_epel_v16_8_neon, export=1 load_epel_filterb x5, x4 sub x1, x1, x2 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ldr q16, [x1] ldr q17, [x1, x2] add x1, x1, x2, lsl #1 @@ -1002,7 +1047,7 @@ endfunc function ff_hevc_put_hevc_epel_v24_8_neon, export=1 load_epel_filterb x5, x4 sub x1, x1, x2 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8b, v17.8b, v18.8b}, [x1], x2 ld1 {v19.8b, v20.8b, v21.8b}, [x1], x2 ld1 {v22.8b, v23.8b, v24.8b}, [x1], x2 @@ -1025,7 +1070,7 @@ endfunc function ff_hevc_put_hevc_epel_v32_8_neon, export=1 load_epel_filterb x5, x4 sub x1, x1, x2 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.16b, v17.16b}, [x1], x2 ld1 {v18.16b, v19.16b}, [x1], x2 ld1 {v20.16b, v21.16b}, [x1], x2 @@ -1327,7 +1372,7 @@ endfunc add x5, x5, x4, lsl #2 ld1r {v30.4s}, [x5] sub x1, x1, #1 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) .endm function ff_hevc_put_hevc_epel_h4_8_neon, export=1 @@ -2179,7 +2224,7 @@ DISABLE_I8MM function hevc_put_hevc_epel_hv4_8_end_neon load_epel_filterh x5, x4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ldr d16, [sp] ldr d17, [sp, x10] add sp, sp, x10, lsl #1 @@ -2198,7 +2243,7 @@ endfunc function hevc_put_hevc_epel_hv6_8_end_neon load_epel_filterh x5, x4 mov x5, #120 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ldr q16, [sp] ldr q17, [sp, x10] add sp, sp, x10, lsl #1 @@ -2218,7 +2263,7 @@ endfunc function hevc_put_hevc_epel_hv8_8_end_neon load_epel_filterh x5, x4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ldr q16, [sp] ldr q17, [sp, x10] add sp, sp, x10, lsl #1 @@ -2238,7 +2283,7 @@ endfunc function hevc_put_hevc_epel_hv12_8_end_neon load_epel_filterh x5, x4 mov x5, #112 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -2258,7 +2303,7 @@ endfunc function hevc_put_hevc_epel_hv16_8_end_neon load_epel_filterh x5, x4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -2278,7 +2323,7 @@ endfunc function hevc_put_hevc_epel_hv24_8_end_neon load_epel_filterh x5, x4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h, v18.8h}, [sp], x10 ld1 {v19.8h, v20.8h, v21.8h}, [sp], x10 ld1 {v22.8h, v23.8h, v24.8h}, [sp], x10 @@ -2462,7 +2507,7 @@ epel_hv neon function hevc_put_hevc_epel_uni_hv4_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.4h}, [sp], x10 ld1 {v17.4h}, [sp], x10 ld1 {v18.4h}, [sp], x10 @@ -2481,7 +2526,7 @@ endfunc function hevc_put_hevc_epel_uni_hv6_8_end_neon load_epel_filterh x6, x5 sub x1, x1, #4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h}, [sp], x10 ld1 {v17.8h}, [sp], x10 ld1 {v18.8h}, [sp], x10 @@ -2501,7 +2546,7 @@ endfunc function hevc_put_hevc_epel_uni_hv8_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h}, [sp], x10 ld1 {v17.8h}, [sp], x10 ld1 {v18.8h}, [sp], x10 @@ -2521,7 +2566,7 @@ endfunc function hevc_put_hevc_epel_uni_hv12_8_end_neon load_epel_filterh x6, x5 sub x1, x1, #8 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -2543,7 +2588,7 @@ endfunc function hevc_put_hevc_epel_uni_hv16_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -2565,7 +2610,7 @@ endfunc function hevc_put_hevc_epel_uni_hv24_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h, v18.8h}, [sp], x10 ld1 {v19.8h, v20.8h, v21.8h}, [sp], x10 ld1 {v22.8h, v23.8h, v24.8h}, [sp], x10 @@ -3223,7 +3268,7 @@ DISABLE_I8MM function hevc_put_hevc_epel_uni_w_hv4_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.4h}, [sp], x10 ld1 {v17.4h}, [sp], x10 ld1 {v18.4h}, [sp], x10 @@ -3273,7 +3318,7 @@ endfunc function hevc_put_hevc_epel_uni_w_hv6_8_end_neon load_epel_filterh x6, x5 sub x1, x1, #4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h}, [sp], x10 ld1 {v17.8h}, [sp], x10 ld1 {v18.8h}, [sp], x10 @@ -3326,7 +3371,7 @@ endfunc function hevc_put_hevc_epel_uni_w_hv8_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h}, [sp], x10 ld1 {v17.8h}, [sp], x10 ld1 {v18.8h}, [sp], x10 @@ -3376,7 +3421,7 @@ endfunc function hevc_put_hevc_epel_uni_w_hv12_8_end_neon load_epel_filterh x6, x5 sub x1, x1, #8 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -3437,7 +3482,7 @@ endfunc function hevc_put_hevc_epel_uni_w_hv16_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -3498,7 +3543,7 @@ endfunc function hevc_put_hevc_epel_uni_w_hv24_8_end_neon load_epel_filterh x6, x5 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h, v18.8h}, [sp], x10 ld1 {v19.8h, v20.8h, v21.8h}, [sp], x10 ld1 {v22.8h, v23.8h, v24.8h}, [sp], x10 @@ -3795,7 +3840,7 @@ epel_uni_w_hv neon function hevc_put_hevc_epel_bi_hv4_8_end_neon load_epel_filterh x7, x6 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.4h}, [sp], x10 ld1 {v17.4h}, [sp], x10 ld1 {v18.4h}, [sp], x10 @@ -3816,7 +3861,7 @@ endfunc function hevc_put_hevc_epel_bi_hv6_8_end_neon load_epel_filterh x7, x6 sub x1, x1, #4 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h}, [sp], x10 ld1 {v17.8h}, [sp], x10 ld1 {v18.8h}, [sp], x10 @@ -3838,7 +3883,7 @@ endfunc function hevc_put_hevc_epel_bi_hv8_8_end_neon load_epel_filterh x7, x6 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h}, [sp], x10 ld1 {v17.8h}, [sp], x10 ld1 {v18.8h}, [sp], x10 @@ -3860,7 +3905,7 @@ endfunc function hevc_put_hevc_epel_bi_hv12_8_end_neon load_epel_filterh x7, x6 sub x1, x1, #8 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -3885,7 +3930,7 @@ endfunc function hevc_put_hevc_epel_bi_hv16_8_end_neon load_epel_filterh x7, x6 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -3910,7 +3955,7 @@ endfunc function hevc_put_hevc_epel_bi_hv24_8_end_neon load_epel_filterh x7, x6 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h, v18.8h}, [sp], x10 ld1 {v19.8h, v20.8h, v21.8h}, [sp], x10 ld1 {v22.8h, v23.8h, v24.8h}, [sp], x10 @@ -3939,7 +3984,7 @@ endfunc function hevc_put_hevc_epel_bi_hv32_8_end_neon load_epel_filterh x7, x6 - mov x10, #(MAX_PB_SIZE * 2) + mov x10, #(HEVC_MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h, v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h, v22.8h, v23.8h}, [sp], x10 ld1 {v24.8h, v25.8h, v26.8h, v27.8h}, [sp], x10 diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index a05009c9d6..0585f03de9 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -1250,6 +1250,10 @@ function ff_hevc_put_hevc_qpel_bi_v64_8_neon, export=1 b X(ff_hevc_put_hevc_qpel_bi_v32_8_neon) endfunc +function ff_vvc_put_pel_uni_pixels4_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_pixels4_8_neon) +endfunc + function ff_hevc_put_hevc_pel_uni_pixels4_8_neon, export=1 1: ldr s0, [x2] @@ -1278,6 +1282,10 @@ function ff_hevc_put_hevc_pel_uni_pixels6_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_pixels8_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_pixels8_8_neon) +endfunc + function ff_hevc_put_hevc_pel_uni_pixels8_8_neon, export=1 1: ldr d0, [x2] @@ -1306,6 +1314,10 @@ function ff_hevc_put_hevc_pel_uni_pixels12_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_pixels16_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_pixels16_8_neon) +endfunc + function ff_hevc_put_hevc_pel_uni_pixels16_8_neon, export=1 1: ldr q0, [x2] @@ -1328,6 +1340,10 @@ function ff_hevc_put_hevc_pel_uni_pixels24_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_pixels32_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_pixels32_8_neon) +endfunc + function ff_hevc_put_hevc_pel_uni_pixels32_8_neon, export=1 1: ld1 {v0.16b, v1.16b}, [x2], x3 @@ -1346,6 +1362,10 @@ function ff_hevc_put_hevc_pel_uni_pixels48_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_pixels64_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_pixels64_8_neon) +endfunc + function ff_hevc_put_hevc_pel_uni_pixels64_8_neon, export=1 1: ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x2], x3 @@ -1355,6 +1375,21 @@ function ff_hevc_put_hevc_pel_uni_pixels64_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_pixels128_8_neon, export=1 +1: + mov x5, x2 + mov x6, x0 + ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x5], #64 + ld1 {v4.16b, v5.16b, v6.16b, v7.16b}, [x5] + sub w4, w4, #1 + add x2, x2, x3 + add x0, x0, x1 + st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x6], #64 + st1 {v4.16b, v5.16b, v6.16b, v7.16b}, [x6] + cbnz w4, 1b + ret +endfunc + function ff_hevc_put_hevc_qpel_uni_v4_8_neon, export=1 load_qpel_filterb x6, x5 sub x2, x2, x3, lsl #1 @@ -1528,6 +1563,10 @@ function ff_hevc_put_hevc_qpel_uni_v64_8_neon, export=1 b X(ff_hevc_put_hevc_qpel_uni_v16_8_neon) endfunc +function ff_vvc_put_pel_uni_w_pixels4_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_w_pixels4_8_neon) +endfunc + function ff_hevc_put_hevc_pel_uni_w_pixels4_8_neon, export=1 mov w10, #-6 sub w10, w10, w5 @@ -1598,6 +1637,10 @@ function ff_hevc_put_hevc_pel_uni_w_pixels6_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_w_pixels8_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_w_pixels8_8_neon) +endfunc + function ff_hevc_put_hevc_pel_uni_w_pixels8_8_neon, export=1 mov w10, #-6 sub w10, w10, w5 @@ -1741,7 +1784,9 @@ function ff_hevc_put_hevc_pel_uni_w_pixels16_8_neon, export=1 ret endfunc - +function ff_vvc_put_pel_uni_w_pixels16_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_w_pixels16_8_neon) +endfunc function ff_hevc_put_hevc_pel_uni_w_pixels24_8_neon, export=1 mov w10, #-6 @@ -1803,6 +1848,9 @@ function ff_hevc_put_hevc_pel_uni_w_pixels32_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_w_pixels32_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_w_pixels32_8_neon) +endfunc function ff_hevc_put_hevc_pel_uni_w_pixels48_8_neon, export=1 mov w10, #-6 @@ -1839,6 +1887,39 @@ function ff_hevc_put_hevc_pel_uni_w_pixels64_8_neon, export=1 ret endfunc +function ff_vvc_put_pel_uni_w_pixels64_8_neon, export=1 + b X(ff_hevc_put_hevc_pel_uni_w_pixels64_8_neon) +endfunc + +function ff_vvc_put_pel_uni_w_pixels128_8_neon, export=1 + mov w10, #-6 + sub w10, w10, w5 + dup v30.8h, w6 + dup v31.4s, w10 + dup v29.4s, w7 +1: + mov x11, x2 + mov x12, x0 + ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x11], #64 + add x2, x2, x3 + add x0, x0, x1 + PEL_UNI_W_PIXEL_CALC v0, v4, v5, v16, v17, v18, v19 + PEL_UNI_W_PIXEL_CALC v1, v6, v7, v20, v21, v22, v23 + PEL_UNI_W_PIXEL_CALC v2, v4, v5, v16, v17, v18, v19 + PEL_UNI_W_PIXEL_CALC v3, v6, v7, v20, v21, v22, v23 + st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x12], #64 + + ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x11], #64 + sub w4, w4, #1 + PEL_UNI_W_PIXEL_CALC v0, v4, v5, v16, v17, v18, v19 + PEL_UNI_W_PIXEL_CALC v1, v6, v7, v20, v21, v22, v23 + PEL_UNI_W_PIXEL_CALC v2, v4, v5, v16, v17, v18, v19 + PEL_UNI_W_PIXEL_CALC v3, v6, v7, v20, v21, v22, v23 + st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x12], #64 + cbnz w4, 1b + ret +endfunc + .macro QPEL_UNI_W_V_HEADER ldur x12, [sp, #8] // my sub x2, x2, x3, lsl #1 diff --git a/libavcodec/aarch64/vvc/Makefile b/libavcodec/aarch64/vvc/Makefile index a5ad24dfc5..a1c1f03e27 100644 --- a/libavcodec/aarch64/vvc/Makefile +++ b/libavcodec/aarch64/vvc/Makefile @@ -3,5 +3,6 @@ clean:: OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/dsp_init.o NEON-OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/alf.o \ + aarch64/h26x/epel_neon.o \ aarch64/h26x/qpel_neon.o \ aarch64/h26x/sao_neon.o diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index ea6245d9a3..457be8c725 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -46,6 +46,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) return; if (bd == 8) { + c->inter.put[0][1][0][0] = ff_vvc_put_pel_pixels4_8_neon; + c->inter.put[0][2][0][0] = ff_vvc_put_pel_pixels8_8_neon; + c->inter.put[0][3][0][0] = ff_vvc_put_pel_pixels16_8_neon; + c->inter.put[0][4][0][0] = ff_vvc_put_pel_pixels32_8_neon; + c->inter.put[0][5][0][0] = ff_vvc_put_pel_pixels64_8_neon; + c->inter.put[0][6][0][0] = ff_vvc_put_pel_pixels128_8_neon; + c->inter.put[0][1][0][1] = ff_vvc_put_qpel_h4_8_neon; c->inter.put[0][2][0][1] = ff_vvc_put_qpel_h8_8_neon; c->inter.put[0][3][0][1] = ff_vvc_put_qpel_h16_8_neon; @@ -53,6 +60,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][5][0][1] = c->inter.put[0][6][0][1] = ff_vvc_put_qpel_h32_8_neon; + c->inter.put_uni[0][1][0][0] = ff_vvc_put_pel_uni_pixels4_8_neon; + c->inter.put_uni[0][2][0][0] = ff_vvc_put_pel_uni_pixels8_8_neon; + c->inter.put_uni[0][3][0][0] = ff_vvc_put_pel_uni_pixels16_8_neon; + c->inter.put_uni[0][4][0][0] = ff_vvc_put_pel_uni_pixels32_8_neon; + c->inter.put_uni[0][5][0][0] = ff_vvc_put_pel_uni_pixels64_8_neon; + c->inter.put_uni[0][6][0][0] = ff_vvc_put_pel_uni_pixels128_8_neon; + c->inter.put_uni[0][1][0][1] = ff_vvc_put_qpel_uni_h4_8_neon; c->inter.put_uni[0][2][0][1] = ff_vvc_put_qpel_uni_h8_8_neon; c->inter.put_uni[0][3][0][1] = ff_vvc_put_qpel_uni_h16_8_neon; @@ -60,6 +74,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put_uni[0][5][0][1] = c->inter.put_uni[0][6][0][1] = ff_vvc_put_qpel_uni_h32_8_neon; + c->inter.put_uni_w[0][1][0][0] = ff_vvc_put_pel_uni_w_pixels4_8_neon; + c->inter.put_uni_w[0][2][0][0] = ff_vvc_put_pel_uni_w_pixels8_8_neon; + c->inter.put_uni_w[0][3][0][0] = ff_vvc_put_pel_uni_w_pixels16_8_neon; + c->inter.put_uni_w[0][4][0][0] = ff_vvc_put_pel_uni_w_pixels32_8_neon; + c->inter.put_uni_w[0][5][0][0] = ff_vvc_put_pel_uni_w_pixels64_8_neon; + c->inter.put_uni_w[0][6][0][0] = ff_vvc_put_pel_uni_w_pixels128_8_neon; + for (int i = 0; i < FF_ARRAY_ELEMS(c->sao.band_filter); i++) c->sao.band_filter[i] = ff_h26x_sao_band_filter_8x8_8_neon; c->sao.edge_filter[0] = ff_vvc_sao_edge_filter_8x8_8_neon; From patchwork Sat Sep 7 17:13:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51385 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9c4f:0:b0:48e:c0f8:d0de with SMTP id w15csp833507vqu; Sat, 7 Sep 2024 10:14:34 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXGnruPVd9x1z9fDnQrKotWDhsCX5x9Cg0t2z0qtTBdmJJ+CXNSuWVb5HH5YvNzWGSJS1Gjvbtx2k2oL+J2FCW4@gmail.com X-Google-Smtp-Source: AGHT+IEv+cYzTioyOCH3wOWZtwTVE9kYppasUMLQH1J0c1oCdQTBqo3QIMZZtlb28pLOwlBwI96j X-Received: by 2002:a05:6402:40c9:b0:5c3:c42e:d60e with SMTP id 4fb4d7f45d1cf-5c3dc61cfb4mr1640728a12.0.1725729273903; Sat, 07 Sep 2024 10:14:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725729273; cv=none; d=google.com; s=arc-20240605; b=b4NN5lvm506/ILJ99FBUbHfc0JjB7UN67XfMhMlbI5MVxeZ2jYKU/LnNyCvHarzN6R McAh5vD0vaENTuMjmMF6YG+jAt0mE2IuyskJJPbmy8MEyMjdDfwOogyxp5OFr6Pm/4DE TVdx7Iq4woYwmM4C4Ops7k9VJeVkqRQxBK8eJh/KMNRGddeLRzDfdCZBRL5gnudV4ZOl qpUH90tp06sLebawAoMnj9raQWqvU5lRNZQfx247mbnIggGXriDJFiH/ad/0hCbHrB8e ORSF7MORlemK/JC0Sj2Xj7JsOfYS2YiFB0mIekXDtiUu+P0djam0rxIu7si7F11tYLAh ezdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=DniQkQl17ynLnH8XStUq5QY2Q5GhlRCLB7gypWT7uBE=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=Q1gnlfh0GGDNNWpeyGlUoJdBqtOv/vWoQcHBXG822jtBMXeJxfR8UBaZv8KEPda5VQ 5iDqDYn6/XSXmXZJ+F3CDNDyhS5eROCletmyfIL+qB5Ynktdgt0TTmbf6zB9WVDEkRle SGETk6XKsvjm2uhcZciZJjAE0uooyted/t/wsNZXsDI5XTlz0HHkd0a3hNZoB/sS4vOb pWthlkhGsu9WJqnZbyJJIsoiIanRKiS6vuM0osvNjRNQGVsUjQm17gAA0mQMTiX0FPbX xxECZpWPi/xmKh32uYgY0qHU6/dm2RKo51i6r0ODcGLV/2QGqQYD89qYn7Ly3MFSdEGM 9WnA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=hQRqwVj9; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5c3ebd79eb4si1048453a12.255.2024.09.07.10.14.33; Sat, 07 Sep 2024 10:14:33 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=hQRqwVj9; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 53CAF68D92B; Sat, 7 Sep 2024 20:14:03 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-240.mail.qq.com (out203-205-221-240.mail.qq.com [203.205.221.240]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1B3FD68D7A9 for ; Sat, 7 Sep 2024 20:13:53 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1725729225; bh=BPuSZp86SXqDtkvFLR3vcL4wx5As692lScVR7k4lZl4=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=hQRqwVj9cIqYslMpUG0Zz0erjw+XyXJWjziNQzwhsg6N9gRtbI7gffmAT8QH54xkJ UxbX/NkMgZ4/ywQSzC8320RtooxLIOrvRoDsMu28ROBkerf2l1odfP8mrZ5tZ42a11 bTmaF8/sV+d/Zt1rJBjD741U2XizDHC+VQpooXc4= Received: from ZHILIZHAO-MB1.tencent.com ([121.34.200.168]) by newxmesmtplogicsvrszc5-2.qq.com (NewEsmtp) with SMTP id 36A3CCF4; Sun, 08 Sep 2024 01:13:42 +0800 X-QQ-mid: xmsmtpt1725729224t6n699snv Message-ID: X-QQ-XMAILINFO: MmPNY57tR1Xnz9xIJ2y+iWjz4nuB1hlFpl+nktAlui9vXyZpFE0kPngu/34kV0 zwh3SksLpgMdZZKljJ6UGttDjFy9EhmtUufYjJaCcBdBWkInMXt3x5tBA4qE/JbxDrDBifOZDvwq tqwvmolCaMWjf/kKHIXPPJAUJbQIL7KKtYoDL70klowudjfG4e2U1BiIu/Q3hBxU3yitMMhRSj6P 5+jbDx+Yf3BEO0JUggrJTeXF77vZsX1hR5wWkHiqdz7DK+kML8/DAwweksSe91ZFlhyCLm3yhYzj J8r+e9lQGt5jYE2aqf0cHMpcwWQrlSd/6Y/nK4ZxQ66eV2Y4USOgEYdyC8/fpwCvSEgEy8mTvp8d XZ+U/D7kwghwPaciIv5x/DnTvMAneBghJ/RGNWx+gUDn0/Qk9xFj5ohzS0RYlHd+Cd3M9LXlU5y1 jlzy7QUpo287NjjzWobfFHNLCdlcYts7gbjEbsxCMr0S7KbaA4KvpcaDtw27HdL2laPWPAYmEzNf l0c09AEZ2eaOffpAR2jLXSSuc3GOQl/Y6TJ892Kvpb7INKeTggnBuULC8o9oUvAmqQ41qhlwAVT5 tpy3qUfufKaQjws9LjIN0m1mBXJ7TekRRjrK1Ga1P9r6xekcwD1P61cDYRSasDLPz1Fr2RlpgwYv k+S8MsjJWNc25+ahL1a2fNAyO3djcW8HqlfoFsedwRxLjJY0hUAj9KTnGIHv+zNPDp6cDu1nEAAg VCE53y7Dx5BYtLTp4aLdzVKNNeqwJtYk3OA0GF6HNqGedsPgnjToNet+4KV8iRUhEZu4Ti0jHa7S o1iuPH9LnS8EVfo6jW4aT4tyrvc0EANZHCqBMJz4W30KaSgBTwtXMWzsw8Md7ekXEEHcPjxEiT9j ZuBOxBGjm73x+cgYhjNCCzroDgc4euKWTstGSCoRfD+SKFIreauZgJ6ova9FDcFYveqHqFAbxwox H4fyg3Zr81iZmyD8Gh3+yga8xxHOzc+mhOKG1TK+z6sqcX8CeQlEXcBl7LUPXV7McqzFiNR2PHPH DFGv0N+joSEJqkzP/tuXB61k1Y7X8= X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Sun, 8 Sep 2024 01:13:39 +0800 X-OQ-MSGID: <20240907171340.55502-5-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240907171340.55502-1-quinkblack@foxmail.com> References: <20240907171340.55502-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 5/6] aarch64/vvc: Add put_qpel_hx i8mm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: nyTsRtzycaoD From: Zhao Zhili Benchmark on Android pixel 8 with -fno-vectorize put_luma_h_8_4x4_c: 0.2 ( 1.00x) put_luma_h_8_4x4_neon: 0.2 ( 1.00x) put_luma_h_8_4x4_i8mm: 0.0 ( 0.00x) put_luma_h_8_8x8_c: 1.5 ( 1.00x) put_luma_h_8_8x8_neon: 0.5 ( 3.00x) put_luma_h_8_8x8_i8mm: 0.5 ( 3.00x) put_luma_h_8_16x16_c: 6.2 ( 1.00x) put_luma_h_8_16x16_neon: 2.0 ( 3.12x) put_luma_h_8_16x16_i8mm: 1.5 ( 4.17x) put_luma_h_8_32x32_c: 25.5 ( 1.00x) put_luma_h_8_32x32_neon: 9.0 ( 2.83x) put_luma_h_8_32x32_i8mm: 6.8 ( 3.78x) put_luma_h_8_64x64_c: 99.8 ( 1.00x) put_luma_h_8_64x64_neon: 35.2 ( 2.83x) put_luma_h_8_64x64_i8mm: 27.2 ( 3.66x) put_luma_h_8_128x128_c: 422.0 ( 1.00x) put_luma_h_8_128x128_neon: 138.5 ( 3.05x) put_luma_h_8_128x128_i8mm: 109.2 ( 3.86x) --- libavcodec/aarch64/h26x/dsp.h | 4 ++ libavcodec/aarch64/h26x/qpel_neon.S | 68 ++++++++++++++++++++++++++--- libavcodec/aarch64/vvc/dsp_init.c | 9 ++++ 3 files changed, 76 insertions(+), 5 deletions(-) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 076d01b477..323a253257 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -270,4 +270,8 @@ NEON8_FNPROTO_PARTIAL_6(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride, int height, int denom, int wx, int ox, const int8_t *hf, const int8_t *vf, int width),); +NEON8_FNPROTO_PARTIAL_6(qpel_h, (int16_t * dst, + const uint8_t *_src, ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width), _i8mm); + #endif diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index 0585f03de9..8a372a76be 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -3518,6 +3518,17 @@ endfunc sub x1, x1, #3 .endm +.macro VVC_QPEL_H_HEADER + ld1r {v31.2d}, [x4] + sub x1, x1, #3 +.endm + +function ff_vvc_put_qpel_h4_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h4_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3574,6 +3585,12 @@ function ff_hevc_put_hevc_qpel_h6_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_qpel_h8_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h8_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3658,6 +3675,12 @@ function ff_hevc_put_hevc_qpel_h12_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_qpel_h16_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h16_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3748,6 +3771,13 @@ function ff_hevc_put_hevc_qpel_h24_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_qpel_h32_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + add x15, x0, #32 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h32_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3883,10 +3913,7 @@ function ff_hevc_put_hevc_qpel_h48_8_neon_i8mm, export=1 ret endfunc -function ff_hevc_put_hevc_qpel_h64_8_neon_i8mm, export=1 - QPEL_H_HEADER - sub x2, x2, #64 -1: +.macro put_qpel_h64_8_neon_i8mm ld1 {v16.16b, v17.16b, v18.16b, v19.16b}, [x1], #64 ext v1.16b, v16.16b, v17.16b, #1 ext v2.16b, v16.16b, v17.16b, #2 @@ -3977,11 +4004,42 @@ function ff_hevc_put_hevc_qpel_h64_8_neon_i8mm, export=1 sqxtn2 v20.8h, v26.4s sqxtn v21.4h, v23.4s sqxtn2 v21.8h, v27.4s - stp q20, q21, [x0], #32 + stp q20, q21, [x0] + add x0, x0, x10 +.endm + +function ff_vvc_put_qpel_h64_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #(VVC_MAX_PB_SIZE * 2 - 32 * 3) + sub x2, x2, #64 + b 1f +endfunc + +function ff_hevc_put_hevc_qpel_h64_8_neon_i8mm, export=1 + QPEL_H_HEADER + mov x10, #32 + sub x2, x2, #64 +1: + put_qpel_h64_8_neon_i8mm subs w3, w3, #1 b.ne 1b ret endfunc + +function ff_vvc_put_qpel_h128_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + sub x11, x2, #128 + mov x10, #32 + mov x2, #0 +1: + put_qpel_h64_8_neon_i8mm + put_qpel_h64_8_neon_i8mm + sub w3, w3, #1 + add x1, x1, x11 + cbnz w3, 1b + ret +endfunc + DISABLE_I8MM #endif diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 457be8c725..bcc7df8f6c 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -88,6 +88,15 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->sao.edge_filter[i] = ff_vvc_sao_edge_filter_16x16_8_neon; c->alf.filter[LUMA] = alf_filter_luma_8_neon; c->alf.filter[CHROMA] = alf_filter_chroma_8_neon; + + if (have_i8mm(cpu_flags)) { + c->inter.put[0][1][0][1] = ff_vvc_put_qpel_h4_8_neon_i8mm; + c->inter.put[0][2][0][1] = ff_vvc_put_qpel_h8_8_neon_i8mm; + c->inter.put[0][3][0][1] = ff_vvc_put_qpel_h16_8_neon_i8mm; + c->inter.put[0][4][0][1] = ff_vvc_put_qpel_h32_8_neon_i8mm; + c->inter.put[0][5][0][1] = ff_vvc_put_qpel_h64_8_neon_i8mm; + c->inter.put[0][6][0][1] = ff_vvc_put_qpel_h128_8_neon_i8mm; + } } else if (bd == 10) { c->alf.filter[LUMA] = alf_filter_luma_10_neon; c->alf.filter[CHROMA] = alf_filter_chroma_10_neon; From patchwork Sat Sep 7 17:13:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51387 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9c4f:0:b0:48e:c0f8:d0de with SMTP id w15csp833639vqu; Sat, 7 Sep 2024 10:14:57 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVMu85Fwm9PacxGf5QusSuuApuKqFvXwnntWvN1fTG2vW08dOKsGMDgdVR/7MHeyRYkUJlsgc3FuMIuWy1mgRyT@gmail.com X-Google-Smtp-Source: AGHT+IFn+8Avea7TcJrO98DaP8rsBWvy2FQq0xW2YvtBKXCTC43Bq0HDeN71vEx3bDbQzkOSc0/0 X-Received: by 2002:a2e:a983:0:b0:2ef:2405:ff63 with SMTP id 38308e7fff4ca-2f751f3cb8fmr23879561fa.5.1725729297406; Sat, 07 Sep 2024 10:14:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725729297; cv=none; d=google.com; s=arc-20240605; b=SNzbFQAaokl6r6TqSgKbwVIZAjwVvWyHKgEaSH9L657+UUxUn9ImeSsvFuzt6ud3ZB IWuiLuVu2/Z2nGNIz2FhBvJ88rFqC4f7COBw/uybX6eWvXcn/Ac/hR00AW/j+DPu6lxS W/5tKVlzZ8+A2a/UBsvtj4Yl4DqB3iPFPjUBTFeu2ssq5ivctNU3jbcHslF9hDZn45wP dUEVAoRBSBMKLm34uwrr2tBKvo0/9Q5VoqxFFhhpPM2ojj3daYTEtRy7jrKsjhTO8vMw /2MTnekaIvBZkkjl1VLiIm1GUkZi2fYHRW8WxwCnKg9xwQfw5r39G/JcdTSbql0QYFSj BoYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=8dQxN7CNwzNwaX+23aLhtZ2PSqX2E/WNSUIzUjdM4qM=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=aAOQ41pwK2kwqaWC9yZRBIBIZbXP0EjOvT5p7iiD+IloZnMyWj4bAoRhxFD1guD2sA icW9FkTvJLyJliJXNb4HgMhvFHaQqxGbGutJ9ocvKPnhM2oyPVfeosvQYqdHlta13q4M uVav/SQG7vcB4N/4Waz0dWkSJfgr9lcassmv8hl8TQYfKabgZqaF/bjx6czLL/GAsCwl QBE525fPRbUZKE5CfyoaKlfiJ54yk1c9ooo60+ayjpq8ot4HgZG7IJJ2/sd+HXBKTE0w c3VBIlCTogdXaHnixpNb3ytLCFVauOnZ4KEusQ+VRv56sal6/fQZQLyMMOVjCrkGvuTw 59jA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=gEJED3G4; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f75c0700b2si4808851fa.285.2024.09.07.10.14.56; Sat, 07 Sep 2024 10:14:57 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=gEJED3G4; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9EFB068D964; Sat, 7 Sep 2024 20:14:05 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-231.mail.qq.com (out203-205-221-231.mail.qq.com [203.205.221.231]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 807DC68D7C8 for ; Sat, 7 Sep 2024 20:13:54 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1725729225; bh=YTaxjzotbmztkWkMZEXlplKPjFur/rff4PgOEMSAxAk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=gEJED3G4kHDDIy8SQ8+yxysI5rpbmmKv6uJ/sR4kUah8rqzgwft8HAO4qZQxP5t/S cjK3WDk5mNZskbuU1xNAzrwDSCMNJzExVIORwa2opRmyU87SZ3GqZ7tu6Oe3UCAD38 Dcgfn12CF25zEfl2lQVHoQw4ViZlAnx/vYWYnpTs= Received: from ZHILIZHAO-MB1.tencent.com ([121.34.200.168]) by newxmesmtplogicsvrszc5-2.qq.com (NewEsmtp) with SMTP id 36A3CCF4; Sun, 08 Sep 2024 01:13:42 +0800 X-QQ-mid: xmsmtpt1725729225t4f44p5f9 Message-ID: X-QQ-XMAILINFO: Mdc3TkmnJyI/0UxMrLg2cmfDaxeV0kMhPAZfNgKFLhOdguhCTpnA8L2+9zdYwC C+bO4ZQq/A4+iEVPnIthxjtiJDPbPFIIs62/LgR+PbumHxPYA3+KySJLxb450HvVCCZCy/z4Tuwj Mxn/TDUiGcTO+mtD2ahNOxCzaKoEASGSilkyz5nsXsQhxxTxoGxQ/h9B5xFzLhMPkN+TWOGEJOlk oGQanHi8pc93veoTzJ7N3uol02mJTtYdcshTA+Z1l7dNhjiHiwGd6EUVhjUf3yiNjWAmTXATVRcM Z2uSdImSatDMGU0PXmZaEYGbDA+U/4bwGQpC5sprzNp284M2xSqO3tZkLGC+fVvCoXd8xKY2FlF+ Y0cv7p3y7Ci2rsGxhonWkVLSPyPstKQFk5rAcW+nZctEn1e89ZkqIJKb24G2YoEp/WkWhPeRYjnu Z6GyeswCGY2SnMtvnxyoP7UUwbZvGWvwvScvjjyaq0+RQfmEkag2tpouohfymA1Lq2kTfQtNLhLk e3509hzMPsnDEBlt9dFbnPEJbEqmkKasc/KUGqiil2ha+zqqZ3oLuJWj1tjNwy3F+s5IzCPss+Lg BsaJOL5No1Hp5sUWuh3VA0iKtZrLorXSHaYGajBEZBDxRhhdObIO7ilLAtjItxB8THlaWRvoTHv4 3ebNrE/4KxQQBU7vUFLWjbYwKBKvFnhFARbjvJNyEgs4R9UCxuS2bE3r0XLTkBTn1tU/BFh+PeXM hMagsS+mESiENZIzjSFoVZFi5No4sRGCKT1o5hq20NYZ8W6KjLXAAI8yZaRZS0fJM5iIEZOBU845 tYP3oUu5TemdKb80vtvQmnXYl4kQiWB6MB/tAGCgK1hdXY21csSZIDw7k4uGuTaAykQyR4BT60// Y2e+65XcCNYENPe+feAUvbIBXMmgJU/Fq+lntE0GfrsSc5Y8P3StdMhIkeu17gi0USSypwABIW6d Bt5XAQa2cmSlEE+1XY6ogx4KvsbtPdmYiEPOKcyMY= X-QQ-XMRINFO: NI4Ajvh11aEj8Xl/2s1/T8w= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Sun, 8 Sep 2024 01:13:40 +0800 X-OQ-MSGID: <20240907171340.55502-6-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240907171340.55502-1-quinkblack@foxmail.com> References: <20240907171340.55502-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 6/6] avcodec/hevc: ff_hevc_(qpel/epel)_filters are signed type X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: CLJLZbXQ48Dg From: Zhao Zhili --- libavcodec/hevc/dsp_template.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/hevc/dsp_template.c b/libavcodec/hevc/dsp_template.c index aebccd1a0c..a0f79c2673 100644 --- a/libavcodec/hevc/dsp_template.c +++ b/libavcodec/hevc/dsp_template.c @@ -302,8 +302,8 @@ IDCT_DC(32) //////////////////////////////////////////////////////////////////////////////// #define ff_hevc_pel_filters ff_hevc_qpel_filters #define DECL_HV_FILTER(f) \ - const uint8_t *hf = ff_hevc_ ## f ## _filters[mx]; \ - const uint8_t *vf = ff_hevc_ ## f ## _filters[my]; + const int8_t *hf = ff_hevc_ ## f ## _filters[mx]; \ + const int8_t *vf = ff_hevc_ ## f ## _filters[my]; #define FW_PUT(p, f, t) \ static void FUNC(put_hevc_## f)(int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height, \