From patchwork Wed Sep 11 18:06:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51521 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp487702vqi; Wed, 11 Sep 2024 11:24:17 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVXLG8vQND32V+eTHBAu8SagjuUiC1gvDJhI3s2wBSSdNmicvtSwuzmkWjCUX+hv/OjhLqCOpB8Z1KmoA4L6Qx8@gmail.com X-Google-Smtp-Source: AGHT+IH/mKJbbj93eTfJbwkOPEWWMhfQOMUDPWBI/XBnS/4t10JbRI3CZupI2QN4rQzmODzOIJ86 X-Received: by 2002:a05:6402:13c5:b0:5ba:8ad7:4859 with SMTP id 4fb4d7f45d1cf-5c413e126f3mr394128a12.14.1726079056986; Wed, 11 Sep 2024 11:24:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726079056; cv=none; d=google.com; s=arc-20240605; b=LcykWSqWhpygV4QqACbI6cqZUxDzlvoUay7tVky+Gl+/kmDChYy+zBrIpdXlT9RfEN gcIUAqy0AEv3IUuGZH4VdQoqJeQ1wOGhum+R9q2mhWkX2jlldaHe935CROhQ2DYHFbEK UsBwgJsC0dbC7s2D3seSyX4XG8+YSHNT2hCoa/kcOxCxjIKG3YMEFE82A2089Ghbjw93 OGcjYk1vhiYpC/+pAsBVRahoRM5hAVFSzHigZYI+zt7Jx9R/7DV158HXJKl4eLKB6kLG 9q9GuuCCQAW4ApbsY3vE04ek8m0y/RJVfCLm7qZJJvAIFrpwyWVOyREHTdOgXGRs0LxG CUCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=vXjbDClyfa8PTsVmE2aiAgKEGmfM2P66jXA5Pjd7YmQ=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=ZmbOeKL3IrexdzyeK4A/Ks2LVvFuTdiZDw0tCypIc1Ot5USP55mjK1WOJY5eW0Z2EA QD116lrPGtQKAu2ri/JjaAP0+B9hfWaKNWrFiadwJmeybnAcIdPC9uudL86cmlgKuIOE AzBIUEfA2MZ2D3h3HD0jU9ijtpCp2AceW/AcwcuhJAc8kY3Fg494+ZfM3eNsalQskPY2 RzflRMxi0doxZileWfEgaJDYXXwLqSKPcRTdvhetSyL5w+hW1Bui6yCygcPJGeBxjf/x li6I8K1Ccbp595iK4BBmygHwHtoNIEeVdOs2PNIFCnqVV3x6gwde0tEClqdArRYLQLpc 934Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=gLsmgiop; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5c3ebdb6d84si7149205a12.653.2024.09.11.11.24.15; Wed, 11 Sep 2024 11:24:16 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=gLsmgiop; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CA2F268E2E4; Wed, 11 Sep 2024 21:06:50 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-209.mail.qq.com (out203-205-221-209.mail.qq.com [203.205.221.209]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DDECB68E291 for ; Wed, 11 Sep 2024 21:06:33 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077985; bh=Iz4vFYtzUwuZe+6xD1fAbeBc1ZCJ1Kq1X0/lsaic1uE=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=gLsmgiop956cxSbKPMirMSbuswF6Wg8TyogQkWEGKnp1Cf4eeDY7v4Rx7rt+p4xTf x57G5a0PuZRQcYHr3nB/j1WtOmHfm1srczltJa/h8VMDtHe0w0M24+ihgdrSJeCoeM TtejIaOBBIIU+OeGDgfs/I3NmlsqlvqxWXGPS2GY= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077985te5i0svbk Message-ID: X-QQ-XMAILINFO: Mdc3TkmnJyI/uGxTrG9ziXCe60I9IUJTuVc/rTWdHJMlofcVP1U2DsvtG2rp6E MTRC8/fhnAMvflbm0m/mhCJfxuigSH89Rj1djKhu/o46D1gMuhYbTA4NgW8zLK2iW2WUkmEvqx3H pWh8dP2qjb5M5RHoCZaAgDNyPcloDsmOuuYKnd4eDjqT9fnRW8RVWR2oGSTYWdRi8cS4VZvsJfjh DkoO3zyQcj5BiHOBpYudywWWl43HpaFk+cMKnacm9xd7kvqUxdEqd+CHpIddEFPVq5Y/5umZOmX6 WJg9aJN7YDmElEnyhYh1hqvKWGZgHEl2ao2bSsHTQ/tbmPR0K4EA60wiWNb3oFLsYBmwl/IR8E4D kYLg7hkS4J7ObNhPjtptY4nM9wY6eOvd6O2jOhf7RyBvRlzckek7ZHmJS5PiWVtvzNiU1JDE4WKI 4Ip65hZ4l09O1zIJEadCFsbOefLYYgNsi0lDNyksR1vMsLyE9XYgLlAELwLFjzEonL/cgUBJSZ7y CvgGNCQcetopk82dNuiB8v5vYjWNviz5zpQyGpNEihdwal+c8ENkaxmu/kcWc99pFOwWqSEwC3w6 5u/YXmy0sgo/iXIJ3YCtBDvf+Ep3pN1XnR2faFAE+tRmqKyFByVSu5QGLJoNQUrz/8VfaiEfi4NY ig8jO3Yy6BkCKqCyD4NYo2expGdDuP847dDqJU21CqVFwb8uSHfVvZvmDPSWhpaauQr0H7GpATLi jnRdhMXg/lg3+HCktPS9ozEhpYBJvHJsDkvkbh83iyFtEYe0CxN8vcxYLQ2fF2hhXAa6oSR7MeYv fwoWlTkFbATN2dUN6oMxmrvvmAgJZN8GYVOFCcykiwJlkaPlMNV2NlDQmVjJEbxz5uXjZPFNnFc0 CzOLOQ17Sy4VvD604c2rGBjynqT3iqqCC4hZOFVAT8HtU5vBEvQJdoiu9G6xlnKaBKNkHW47lx/D Dsi30uN+Mo4IoGrEB2EHhtRcp+6fq583m3yp+rkFhpGSE7SE3+qD6JjOyAxGH/yPMOXjNm0XE= X-QQ-XMRINFO: MPJ6Tf5t3I/ycC2BItcBVIA= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:15 +0800 X-OQ-MSGID: <20240911180618.28921-12-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 11/14] aarch64/vvc: Add put_epel_h X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: l8gBXkKg4Xg/ From: Zhao Zhili put_chroma_h_8_4x4_c: 0.2 ( 1.00x) put_chroma_h_8_4x4_neon: 0.2 ( 1.00x) put_chroma_h_8_8x8_c: 0.8 ( 1.00x) put_chroma_h_8_8x8_neon: 0.2 ( 3.00x) put_chroma_h_8_16x16_c: 3.8 ( 1.00x) put_chroma_h_8_16x16_neon: 0.8 ( 5.00x) put_chroma_h_8_32x32_c: 12.5 ( 1.00x) put_chroma_h_8_32x32_neon: 2.2 ( 5.56x) put_chroma_h_8_64x64_c: 47.0 ( 1.00x) put_chroma_h_8_64x64_neon: 8.8 ( 5.37x) put_chroma_h_8_128x128_c: 200.2 ( 1.00x) put_chroma_h_8_128x128_neon: 31.8 ( 6.31x) --- libavcodec/aarch64/h26x/dsp.h | 3 +++ libavcodec/aarch64/h26x/epel_neon.S | 30 +++++++++++++++++++++++++++++ libavcodec/aarch64/vvc/dsp_init.c | 7 +++++++ 3 files changed, 40 insertions(+) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index c54906dde2..6978b900fe 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -248,6 +248,9 @@ NEON8_FNPROTO_PARTIAL_4(qpel, (int16_t *dst, const uint8_t *_src, ptrdiff_t _src NEON8_FNPROTO_PARTIAL_4(qpel_uni, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width),) +NEON8_FNPROTO_PARTIAL_4(epel, (int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width),) + #undef NEON8_FNPROTO_PARTIAL_6 #define NEON8_FNPROTO_PARTIAL_6(fn, args, ext) \ void ff_vvc_put_##fn##4_8_neon##ext args; \ diff --git a/libavcodec/aarch64/h26x/epel_neon.S b/libavcodec/aarch64/h26x/epel_neon.S index 8ca42a5c3a..80a0b66a52 100644 --- a/libavcodec/aarch64/h26x/epel_neon.S +++ b/libavcodec/aarch64/h26x/epel_neon.S @@ -1375,6 +1375,18 @@ endfunc mov x10, #(HEVC_MAX_PB_SIZE * 2) .endm +.macro VVC_EPEL_H_HEADER + ld1r {v30.4s}, [x4] + sub x1, x1, #1 + mov x10, #(VVC_MAX_PB_SIZE * 2) +.endm + +function ff_vvc_put_epel_h4_8_neon, export=1 + VVC_EPEL_H_HEADER + sxtl v0.8h, v30.8b + b 1f +endfunc + function ff_hevc_put_hevc_epel_h4_8_neon, export=1 EPEL_H_HEADER sxtl v0.8h, v30.8b @@ -1414,6 +1426,12 @@ function ff_hevc_put_hevc_epel_h6_8_neon, export=1 ret endfunc +function ff_vvc_put_epel_h8_8_neon, export=1 + VVC_EPEL_H_HEADER + sxtl v0.8h, v30.8b + b 1f +endfunc + function ff_hevc_put_hevc_epel_h8_8_neon, export=1 EPEL_H_HEADER sxtl v0.8h, v30.8b @@ -1461,6 +1479,12 @@ function ff_hevc_put_hevc_epel_h12_8_neon, export=1 ret endfunc +function ff_vvc_put_epel_h16_8_neon, export=1 + VVC_EPEL_H_HEADER + sxtl v0.8h, v30.8b + b 1f +endfunc + function ff_hevc_put_hevc_epel_h16_8_neon, export=1 EPEL_H_HEADER sxtl v0.8h, v30.8b @@ -1523,8 +1547,14 @@ function ff_hevc_put_hevc_epel_h24_8_neon, export=1 ret endfunc +function ff_vvc_put_epel_h32_8_neon, export=1 + VVC_EPEL_H_HEADER + b 0f +endfunc + function ff_hevc_put_hevc_epel_h32_8_neon, export=1 EPEL_H_HEADER +0: ld1 {v1.8b}, [x1], #8 sub x2, x2, w6, uxtw // decrement src stride mov w7, w6 // original width diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 714d642634..c8c13eb068 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -77,6 +77,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][5][1][1] = ff_vvc_put_qpel_hv64_8_neon; c->inter.put[0][6][1][1] = ff_vvc_put_qpel_hv128_8_neon; + c->inter.put[1][1][0][1] = ff_vvc_put_epel_h4_8_neon; + c->inter.put[1][2][0][1] = ff_vvc_put_epel_h8_8_neon; + c->inter.put[1][3][0][1] = ff_vvc_put_epel_h16_8_neon; + c->inter.put[1][4][0][1] = + c->inter.put[1][5][0][1] = + c->inter.put[1][6][0][1] = ff_vvc_put_epel_h32_8_neon; + c->inter.put_uni[0][1][0][0] = ff_vvc_put_pel_uni_pixels4_8_neon; c->inter.put_uni[0][2][0][0] = ff_vvc_put_pel_uni_pixels8_8_neon; c->inter.put_uni[0][3][0][0] = ff_vvc_put_pel_uni_pixels16_8_neon;