From patchwork Wed Sep 11 18:06:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51517 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp485286vqi; Wed, 11 Sep 2024 11:19:17 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUywD7vVNTA11L4yYlx78g06SS5aO/IaRkqw9y/tvr564A5zXqdjW9PXBwUlwANaU5er20Jtnfi4JJuvCd6MdMe@gmail.com X-Google-Smtp-Source: AGHT+IHLURy4w6U4YC/TEPd+5O9EZj4Tc2wIO+YdSK8NkS4p4TxRrXh29CtgoqdxhDYsXhLV7+3s X-Received: by 2002:a17:907:3e94:b0:a8a:926a:cffe with SMTP id a640c23a62f3a-a902942e92emr18749066b.4.1726078757630; Wed, 11 Sep 2024 11:19:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726078757; cv=none; d=google.com; s=arc-20240605; b=C99bt52E8Wu+1oi4fBmJ+hll0vqq6DrMf7NYRJoQC4XI/DE+tu1ZyvGQSa44eT9cBU OD4Da4tO67PJXFrj6XYhTq3fhP6br4Vmmm4RN+LLoTTKTOH/NypaBvEdYZ9ARqBdARZy pi8AoUtrJbazU4DElaZ3VSzZUTDDPsrNDqnzbxNwY3VFilnhK5hjOJNfOZfsLlPXMNuM XCwm3C6xj98fiGzVwFJk9jA62uacZ3nyU4QNwChE1hPTseQCQmk67IWJ0WhHgCk0g+PW c1y4MQymDlZIX/iAFOt09GugpplNF4Py9jHeAuat9BvkmLKvAEpbriScHaEo6WzyN2lb vSNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=zA9Tu2N7bNMBHZWgv1JKF2EcqYpmmuazbegP6J1qZDs=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=GrX2YU5iox4WBmMmc9T0/YysLDKGfRQfFAcY2TTsmzSN7bD9G2RAR3NC3hMg3gEm3d pP+X4kcV3K5a2WJ9muWpyEC2twvGCUZ8RhG5w231jkDWzIHDoM4s77OhOh5WvDlpZ9+D LgIc3fg+m0ivdu+EQymjmQoZ2FoKf3DaogiIsiWiDP3M41mOHhJAYITJx+TqmjogulP8 MJPFqWQB9f0wq7b0YCXFWhE3GEnVe8rSGic0DG0SbOHtlE67tmJ6wlZCnen3+ZJEXQoX hfeFj/R5jk5T0FK70s95yFRj+i5tn8+EK7pgkU8vxBtKMdQdeYxm6l3BhOCLQEwJYm3p AxHg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="h/GQ5DTR"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a8d2585695asi702269166b.102.2024.09.11.11.19.17; Wed, 11 Sep 2024 11:19:17 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="h/GQ5DTR"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DC07968E2EB; Wed, 11 Sep 2024 21:06:51 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-58-211.mail.qq.com (out162-62-58-211.mail.qq.com [162.62.58.211]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 70FEF68E26E for ; Wed, 11 Sep 2024 21:06:34 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077986; bh=sRjuPKcFBO8GpUe2ObiP8cMoXjJiQk6nuL/3sYVAnqk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=h/GQ5DTRNa9Vohkt8hYaaf88A6sFSypnUi2qCTY84Jlb5w1m2H8wovt6Ii0qz/QV7 CJrodzU5nwHsWHwGYiueh1xBOvylwaMbtfYju3bSJVrbfm1islgEE/dqtjtnIAfkep hAewoAkVFOqdrTlvrxArz60l2V/Sufm50/pSoesk= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077985tp2bv2ele Message-ID: X-QQ-XMAILINFO: MyIXMys/8kCtplcVbRXz6ORcqURHUJswaUjz7VxrfnjX7wGFy1FZCx1j5tDWg3 LK93jSLOsEuxEqL+OQnmfIN0FS357awY7e3vEZsviiLXlyqNEyV8+OzL/m7wJfG18ud8gDtzqny+ ANhgk4COCHwCxkakwhMGo7/g0DT5XkRnVencNi4gVELGksoi4597PIxaCdwYeOaGS8YST6m6tRKF 4giMscRvmaZIH4I1Tddjaa8UQ7ez9JbYp/GxWoYe4FJukiQurHdLn5v19YMW3bnnE+vkTuuLmXeW R5pxynjjUgz1do3ueObw9IKBLOh5ReO2fwjKWEndD0gSyWLRtLtBaXULf4XEq8KpWyFy/z6qakwJ 9unZFapziZIdCfrEO478siDm6ipmjKSISsZd47T1HPrJkMP7dcC7JxYU0T/lZAaQSejnLhkWuvj3 ZrsHvlc4WuyZEVwJNMIjAi0r8s/xfa/hirlERLVnstQNsJ6oW5USUpjYzN8+pHaQuqmOJNMJDtBA fSiwwJnTrIZ6ysLSwzrDdyiY9zv4DPuVuh9sZgL2SoDQjyWiYYrnMIbfQlUyYRcLgL35XaVAebe0 HY4pnhIXwK6axVrEtyugZNncvJk+rqgHk7/UpeqJst51/405UYltk6VArYlHur/rnShHEVtlV5DX izCxT8XX9VgdSbr+Y18xaWW0LHYl4bJvx83s8XmchIQpRGT7bSAGL75Cbc3TCghThmZy+i+06F7E INEdteqbumgVd40j8F95fhGQq2rVGPPLUlHjfTGGRpQ1XFxuBunPtciU+azd3NlwMwmvXRzk2jyI 4eMJPU/1PiF9u8EAS92Je/RoKQf30WD8T6uoWE04E6RbO43KAECAX31V32SN/vl6CKjdN+1HG+D6 F8VhZULMd40Loo/Xb6itZRwQQBN71Ny/Ub/BrhWHuxgTbD/9qPoUrFVdvgrQc3LXM2E9YdA+qM9y mlqMplE+V7poAp8PUtG/576Sem5FS/XyJfSZWq2Y5qLnNuhJ6RL7X11hR0U6C/Y4uaO5/rFZg+RV Innsyvc4BSLbUA0BE/ X-QQ-XMRINFO: NI4Ajvh11aEj8Xl/2s1/T8w= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:16 +0800 X-OQ-MSGID: <20240911180618.28921-13-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 12/14] aarch64/vvc: Add put_epel_h i8mm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: u9bFlmecXIZ5 From: Zhao Zhili put_chroma_h_8_4x4_c: 0.4 ( 1.00x) put_chroma_h_8_4x4_neon: 0.0 ( 0.00x) put_chroma_h_8_4x4_i8mm: 0.1 ( 2.67x) put_chroma_h_8_8x8_c: 1.6 ( 1.00x) put_chroma_h_8_8x8_neon: 0.1 (11.00x) put_chroma_h_8_8x8_i8mm: 0.1 (11.00x) put_chroma_h_8_16x16_c: 6.9 ( 1.00x) put_chroma_h_8_16x16_neon: 1.1 ( 6.00x) put_chroma_h_8_16x16_i8mm: 0.7 (10.62x) put_chroma_h_8_32x32_c: 27.6 ( 1.00x) put_chroma_h_8_32x32_neon: 4.7 ( 5.95x) put_chroma_h_8_32x32_i8mm: 4.4 ( 6.28x) put_chroma_h_8_64x64_c: 116.2 ( 1.00x) put_chroma_h_8_64x64_neon: 19.1 ( 6.07x) put_chroma_h_8_64x64_i8mm: 17.1 ( 6.77x) put_chroma_h_8_128x128_c: 466.6 ( 1.00x) put_chroma_h_8_128x128_neon: 81.4 ( 5.73x) put_chroma_h_8_128x128_i8mm: 71.7 ( 6.51x) --- libavcodec/aarch64/h26x/dsp.h | 6 ++- libavcodec/aarch64/h26x/epel_neon.S | 60 ++++++++++++++++++++++++++--- libavcodec/aarch64/vvc/dsp_init.c | 7 ++++ 3 files changed, 66 insertions(+), 7 deletions(-) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 6978b900fe..90a42d7108 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -273,7 +273,11 @@ NEON8_FNPROTO_PARTIAL_6(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride, int height, int denom, int wx, int ox, const int8_t *hf, const int8_t *vf, int width),); -NEON8_FNPROTO_PARTIAL_6(qpel_h, (int16_t * dst, +NEON8_FNPROTO_PARTIAL_6(qpel_h, (int16_t *dst, + const uint8_t *_src, ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width), _i8mm); + +NEON8_FNPROTO_PARTIAL_6(epel_h, (int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width), _i8mm); diff --git a/libavcodec/aarch64/h26x/epel_neon.S b/libavcodec/aarch64/h26x/epel_neon.S index 80a0b66a52..cad8f2a5f4 100644 --- a/libavcodec/aarch64/h26x/epel_neon.S +++ b/libavcodec/aarch64/h26x/epel_neon.S @@ -1910,6 +1910,12 @@ endfunc #if HAVE_I8MM ENABLE_I8MM + +function ff_vvc_put_epel_h4_8_neon_i8mm, export=1 + VVC_EPEL_H_HEADER + b 1f +endfunc + function ff_hevc_put_hevc_epel_h4_8_neon_i8mm, export=1 EPEL_H_HEADER 1: ld1 {v4.8b}, [x1], x2 @@ -1953,6 +1959,11 @@ function ff_hevc_put_hevc_epel_h6_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_epel_h8_8_neon_i8mm, export=1 + VVC_EPEL_H_HEADER + b 1f +endfunc + function ff_hevc_put_hevc_epel_h8_8_neon_i8mm, export=1 EPEL_H_HEADER 1: ld1 {v4.16b}, [x1], x2 @@ -2003,6 +2014,11 @@ function ff_hevc_put_hevc_epel_h12_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_epel_h16_8_neon_i8mm, export=1 + VVC_EPEL_H_HEADER + b 1f +endfunc + function ff_hevc_put_hevc_epel_h16_8_neon_i8mm, export=1 EPEL_H_HEADER 1: ld1 {v0.16b, v1.16b}, [x1], x2 @@ -2077,6 +2093,11 @@ function ff_hevc_put_hevc_epel_h24_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_epel_h32_8_neon_i8mm, export=1 + VVC_EPEL_H_HEADER + b 1f +endfunc + function ff_hevc_put_hevc_epel_h32_8_neon_i8mm, export=1 EPEL_H_HEADER 1: ld1 {v0.16b, v1.16b, v2.16b}, [x1], x2 @@ -2176,11 +2197,8 @@ function ff_hevc_put_hevc_epel_h48_8_neon_i8mm, export=1 ret endfunc -function ff_hevc_put_hevc_epel_h64_8_neon_i8mm, export=1 - EPEL_H_HEADER - sub x2, x2, #64 -1: ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x1], #64 - subs w3, w3, #1 // height +.macro put_epel_h64_8_neon_i8mm + ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x1], #64 ext v4.16b, v0.16b, v1.16b, #1 ext v5.16b, v0.16b, v1.16b, #2 ext v6.16b, v0.16b, v1.16b, #3 @@ -2243,7 +2261,37 @@ function ff_hevc_put_hevc_epel_h64_8_neon_i8mm, export=1 xtn2 v22.8h, v26.4s xtn v23.4h, v23.4s xtn2 v23.8h, v27.4s - st4 {v20.8h, v21.8h, v22.8h, v23.8h}, [x0], #64 + st4 {v20.8h, v21.8h, v22.8h, v23.8h}, [x0], x10 +.endm + +function ff_vvc_put_epel_h64_8_neon_i8mm, export=1 + VVC_EPEL_H_HEADER + mov x10, #(VVC_MAX_PB_SIZE * 2 - 64) + sub x2, x2, #64 + b 1f +endfunc + +function ff_hevc_put_hevc_epel_h64_8_neon_i8mm, export=1 + EPEL_H_HEADER + mov x10, #64 + sub x2, x2, #64 +1: + subs w3, w3, #1 // height + put_epel_h64_8_neon_i8mm + b.ne 1b + ret +endfunc + +function ff_vvc_put_epel_h128_8_neon_i8mm, export=1 + VVC_EPEL_H_HEADER + sub x11, x2, #128 + mov x10, #64 + mov x2, #0 +1: + put_epel_h64_8_neon_i8mm + subs w3, w3, #1 + put_epel_h64_8_neon_i8mm + add x1, x1, x11 b.ne 1b ret endfunc diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index c8c13eb068..c947885145 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -127,6 +127,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][4][1][1] = ff_vvc_put_qpel_hv32_8_neon_i8mm; c->inter.put[0][5][1][1] = ff_vvc_put_qpel_hv64_8_neon_i8mm; c->inter.put[0][6][1][1] = ff_vvc_put_qpel_hv128_8_neon_i8mm; + + c->inter.put[1][1][0][1] = ff_vvc_put_epel_h4_8_neon_i8mm; + c->inter.put[1][2][0][1] = ff_vvc_put_epel_h8_8_neon_i8mm; + c->inter.put[1][3][0][1] = ff_vvc_put_epel_h16_8_neon_i8mm; + c->inter.put[1][4][0][1] = ff_vvc_put_epel_h32_8_neon_i8mm; + c->inter.put[1][5][0][1] = ff_vvc_put_epel_h64_8_neon_i8mm; + c->inter.put[1][6][0][1] = ff_vvc_put_epel_h128_8_neon_i8mm; } } else if (bd == 10) { c->alf.filter[LUMA] = alf_filter_luma_10_neon;