From patchwork Wed Sep 11 18:06:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51522 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp492771vqi; Wed, 11 Sep 2024 11:34:13 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXqorGqYZLKsWorVjquryCAmAwCKRonLlii2khBwhz+Qsi42DOkdXzJwsyaSLG8sxQ4UcroHKpSjUuMoS/NVewm@gmail.com X-Google-Smtp-Source: AGHT+IG+c2TyserSuUaKc1AGiFp4Wfilmf/iV2zKkiZhyR3r/ZQqQauC7ozN3aV7IljEdvX/KV1i X-Received: by 2002:a05:6512:b1e:b0:530:da96:a986 with SMTP id 2adb3069b0e04-53678feb0eemr182156e87.47.1726079652822; Wed, 11 Sep 2024 11:34:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726079652; cv=none; d=google.com; s=arc-20240605; b=IWbywPz5BVz7187GM3wckeisc3DP8VCTOwACZmXJ9Znsx4qjsaopHUcEQgYpL6AskI Ukv6i0p+rhpaS/CeZwc7TVIblE79bWxA85e5xrPakUEVeDWxRXFm/Puxx31/I9fG5WIi FQmkQ8pV8ZHP2W1lAIdXGQvS3q/ljOKHg+hnA7camr7piLRkk2kgitDqGZyoWWLr3aax ObIf9aWxX0X2asN9NMbj042bWfaDBcqASbyLZvjH4Mxf7hXDkXS2H64s89zHdv9SIoqA 99aPFvYD5os+PuuZ+wd5RQ35wmfG4Ao6r8YINh6mhXdSnIoIdFkpp+HPhkixNMZClRZF eJjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=aE+ukGchHS9zvw/nl9/HqR0bPAmi3pyuCEu9x4ghE7E=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=lanVZXL5fVz13NlSgjf3QKY3r4eu9w8r96P6OPCTYbt3coYNUOnhvphq3FR0Hi7sGp XAxuwJIH9xIoc9Ft51WjP6Zc0pl6FG8+NwHK9AZCZK0ghqEObBSsupB9yfa0DMgbIpow yNt0xxufUFylnMqOj73XxgUlWBLCdf50GmdJVxjwxBLyyE4RG8lpmZD7mW+ry0m6+Bl2 FH/7T9Ea/SyaDSTvJ4edaeIgDclQDbC6FULgVXdNIO6majUfNNRkE+7XDhWKSLBZt7dD fLUVLBnxhXlnjuEniplw29/e+1tnR5pRBr47QD1XKBDDWkCvUW/09kN+NEJaK4gwrnUc bgfQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=T9Rk2qor; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5c3ebdb750esi7390756a12.689.2024.09.11.11.34.12; Wed, 11 Sep 2024 11:34:12 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=T9Rk2qor; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id F09A368E2F3; Wed, 11 Sep 2024 21:06:52 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-210.mail.qq.com (out203-205-221-210.mail.qq.com [203.205.221.210]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E2AA568E29A for ; Wed, 11 Sep 2024 21:06:34 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077986; bh=Ya/qU53+pGW0K4U4NH9+RISmFt/4p6S26ee3giE/Wb0=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=T9Rk2qor39XgHo028LU+1PBDwyzL0RsSyaq37TB6eUWbqiho3thj0x09Zn59iJiV4 NeHC4gv6n+4anlzc5aPuHSTzaSCMurY+oZ4BU6/2JdiOUas6v3PGKcqDstjemJjPZn /W9p8W+TvpPwQLKKDaAJ39+Bmji3S7W2zATR6Pco= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077986tu8pv4obv Message-ID: X-QQ-XMAILINFO: OKKHiI6c9SH39yEiQZBt0jZ8j3eniNaOIzEYrtu6llGuX54tGMA1sFfo1mEPZh Ru0RKY01DsOuFK8/X87qsWq8bo6cGyCdXzgRV1vJ9GZBWNWtM+nhfwpKpN1HDBywCzluGU4Drzt2 s2oJWa0KPMYLmpcBBhx96d2IT9fi2+CMHesK0xWUQ97p7Uiyr6M4Zc/Pi1FzxgjNqVpc4dOjxIem ZJPdb7fDCcYiXp7JUd+e0Ra9MufwITbgE0LVS5O0KO4NSJs0w33gKhUsvHRCJRFwLNev9bkCnZ0a tLmtTDecv+oN5pemTnULkF3JnG+t4A3IKzz4vY5UzFRbec1qqthsxzKYCALxuO+jNchPPKZnFczB lEn5keBlNXIR20rgGl0u7jm3xSww0ANg1lD3CqO88HHoIO68JEJTq55mK8M2dE4QOyjAd5GamXAK 2TU5c4UW4SHlA8aEPplF3e1iQIRwXdYCE+GuZoRqf0rsGkouOPwNqlsvi3AjpPxRhJz0c4Q3x7KH CBV/G8a6VPIZ7h+8rMGE8DXs8kMJqdOT+vOrTpPJvI1vOhxPdh/8DizC+qG7pIQAFamMBHszufoP /AetyRYmPvVHg1xRDtdlZfTl/ZIQSTYe0RbucTYk4SbCT/1MWVRt6fh0P/vOcLVOUpGmmEtczOIY N6DOR1oVspQCS72QN7lG2zqjpN9DsrBSxaX/Z12tDYHNDljITT/QJXt5Hoi+ECyt/W8SEOqKYdLE gGqGYJisfO6KYdjQ7d0clSY6DU6dOIaUyU8JKMDe0dpDu1JNGay1Iw3wTIT8lriq+LNF5rXPIztN /uDTkdtEZY3MAldjZ4JDm2D0jn/IZ1KT4o7b2656pTDR5kEMCJbtA7UGV0HYmtO1U2qGJASOu3QQ JAh3pxs470SNJKWJP/t1R/C9DW2lZiNf4apsXpW66X/z8/zsISf89E7kyByeDPSsAPxMgNRltlZo aVv1Luob72bgwtxoNeivaVZBs5tRVyk1aCWoPuErY4x7dyb43AAy/3w4ZSkrFKHgX27d46VBY= X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:17 +0800 X-OQ-MSGID: <20240911180618.28921-14-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 13/14] aarch64/vvc: Add put_epel_hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: caHlLWGiWNUP From: Zhao Zhili On Apple M1: put_chroma_hv_8_4x4_c: 1.7 ( 1.00x) put_chroma_hv_8_4x4_neon: 0.2 ( 7.67x) put_chroma_hv_8_8x8_c: 5.5 ( 1.00x) put_chroma_hv_8_8x8_neon: 0.5 (11.53x) put_chroma_hv_8_16x16_c: 18.5 ( 1.00x) put_chroma_hv_8_16x16_neon: 1.5 (12.53x) put_chroma_hv_8_32x32_c: 72.5 ( 1.00x) put_chroma_hv_8_32x32_neon: 4.7 (15.34x) put_chroma_hv_8_64x64_c: 274.0 ( 1.00x) put_chroma_hv_8_64x64_neon: 18.5 (14.83x) put_chroma_hv_8_128x128_c: 1058.7 ( 1.00x) put_chroma_hv_8_128x128_neon: 75.2 (14.07x) On Android Pixel 8 Pro: put_chroma_hv_8_4x4_c: 1.2 ( 1.00x) put_chroma_hv_8_4x4_neon: 0.0 ( 0.00x) put_chroma_hv_8_4x4_i8mm: 0.2 ( 5.00x) put_chroma_hv_8_8x8_c: 4.0 ( 1.00x) put_chroma_hv_8_8x8_neon: 0.5 ( 8.00x) put_chroma_hv_8_8x8_i8mm: 0.5 ( 8.00x) put_chroma_hv_8_16x16_c: 15.2 ( 1.00x) put_chroma_hv_8_16x16_neon: 2.5 ( 6.10x) put_chroma_hv_8_16x16_i8mm: 2.2 ( 6.78x) put_chroma_hv_8_32x32_c: 61.0 ( 1.00x) put_chroma_hv_8_32x32_neon: 9.8 ( 6.26x) put_chroma_hv_8_32x32_i8mm: 8.5 ( 7.18x) put_chroma_hv_8_64x64_c: 229.5 ( 1.00x) put_chroma_hv_8_64x64_neon: 38.5 ( 5.96x) put_chroma_hv_8_64x64_i8mm: 34.0 ( 6.75x) put_chroma_hv_8_128x128_c: 919.8 ( 1.00x) put_chroma_hv_8_128x128_neon: 154.5 ( 5.95x) put_chroma_hv_8_128x128_i8mm: 140.0 ( 6.57x) --- libavcodec/aarch64/h26x/dsp.h | 8 ++ libavcodec/aarch64/h26x/epel_neon.S | 125 ++++++++++++++++++++++++++++ libavcodec/aarch64/vvc/dsp_init.c | 14 ++++ 3 files changed, 147 insertions(+) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 90a42d7108..0fefb4d70f 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -297,4 +297,12 @@ NEON8_FNPROTO_PARTIAL_6(qpel_hv, (int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height, const int8_t *hf, const int8_t *vf, int width), _i8mm); +NEON8_FNPROTO_PARTIAL_6(epel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, int height, + const int8_t *hf, const int8_t *vf, int width),); + +NEON8_FNPROTO_PARTIAL_6(epel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, int height, + const int8_t *hf, const int8_t *vf, int width), _i8mm); + #endif diff --git a/libavcodec/aarch64/h26x/epel_neon.S b/libavcodec/aarch64/h26x/epel_neon.S index cad8f2a5f4..e44a448b1f 100644 --- a/libavcodec/aarch64/h26x/epel_neon.S +++ b/libavcodec/aarch64/h26x/epel_neon.S @@ -72,6 +72,11 @@ endconst sxtl v0.8h, v0.8b .endm +.macro vvc_load_epel_filterh freg + ld1 {v0.8b}, [\freg] + sxtl v0.8h, v0.8b +.endm + .macro calc_epelh dst, src0, src1, src2, src3 smull \dst\().4s, \src0\().4h, v0.h[0] smlal \dst\().4s, \src1\().4h, v0.h[1] @@ -2299,10 +2304,16 @@ endfunc DISABLE_I8MM #endif +function vvc_put_epel_hv4_8_end_neon + vvc_load_epel_filterh x5 + mov x10, #(VVC_MAX_PB_SIZE * 2) + b 0f +endfunc function hevc_put_hevc_epel_hv4_8_end_neon load_epel_filterh x5, x4 mov x10, #(HEVC_MAX_PB_SIZE * 2) +0: ldr d16, [sp] ldr d17, [sp, x10] add sp, sp, x10, lsl #1 @@ -2339,9 +2350,16 @@ function hevc_put_hevc_epel_hv6_8_end_neon 2: ret endfunc +function vvc_put_epel_hv8_8_end_neon + vvc_load_epel_filterh x5 + mov x10, #(VVC_MAX_PB_SIZE * 2) + b 0f +endfunc + function hevc_put_hevc_epel_hv8_8_end_neon load_epel_filterh x5, x4 mov x10, #(HEVC_MAX_PB_SIZE * 2) +0: ldr q16, [sp] ldr q17, [sp, x10] add sp, sp, x10, lsl #1 @@ -2379,9 +2397,16 @@ function hevc_put_hevc_epel_hv12_8_end_neon 2: ret endfunc +function vvc_put_epel_hv16_8_end_neon + vvc_load_epel_filterh x5 + mov x10, #(VVC_MAX_PB_SIZE * 2) + b 0f +endfunc + function hevc_put_hevc_epel_hv16_8_end_neon load_epel_filterh x5, x4 mov x10, #(HEVC_MAX_PB_SIZE * 2) +0: ld1 {v16.8h, v17.8h}, [sp], x10 ld1 {v18.8h, v19.8h}, [sp], x10 ld1 {v20.8h, v21.8h}, [sp], x10 @@ -2437,6 +2462,21 @@ function ff_hevc_put_hevc_epel_hv4_8_\suffix, export=1 b hevc_put_hevc_epel_hv4_8_end_neon endfunc +function ff_vvc_put_epel_hv4_8_\suffix, export=1 + add w10, w3, #3 + lsl x10, x10, #8 + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-32]! + stp x0, x3, [sp, #16] + add x0, sp, #32 + sub x1, x1, x2 + add w3, w3, #3 + bl X(ff_vvc_put_epel_h4_8_\suffix) + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #32 + b vvc_put_epel_hv4_8_end_neon +endfunc + function ff_hevc_put_hevc_epel_hv6_8_\suffix, export=1 add w10, w3, #3 lsl x10, x10, #7 @@ -2467,6 +2507,21 @@ function ff_hevc_put_hevc_epel_hv8_8_\suffix, export=1 b hevc_put_hevc_epel_hv8_8_end_neon endfunc +function ff_vvc_put_epel_hv8_8_\suffix, export=1 + add w10, w3, #3 + lsl x10, x10, #8 + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-32]! + stp x0, x3, [sp, #16] + add x0, sp, #32 + sub x1, x1, x2 + add w3, w3, #3 + bl X(ff_vvc_put_epel_h8_8_\suffix) + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #32 + b vvc_put_epel_hv8_8_end_neon +endfunc + function ff_hevc_put_hevc_epel_hv12_8_\suffix, export=1 add w10, w3, #3 lsl x10, x10, #7 @@ -2497,6 +2552,21 @@ function ff_hevc_put_hevc_epel_hv16_8_\suffix, export=1 b hevc_put_hevc_epel_hv16_8_end_neon endfunc +function ff_vvc_put_epel_hv16_8_\suffix, export=1 + add w10, w3, #3 + lsl x10, x10, #8 + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-32]! + stp x0, x3, [sp, #16] + add x0, sp, #32 + sub x1, x1, x2 + add w3, w3, #3 + bl X(ff_vvc_put_epel_h16_8_\suffix) + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #32 + b vvc_put_epel_hv16_8_end_neon +endfunc + function ff_hevc_put_hevc_epel_hv24_8_\suffix, export=1 add w10, w3, #3 lsl x10, x10, #7 @@ -2530,6 +2600,24 @@ function ff_hevc_put_hevc_epel_hv32_8_\suffix, export=1 ret endfunc +function ff_vvc_put_epel_hv32_8_\suffix, export=1 + stp x4, x5, [sp, #-64]! + stp x2, x3, [sp, #16] + stp x0, x1, [sp, #32] + str x30, [sp, #48] + mov x6, #16 + bl X(ff_vvc_put_epel_hv16_8_\suffix) + ldp x0, x1, [sp, #32] + ldp x2, x3, [sp, #16] + ldp x4, x5, [sp], #48 + add x0, x0, #32 + add x1, x1, #16 + mov x6, #16 + bl X(ff_vvc_put_epel_hv16_8_\suffix) + ldr x30, [sp], #16 + ret +endfunc + function ff_hevc_put_hevc_epel_hv48_8_\suffix, export=1 stp x4, x5, [sp, #-64]! stp x2, x3, [sp, #16] @@ -2579,6 +2667,43 @@ function ff_hevc_put_hevc_epel_hv64_8_\suffix, export=1 ldr x30, [sp], #16 ret endfunc + +function ff_vvc_put_epel_hv64_8_\suffix, export=1 + stp x4, x5, [sp, #-64]! + stp x2, x3, [sp, #16] + stp x0, x1, [sp, #32] + str x30, [sp, #48] + mov x6, #32 + bl X(ff_vvc_put_epel_hv32_8_\suffix) + ldp x0, x1, [sp, #32] + ldp x2, x3, [sp, #16] + ldp x4, x5, [sp], #48 + add x0, x0, #64 + add x1, x1, #32 + mov x6, #32 + bl X(ff_vvc_put_epel_hv32_8_\suffix) + ldr x30, [sp], #16 + ret +endfunc + +function ff_vvc_put_epel_hv128_8_\suffix, export=1 + stp x4, x5, [sp, #-64]! + stp x2, x3, [sp, #16] + stp x0, x1, [sp, #32] + str x30, [sp, #48] + mov x6, #64 + bl X(ff_vvc_put_epel_hv64_8_\suffix) + ldp x0, x1, [sp, #32] + ldp x2, x3, [sp, #16] + ldp x4, x5, [sp], #48 + add x0, x0, #128 + add x1, x1, #64 + mov x6, #64 + bl X(ff_vvc_put_epel_hv64_8_\suffix) + ldr x30, [sp], #16 + ret +endfunc + .endm epel_hv neon diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index c947885145..4867491620 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -84,6 +84,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[1][5][0][1] = c->inter.put[1][6][0][1] = ff_vvc_put_epel_h32_8_neon; + c->inter.put[1][1][1][1] = ff_vvc_put_epel_hv4_8_neon; + c->inter.put[1][2][1][1] = ff_vvc_put_epel_hv8_8_neon; + c->inter.put[1][3][1][1] = ff_vvc_put_epel_hv16_8_neon; + c->inter.put[1][4][1][1] = ff_vvc_put_epel_hv32_8_neon; + c->inter.put[1][5][1][1] = ff_vvc_put_epel_hv64_8_neon; + c->inter.put[1][6][1][1] = ff_vvc_put_epel_hv128_8_neon; + c->inter.put_uni[0][1][0][0] = ff_vvc_put_pel_uni_pixels4_8_neon; c->inter.put_uni[0][2][0][0] = ff_vvc_put_pel_uni_pixels8_8_neon; c->inter.put_uni[0][3][0][0] = ff_vvc_put_pel_uni_pixels16_8_neon; @@ -134,6 +141,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[1][4][0][1] = ff_vvc_put_epel_h32_8_neon_i8mm; c->inter.put[1][5][0][1] = ff_vvc_put_epel_h64_8_neon_i8mm; c->inter.put[1][6][0][1] = ff_vvc_put_epel_h128_8_neon_i8mm; + + c->inter.put[1][1][1][1] = ff_vvc_put_epel_hv4_8_neon_i8mm; + c->inter.put[1][2][1][1] = ff_vvc_put_epel_hv8_8_neon_i8mm; + c->inter.put[1][3][1][1] = ff_vvc_put_epel_hv16_8_neon_i8mm; + c->inter.put[1][4][1][1] = ff_vvc_put_epel_hv32_8_neon_i8mm; + c->inter.put[1][5][1][1] = ff_vvc_put_epel_hv64_8_neon_i8mm; + c->inter.put[1][6][1][1] = ff_vvc_put_epel_hv128_8_neon_i8mm; } } else if (bd == 10) { c->alf.filter[LUMA] = alf_filter_luma_10_neon;