From patchwork Wed Sep 11 18:06:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51515 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp482780vqi; Wed, 11 Sep 2024 11:14:16 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCX9Pp9M6HfFl2pRJcxV6h5/nBG+HImyzlObWiFItSmfiy4eOoPD3Elx2KuBqhVSyWUQPWwvGoHxqP4QFlVZ/saE@gmail.com X-Google-Smtp-Source: AGHT+IF3KMjdr3mVooTadkWJD5MUW0uHn7gXczuw8Ai+Y0qf5P3HzgXXbR87PLjvY2uNfliEv/fn X-Received: by 2002:a17:907:1b05:b0:a72:750d:ab08 with SMTP id a640c23a62f3a-a90294f3b59mr32928066b.14.1726078456605; Wed, 11 Sep 2024 11:14:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726078456; cv=none; d=google.com; s=arc-20240605; b=XA67AAo3uht2iwSWeJqwliTu1LOBuey65lA2e/qqQqa7BVzt0FMOcGY1X5pYwjP0XX zVuMeZujor+gCfpRcxE75sqyurh6fnBLMvSFBkb+WTowRXSje3Df0QzQiY5jVMGTtqm8 QbkoU1w4jKJ43oROYcZf0Xozv73j585MMrSxp9Jf4E/i6fNItRUt0nTiSaoyTwTuoZYk CDVCPndZPOkJPEoV/hqmdtQjl6PHS8pVuPElYCZYG6qiiCshFD4WbrBxU3Q5k3CESAVp R0PI2GB3O8O8KHqL09Jh9qR6Wb3GWhHQ2ppQeEs2Q9M6ddc2ekBhytXDmMTCYtRy2q50 AGsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=HWZmJaTNDRuYC5LCHuOz1e9TqHXcStGw2gYQX6DL+3k=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=cq9mIQZOv+NvXpDwI5fY4NlfI4uot05gZdYUpNdB+fBorW+Q9Qfq2bEvnltNlKQpRg 5tD4+qlrABlbFUA+gARcjXIvhWyaPlxiUEUQPXCRYtdIXJBoxZot86dyl5m6m6irWLw0 5fLMoSbW727TMjaKf+6NUIyKn9dfHSOJk33D7DLJxMBFkyD9+nQvGjFHTQfX37nO1PX8 wa9M/1RTluQpgW1l4p+rWPC0fXIm1KoMoqgP2tfdq68m4RNliFgBA/xs3EqQh4emtkkS EUUeDCNVAJzL+rYYWjMiC9HjWVxqmZpLppNIMloOtmli0ochR9V5biqmUi+gBnKptlrL AWcA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=frE8VZsL; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a8d25d65367si730550966b.813.2024.09.11.11.14.16; Wed, 11 Sep 2024 11:14:16 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=frE8VZsL; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6647168E2DD; Wed, 11 Sep 2024 21:06:49 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-149.mail.qq.com (out203-205-221-149.mail.qq.com [203.205.221.149]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C817D68E274 for ; Wed, 11 Sep 2024 21:06:33 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077985; bh=vhduru0a93vJSYI77SuXcWDrEbnyDdFlfukg8TcZjpM=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=frE8VZsLGe8bFg/Gv8kV+UcAZTd7ru/ZW3nvveqh09bcKONk5TzBYQBtFtrcnZdsH 8SEXfC+Y/5LW9cF85FS4p/sbSaZFJWpx4ejPaLZJVasY9Nt1h13GpGgtKOkxiXPwZ7 3N42Y8jzaOiJibr5AflaLmGdHOQ2tdnu6Y3TgILA= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077984trjxrxmce Message-ID: X-QQ-XMAILINFO: NZ3yMkVY2d1s4TwKLiWCuBRHQ1dtnzm3yBZjMpqPAwPQfIHck1oD9jcks0055c bTyU2h71kuFMPv3im2G+OFXHgrazd2xWY2HSCQBCyJLX9Nvf2vshFtdN0Z9r7GuFtv4pBCgI1c3l 3aS4XyUfrnB6MNhIXeBODA+C4z7XnUkXR3PeYcPng7bQ5uFiLURsWFg6pSpEplTMiicTr5q3WSkr +ZlcRYw2X5qcQRaa4j07tc45WjQrnauvEDsdx3PRZC0lw7SFWfT5b7Cld0P3B59vE3IOZuCIeFy/ KRcDi2GLLoDFfsLpvZBmkxDsOP0bxLlyu9jmEp8x/4TJ0ql1S/ghtbyqzmenA69uqzCt4DS2FSHX +ZJYwvZXmmgUULD+5sihaIZcZKQorB1wDv9fHuyxy7u8XlqaHyklr0I8voOg3gZBzYWnQDkiM8zo dZdEhnjbImQVayMOB3qU2Ap1/BVV0y11BTfmqPxm1pQj5QAYFNgnbzqEorFg7qYTQl4MPYxrFPje xe2D1WAt0HhfchJLAEJ4IQbpsny5J7kJiz3i+lqPrW9WdXVPWIEnmxKS+pKtOyQt4/oatzJA1/Ci D9XDhStkndcv3g/t5DhLzmL1hFLMHdytdYTFsydnxxg1QsoZrw0EaOvlT4Y5x+kAzx6GBp6QZjIX 2PPdB/LFBtjh6set2H7h1ZRRnt53ef9wWzzS2scoHcV/BVos4NazfqK0vwEhpL9Cw8lHzQNg4xIA M+DPG7yri9aE7azFx5QYQBdna6BAWBa107MywjvWO9iXDINBndiu3CYI2g79oIPznEsMPwQjc1/Q KmHch09YoM4IZKNJMkjRy+sHFbEMyQZv970+ilQuBcjRCUJY+t/Dp2blDFTFe2/iwpatKjcBQyIf JH+SMkOpfw1TJpp3S7Gs2vl1SzhwcWy8oxJ35vXHJ91PU+H14VtJe5+w6W0oaie+W/MJ6cIJLA9x T4blRUQWDlINy7tSYGsJl25a7IfU9Yi/cFFVn+qXEAYIYbUERYoZOw44YVueWW3EPwozADz+E= X-QQ-XMRINFO: MPJ6Tf5t3I/ycC2BItcBVIA= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:13 +0800 X-OQ-MSGID: <20240911180618.28921-10-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 09/14] aarch64/vvc: Add put_qpel_hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ug4Yc8qaZLBc From: Zhao Zhili With Apple M1 (no i8mm): put_luma_hv_8_4x4_c: 2.2 ( 1.00x) put_luma_hv_8_4x4_neon: 0.8 ( 3.00x) put_luma_hv_8_8x8_c: 7.0 ( 1.00x) put_luma_hv_8_8x8_neon: 0.8 ( 9.33x) put_luma_hv_8_16x16_c: 22.8 ( 1.00x) put_luma_hv_8_16x16_neon: 2.5 ( 9.10x) put_luma_hv_8_32x32_c: 84.8 ( 1.00x) put_luma_hv_8_32x32_neon: 9.5 ( 8.92x) put_luma_hv_8_64x64_c: 333.0 ( 1.00x) put_luma_hv_8_64x64_neon: 35.5 ( 9.38x) put_luma_hv_8_128x128_c: 1294.5 ( 1.00x) put_luma_hv_8_128x128_neon: 137.8 ( 9.40x) With Pixel 8 Pro: put_luma_hv_8_4x4_c: 5.0 ( 1.00x) put_luma_hv_8_4x4_neon: 0.8 ( 6.67x) put_luma_hv_8_4x4_i8mm: 0.2 (20.00x) put_luma_hv_8_8x8_c: 13.2 ( 1.00x) put_luma_hv_8_8x8_neon: 1.2 (10.60x) put_luma_hv_8_8x8_i8mm: 1.2 (10.60x) put_luma_hv_8_16x16_c: 44.2 ( 1.00x) put_luma_hv_8_16x16_neon: 4.5 ( 9.83x) put_luma_hv_8_16x16_i8mm: 4.2 (10.41x) put_luma_hv_8_32x32_c: 160.8 ( 1.00x) put_luma_hv_8_32x32_neon: 17.5 ( 9.19x) put_luma_hv_8_32x32_i8mm: 16.0 (10.05x) put_luma_hv_8_64x64_c: 611.2 ( 1.00x) put_luma_hv_8_64x64_neon: 68.0 ( 8.99x) put_luma_hv_8_64x64_i8mm: 62.2 ( 9.82x) put_luma_hv_8_128x128_c: 2384.8 ( 1.00x) put_luma_hv_8_128x128_neon: 268.8 ( 8.87x) put_luma_hv_8_128x128_i8mm: 245.8 ( 9.70x) --- libavcodec/aarch64/h26x/dsp.h | 8 ++ libavcodec/aarch64/h26x/qpel_neon.S | 140 ++++++++++++++++++++++++++++ libavcodec/aarch64/vvc/dsp_init.c | 14 +++ 3 files changed, 162 insertions(+) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 881091f39a..c54906dde2 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -282,4 +282,12 @@ void ff_vvc_put_qpel_v8_8_neon(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width); +NEON8_FNPROTO_PARTIAL_6(qpel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, int height, + const int8_t *hf, const int8_t *vf, int width),); + +NEON8_FNPROTO_PARTIAL_6(qpel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, int height, + const int8_t *hf, const int8_t *vf, int width), _i8mm); + #endif diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index a6a3b9549d..5c3f0263b6 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -4140,9 +4140,15 @@ endfunc DISABLE_I8MM #endif +function vvc_put_qpel_hv4_8_end_neon + vvc_load_qpel_filterh x5 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc function hevc_put_hevc_qpel_hv4_8_end_neon load_qpel_filterh x5, x4 +1: ldr d16, [sp] ldr d17, [sp, x7] add sp, sp, x7, lsl #1 @@ -4194,9 +4200,16 @@ function hevc_put_hevc_qpel_hv6_8_end_neon ret endfunc +function vvc_put_qpel_hv8_8_end_neon + vvc_load_qpel_filterh x5 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc + function hevc_put_hevc_qpel_hv8_8_end_neon mov x7, #128 load_qpel_filterh x5, x4 +1: ldr q16, [sp] ldr q17, [sp, x7] add sp, sp, x7, lsl #1 @@ -4247,9 +4260,16 @@ function hevc_put_hevc_qpel_hv12_8_end_neon ret endfunc +function vvc_put_qpel_hv16_8_end_neon + vvc_load_qpel_filterh x5 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 1f +endfunc + function hevc_put_hevc_qpel_hv16_8_end_neon mov x7, #128 load_qpel_filterh x5, x4 +1: ld1 {v16.8h, v17.8h}, [sp], x7 ld1 {v18.8h, v19.8h}, [sp], x7 ld1 {v20.8h, v21.8h}, [sp], x7 @@ -4272,6 +4292,12 @@ function hevc_put_hevc_qpel_hv16_8_end_neon ret endfunc +function vvc_put_qpel_hv32_8_end_neon + vvc_load_qpel_filterh x5 + mov x7, #(VVC_MAX_PB_SIZE * 2) + b 0f +endfunc + function hevc_put_hevc_qpel_hv32_8_end_neon mov x7, #128 load_qpel_filterh x5, x4 @@ -4325,6 +4351,25 @@ function ff_hevc_put_hevc_qpel_hv4_8_\suffix, export=1 b hevc_put_hevc_qpel_hv4_8_end_neon endfunc +function ff_vvc_put_qpel_hv4_8_\suffix, export=1 + add w10, w3, #8 + lsl x10, x10, #8 + mov x14, sp + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x0, sp, #48 + sub x1, x1, x2, lsl #1 + add x3, x3, #7 + sub x1, x1, x2 + bl X(ff_vvc_put_qpel_h4_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 + b vvc_put_qpel_hv4_8_end_neon +endfunc + function ff_hevc_put_hevc_qpel_hv6_8_\suffix, export=1 add w10, w3, #8 mov x7, #128 @@ -4364,6 +4409,25 @@ function ff_hevc_put_hevc_qpel_hv8_8_\suffix, export=1 b hevc_put_hevc_qpel_hv8_8_end_neon endfunc +function ff_vvc_put_qpel_hv8_8_\suffix, export=1 + add w10, w3, #8 + lsl x10, x10, #8 + sub x1, x1, x2, lsl #1 + mov x14, sp + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x0, sp, #48 + add x3, x3, #7 + sub x1, x1, x2 + bl X(ff_vvc_put_qpel_h8_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 + b vvc_put_qpel_hv8_8_end_neon +endfunc + function ff_hevc_put_hevc_qpel_hv12_8_\suffix, export=1 add w10, w3, #8 lsl x10, x10, #7 @@ -4403,6 +4467,25 @@ function ff_hevc_put_hevc_qpel_hv16_8_\suffix, export=1 b hevc_put_hevc_qpel_hv16_8_end_neon endfunc +function ff_vvc_put_qpel_hv16_8_\suffix, export=1 + add w10, w3, #8 + lsl x10, x10, #8 + sub x1, x1, x2, lsl #1 + mov x14, sp + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x3, x3, #7 + add x0, sp, #48 + sub x1, x1, x2 + bl X(ff_vvc_put_qpel_h16_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 + b vvc_put_qpel_hv16_8_end_neon +endfunc + function ff_hevc_put_hevc_qpel_hv24_8_\suffix, export=1 stp x4, x5, [sp, #-64]! stp x2, x3, [sp, #16] @@ -4439,6 +4522,26 @@ function ff_hevc_put_hevc_qpel_hv32_8_\suffix, export=1 b hevc_put_hevc_qpel_hv32_8_end_neon endfunc +function ff_vvc_put_qpel_hv32_8_\suffix, export=1 + add w10, w3, #8 + sub x1, x1, x2, lsl #1 + lsl x10, x10, #8 + sub x1, x1, x2 + mov x14, sp + sub sp, sp, x10 // tmp_array + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x3, x3, #7 + add x0, sp, #48 + mov w6, #32 + bl X(ff_vvc_put_qpel_h32_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 + b vvc_put_qpel_hv32_8_end_neon +endfunc + function ff_hevc_put_hevc_qpel_hv48_8_\suffix, export=1 stp x4, x5, [sp, #-64]! stp x2, x3, [sp, #16] @@ -4472,6 +4575,43 @@ function ff_hevc_put_hevc_qpel_hv64_8_\suffix, export=1 ldr x30, [sp], #16 ret endfunc + +function ff_vvc_put_qpel_hv64_8_\suffix, export=1 + stp x4, x5, [sp, #-64]! + stp x2, x3, [sp, #16] + stp x0, x1, [sp, #32] + str x30, [sp, #48] + mov x6, #32 + bl X(ff_vvc_put_qpel_hv32_8_\suffix) + ldp x0, x1, [sp, #32] + ldp x2, x3, [sp, #16] + ldp x4, x5, [sp], #48 + add x1, x1, #32 + add x0, x0, #64 + mov x6, #32 + bl X(ff_vvc_put_qpel_hv32_8_\suffix) + ldr x30, [sp], #16 + ret +endfunc + +function ff_vvc_put_qpel_hv128_8_\suffix, export=1 + stp x4, x5, [sp, #-64]! + stp x2, x3, [sp, #16] + stp x0, x1, [sp, #32] + str x30, [sp, #48] + mov x6, #64 + bl X(ff_vvc_put_qpel_hv64_8_\suffix) + ldp x0, x1, [sp, #32] + ldp x2, x3, [sp, #16] + ldp x4, x5, [sp], #48 + add x1, x1, #64 + add x0, x0, #128 + mov x6, #64 + bl X(ff_vvc_put_qpel_hv64_8_\suffix) + ldr x30, [sp], #16 + ret +endfunc + .endm qpel_hv neon diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index ba3a49aa1a..934d918ffd 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -67,6 +67,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][5][1][0] = c->inter.put[0][6][1][0] = ff_vvc_put_qpel_v8_8_neon; + c->inter.put[0][1][1][1] = ff_vvc_put_qpel_hv4_8_neon; + c->inter.put[0][2][1][1] = ff_vvc_put_qpel_hv8_8_neon; + c->inter.put[0][3][1][1] = ff_vvc_put_qpel_hv16_8_neon; + c->inter.put[0][4][1][1] = ff_vvc_put_qpel_hv32_8_neon; + c->inter.put[0][5][1][1] = ff_vvc_put_qpel_hv64_8_neon; + c->inter.put[0][6][1][1] = ff_vvc_put_qpel_hv128_8_neon; + c->inter.put_uni[0][1][0][0] = ff_vvc_put_pel_uni_pixels4_8_neon; c->inter.put_uni[0][2][0][0] = ff_vvc_put_pel_uni_pixels8_8_neon; c->inter.put_uni[0][3][0][0] = ff_vvc_put_pel_uni_pixels16_8_neon; @@ -103,6 +110,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[0][4][0][1] = ff_vvc_put_qpel_h32_8_neon_i8mm; c->inter.put[0][5][0][1] = ff_vvc_put_qpel_h64_8_neon_i8mm; c->inter.put[0][6][0][1] = ff_vvc_put_qpel_h128_8_neon_i8mm; + + c->inter.put[0][1][1][1] = ff_vvc_put_qpel_hv4_8_neon_i8mm; + c->inter.put[0][2][1][1] = ff_vvc_put_qpel_hv8_8_neon_i8mm; + c->inter.put[0][3][1][1] = ff_vvc_put_qpel_hv16_8_neon_i8mm; + c->inter.put[0][4][1][1] = ff_vvc_put_qpel_hv32_8_neon_i8mm; + c->inter.put[0][5][1][1] = ff_vvc_put_qpel_hv64_8_neon_i8mm; + c->inter.put[0][6][1][1] = ff_vvc_put_qpel_hv128_8_neon_i8mm; } } else if (bd == 10) { c->alf.filter[LUMA] = alf_filter_luma_10_neon;