From patchwork Wed Sep 11 18:06:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51513 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp479400vqi; Wed, 11 Sep 2024 11:07:42 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXIlEbTJcqYXS/U6+MlgCNEsveRRiU8igdc+O5Pl33DsipGj+AmfjYXw66weKm1C4f3LnTwkuu1NR+UaRXz7oQT@gmail.com X-Google-Smtp-Source: AGHT+IG4/lnOETZUgygpa2Trl6xXSLab8F95isLfh0bTfBXKLzD+QLrTdNwrBsAf2PCuqAvAfZg3 X-Received: by 2002:a17:907:9618:b0:a86:ac9e:45fd with SMTP id a640c23a62f3a-a902974b93fmr22079866b.62.1726078062213; Wed, 11 Sep 2024 11:07:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726078062; cv=none; d=google.com; s=arc-20240605; b=R01KPnvlv8L+4RCocZa1O/TRIJmh2f2/ofRJ+EAvKdQwABbZ0KdYXX99XvYqKnBFxA VnC+gQoyTZxLrzlyhveg2JgIFlIsmMpDVBq/oJ94SL6jHHy2wDeeuBslNtwW2eg4i7it FB/rL/UaeYQFYR+ElKiyx0Sg5bANUHLROg8iY6a+Onr2CE+FrhD6NIvbXS1UoDs2dEtA 2b640yujgn9tO8zX8AlnqY7bXCYXO1dDaTWXjcqeSc1kbqu4LMYmSuFeABBLj3X6UnPA /S/YfBn5YVsOKYScQUE+4jp3WHs3cDuH2MXBQ3HhGAtexDi2DFTOGOTNqlPHz0fErFF4 fQ8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=TlB+Rb0lcXKpDyHtYaNUg3WGSpAQo/2l8EsB7UV7j2w=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=SKpyHc6sVJqnUGbpkRmaKnUeKIyhFvJ7lI0xcL3uHvzZ7C/ZOtlkYmgOg3GkqwWOJi DEJ0+cYpB6Itstc2Yju1s+SI9m2ZKVH99UK8vsZf96hq5ivBOI3LYvaDsHU50l01zKG0 IUNceLXmqqdRhCFFfm3swjVlDvxFzHrWD/QeCjdgQUsi3cmA/+TSHi5KGr1qV9RghCQD 8oB4D99//yT0MXYMe6hRCSw6PbDx7oUTTeW7EGLYTiIFxSZwiUn0GEvr448PA4iKjf3M QxJYQnmXLhcb25Dkm7RIxrDmJ1HTJCOxsb8oROx58HHvfjrTSMscpR40y7VOlhcgOk3R m/IQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=zGpUElhG; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a8d25ce92f6si724261166b.499.2024.09.11.11.07.41; Wed, 11 Sep 2024 11:07:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=zGpUElhG; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EE81868E2BD; Wed, 11 Sep 2024 21:06:44 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-205.mail.qq.com (out203-205-221-205.mail.qq.com [203.205.221.205]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3E2DA68E274 for ; Wed, 11 Sep 2024 21:06:30 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077983; bh=/buJoYZS/K9UVfRbTaCk8sH7S8GN7uGTGBrRvxC1UFU=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=zGpUElhGLHaGGGvA/0vLsDoSa6VriBDntcw8fSxZiWk8Vbatc6iH9wE0a3YFge8V/ ph/1lxc2wqKEnabWc4PwnK8Mbszka7gL9liYNXqyLYLBhsbqcvj0LCwbmciJs8b+tn 2GTzGkA+O0znGLDvciymlbqdYCsFB7KojDrOf2KQ= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077982t9lq03a4u Message-ID: X-QQ-XMAILINFO: NVJ0hJNx7N5SoZ9zmMMHKuutgP8jElMIhDoWOLUepf24mJfLTW2URxf0zM+FeE M1aSsSqgoc/iwa5v8Ed/Yz59Y61XdkXv1ZaDS1cjY9qABbsUdhoHIh9EcNkASp2PueibBLAZORfW sASl3Aoc+jYrm/KTxGgnd8B+FE4O86faLWME5vFkGgj1qnE74RlmTTkT9XBQw9YADAx3TEaC5Km6 fQx1J+EXuknn37yO1XZnX97C0HmzlkS8T3DH/vXb8LvWH/3tYPGh5Is5zwVwMTcoTEBZJB5EtJgH kIMHDOFqGdWV7rqv1FG5si14wX22X35d4dGMN0e4vOBV5zq84bQZKVADpWrpzMrEPFlfqyr7c1a7 WgTw1bWeZNM5hrAVqZY5+koP/qt9dMlf+3JoiH2ZiRblJmHIKyEWJYAPky7pexluRdJz74Ty/RsZ y4YAJgFnsKEqRDtujswcNlWx3LgX2nv47srdVdBdlkCWruELzejku9fN0m8Nf/06CkHB9FLEKdm/ W3X/6VQj3xNtJRslKUzTS6dRjUYOMZEPvna6dxvLUAowiGxaj/CPAB+RuFZRIJusYmQ9adweIAF4 EEK7QlUsmSd5/yizX/nnBSz6TmEdFmLVSUFznaT96hoK4MhCrqNtrwhBA/Jyp6KEx1reP2F759Yz G1QRMCwgSARMP5XZpyOrB6QpGjVnL5Z+pcd65YVQ0lq396/dcmbGZDHguNJt/QW6x+7FWTburfZ7 kI1zC7VkilaXhCbKlsxMxb5XF6GOc4lmxsuLbjgoZHcq+sglGcDWZhCjf89Epo6VjzyLrZj9QhIs EefdlmfR76F04iI5yV84NDM7sDlH5R/Momg+5msAHVwYAWsuKuNzCUj85rSWllcCnLs4G7AvbYjj eypHmEEAnXyWknflf89CW47F6tUN+ePBynjmFsQh5tWSnlb424hKThXX82ETZ6i028NH7b/j43MA A2KffHNasQX/foLRoS6TukGsoBct5w1ZPGlNUzB5BTUH4dw+0gXfiLW2ppx+xdkugNhxn0e1t+bi qsXCmkCA== X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:09 +0800 X-OQ-MSGID: <20240911180618.28921-6-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 05/14] aarch64/vvc: Add put_qpel_hx i8mm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: mVSB1S+94v2o From: Zhao Zhili Benchmark on Android pixel 8 with -fno-vectorize put_luma_h_8_4x4_c: 0.2 ( 1.00x) put_luma_h_8_4x4_neon: 0.2 ( 1.00x) put_luma_h_8_4x4_i8mm: 0.0 ( 0.00x) put_luma_h_8_8x8_c: 1.5 ( 1.00x) put_luma_h_8_8x8_neon: 0.5 ( 3.00x) put_luma_h_8_8x8_i8mm: 0.5 ( 3.00x) put_luma_h_8_16x16_c: 6.2 ( 1.00x) put_luma_h_8_16x16_neon: 2.0 ( 3.12x) put_luma_h_8_16x16_i8mm: 1.5 ( 4.17x) put_luma_h_8_32x32_c: 25.5 ( 1.00x) put_luma_h_8_32x32_neon: 9.0 ( 2.83x) put_luma_h_8_32x32_i8mm: 6.8 ( 3.78x) put_luma_h_8_64x64_c: 99.8 ( 1.00x) put_luma_h_8_64x64_neon: 35.2 ( 2.83x) put_luma_h_8_64x64_i8mm: 27.2 ( 3.66x) put_luma_h_8_128x128_c: 422.0 ( 1.00x) put_luma_h_8_128x128_neon: 138.5 ( 3.05x) put_luma_h_8_128x128_i8mm: 109.2 ( 3.86x) --- libavcodec/aarch64/h26x/dsp.h | 4 ++ libavcodec/aarch64/h26x/qpel_neon.S | 68 ++++++++++++++++++++++++++--- libavcodec/aarch64/vvc/dsp_init.c | 9 ++++ 3 files changed, 76 insertions(+), 5 deletions(-) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 076d01b477..323a253257 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -270,4 +270,8 @@ NEON8_FNPROTO_PARTIAL_6(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride, int height, int denom, int wx, int ox, const int8_t *hf, const int8_t *vf, int width),); +NEON8_FNPROTO_PARTIAL_6(qpel_h, (int16_t * dst, + const uint8_t *_src, ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width), _i8mm); + #endif diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index 47b3948f8b..1fa5a1dd0e 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -3516,6 +3516,17 @@ endfunc sub x1, x1, #3 .endm +.macro VVC_QPEL_H_HEADER + ld1r {v31.2d}, [x4] + sub x1, x1, #3 +.endm + +function ff_vvc_put_qpel_h4_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h4_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3572,6 +3583,12 @@ function ff_hevc_put_hevc_qpel_h6_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_qpel_h8_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h8_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3656,6 +3673,12 @@ function ff_hevc_put_hevc_qpel_h12_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_qpel_h16_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h16_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3746,6 +3769,13 @@ function ff_hevc_put_hevc_qpel_h24_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_qpel_h32_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + add x15, x0, #32 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h32_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3881,10 +3911,7 @@ function ff_hevc_put_hevc_qpel_h48_8_neon_i8mm, export=1 ret endfunc -function ff_hevc_put_hevc_qpel_h64_8_neon_i8mm, export=1 - QPEL_H_HEADER - sub x2, x2, #64 -1: +.macro put_qpel_h64_8_neon_i8mm ld1 {v16.16b, v17.16b, v18.16b, v19.16b}, [x1], #64 ext v1.16b, v16.16b, v17.16b, #1 ext v2.16b, v16.16b, v17.16b, #2 @@ -3975,11 +4002,42 @@ function ff_hevc_put_hevc_qpel_h64_8_neon_i8mm, export=1 sqxtn2 v20.8h, v26.4s sqxtn v21.4h, v23.4s sqxtn2 v21.8h, v27.4s - stp q20, q21, [x0], #32 + stp q20, q21, [x0] + add x0, x0, x10 +.endm + +function ff_vvc_put_qpel_h64_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #(VVC_MAX_PB_SIZE * 2 - 32 * 3) + sub x2, x2, #64 + b 1f +endfunc + +function ff_hevc_put_hevc_qpel_h64_8_neon_i8mm, export=1 + QPEL_H_HEADER + mov x10, #32 + sub x2, x2, #64 +1: + put_qpel_h64_8_neon_i8mm + subs w3, w3, #1 + b.ne 1b + ret +endfunc + +function ff_vvc_put_qpel_h128_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + sub x11, x2, #128 + mov x10, #32 + mov x2, #0 +1: + put_qpel_h64_8_neon_i8mm subs w3, w3, #1 + put_qpel_h64_8_neon_i8mm + add x1, x1, x11 b.ne 1b ret endfunc + DISABLE_I8MM #endif diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 457be8c725..bcc7df8f6c 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -88,6 +88,15 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->sao.edge_filter[i] = ff_vvc_sao_edge_filter_16x16_8_neon; c->alf.filter[LUMA] = alf_filter_luma_8_neon; c->alf.filter[CHROMA] = alf_filter_chroma_8_neon; + + if (have_i8mm(cpu_flags)) { + c->inter.put[0][1][0][1] = ff_vvc_put_qpel_h4_8_neon_i8mm; + c->inter.put[0][2][0][1] = ff_vvc_put_qpel_h8_8_neon_i8mm; + c->inter.put[0][3][0][1] = ff_vvc_put_qpel_h16_8_neon_i8mm; + c->inter.put[0][4][0][1] = ff_vvc_put_qpel_h32_8_neon_i8mm; + c->inter.put[0][5][0][1] = ff_vvc_put_qpel_h64_8_neon_i8mm; + c->inter.put[0][6][0][1] = ff_vvc_put_qpel_h128_8_neon_i8mm; + } } else if (bd == 10) { c->alf.filter[LUMA] = alf_filter_luma_10_neon; c->alf.filter[CHROMA] = alf_filter_chroma_10_neon;