From patchwork Mon Mar 25 15:02:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 47444 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:c889:b0:1a3:b6bb:3029 with SMTP id hb9csp1248936pzb; Mon, 25 Mar 2024 08:06:08 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU56e4t3cFXo+g9vIo5sY2geN5Txfw7QlWlaLdtMxD3vswgV/1jytSnZYpkZT3t3FgUV3nZs5iyRdNFbigqH/QW9WTPkKej/aDsvg== X-Google-Smtp-Source: AGHT+IHcEHKiAqYFqdMhhhm28iuSY8t78/KbZ18YqCaO4OsgCPGyH0WEuRUrX28ekNhQiObbOZNX X-Received: by 2002:a17:907:944b:b0:a4a:3c5c:482f with SMTP id dl11-20020a170907944b00b00a4a3c5c482fmr1673826ejc.19.1711379168056; Mon, 25 Mar 2024 08:06:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1711379168; cv=none; d=google.com; s=arc-20160816; b=PVX6O4PZMHY5RY3ZnmDHd3mx4e0Sxh3lRlMl0y9LBSkwiXiY11xIWmMkaFhJC15w1/ +zpXy2A+OeXb+DqS9iqP92ev6mj71giXs1Wr7iPpbR9SM+hBUDNSVD+428pxMXTkYwSa LLpHuHI1AQbYx0eYHP+ve1uv0dcElG5xU5jMEizqofFDjvsChnhmpHq1nSTm1s0McTcs Wl1cDwq02cy5DQgJ13RRH3YwZWTdC6cOze1eSIBOp/LSbiCLX2gezmszjJSra8lSsP56 tAsDqnPQjRiLt5f7XqGqsWAQQklU97/T1sS27NchgJVxRQDpni3hzgN0ziYZ9DtSRtTX 7TDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=kZF6E1v4R4JpaCQ9zcSsmT1tnk1hngAL8zjpMM6Pnl8=; fh=DRs4GoYGH1Ee5ERfKSLc+LBtFKtorfi5XrD3r13LPeM=; b=UNkLxCUlFoKz+twJ+P3BSxqyOii44ugoxH6RHEpQ1MwEGll+GVBZ34Ur+lYdUZUHlv BRKStTE5QQaMYvnrHp6sYv5yFZ6qj4TnajVGRIMyeiT1ucU/U6/yHBuvUlQBzGj+T/8r H0aoqNpcqFpxPeWGw9rz8m59ampkRu2zzJ1OJA2bPjKFj+rOnUZ8wJwBT2FjLhDxTrrl UVenTUOXYxInUHZ+SyNhFGTbWnxQioa9W18nFVl0l8v34pyQceOr61k92zFFbhXftSvr z9rJG7gAwdVAWyrCgZ3E0hOGrsnUKCkFphbIZYGKRO21QHaNK8QdtpfNmYzhUZ/KiaUX vrMA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b=K90MToeP; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id jj22-20020a170907985600b00a442072c5fdsi2545175ejc.667.2024.03.25.08.06.07; Mon, 25 Mar 2024 08:06:08 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b=K90MToeP; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 14A0668D5FD; Mon, 25 Mar 2024 17:03:14 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5F1CD68D5C5 for ; Mon, 25 Mar 2024 17:02:59 +0200 (EET) Received: by mail-lf1-f44.google.com with SMTP id 2adb3069b0e04-513edc88d3cso4772936e87.0 for ; Mon, 25 Mar 2024 08:02:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1711378978; x=1711983778; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=f3sN059C0O7XG05mFAfc6eTXck62D2W08YoBpQswwL4=; b=K90MToeP+K7BT68XenIY/fpeywoze2tBax7+aLIJnjtgZhCVDniZUpkdCYyyTzWCpg 3aS0oZkI1RG2QdQH/pXhG/zgihHxHBdkU9A5ooX8cjm7j5ST5pvszGg1IwjtSSuGXORZ eHqEYpkJpj6QmGLfGf8KQo+qtmIHa5rodor/nWslWturJxGS7Lq2zpbYDtvZRz8CzTTd w8FFBxHPbVrDslwQjCphicK0GY3p9jiR91JvE82U8RI/c6QSJSKHSZ4LEuy9eTTMWTml 5aAxcYJ1CxPp3SAshqFwa6TZrwcxKK5AcjxntOnlVupXguRqpzwtSUVRxQ/P32fH8jne TUeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711378978; x=1711983778; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=f3sN059C0O7XG05mFAfc6eTXck62D2W08YoBpQswwL4=; b=DWBRGyoQYvqVz9//FLNiWl8VaKRS0gP+SazhKOFZwrWJXPOQX6JdoJ17zghaGEJB0C dlHyYk5iKiduuaCYzus90Xit5WZOp1wFK0m5DDoL2SVph/MUQ9dJZnuT2Fkmo43mJv7X BR1tzuJeX+sX9g0uvHW7QKIBDdT8DrtH9MkFf5cVWiEhawyJ6M6RLIPd5YQ5uMqHLctt Yap2UsJjyo9+Dcn9TEGHRKCYFmMvUw0OdEh3NEhfW3nzqRdBgpBHYrEm3BKZ2QeES9A4 eVy4NvnR20ouQF08neTPZ2ZjDf5Ki9Z5Tp5wP5auHWoeOhn/NBGtKCieoZQLlD77zM2c KBMA== X-Gm-Message-State: AOJu0YzsHaP1WrSYEroM1KPwzRK5aZ6RL21jULHMG9PXra1mv0+HlN0y BLrQEIMFUa9RNUSV2Qbard9yH2jyb7IlU3Gq/FH4oDadaJBKKSJ8CeXKbxEBmkNpjQMzAZS6++O uvKC7 X-Received: by 2002:a05:651c:2122:b0:2d4:ad34:85ae with SMTP id a34-20020a05651c212200b002d4ad3485aemr4888966ljq.16.1711378978530; Mon, 25 Mar 2024 08:02:58 -0700 (PDT) Received: from localhost (host-114-191.parnet.fi. [77.234.114.191]) by smtp.gmail.com with ESMTPSA id i6-20020a2ea366000000b002d4375e7678sm1458114ljn.66.2024.03.25.08.02.58 (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 25 Mar 2024 08:02:58 -0700 (PDT) From: =?utf-8?q?Martin_Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 25 Mar 2024 17:02:40 +0200 Message-Id: <20240325150243.59058-19-martin@martin.st> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20240325150243.59058-1-martin@martin.st> References: <20240325150243.59058-1-martin@martin.st> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 18/21] aarch64: hevc: Produce plain neon versions of qpel_hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Logan Lyu , "J . Dekker" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: beF5uOV3KO9K As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_hv4_8_c: 386.0 put_hevc_qpel_hv4_8_neon: 125.7 put_hevc_qpel_hv4_8_i8mm: 83.2 put_hevc_qpel_hv6_8_c: 749.0 put_hevc_qpel_hv6_8_neon: 207.0 put_hevc_qpel_hv6_8_i8mm: 166.0 put_hevc_qpel_hv8_8_c: 1305.2 put_hevc_qpel_hv8_8_neon: 216.5 put_hevc_qpel_hv8_8_i8mm: 213.0 put_hevc_qpel_hv12_8_c: 2570.5 put_hevc_qpel_hv12_8_neon: 480.0 put_hevc_qpel_hv12_8_i8mm: 398.2 put_hevc_qpel_hv16_8_c: 4158.7 put_hevc_qpel_hv16_8_neon: 659.7 put_hevc_qpel_hv16_8_i8mm: 593.5 put_hevc_qpel_hv24_8_c: 8626.7 put_hevc_qpel_hv24_8_neon: 1653.5 put_hevc_qpel_hv24_8_i8mm: 1398.7 put_hevc_qpel_hv32_8_c: 14646.0 put_hevc_qpel_hv32_8_neon: 2566.2 put_hevc_qpel_hv32_8_i8mm: 2287.5 put_hevc_qpel_hv48_8_c: 31072.5 put_hevc_qpel_hv48_8_neon: 6228.5 put_hevc_qpel_hv48_8_i8mm: 5291.0 put_hevc_qpel_hv64_8_c: 53847.2 put_hevc_qpel_hv64_8_neon: 9856.7 put_hevc_qpel_hv64_8_i8mm: 8831.0 --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 6 + libavcodec/aarch64/hevcdsp_qpel_neon.S | 166 +++++++++++++--------- 2 files changed, 104 insertions(+), 68 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index ea0d26c019..105c26017b 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -265,6 +265,10 @@ NEON8_FNPROTO(qpel_v, (int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height, intptr_t mx, intptr_t my, int width),); +NEON8_FNPROTO(qpel_hv, (int16_t *dst, + const uint8_t *src, ptrdiff_t srcstride, + int height, intptr_t mx, intptr_t my, int width),); + NEON8_FNPROTO(qpel_hv, (int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height, intptr_t mx, intptr_t my, int width), _i8mm); @@ -436,6 +440,8 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) NEON8_FNASSIGN_SHARED_32(c->put_hevc_qpel_uni_w, 0, 1, qpel_uni_w_h,); + NEON8_FNASSIGN(c->put_hevc_qpel, 1, 1, qpel_hv,); + if (have_i8mm(cpu_flags)) { NEON8_FNASSIGN(c->put_hevc_epel, 0, 1, epel_h, _i8mm); NEON8_FNASSIGN(c->put_hevc_epel, 1, 1, epel_hv, _i8mm); diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index ad568e415b..7bffb991a7 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -3804,7 +3804,8 @@ function hevc_put_hevc_qpel_hv4_8_end_neon .endm 1: calc_all .purgem calc -2: ret +2: mov sp, x14 + ret endfunc function hevc_put_hevc_qpel_hv6_8_end_neon @@ -3831,7 +3832,8 @@ function hevc_put_hevc_qpel_hv6_8_end_neon .endm 1: calc_all .purgem calc -2: ret +2: mov sp, x14 + ret endfunc function hevc_put_hevc_qpel_hv8_8_end_neon @@ -3857,7 +3859,8 @@ function hevc_put_hevc_qpel_hv8_8_end_neon .endm 1: calc_all .purgem calc -2: ret +2: mov sp, x14 + ret endfunc function hevc_put_hevc_qpel_hv12_8_end_neon @@ -3882,7 +3885,8 @@ function hevc_put_hevc_qpel_hv12_8_end_neon .endm 1: calc_all2 .purgem calc -2: ret +2: mov sp, x14 + ret endfunc function hevc_put_hevc_qpel_hv16_8_end_neon @@ -3906,7 +3910,8 @@ function hevc_put_hevc_qpel_hv16_8_end_neon .endm 1: calc_all2 .purgem calc -2: ret +2: mov sp, x14 + ret endfunc function hevc_put_hevc_qpel_hv32_8_end_neon @@ -3937,162 +3942,187 @@ function hevc_put_hevc_qpel_hv32_8_end_neon add sp, sp, #32 subs w6, w6, #16 b.hi 0b - add w10, w3, #6 - add sp, sp, #64 // discard rest of first line - lsl x10, x10, #7 - add sp, sp, x10 // tmp_array without first line + mov sp, x14 ret endfunc -#if HAVE_I8MM -ENABLE_I8MM -function ff_hevc_put_hevc_qpel_hv4_8_neon_i8mm, export=1 - add w10, w3, #7 +.macro qpel_hv suffix +function ff_hevc_put_hevc_qpel_hv4_8_\suffix, export=1 + add w10, w3, #8 mov x7, #128 lsl x10, x10, #7 + mov x14, sp sub sp, sp, x10 // tmp_array - stp x5, x30, [sp, #-32]! - stp x0, x3, [sp, #16] - add x0, sp, #32 + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x0, sp, #48 sub x1, x1, x2, lsl #1 add x3, x3, #7 sub x1, x1, x2 - bl X(ff_hevc_put_hevc_qpel_h4_8_neon_i8mm) - ldp x0, x3, [sp, #16] - ldp x5, x30, [sp], #32 + bl X(ff_hevc_put_hevc_qpel_h4_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 b hevc_put_hevc_qpel_hv4_8_end_neon endfunc -function ff_hevc_put_hevc_qpel_hv6_8_neon_i8mm, export=1 - add w10, w3, #7 +function ff_hevc_put_hevc_qpel_hv6_8_\suffix, export=1 + add w10, w3, #8 mov x7, #128 lsl x10, x10, #7 + mov x14, sp sub sp, sp, x10 // tmp_array - stp x5, x30, [sp, #-32]! - stp x0, x3, [sp, #16] - add x0, sp, #32 + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x0, sp, #48 sub x1, x1, x2, lsl #1 add x3, x3, #7 sub x1, x1, x2 - bl X(ff_hevc_put_hevc_qpel_h6_8_neon_i8mm) - ldp x0, x3, [sp, #16] - ldp x5, x30, [sp], #32 + bl X(ff_hevc_put_hevc_qpel_h6_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 b hevc_put_hevc_qpel_hv6_8_end_neon endfunc -function ff_hevc_put_hevc_qpel_hv8_8_neon_i8mm, export=1 - add w10, w3, #7 +function ff_hevc_put_hevc_qpel_hv8_8_\suffix, export=1 + add w10, w3, #8 lsl x10, x10, #7 sub x1, x1, x2, lsl #1 + mov x14, sp sub sp, sp, x10 // tmp_array - stp x5, x30, [sp, #-32]! - stp x0, x3, [sp, #16] - add x0, sp, #32 + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x0, sp, #48 add x3, x3, #7 sub x1, x1, x2 - bl X(ff_hevc_put_hevc_qpel_h8_8_neon_i8mm) - ldp x0, x3, [sp, #16] - ldp x5, x30, [sp], #32 + bl X(ff_hevc_put_hevc_qpel_h8_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 b hevc_put_hevc_qpel_hv8_8_end_neon endfunc -function ff_hevc_put_hevc_qpel_hv12_8_neon_i8mm, export=1 - add w10, w3, #7 +function ff_hevc_put_hevc_qpel_hv12_8_\suffix, export=1 + add w10, w3, #8 lsl x10, x10, #7 sub x1, x1, x2, lsl #1 + mov x14, sp sub sp, sp, x10 // tmp_array - stp x5, x30, [sp, #-32]! - stp x0, x3, [sp, #16] - add x0, sp, #32 + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] + add x0, sp, #48 add x3, x3, #7 sub x1, x1, x2 - bl X(ff_hevc_put_hevc_qpel_h12_8_neon_i8mm) - ldp x0, x3, [sp, #16] - ldp x5, x30, [sp], #32 + mov w6, #12 + bl X(ff_hevc_put_hevc_qpel_h12_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 b hevc_put_hevc_qpel_hv12_8_end_neon endfunc -function ff_hevc_put_hevc_qpel_hv16_8_neon_i8mm, export=1 - add w10, w3, #7 +function ff_hevc_put_hevc_qpel_hv16_8_\suffix, export=1 + add w10, w3, #8 lsl x10, x10, #7 sub x1, x1, x2, lsl #1 + mov x14, sp sub sp, sp, x10 // tmp_array - stp x5, x30, [sp, #-32]! - stp x0, x3, [sp, #16] + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] add x3, x3, #7 - add x0, sp, #32 + add x0, sp, #48 sub x1, x1, x2 - bl X(ff_hevc_put_hevc_qpel_h16_8_neon_i8mm) - ldp x0, x3, [sp, #16] - ldp x5, x30, [sp], #32 + bl X(ff_hevc_put_hevc_qpel_h16_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 b hevc_put_hevc_qpel_hv16_8_end_neon endfunc -function ff_hevc_put_hevc_qpel_hv24_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_qpel_hv24_8_\suffix, export=1 stp x4, x5, [sp, #-64]! stp x2, x3, [sp, #16] stp x0, x1, [sp, #32] str x30, [sp, #48] - bl X(ff_hevc_put_hevc_qpel_hv12_8_neon_i8mm) + bl X(ff_hevc_put_hevc_qpel_hv12_8_\suffix) ldp x0, x1, [sp, #32] ldp x2, x3, [sp, #16] ldp x4, x5, [sp], #48 add x1, x1, #12 add x0, x0, #24 - bl X(ff_hevc_put_hevc_qpel_hv12_8_neon_i8mm) + bl X(ff_hevc_put_hevc_qpel_hv12_8_\suffix) ldr x30, [sp], #16 ret endfunc -function ff_hevc_put_hevc_qpel_hv32_8_neon_i8mm, export=1 - add w10, w3, #7 +function ff_hevc_put_hevc_qpel_hv32_8_\suffix, export=1 + add w10, w3, #8 sub x1, x1, x2, lsl #1 lsl x10, x10, #7 sub x1, x1, x2 + mov x14, sp sub sp, sp, x10 // tmp_array - stp x5, x30, [sp, #-32]! - stp x0, x3, [sp, #16] + stp x5, x30, [sp, #-48]! + stp x0, x3, [sp, #16] + str x14, [sp, #32] add x3, x3, #7 - add x0, sp, #32 - bl X(ff_hevc_put_hevc_qpel_h32_8_neon_i8mm) - ldp x0, x3, [sp, #16] - ldp x5, x30, [sp], #32 + add x0, sp, #48 + mov w6, #32 + bl X(ff_hevc_put_hevc_qpel_h32_8_\suffix) + ldr x14, [sp, #32] + ldp x0, x3, [sp, #16] + ldp x5, x30, [sp], #48 b hevc_put_hevc_qpel_hv32_8_end_neon endfunc -function ff_hevc_put_hevc_qpel_hv48_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_qpel_hv48_8_\suffix, export=1 stp x4, x5, [sp, #-64]! stp x2, x3, [sp, #16] stp x0, x1, [sp, #32] str x30, [sp, #48] - bl X(ff_hevc_put_hevc_qpel_hv24_8_neon_i8mm) + bl X(ff_hevc_put_hevc_qpel_hv24_8_\suffix) ldp x0, x1, [sp, #32] ldp x2, x3, [sp, #16] ldp x4, x5, [sp], #48 add x1, x1, #24 add x0, x0, #48 - bl X(ff_hevc_put_hevc_qpel_hv24_8_neon_i8mm) + bl X(ff_hevc_put_hevc_qpel_hv24_8_\suffix) ldr x30, [sp], #16 ret endfunc -function ff_hevc_put_hevc_qpel_hv64_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_qpel_hv64_8_\suffix, export=1 stp x4, x5, [sp, #-64]! stp x2, x3, [sp, #16] stp x0, x1, [sp, #32] str x30, [sp, #48] mov x6, #32 - bl X(ff_hevc_put_hevc_qpel_hv32_8_neon_i8mm) + bl X(ff_hevc_put_hevc_qpel_hv32_8_\suffix) ldp x0, x1, [sp, #32] ldp x2, x3, [sp, #16] ldp x4, x5, [sp], #48 add x1, x1, #32 add x0, x0, #64 mov x6, #32 - bl X(ff_hevc_put_hevc_qpel_hv32_8_neon_i8mm) + bl X(ff_hevc_put_hevc_qpel_hv32_8_\suffix) ldr x30, [sp], #16 ret endfunc +.endm + +qpel_hv neon + +#if HAVE_I8MM +ENABLE_I8MM + +qpel_hv neon_i8mm + DISABLE_I8MM #endif