From patchwork Mon Mar 25 15:02:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 47441 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:c889:b0:1a3:b6bb:3029 with SMTP id hb9csp1248482pzb; Mon, 25 Mar 2024 08:05:35 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXg+ctHlMTD5sZLhaYey/pr90Gm0glVCks1mdlrxDaN7ojjFyfizKxHukTzxNc71eC44mieucY1QEt8YfkmTbM3bU2EpJQ18Hp2ow== X-Google-Smtp-Source: AGHT+IEJniYy68XvarYQ/jzEQua1bkVC5Ij/JLyxBlBGEYc0tty99LzIAZBQB4Km7lYOpx7xf+PU X-Received: by 2002:a05:6512:3dac:b0:515:acda:77f0 with SMTP id k44-20020a0565123dac00b00515acda77f0mr3217859lfv.29.1711379134916; Mon, 25 Mar 2024 08:05:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1711379134; cv=none; d=google.com; s=arc-20160816; b=MniO2PEZZHIWLqFMPX9uTlWqTirk4Wn5ciqo+DHhxx33PRHb/6l1DRntkVnlbapXPX uYv9mA2OwPA/XuKYoCITiVDx+BRoDf5hLcI/USJRKwRaUqoxV/KFEIl+Ooxo+/tuWncn J2v6aKhajrB7S0dayA+Gl9OmqeMb+Lhw5wW0j92F8JHZL/qkpqgap/43gBwh14NtKP64 MflYi0SoKiy87ncgYb8wm9aDRp+enAhkbhlLv4FKHwu1kW0PLYPPHVOvKDUzsuI5d3ZE GvOoj5N6LSZGLASJx20XZb+Tkl1lgm8OosrVCXmpoksCr0J5oHemWqxG37fNpOeWLeDW 6omQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=6f2t2fdXXkWG90rWW1TZ4kwOebAamzp4onFpYaQ7KBI=; fh=DRs4GoYGH1Ee5ERfKSLc+LBtFKtorfi5XrD3r13LPeM=; b=ZxgXFsTU5H6PAd0VPOm/ddtSBPfd/52G8/EifnE0Op6ZFlydqORIjgObzvBTRRjfgA 40jKrBZbc8T21LxisuAyFJ0H/SVCkFVQDoFh8dTA6eYA1pHz3tkoUrCyfwc5++jaAQuV MbIADiAUQ/tlya7f81E5jzc7826XQikMylDvWZkZXp/7Ch6ctB2s7JTpCS3gtPXMEZnN iejD8RanIvxlv8fJDoXI0cag9r53nvdgNqCfoWYx1YFJgUhHawuKcJiGysPFLB5V66ZN CpcBNi+BhnR87Ie/U6EGPoZEgsbleWip9SZjVAPCVFfbAIKy6SOHdUZPLI6CJOWFrHAg Fc6Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b=ac6hAt97; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id hp19-20020a1709073e1300b00a4a397b4804si747837ejc.388.2024.03.25.08.05.34; Mon, 25 Mar 2024 08:05:34 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b=ac6hAt97; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D596D68D601; Mon, 25 Mar 2024 17:03:10 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B4B6868D47C for ; Mon, 25 Mar 2024 17:02:56 +0200 (EET) Received: by mail-lf1-f48.google.com with SMTP id 2adb3069b0e04-51381021af1so6889496e87.0 for ; Mon, 25 Mar 2024 08:02:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1711378976; x=1711983776; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=aqeEfie6YrzTuNo/YPeHumORUDPsSuYYsWoWi5AUX7k=; b=ac6hAt97Sw9i0yKWgF5Wdx2sqrEug7e9Gztgw+FLH3TmCrfuA0mlQdVJuDAJTrwyZ8 KkIhIfoinzIuYU3ijctNAO45PtA5eQP9oO5ffcTtkga7yU5lcJP+7HGRpNUrSXNTM3Tr v2Qm0qrSif/Sytjqw+n2/K1/Bo2UsBMdjSvrkclPvqRtBd79Phg7Ifh3wFplySrsLi6J oH3hXKDIdleyNJzeB098VEy1irnaLL/HWY8aedbdaimN0MEx0u0pZBXTrPHp0tbVF5/n ETxaQTU7lZQg3+SxssNcdA5TfrpbjsZ/r5zWFudl4+uWfyhVhQ8W0KHmndKmg64bT3d/ ZH6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711378976; x=1711983776; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aqeEfie6YrzTuNo/YPeHumORUDPsSuYYsWoWi5AUX7k=; b=VPT5JOb3Z/xOLcO+eP0v3U3jp/gIEili06ONa+agF9ypBfQcZ8fjyvPLvOL3x+SJID nMdYLwyjTJanzcMd0qnPqZLgNhDkaGXatwVTG78asFkA7wgtC3Tmwd4HPTG6Fc8Z/cSq RAKOVS+XVTUyKUQIaP4Uy25PfCMNQ+/2+hE6YpRRznbcOSBLyNuggh5vQF+GlJ/q8r7l Zv+0kUNZxSRI8gtes9cQc78XYNIONGlrYr5Sxrkbc6vzNwaIRHTKRuPK25GgLW7m5SHC ZMiCXHGA3LXvJF3DgOMpSN1iUbDEVbtQh17lx5uL08IA7d0b5unyThRGKvhVax6A11dM ShEw== X-Gm-Message-State: AOJu0YxhNu1w68DZRmmw71p7CJ3ueFtldIgDCNktViP2drcdIP2BrOlM 7CYiYCsR05QPzOUmU2+CGNr9LkzGh0DehyOW9iIdbzmO9tiAZdGVSUG5K1UfNMXVJ99Ne87x8Li JvYsA X-Received: by 2002:a05:6512:52c:b0:513:cc7e:9919 with SMTP id o12-20020a056512052c00b00513cc7e9919mr6584887lfc.7.1711378976135; Mon, 25 Mar 2024 08:02:56 -0700 (PDT) Received: from localhost (host-114-191.parnet.fi. [77.234.114.191]) by smtp.gmail.com with ESMTPSA id v26-20020a19741a000000b00515a557e705sm943593lfe.24.2024.03.25.08.02.55 (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 25 Mar 2024 08:02:55 -0700 (PDT) From: =?utf-8?q?Martin_Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 25 Mar 2024 17:02:37 +0200 Message-Id: <20240325150243.59058-16-martin@martin.st> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20240325150243.59058-1-martin@martin.st> References: <20240325150243.59058-1-martin@martin.st> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 15/21] aarch64: hevc: Split the qpel_*_hv functions into two parts X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Logan Lyu , "J . Dekker" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 5L1Hcg2rRpyb --- libavcodec/aarch64/hevcdsp_qpel_neon.S | 94 +++++++++++++++++++++++--- 1 file changed, 86 insertions(+), 8 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index fba063186c..c04e8dbea8 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -2166,6 +2166,10 @@ function ff_hevc_put_hevc_qpel_uni_hv4_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 + b hevc_put_hevc_qpel_uni_hv4_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_hv4_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 ldr d16, [sp] @@ -2208,6 +2212,10 @@ function ff_hevc_put_hevc_qpel_uni_hv6_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 + b hevc_put_hevc_qpel_uni_hv6_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_hv6_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 sub x1, x1, #4 @@ -2253,6 +2261,10 @@ function ff_hevc_put_hevc_qpel_uni_hv8_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 + b hevc_put_hevc_qpel_uni_hv8_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_hv8_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 ldr q16, [sp] @@ -2296,6 +2308,10 @@ function ff_hevc_put_hevc_qpel_uni_hv12_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 + b hevc_put_hevc_qpel_uni_hv12_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_hv12_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 sub x1, x1, #8 @@ -2339,7 +2355,10 @@ function ff_hevc_put_hevc_qpel_uni_hv16_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 -.Lqpel_uni_hv16_loop: + b hevc_put_hevc_qpel_uni_hv16_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_hv16_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 sub w12, w9, w7, lsl #1 @@ -2414,7 +2433,7 @@ function ff_hevc_put_hevc_qpel_uni_hv32_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 - b .Lqpel_uni_hv16_loop + b hevc_put_hevc_qpel_uni_hv16_8_end_neon endfunc function ff_hevc_put_hevc_qpel_uni_hv48_8_neon_i8mm, export=1 @@ -2434,7 +2453,7 @@ function ff_hevc_put_hevc_qpel_uni_hv48_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 - b .Lqpel_uni_hv16_loop + b hevc_put_hevc_qpel_uni_hv16_8_end_neon endfunc function ff_hevc_put_hevc_qpel_uni_hv64_8_neon_i8mm, export=1 @@ -2454,7 +2473,7 @@ function ff_hevc_put_hevc_qpel_uni_hv64_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 - b .Lqpel_uni_hv16_loop + b hevc_put_hevc_qpel_uni_hv16_8_end_neon endfunc DISABLE_I8MM #endif @@ -3776,6 +3795,10 @@ function ff_hevc_put_hevc_qpel_hv4_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_qpel_h4_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 + b hevc_put_hevc_qpel_hv4_8_end_neon +endfunc + +function hevc_put_hevc_qpel_hv4_8_end_neon load_qpel_filterh x5, x4 ldr d16, [sp] ldr d17, [sp, x7] @@ -3813,6 +3836,10 @@ function ff_hevc_put_hevc_qpel_hv6_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_qpel_h6_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 + b hevc_put_hevc_qpel_hv6_8_end_neon +endfunc + +function hevc_put_hevc_qpel_hv6_8_end_neon mov x8, #120 load_qpel_filterh x5, x4 ldr q16, [sp] @@ -3852,6 +3879,10 @@ function ff_hevc_put_hevc_qpel_hv8_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_qpel_h8_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 + b hevc_put_hevc_qpel_hv8_8_end_neon +endfunc + +function hevc_put_hevc_qpel_hv8_8_end_neon mov x7, #128 load_qpel_filterh x5, x4 ldr q16, [sp] @@ -3890,6 +3921,10 @@ function ff_hevc_put_hevc_qpel_hv12_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_qpel_h12_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 + b hevc_put_hevc_qpel_hv12_8_end_neon +endfunc + +function hevc_put_hevc_qpel_hv12_8_end_neon mov x7, #128 load_qpel_filterh x5, x4 mov x8, #112 @@ -3927,6 +3962,10 @@ function ff_hevc_put_hevc_qpel_hv16_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_qpel_h16_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 + b hevc_put_hevc_qpel_hv16_8_end_neon +endfunc + +function hevc_put_hevc_qpel_hv16_8_end_neon mov x7, #128 load_qpel_filterh x5, x4 ld1 {v16.8h, v17.8h}, [sp], x7 @@ -3979,6 +4018,10 @@ function ff_hevc_put_hevc_qpel_hv32_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_qpel_h32_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 + b hevc_put_hevc_qpel_hv32_8_end_neon +endfunc + +function hevc_put_hevc_qpel_hv32_8_end_neon mov x7, #128 load_qpel_filterh x5, x4 0: mov x8, sp // src @@ -4127,6 +4170,10 @@ endfunc function ff_hevc_put_hevc_qpel_uni_w_hv4_8_neon_i8mm, export=1 QPEL_UNI_W_HV_HEADER 4 + b hevc_put_hevc_qpel_uni_w_hv4_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_w_hv4_8_end_neon ldr d16, [sp] ldr d17, [sp, x10] add sp, sp, x10, lsl #1 @@ -4217,6 +4264,10 @@ endfunc function ff_hevc_put_hevc_qpel_uni_w_hv8_8_neon_i8mm, export=1 QPEL_UNI_W_HV_HEADER 8 + b hevc_put_hevc_qpel_uni_w_hv8_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_w_hv8_8_end_neon ldr q16, [sp] ldr q17, [sp, x10] add sp, sp, x10, lsl #1 @@ -4327,6 +4378,10 @@ endfunc function ff_hevc_put_hevc_qpel_uni_w_hv16_8_neon_i8mm, export=1 QPEL_UNI_W_HV_HEADER 16 + b hevc_put_hevc_qpel_uni_w_hv16_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_w_hv16_8_end_neon ldp q16, q1, [sp] add sp, sp, x10 ldp q17, q2, [sp] @@ -4430,6 +4485,10 @@ endfunc function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 QPEL_UNI_W_HV_HEADER 32 + b hevc_put_hevc_qpel_uni_w_hv32_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_w_hv32_8_end_neon mov x11, sp mov w12, w22 mov x13, x20 @@ -4543,6 +4602,10 @@ endfunc function ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, export=1 QPEL_UNI_W_HV_HEADER 64 + b hevc_put_hevc_qpel_uni_w_hv64_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_w_hv64_8_end_neon mov x11, sp mov w12, w22 mov x13, x20 @@ -4671,6 +4734,10 @@ function ff_hevc_put_hevc_qpel_bi_hv4_8_neon_i8mm, export=1 ldp x4, x5, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 + b hevc_put_hevc_qpel_bi_hv4_8_end_neon +endfunc + +function hevc_put_hevc_qpel_bi_hv4_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x7, x6 ld1 {v16.4h}, [sp], x9 @@ -4712,6 +4779,10 @@ function ff_hevc_put_hevc_qpel_bi_hv6_8_neon_i8mm, export=1 ldp x4, x5, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 + b hevc_put_hevc_qpel_bi_hv6_8_end_neon +endfunc + +function hevc_put_hevc_qpel_bi_hv6_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x7, x6 sub x1, x1, #4 @@ -4758,6 +4829,10 @@ function ff_hevc_put_hevc_qpel_bi_hv8_8_neon_i8mm, export=1 ldp x4, x5, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 + b hevc_put_hevc_qpel_bi_hv8_8_end_neon +endfunc + +function hevc_put_hevc_qpel_bi_hv8_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x7, x6 ld1 {v16.8h}, [sp], x9 @@ -4822,7 +4897,10 @@ function ff_hevc_put_hevc_qpel_bi_hv16_8_neon_i8mm, export=1 ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 mov x6, #16 // width -.Lqpel_bi_hv16_loop: + b hevc_put_hevc_qpel_bi_hv16_8_end_neon +endfunc + +function hevc_put_hevc_qpel_bi_hv16_8_end_neon load_qpel_filterh x7, x8 mov x9, #(MAX_PB_SIZE * 2) mov x10, x6 @@ -4908,7 +4986,7 @@ function ff_hevc_put_hevc_qpel_bi_hv32_8_neon_i8mm, export=1 ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 mov x6, #32 // width - b .Lqpel_bi_hv16_loop + b hevc_put_hevc_qpel_bi_hv16_8_end_neon endfunc function ff_hevc_put_hevc_qpel_bi_hv48_8_neon_i8mm, export=1 @@ -4929,7 +5007,7 @@ function ff_hevc_put_hevc_qpel_bi_hv48_8_neon_i8mm, export=1 ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 mov x6, #48 // width - b .Lqpel_bi_hv16_loop + b hevc_put_hevc_qpel_bi_hv16_8_end_neon endfunc function ff_hevc_put_hevc_qpel_bi_hv64_8_neon_i8mm, export=1 @@ -4950,7 +5028,7 @@ function ff_hevc_put_hevc_qpel_bi_hv64_8_neon_i8mm, export=1 ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 mov x6, #64 // width - b .Lqpel_bi_hv16_loop + b hevc_put_hevc_qpel_bi_hv16_8_end_neon endfunc DISABLE_I8MM