From patchwork Tue Oct 17 11:45:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 44279 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3e0b:b0:15d:8365:d4b8 with SMTP id bk11csp301822pzc; Tue, 17 Oct 2023 04:46:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGg/sdQKHbEj9QCAsahwyeOHS5y2f//+h6lVexlf1yG0BqhsdW7RlNpUUi9eX8bJtPpvMoG X-Received: by 2002:a17:907:980c:b0:9bf:5df1:38c9 with SMTP id ji12-20020a170907980c00b009bf5df138c9mr1486948ejc.9.1697543212237; Tue, 17 Oct 2023 04:46:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697543212; cv=none; d=google.com; s=arc-20160816; b=PerzM0klwrKQMaZHSBg4m5Ue83OX3eiGz29XghSdoalwMLF3wpN9GEhMXGNSwW00s8 B5lCBdhQ4/FDfi7FqEn2dPpJLrP+MDSxU+fzb6z2jwhEQ+tlelWROMIEV5x3wxOONnef amTvXJUKkjWtZFXBRnFcvC6RROdyFbBP039KE+US5Ce1Gl8aBBPv1Ol+bh3KKsEWofRU 5V/1JDbeXIUugWMgdTCF8k5t59quqtR++fzdGDPICl/BO+5t6YwVpZBJLWLeWd1LgBJl fCuWtGB5VHXpzoeTrIbaDUzVXZgmtq6eYVcIPE87H1ncV3LpCUjpp54DQv7wCO1hkych hXiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=z61hFbAGAcOROFasrakAw5Mr2HIPDq96RONNBth47z0=; fh=4VBelKDE4DH3L7jF6H/1Jmu78FdN+YP76yfdJCQTJ30=; b=vuV6fPzpTngPD5H8qXM0jdexAXxTeXJWMOesTXFYFhGtjrNy/Z3qMz8EjGEv+r3OG4 QUKR6ne8hPrVdDgO+nmXYIzdhJOMyHvKisYbNNLkpu6+win/vwgkOBqtnoKQQlYp2Z9b d0uEIStphXUBfQGQizESwDT+UOaqRKg3zuHRH5lnF4O/wBEAJg/il83XymMolwdqSOLO MOZCjKYuuFOxEvrlkgfTDtTOHJXEGa1FkYjnxuixoe3NFk/kQUNxqDarqGc3rHFVeQxJ z/O+4Soju5u0jRqpBazLXX80TPqSmI2q9wQ00aWVvSU338FXrv6tZvoC9R91hYhjO2nA lS6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b="Ky7kphy/"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id fx25-20020a170906b75900b009b2cda6d09asi608534ejb.805.2023.10.17.04.46.51; Tue, 17 Oct 2023 04:46:52 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b="Ky7kphy/"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CC4D168C9F9; Tue, 17 Oct 2023 14:46:14 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lj1-f170.google.com (mail-lj1-f170.google.com [209.85.208.170]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id BA82A68C93C for ; Tue, 17 Oct 2023 14:46:05 +0300 (EEST) Received: by mail-lj1-f170.google.com with SMTP id 38308e7fff4ca-2c509d5ab43so55798681fa.0 for ; Tue, 17 Oct 2023 04:46:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1697543165; x=1698147965; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MEmfODeGLjWBad888FGyDKhcOip/MaWIDMIipQ5lKaU=; b=Ky7kphy/kaK+jLBn2VfeVy9VjG3nTY+tl2EXcKFm3+PgQDJqbkHD/XxDx+HqluMNuL RgoUGcgTmtpBjt19DeAlClUvdxFsidLTbWGsn0r0jQ753inwh6K7VhBZV4nvXXdxB+UL 04sY7UpTu65pMnSrcd0GK93Q5+FkGTEN1Tou6FEOtfAkjH+8GxUNntKcylxpWhlVR5ok RV3eD291tZJXrKXpUY4gW1cyQY5kqrB14Au+GIhyF0G2LqJsYzTH/xQ9fzX2iaUPcT5W OJ9fZCz4FQNM7M/ngHkQdz03mibnQqtzl5qm1hGQ1Q4JoP7EmVs5NTzdROoDThk5NPys WRXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697543165; x=1698147965; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MEmfODeGLjWBad888FGyDKhcOip/MaWIDMIipQ5lKaU=; b=saRZERxQluUU/Fnv9izimHPNbdjD6j0EI0kfHUbRX6hi0/C3unByh16ln6wECesy0c XpkEGLf78YasCcr6zIibbbCWT/N7jYlpc471JCFWNcTeBX0jCayZUMsrxlD146oTA+ch utJothcyN8nnln/kijpQGukC1hK4My7OOaMOySU0BRNwv5wwqqRfiUF6MrAnbbLusLOa 3AtA+gkOFTDs9DCwaZZL3iOKjOtzSL7Sf+FLfrA6zIbFU3vSsAROWGAQ9NGClwTSguM/ kW+BDEcQCXe9vr0h8+OYupAOoP4mfFAps0G9OgIcRizyXr20pA/Y0MoyD3Zv3dux/gnE HEBw== X-Gm-Message-State: AOJu0YxQjPeeb2mBX2nCe/0LbQgvxvQ6feILD31OTYLBW6ovwUxnWsN1 hSvY1NyKrL491jrWKR4bH1zSU1LkkujuQ3UL5yakaQ== X-Received: by 2002:a05:6512:3144:b0:507:a66b:c9a1 with SMTP id s4-20020a056512314400b00507a66bc9a1mr1425565lfi.17.1697543164645; Tue, 17 Oct 2023 04:46:04 -0700 (PDT) Received: from localhost.localdomain (dsl-tkubng21-58c01c-243.dhcp.inet.fi. [88.192.28.243]) by smtp.gmail.com with ESMTPSA id x25-20020a19f619000000b0050797a35f8csm244532lfe.162.2023.10.17.04.46.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 04:46:04 -0700 (PDT) From: =?utf-8?q?Martin_Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Tue, 17 Oct 2023 14:45:59 +0300 Message-Id: <20231017114601.1374712-4-martin@martin.st> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231017114601.1374712-1-martin@martin.st> References: <20231017114601.1374712-1-martin@martin.st> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/5] aarch64: Manually tweak vertical alignment/indentation in tx_float_neon.S X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: jdek@itanimul.li Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: jQEzOzhk83dt Favour left aligned columns over right aligned columns. In principle either style should be ok, but some of the cases easily lead to incorrect indentation in the surrounding code (see a couple of cases fixed up in the preceding patch), and show up in automatic indentation correction attempts. --- libavutil/aarch64/tx_float_neon.S | 120 +++++++++++++++--------------- 1 file changed, 60 insertions(+), 60 deletions(-) diff --git a/libavutil/aarch64/tx_float_neon.S b/libavutil/aarch64/tx_float_neon.S index 9916ad4142..30ffa2a1d4 100644 --- a/libavutil/aarch64/tx_float_neon.S +++ b/libavutil/aarch64/tx_float_neon.S @@ -733,12 +733,12 @@ FFT16_FN ns_float, 1 add x11, x1, x21, lsl #1 add x12, x1, x22 - ldp q0, q1, [x1, #((0 + \part)*32 + \off)] - ldp q4, q5, [x1, #((2 + \part)*32 + \off)] - ldp q2, q3, [x10, #((0 + \part)*32 + \off)] - ldp q6, q7, [x10, #((2 + \part)*32 + \off)] + ldp q0, q1, [x1, #((0 + \part)*32 + \off)] + ldp q4, q5, [x1, #((2 + \part)*32 + \off)] + ldp q2, q3, [x10, #((0 + \part)*32 + \off)] + ldp q6, q7, [x10, #((2 + \part)*32 + \off)] - ldp q8, q9, [x11, #((0 + \part)*32 + \off)] + ldp q8, q9, [x11, #((0 + \part)*32 + \off)] ldp q10, q11, [x11, #((2 + \part)*32 + \off)] ldp q12, q13, [x12, #((0 + \part)*32 + \off)] ldp q14, q15, [x12, #((2 + \part)*32 + \off)] @@ -747,12 +747,12 @@ FFT16_FN ns_float, 1 v8, v9, v10, v11, v12, v13, v14, v15, \ x7, x8, x9, 0 - stp q0, q1, [x1, #((0 + \part)*32 + \off)] - stp q4, q5, [x1, #((2 + \part)*32 + \off)] - stp q2, q3, [x10, #((0 + \part)*32 + \off)] - stp q6, q7, [x10, #((2 + \part)*32 + \off)] + stp q0, q1, [x1, #((0 + \part)*32 + \off)] + stp q4, q5, [x1, #((2 + \part)*32 + \off)] + stp q2, q3, [x10, #((0 + \part)*32 + \off)] + stp q6, q7, [x10, #((2 + \part)*32 + \off)] - stp q8, q9, [x11, #((0 + \part)*32 + \off)] + stp q8, q9, [x11, #((0 + \part)*32 + \off)] stp q12, q13, [x11, #((2 + \part)*32 + \off)] stp q10, q11, [x12, #((0 + \part)*32 + \off)] stp q14, q15, [x12, #((2 + \part)*32 + \off)] @@ -775,12 +775,12 @@ FFT16_FN ns_float, 1 add x12, x15, #((\part)*32 + \off) add x13, x16, #((\part)*32 + \off) - ldp q0, q1, [x10] - ldp q4, q5, [x10, #(2*32)] - ldp q2, q3, [x11] - ldp q6, q7, [x11, #(2*32)] + ldp q0, q1, [x10] + ldp q4, q5, [x10, #(2*32)] + ldp q2, q3, [x11] + ldp q6, q7, [x11, #(2*32)] - ldp q8, q9, [x12] + ldp q8, q9, [x12] ldp q10, q11, [x12, #(2*32)] ldp q12, q13, [x13] ldp q14, q15, [x13, #(2*32)] @@ -800,10 +800,10 @@ FFT16_FN ns_float, 1 zip1 v22.2d, v3.2d, v7.2d zip2 v23.2d, v3.2d, v7.2d - ldp q0, q1, [x10, #(1*32)] - ldp q4, q5, [x10, #(3*32)] - ldp q2, q3, [x11, #(1*32)] - ldp q6, q7, [x11, #(3*32)] + ldp q0, q1, [x10, #(1*32)] + ldp q4, q5, [x10, #(3*32)] + ldp q2, q3, [x11, #(1*32)] + ldp q6, q7, [x11, #(3*32)] st1 { v16.4s, v17.4s, v18.4s, v19.4s }, [x10], #64 st1 { v20.4s, v21.4s, v22.4s, v23.4s }, [x11], #64 @@ -817,7 +817,7 @@ FFT16_FN ns_float, 1 zip1 v26.2d, v11.2d, v15.2d zip2 v27.2d, v11.2d, v15.2d - ldp q8, q9, [x12, #(1*32)] + ldp q8, q9, [x12, #(1*32)] ldp q10, q11, [x12, #(3*32)] ldp q12, q13, [x13, #(1*32)] ldp q14, q15, [x13, #(3*32)] @@ -875,9 +875,9 @@ function ff_tx_fft32_\name\()_neon, export=1 SETUP_SR_RECOMB 32, x7, x8, x9 SETUP_LUT \no_perm - LOAD_INPUT 0, 1, 2, 3, x2, \no_perm - LOAD_INPUT 4, 5, 6, 7, x2, \no_perm - LOAD_INPUT 8, 9, 10, 11, x2, \no_perm + LOAD_INPUT 0, 1, 2, 3, x2, \no_perm + LOAD_INPUT 4, 5, 6, 7, x2, \no_perm + LOAD_INPUT 8, 9, 10, 11, x2, \no_perm LOAD_INPUT 12, 13, 14, 15, x2, \no_perm FFT8_X2 v8, v9, v10, v11, v12, v13, v14, v15 @@ -982,37 +982,37 @@ function ff_tx_fft_sr_\name\()_neon, export=1 32: SETUP_SR_RECOMB 32, x7, x8, x9 - LOAD_INPUT 0, 1, 2, 3, x2, \no_perm - LOAD_INPUT 4, 6, 5, 7, x2, \no_perm, 1 - LOAD_INPUT 8, 9, 10, 11, x2, \no_perm + LOAD_INPUT 0, 1, 2, 3, x2, \no_perm + LOAD_INPUT 4, 6, 5, 7, x2, \no_perm, 1 + LOAD_INPUT 8, 9, 10, 11, x2, \no_perm LOAD_INPUT 12, 13, 14, 15, x2, \no_perm FFT8_X2 v8, v9, v10, v11, v12, v13, v14, v15 FFT16 v0, v1, v2, v3, v4, v6, v5, v7 - SR_COMBINE v0, v1, v2, v3, v4, v6, v5, v7, \ - v8, v9, v10, v11, v12, v13, v14, v15, \ - x7, x8, x9, 0 + SR_COMBINE v0, v1, v2, v3, v4, v6, v5, v7, \ + v8, v9, v10, v11, v12, v13, v14, v15, \ + x7, x8, x9, 0 - stp q2, q3, [x1, #32*1] - stp q6, q7, [x1, #32*3] + stp q2, q3, [x1, #32*1] + stp q6, q7, [x1, #32*3] stp q10, q11, [x1, #32*5] stp q14, q15, [x1, #32*7] cmp w20, #32 b.gt 64f - stp q0, q1, [x1, #32*0] - stp q4, q5, [x1, #32*2] - stp q8, q9, [x1, #32*4] + stp q0, q1, [x1, #32*0] + stp q4, q5, [x1, #32*2] + stp q8, q9, [x1, #32*4] stp q12, q13, [x1, #32*6] ret 64: SETUP_SR_RECOMB 64, x7, x8, x9 - LOAD_INPUT 2, 3, 10, 11, x2, \no_perm, 1 - LOAD_INPUT 6, 14, 7, 15, x2, \no_perm, 1 + LOAD_INPUT 2, 3, 10, 11, x2, \no_perm, 1 + LOAD_INPUT 6, 14, 7, 15, x2, \no_perm, 1 FFT16 v2, v3, v10, v11, v6, v14, v7, v15 @@ -1033,38 +1033,38 @@ function ff_tx_fft_sr_\name\()_neon, export=1 // TODO: investigate doing the 2 combines like in deinterleave // TODO: experiment with spilling to gprs and converting to HALF or full - SR_COMBINE_LITE v0, v1, v8, v9, \ - v2, v3, v16, v17, \ + SR_COMBINE_LITE v0, v1, v8, v9, \ + v2, v3, v16, v17, \ v24, v25, v26, v27, \ v28, v29, v30, 0 - stp q0, q1, [x1, #32* 0] - stp q8, q9, [x1, #32* 4] - stp q2, q3, [x1, #32* 8] + stp q0, q1, [x1, #32* 0] + stp q8, q9, [x1, #32* 4] + stp q2, q3, [x1, #32* 8] stp q16, q17, [x1, #32*12] - SR_COMBINE_HALF v4, v5, v12, v13, \ - v6, v7, v20, v21, \ + SR_COMBINE_HALF v4, v5, v12, v13, \ + v6, v7, v20, v21, \ v24, v25, v26, v27, \ v28, v29, v30, v0, v1, v8, 1 - stp q4, q20, [x1, #32* 2] + stp q4, q20, [x1, #32* 2] stp q12, q21, [x1, #32* 6] - stp q6, q5, [x1, #32*10] - stp q7, q13, [x1, #32*14] + stp q6, q5, [x1, #32*10] + stp q7, q13, [x1, #32*14] - ldp q2, q3, [x1, #32*1] - ldp q6, q7, [x1, #32*3] + ldp q2, q3, [x1, #32*1] + ldp q6, q7, [x1, #32*3] ldp q12, q13, [x1, #32*5] ldp q16, q17, [x1, #32*7] - SR_COMBINE v2, v3, v12, v13, v6, v16, v7, v17, \ + SR_COMBINE v2, v3, v12, v13, v6, v16, v7, v17, \ v10, v11, v14, v15, v18, v19, v22, v23, \ - x7, x8, x9, 0, \ + x7, x8, x9, 0, \ v24, v25, v26, v27, v28, v29, v30, v8, v0, v1, v4, v5 - stp q2, q3, [x1, #32* 1] - stp q6, q7, [x1, #32* 3] + stp q2, q3, [x1, #32* 1] + stp q6, q7, [x1, #32* 3] stp q12, q13, [x1, #32* 5] stp q16, q17, [x1, #32* 7] @@ -1198,13 +1198,13 @@ SR_TRANSFORM_DEF 131072 mov x10, v23.d[0] mov x11, v23.d[1] - SR_COMBINE_LITE v0, v1, v8, v9, \ - v2, v3, v16, v17, \ + SR_COMBINE_LITE v0, v1, v8, v9, \ + v2, v3, v16, v17, \ v24, v25, v26, v27, \ v28, v29, v30, 0 - SR_COMBINE_HALF v4, v5, v12, v13, \ - v6, v7, v20, v21, \ + SR_COMBINE_HALF v4, v5, v12, v13, \ + v6, v7, v20, v21, \ v24, v25, v26, v27, \ v28, v29, v30, v23, v24, v26, 1 @@ -1236,7 +1236,7 @@ SR_TRANSFORM_DEF 131072 zip2 v3.2d, v17.2d, v13.2d // stp is faster by a little on A53, but this is faster on M1s (theory) - ldp q8, q9, [x1, #32*1] + ldp q8, q9, [x1, #32*1] ldp q12, q13, [x1, #32*5] st1 { v23.4s, v24.4s, v25.4s, v26.4s }, [x12], #64 // 32* 0...1 @@ -1247,12 +1247,12 @@ SR_TRANSFORM_DEF 131072 mov v23.d[0], x10 mov v23.d[1], x11 - ldp q6, q7, [x1, #32*3] + ldp q6, q7, [x1, #32*3] ldp q16, q17, [x1, #32*7] - SR_COMBINE v8, v9, v12, v13, v6, v16, v7, v17, \ + SR_COMBINE v8, v9, v12, v13, v6, v16, v7, v17, \ v10, v11, v14, v15, v18, v19, v22, v23, \ - x7, x8, x9, 0, \ + x7, x8, x9, 0, \ v24, v25, v26, v27, v28, v29, v30, v4, v0, v1, v5, v20 zip1 v0.2d, v8.2d, v6.2d