From patchwork Mon Mar 25 15:02:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 47429 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:c889:b0:1a3:b6bb:3029 with SMTP id hb9csp1246562pzb; Mon, 25 Mar 2024 08:03:20 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWij0s644BD48SsWJ4ssU+KjIjMuWbEVJ3wjgLPHiri9WIO4pRxskeUjpr5Q4z3YlbQQT6+9XNZeRdZPzqXGRsNKvy1j+V7ZK6NTg== X-Google-Smtp-Source: AGHT+IGSp2BJJCPwqbjGcYnlv/FfTjoEmsNsMRsPTPyT6haT0n4yzrnE6C+tmMikAXWgtUIEuZYd X-Received: by 2002:a50:d55e:0:b0:568:b418:6999 with SMTP id f30-20020a50d55e000000b00568b4186999mr4712282edj.16.1711379000005; Mon, 25 Mar 2024 08:03:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1711378999; cv=none; d=google.com; s=arc-20160816; b=izpwlkyOujlBiJ9o5maNMfyVXkf/qMkYqNuPB5sXuucd2JVhEsgWyQicuot3Og29r0 3y13h1bFtsHtHKDtZMEMn1A+sX0lidb9lO6sNOyFZR8AzKrjmJpaXbK9H108ov70nv0e 5khbEQdrsXRM69zehBqtqDjRnJ7Sub072VIK5iP1KiRHzo2YsSCpcw3FolGinsF+TD2V r4InYfNcjsMbZtPznWAd7eapfjx4Sl6AMcuxCp9WkjaCyLPhAdYydHiFqRup0mxYH4DW ZNNssp6cdYBsESFgcs3oL9bTrbFz2fr7Zj/8g0zaMrLfhCBUI4/ZBourky/AijneGW3q Ag+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=rx3FfifLYFaETXic6XrsZ0JTMPPEoSj/tN567vA264k=; fh=DRs4GoYGH1Ee5ERfKSLc+LBtFKtorfi5XrD3r13LPeM=; b=xP4OIUUCHXY8/MuMYrdHe4s4c4eUTCJ64qUj0Q5o2Y9bbDT6U32Hul7ijzHd/KOXpA WtAjMdyeonpGl91B02jwSTPz7DXWTxLj1n7E7sgHWyP6qqmdyVhB3QvFIJmR/GmlJJmB f/hwP0GauKC4kAn17NETuArHdTSjyRi07a44oAeLhm7JdfnbZSgL2irtuC1fYvESpRDf 7hezwcLKFNSuZYOaiTUvjuB6cvWLQfAQTMW+b3ZuC3RP0UdPreozPf0v8bxfIks3dyO1 n+ifQejTP8SYngcCHr6yol6yazLZqFuWZJJ4Xx71i0ngWZn2H3U7U33/mqMPUSl5eIUY HkxQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b=BXEbTDaP; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id q23-20020a50cc97000000b0056b9efb2407si2680415edi.432.2024.03.25.08.03.18; Mon, 25 Mar 2024 08:03:19 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b=BXEbTDaP; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B098668D53A; Mon, 25 Mar 2024 17:02:54 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8AD5B68D442 for ; Mon, 25 Mar 2024 17:02:46 +0200 (EET) Received: by mail-lf1-f48.google.com with SMTP id 2adb3069b0e04-513d717269fso5566473e87.0 for ; Mon, 25 Mar 2024 08:02:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1711378966; x=1711983766; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=P6tzdcPJkkdFvZMpdGpyMXePMkf1H1Bgnmw/gKYH+GY=; b=BXEbTDaPsF1XWdLaXNPxdsCd1y/+XQWN88eCdzQwVoaoWRQnaeCx8iUyLOdhSe/S8C XjDtKzC15emJ7hmRQPAbozcbCjrQ3/yDAZ9UyNhs0R7W1HfyRRKW8T+VEDQSbn2J8GPz 4x9AHgrojAYdVIXpFKKLcoZcBws/R/5iCVqa91BoWJObFBv/woZtFzzg28v+wzFNeqVP Pf/19PfzqEPRA1galqgB+atfMrXULvZIOdu89Raml42vRtltvYIUXCjnKtGK3mU2FCYa yVFu8rHijW9uFjnsAsRb+FHcB59ACehuBkMvx+C/JeNnyfQDU1bcaTXABvaiarkjSsWG UdXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711378966; x=1711983766; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P6tzdcPJkkdFvZMpdGpyMXePMkf1H1Bgnmw/gKYH+GY=; b=VTu91hsm1u6ZEp1S9HJk3y82ldev6h2GTf599uXDqq3b+IvsumkyvfG5LUWN4EgSGO NLdzoujkqOfGOaxdnIPDK83sQ6rM5nv19zwK/VS3Nym6A6cxtLOzW7QmQ4Tva8nd+dch OfRwNJwEVSK1Wr7JQ6kwvQmLi5jzSLc5/b0pK69FDXpJVxn17b+BMK8T2m5Yz/5UK6Ge rUV7ptgFSZXcZJOxiQRBwAnnHcridG8/wAUnyDpVj/ios1QykJl0uC850LhjKIqGr1ys 23dSdJpdVolYkrs9k1h3fAuWmGTIFRfc8eml6lTVvWLItxP4UfF+bUUYJ+nY6jlMRFW2 6Zdw== X-Gm-Message-State: AOJu0YzrSsciTNWFlhG423a8PSKT1Hyh5WhxzokIQAJf1pnoOCEyWHRn UyLTrC6syLHmCD3V0W2Sx+FMPBxvHBG2kcNU+jy8V2cNPUfu5crPrP/MwEMK2orHXnPdKfixFXP s9VJe X-Received: by 2002:ac2:5e73:0:b0:513:e945:e9a7 with SMTP id a19-20020ac25e73000000b00513e945e9a7mr4081507lfr.4.1711378965685; Mon, 25 Mar 2024 08:02:45 -0700 (PDT) Received: from localhost (host-114-191.parnet.fi. [77.234.114.191]) by smtp.gmail.com with ESMTPSA id m1-20020a05651202e100b00513c82c27cbsm1096264lfq.230.2024.03.25.08.02.45 (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 25 Mar 2024 08:02:45 -0700 (PDT) From: =?utf-8?q?Martin_Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 25 Mar 2024 17:02:24 +0200 Message-Id: <20240325150243.59058-3-martin@martin.st> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20240325150243.59058-1-martin@martin.st> References: <20240325150243.59058-1-martin@martin.st> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 02/21] aarch64: hevc: Don't iterate with sp in ff_hevc_put_hevc_qpel_uni_w_hv32/64_8_neon_i8mm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Logan Lyu , "J . Dekker" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: KqYObI4UCuSx Many of the routines within hevcdsp_epel_neon and hevcdsp_qpel_neon store temporary buffers on the stack. When consuming it, many of these functions use the stack pointer as incremental pointer for reading the data (instead of storing it in another register), which is rather unusual. Technically, this is fine as long as the pointer remains properly aligned. However in the case of ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, after incrementing sp when reading data (within each 16 pixel wide stripe) it would then reset the stack pointer back to a lower value, for reading the next 16 pixel wide stripe, expecting the data to remain untouched. This can't be assumed; data on the stack below the stack pointer can be clobbered (e.g. by a signal handler). Some OS ABIs allow for a little margin that won't be touched, aka a red zone, but not all do. The ones that do, guarantee 16 or 128 bytes, not 9 KB. Convert this function to use a separate pointer register to iterate through the data, retaining the stack pointer to point at the bottom of the data we require to remain untouched. --- libavcodec/aarch64/hevcdsp_qpel_neon.S | 130 +++++++++++++------------ 1 file changed, 66 insertions(+), 64 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index 9be29cafe2..815d897094 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -3981,24 +3981,25 @@ function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 mov x11, sp mov w12, w22 mov x13, x20 + mov x14, sp 3: - ldp q16, q1, [sp] - add sp, sp, x10 - ldp q17, q2, [sp] - add sp, sp, x10 - ldp q18, q3, [sp] - add sp, sp, x10 - ldp q19, q4, [sp] - add sp, sp, x10 - ldp q20, q5, [sp] - add sp, sp, x10 - ldp q21, q6, [sp] - add sp, sp, x10 - ldp q22, q7, [sp] - add sp, sp, x10 + ldp q16, q1, [x11] + add x11, x11, x10 + ldp q17, q2, [x11] + add x11, x11, x10 + ldp q18, q3, [x11] + add x11, x11, x10 + ldp q19, q4, [x11] + add x11, x11, x10 + ldp q20, q5, [x11] + add x11, x11, x10 + ldp q21, q6, [x11] + add x11, x11, x10 + ldp q22, q7, [x11] + add x11, x11, x10 1: - ldp q23, q31, [sp] - add sp, sp, x10 + ldp q23, q31, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v16, v17, v18, v19, v20, v21, v22, v23 QPEL_FILTER_H2 v25, v16, v17, v18, v19, v20, v21, v22, v23 QPEL_FILTER_H v26, v1, v2, v3, v4, v5, v6, v7, v31 @@ -4007,8 +4008,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 subs w22, w22, #1 b.eq 2f - ldp q16, q1, [sp] - add sp, sp, x10 + ldp q16, q1, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v17, v18, v19, v20, v21, v22, v23, v16 QPEL_FILTER_H2 v25, v17, v18, v19, v20, v21, v22, v23, v16 QPEL_FILTER_H v26, v2, v3, v4, v5, v6, v7, v31, v1 @@ -4017,8 +4018,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 subs w22, w22, #1 b.eq 2f - ldp q17, q2, [sp] - add sp, sp, x10 + ldp q17, q2, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v18, v19, v20, v21, v22, v23, v16, v17 QPEL_FILTER_H2 v25, v18, v19, v20, v21, v22, v23, v16, v17 QPEL_FILTER_H v26, v3, v4, v5, v6, v7, v31, v1, v2 @@ -4027,8 +4028,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 subs w22, w22, #1 b.eq 2f - ldp q18, q3, [sp] - add sp, sp, x10 + ldp q18, q3, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v19, v20, v21, v22, v23, v16, v17, v18 QPEL_FILTER_H2 v25, v19, v20, v21, v22, v23, v16, v17, v18 QPEL_FILTER_H v26, v4, v5, v6, v7, v31, v1, v2, v3 @@ -4037,8 +4038,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 subs w22, w22, #1 b.eq 2f - ldp q19, q4, [sp] - add sp, sp, x10 + ldp q19, q4, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v20, v21, v22, v23, v16, v17, v18, v19 QPEL_FILTER_H2 v25, v20, v21, v22, v23, v16, v17, v18, v19 QPEL_FILTER_H v26, v5, v6, v7, v31, v1, v2, v3, v4 @@ -4047,8 +4048,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 subs w22, w22, #1 b.eq 2f - ldp q20, q5, [sp] - add sp, sp, x10 + ldp q20, q5, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v21, v22, v23, v16, v17, v18, v19, v20 QPEL_FILTER_H2 v25, v21, v22, v23, v16, v17, v18, v19, v20 QPEL_FILTER_H v26, v6, v7, v31, v1, v2, v3, v4, v5 @@ -4057,8 +4058,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 subs w22, w22, #1 b.eq 2f - ldp q21, q6, [sp] - add sp, sp, x10 + ldp q21, q6, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v22, v23, v16, v17, v18, v19, v20, v21 QPEL_FILTER_H2 v25, v22, v23, v16, v17, v18, v19, v20, v21 QPEL_FILTER_H v26, v7, v31, v1, v2, v3, v4, v5, v6 @@ -4067,8 +4068,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 subs w22, w22, #1 b.eq 2f - ldp q22, q7, [sp] - add sp, sp, x10 + ldp q22, q7, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v23, v16, v17, v18, v19, v20, v21, v22 QPEL_FILTER_H2 v25, v23, v16, v17, v18, v19, v20, v21, v22 QPEL_FILTER_H v26, v31, v1, v2, v3, v4, v5, v6, v7 @@ -4078,10 +4079,10 @@ function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 b.hi 1b 2: subs w27, w27, #16 - add sp, x11, #32 + add x11, x14, #32 add x20, x13, #16 mov w22, w12 - mov x11, sp + mov x14, x11 mov x13, x20 b.hi 3b QPEL_UNI_W_HV_END @@ -4093,24 +4094,25 @@ function ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, export=1 mov x11, sp mov w12, w22 mov x13, x20 + mov x14, sp 3: - ldp q16, q1, [sp] - add sp, sp, x10 - ldp q17, q2, [sp] - add sp, sp, x10 - ldp q18, q3, [sp] - add sp, sp, x10 - ldp q19, q4, [sp] - add sp, sp, x10 - ldp q20, q5, [sp] - add sp, sp, x10 - ldp q21, q6, [sp] - add sp, sp, x10 - ldp q22, q7, [sp] - add sp, sp, x10 + ldp q16, q1, [x11] + add x11, x11, x10 + ldp q17, q2, [x11] + add x11, x11, x10 + ldp q18, q3, [x11] + add x11, x11, x10 + ldp q19, q4, [x11] + add x11, x11, x10 + ldp q20, q5, [x11] + add x11, x11, x10 + ldp q21, q6, [x11] + add x11, x11, x10 + ldp q22, q7, [x11] + add x11, x11, x10 1: - ldp q23, q31, [sp] - add sp, sp, x10 + ldp q23, q31, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v16, v17, v18, v19, v20, v21, v22, v23 QPEL_FILTER_H2 v25, v16, v17, v18, v19, v20, v21, v22, v23 QPEL_FILTER_H v26, v1, v2, v3, v4, v5, v6, v7, v31 @@ -4119,8 +4121,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, export=1 subs w22, w22, #1 b.eq 2f - ldp q16, q1, [sp] - add sp, sp, x10 + ldp q16, q1, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v17, v18, v19, v20, v21, v22, v23, v16 QPEL_FILTER_H2 v25, v17, v18, v19, v20, v21, v22, v23, v16 QPEL_FILTER_H v26, v2, v3, v4, v5, v6, v7, v31, v1 @@ -4129,8 +4131,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, export=1 subs w22, w22, #1 b.eq 2f - ldp q17, q2, [sp] - add sp, sp, x10 + ldp q17, q2, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v18, v19, v20, v21, v22, v23, v16, v17 QPEL_FILTER_H2 v25, v18, v19, v20, v21, v22, v23, v16, v17 QPEL_FILTER_H v26, v3, v4, v5, v6, v7, v31, v1, v2 @@ -4139,8 +4141,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, export=1 subs w22, w22, #1 b.eq 2f - ldp q18, q3, [sp] - add sp, sp, x10 + ldp q18, q3, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v19, v20, v21, v22, v23, v16, v17, v18 QPEL_FILTER_H2 v25, v19, v20, v21, v22, v23, v16, v17, v18 QPEL_FILTER_H v26, v4, v5, v6, v7, v31, v1, v2, v3 @@ -4149,8 +4151,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, export=1 subs w22, w22, #1 b.eq 2f - ldp q19, q4, [sp] - add sp, sp, x10 + ldp q19, q4, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v20, v21, v22, v23, v16, v17, v18, v19 QPEL_FILTER_H2 v25, v20, v21, v22, v23, v16, v17, v18, v19 QPEL_FILTER_H v26, v5, v6, v7, v31, v1, v2, v3, v4 @@ -4159,8 +4161,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, export=1 subs w22, w22, #1 b.eq 2f - ldp q20, q5, [sp] - add sp, sp, x10 + ldp q20, q5, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v21, v22, v23, v16, v17, v18, v19, v20 QPEL_FILTER_H2 v25, v21, v22, v23, v16, v17, v18, v19, v20 QPEL_FILTER_H v26, v6, v7, v31, v1, v2, v3, v4, v5 @@ -4169,8 +4171,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, export=1 subs w22, w22, #1 b.eq 2f - ldp q21, q6, [sp] - add sp, sp, x10 + ldp q21, q6, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v22, v23, v16, v17, v18, v19, v20, v21 QPEL_FILTER_H2 v25, v22, v23, v16, v17, v18, v19, v20, v21 QPEL_FILTER_H v26, v7, v31, v1, v2, v3, v4, v5, v6 @@ -4179,8 +4181,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, export=1 subs w22, w22, #1 b.eq 2f - ldp q22, q7, [sp] - add sp, sp, x10 + ldp q22, q7, [x11] + add x11, x11, x10 QPEL_FILTER_H v24, v23, v16, v17, v18, v19, v20, v21, v22 QPEL_FILTER_H2 v25, v23, v16, v17, v18, v19, v20, v21, v22 QPEL_FILTER_H v26, v31, v1, v2, v3, v4, v5, v6, v7 @@ -4190,10 +4192,10 @@ function ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, export=1 b.hi 1b 2: subs w27, w27, #16 - add sp, x11, #32 + add x11, x14, #32 add x20, x13, #16 mov w22, w12 - mov x11, sp + mov x14, x11 mov x13, x20 b.hi 3b QPEL_UNI_W_HV_END