From patchwork Tue Mar 12 13:12:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 46993 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:dc95:b0:1a1:738b:6bc0 with SMTP id ky21csp1854040pzb; Tue, 12 Mar 2024 06:12:41 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUup8obgxbTui4We+jYHTzV3M9X0rsL3rDGnyPS4LaQ+ouFjCzgenhtj0PI0kzIVzttRr/lF4WePnPX6YD4PS/6M6v77Cs5nWmctw== X-Google-Smtp-Source: AGHT+IH01PeUOSlqnfKYil0bDpptMWLi2UZ6tQSHJIAUFKCSCN9LJxvXnBFVEH2OrTjkftfiKO/n X-Received: by 2002:a50:8dc8:0:b0:567:6c79:c6ef with SMTP id s8-20020a508dc8000000b005676c79c6efmr1748609edh.29.1710249161641; Tue, 12 Mar 2024 06:12:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1710249161; cv=none; d=google.com; s=arc-20160816; b=aJIvpkxBpUMy/EC1oidibX6FDtjKduepJQ1KMftf3ycE74xK3XPHcsggm1G8cOb9TO Bgv7JTKM+pNnrBD2DFirYyUL+3EGfFxzNIcC6pLd9qQw7bLmCG4ISZ+CkwZgR/eX0j3v 9DC4WRydlw9baDrjttAXXfM/NxZJtp6HZE9xjLSF+/qBH9nXPLBGcq2tEZ7gGNq6EO9x TNKYiyUX2kPD3qbekm7izJoazdGEZWsL7if33tuhTj71HFl3oc7B+M87bOXMtxaTj4Jg neVmIPPba0gpqJh/QNUVDnFlFzWx9CHIhTHI1NB3wWcMWGoeyeaNevVCryMn/PAK37W/ aL6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=lDBKDzsiLicUqPRmXxKRzDfKYUaH/PdcTwJN4aqjGDM=; fh=DRs4GoYGH1Ee5ERfKSLc+LBtFKtorfi5XrD3r13LPeM=; b=BZ/k0lzNyCFAySUdGXd+csaZt6UNLzsNOHXFaDQH12dARa49+NcXoy5Q9sezCWaBTf CiYoAjuLBizU67FhJWYNKlSYw1mpgx769aQDgcQ5+i7dsZzp/PfGUSfZgCQOlsKaoxYA ZawQjT/9yKV3aOXBZ9928OOQyxYFuUQsqbShXifq46n/f9yMOxgPGOXPIDoGHrDTJEDS jtOJV1HXFA20GlQZwaO2HVAqZ3ZDl09ZKbBM99lWWiLWvEKywTISFIL75WAkB1SwZU0I usSSMJcPhkDI+IJNqfi9vtxc9zk9Ms0azHK9CZLYJMHf9XQg2HEstwCikwCDm5xIMznM 29Rg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b=UhixgaqV; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id p4-20020a056402500400b0056828594b2csi3609688eda.317.2024.03.12.06.12.41; Tue, 12 Mar 2024 06:12:41 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b=UhixgaqV; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2A86768D097; Tue, 12 Mar 2024 15:12:38 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7352168CF92 for ; Tue, 12 Mar 2024 15:12:31 +0200 (EET) Received: by mail-lf1-f50.google.com with SMTP id 2adb3069b0e04-513af1a29b1so1990477e87.1 for ; Tue, 12 Mar 2024 06:12:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1710249150; x=1710853950; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=iae5a4cfBj9HUw3Y+bcwLlOS3wFHE9hKjTaSWSGK5Mc=; b=UhixgaqVg36yYsvO6/PAw1tIZTv+Sai2h5RBVFFMj2HB4e/rsHl19UEybFHmgbLUaN mRAK1GqYAK83LrGEK4qOi3ySDlMS53uPHVYTbKIMCOI5RQR4gqRAXQ+7S9fZ9l9+JzUJ tM3uZVIKaZpadgT9dlPmVQBhDrarjjZMR40DuTokMTifxi1koihfzPFnL4L2YUD7bb8g hSr7YAOdwxh0pZWuZfVK8iCvVH+MAGPHxgBj1mUUnnKWp2rCvO623IZBLYy02sG6rNim m6c/l/sNwONM1WwiKVOI6NKnPFv28ZmOOlv7QZKy7h7gMIsvKF7TapZgKy/N5acaoBti rK0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710249150; x=1710853950; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=iae5a4cfBj9HUw3Y+bcwLlOS3wFHE9hKjTaSWSGK5Mc=; b=j4NTx1ESc8mt6vE1ioyz5QE0/WXrXbt5G86qEb4Hx6fo38gispJ/tZLOAC7SApsK8T GZtA6oYhZcUIsDrLtO3b6upjsaYlStGAinbog6qNGY0/+AbulJ5rGTjm828JSCE41zLH u4ynv3QAptCPB1S/w/WpfrSpqb47hfJxlU3Q2byib/tV008QtT7xLy67r6VGpPhPdt+N jYuXpGD2rSQAaN2q1EP8O7n1A8Hf7DUUl70zw/OIs3QVTspXN4r44qPPXtIBZ++yHN0c iVfNNajj1yJzMecA41fRzZ87A9BU5ATp46OqHtO44LhIlaljVoJPVka8c9vSE79n/kWL WvpA== X-Gm-Message-State: AOJu0YwS0qYFrsheqepElqoJEVBeQFgl8LtUHiO7pGYX2C8mY1w1J6xv Qwj30g6tM7d5/sDecTbj0TRYdNNkUxkVDfA5U7nQa93SnqNg6NnoaEXG/Axg/itRTFFsQK42AZs 31qbn X-Received: by 2002:ac2:5e6c:0:b0:512:cda1:9bab with SMTP id a12-20020ac25e6c000000b00512cda19babmr5731422lfr.13.1710249150503; Tue, 12 Mar 2024 06:12:30 -0700 (PDT) Received: from localhost (dsl-tkubng21-58c01c-243.dhcp.inet.fi. [88.192.28.243]) by smtp.gmail.com with ESMTPSA id be37-20020a056512252500b00513b15ac2a2sm652592lfb.94.2024.03.12.06.12.30 (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 12 Mar 2024 06:12:30 -0700 (PDT) From: =?utf-8?q?Martin_Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Tue, 12 Mar 2024 15:12:26 +0200 Message-Id: <20240312131229.1551-1-martin@martin.st> X-Mailer: git-send-email 2.39.3 (Apple Git-146) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Logan Lyu , "J . Dekker" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: LSBgNAcIl4gQ The first 32 elements of each row were correct, while the last 16 were scrambled. This hasn't been noticed, because the checkasm test erroneously only checked half of the output (for 8 bit functions), and apparently none of the samples as part of "fate-hevc" seem to trigger this specific function. --- libavcodec/aarch64/hevcdsp_epel_neon.S | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/hevcdsp_epel_neon.S index 2dafa09337..d3f0a26f79 100644 --- a/libavcodec/aarch64/hevcdsp_epel_neon.S +++ b/libavcodec/aarch64/hevcdsp_epel_neon.S @@ -1572,6 +1572,7 @@ function ff_hevc_put_hevc_epel_h48_8_neon_i8mm, export=1 xtn2 v22.8h, v26.4s xtn v23.4h, v23.4s xtn2 v23.8h, v27.4s + add x7, x0, #64 st4 {v20.8h, v21.8h, v22.8h, v23.8h}, [x0], x10 ext v4.16b, v2.16b, v3.16b, #1 ext v5.16b, v2.16b, v3.16b, #2 @@ -1584,11 +1585,14 @@ function ff_hevc_put_hevc_epel_h48_8_neon_i8mm, export=1 usdot v21.4s, v4.16b, v30.16b usdot v22.4s, v5.16b, v30.16b usdot v23.4s, v6.16b, v30.16b - xtn v20.4h, v20.4s - xtn2 v20.8h, v22.4s - xtn v21.4h, v21.4s - xtn2 v21.8h, v23.4s - add x7, x0, #64 + zip1 v24.4s, v20.4s, v22.4s + zip2 v25.4s, v20.4s, v22.4s + zip1 v26.4s, v21.4s, v23.4s + zip2 v27.4s, v21.4s, v23.4s + xtn v20.4h, v24.4s + xtn2 v20.8h, v25.4s + xtn v21.4h, v26.4s + xtn2 v21.8h, v27.4s st2 {v20.8h, v21.8h}, [x7] b.ne 1b ret