[FFmpeg-devel] aarch64: hevc_idct: Fix overflows in idct_dc

Message ID	20210517095537.318311-1-martin@martin.st
State	Accepted
Commit	f27e3ccf06ee19935d160164ca4a02f28cfc2a27
Headers	show Delivered-To: ffmpegpatchwork2@gmail.com Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; From: =?utf-8?q?Martin_Storsj=C3=B6?= <martin@martin.st> To: ffmpeg-devel@ffmpeg.org Date: Mon, 17 May 2021 12:55:37 +0300 Message-Id: <20210517095537.318311-1-martin@martin.st> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] aarch64: hevc_idct: Fix overflows in idct_dc Precedence: list Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Cc: Josh Dekker <josh@itanimul.li> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Series	[FFmpeg-devel] aarch64: hevc_idct: Fix overflows in idct_dc \| expand [FFmpeg-devel] aarch64: hevc_idct: Fix overflows in idct_dc

Message ID

20210517095537.318311-1-martin@martin.st

State

Accepted

Commit

f27e3ccf06ee19935d160164ca4a02f28cfc2a27

Headers

Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
From: =?utf-8?q?Martin_Storsj=C3=B6?= <martin@martin.st>
To: ffmpeg-devel@ffmpeg.org
Date: Mon, 17 May 2021 12:55:37 +0300
Message-Id: <20210517095537.318311-1-martin@martin.st>
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH] aarch64: hevc_idct: Fix overflows in idct_dc
Precedence: list
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Josh Dekker <josh@itanimul.li>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Series

[FFmpeg-devel] aarch64: hevc_idct: Fix overflows in idct_dc | expand

Checks

Context	Check	Description
andriy/x86_make	success	Make finished
andriy/x86_make_fate	success	Make fate finished
andriy/PPC64_make	success	Make finished
andriy/PPC64_make_fate	success	Make fate finished

Context

Check

Description

andriy/x86_make

success

Make finished

andriy/x86_make_fate

success

Make fate finished

andriy/PPC64_make

success

Make finished

andriy/PPC64_make_fate

success

Make fate finished

Commit Message

Martin Storsjö May 17, 2021, 9:55 a.m. UTC

This is marginally slower, but correct for all input values.
The previous implementation failed with certain input seeds, e.g.
"checkasm --test=hevc_idct 98".
---
 libavcodec/aarch64/hevcdsp_idct_neon.S | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

Comments

Martin Storsjö May 21, 2021, 9:05 p.m. UTC | #1

On Mon, 17 May 2021, Martin Storsjö wrote:

> This is marginally slower, but correct for all input values.
> The previous implementation failed with certain input seeds, e.g.
> "checkasm --test=hevc_idct 98".
> ---
> libavcodec/aarch64/hevcdsp_idct_neon.S | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)

OKd by Josh on irc (yesterday), will push soon.

// Martin

diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S
index 28c11e632c..0869431294 100644
--- a/libavcodec/aarch64/hevcdsp_idct_neon.S
+++ b/libavcodec/aarch64/hevcdsp_idct_neon.S
@@ -573,14 +573,13 @@  idct_16x16 10
 // void ff_hevc_idct_NxN_dc_DEPTH_neon(int16_t *coeffs)
 .macro idct_dc size, bitdepth
 function ff_hevc_idct_\size\()x\size\()_dc_\bitdepth\()_neon, export=1
-        movi          v1.8h,  #((1 << (14 - \bitdepth))+1)
         ld1r         {v4.8h}, [x0]
-        add           v4.8h,  v4.8h,  v1.8h
-        sshr          v0.8h,  v4.8h,  #(15 - \bitdepth)
-        sshr          v1.8h,  v4.8h,  #(15 - \bitdepth)
+        srshr         v4.8h,  v4.8h,  #1
+        srshr         v0.8h,  v4.8h,  #(14 - \bitdepth)
+        srshr         v1.8h,  v4.8h,  #(14 - \bitdepth)
 .if \size > 4
-        sshr          v2.8h,  v4.8h,  #(15 - \bitdepth)
-        sshr          v3.8h,  v4.8h,  #(15 - \bitdepth)
+        srshr         v2.8h,  v4.8h,  #(14 - \bitdepth)
+        srshr         v3.8h,  v4.8h,  #(14 - \bitdepth)
 .if \size > 16 /* dc 32x32 */
         mov              x2,  #4
 1:

[FFmpeg-devel] aarch64: hevc_idct: Fix overflows in idct_dc

Checks

Commit Message

Comments

Patch