From patchwork Wed May 25 08:59:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 35918 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:6914:b0:82:6b11:2509 with SMTP id q20csp955140pzj; Wed, 25 May 2022 02:00:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwh27wtuls5Z7Xa04T+ytZagqb8Z1jVJfevMf52dy3+D9ufK+d08x2Y4tNmNBjyFxyIqvi2 X-Received: by 2002:a05:6402:28b2:b0:42a:e63d:880f with SMTP id eg50-20020a05640228b200b0042ae63d880fmr33703033edb.279.1653469211737; Wed, 25 May 2022 02:00:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653469211; cv=none; d=google.com; s=arc-20160816; b=PKwYHJeY0jmaMxH3y9jUHkOU5/WUpcrKZPN/6+zYdrVTbuFsPpkM9od0IKb6A8AxqK C+pygbITB7IRtX0posJo82wTDgQ3uH4cGubBpqHDGwi2n7FV2bnDNUvfOMBe/yfFy/JZ FWfgq8SVolNXe471bJ3p18Nb0hQmhPz7v484yL6qOEIQagi7+9M6FGKXWfa5Q1+sBcuh Azx48YOkFv5q90w2mgnbZ/UD3qCA8zJW9MJf4d9ze1nrfR5h+aQvEVXAOMVulMG8/olW JuWp3YLgiG7/ckvJdOlT2q7Oe9/BR1rq0cAT7Wv+7oUK72aZAplSLE13rudv53fY+faB QvPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from:feedback-id :dkim-signature:dkim-signature:delivered-to; bh=4UPsmq68lzU185aKtMg4d6skMXP7/gC+EqpD0JoqFOw=; b=xuo71fsPooaLXD1FDWUqjyeTcRyo+yIWsdUE/yj/r0wk5p2TyRnAMOnaKkLQnYCBjA LXakrrHJ3bI1ShPc7BzbqxqOMzW+yI78yf6vj2kvA08XaBnfTvxlngpi15d7U3/cRXEu RrOczRZuZbH4H5YdrY9Efj6dqLplYInGAlZct2dgHNm0ndeVFR6Y+7MMPO/cn2p8dsM6 2+/bWOgaruqGmMoRulW0zei00wLs4HmHGycUarewc0OoeHqBvEXwG6tkdQcke+zUIz5m NDqaat/I+d+vl2jWlvMd7uaEVl12aAmehvJcIVSJN8VyxU+8SrvHORwaQlIsrmasmZHH 3P4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm2 header.b=nFk4GE9T; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm1 header.b=almO0h+1; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a21-20020aa7cf15000000b0042a96a858absi19720546edy.440.2022.05.25.02.00.11; Wed, 25 May 2022 02:00:11 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm2 header.b=nFk4GE9T; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm1 header.b=almO0h+1; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AFDF668B517; Wed, 25 May 2022 12:00:07 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9D32168B449 for ; Wed, 25 May 2022 12:00:01 +0300 (EEST) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 710EE5C0295 for ; Wed, 25 May 2022 04:59:59 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Wed, 25 May 2022 04:59:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:content-transfer-encoding:date:date:from:from:in-reply-to :message-id:mime-version:reply-to:sender:subject:subject:to:to; s=fm2; t=1653469199; x=1653555599; bh=OPDiVdY6Cy/FBzpGK/KXDazKm z7R/X0wA8BfY1PIthM=; b=nFk4GE9TvpYtG4sB99rDEUeakHbLYiLCw8msCYFA8 fge3Qr3PWrUYDwnJKSEMi7mglv+s/9Woc6PwbJR2R+1R+iXpxLyJ7gs+losyjpFD 92rCdteH1xvZlGkCnYikTunYINGBVxre/mwwLjqcX5HuVv0yj6aj1fZ7D4HLptQQ rUl1sYM+11KKEG2bkTwq/Tdkw9bN3x1g7FE3Iu/RQVBHr/54MAsGVc8ojaohFQtu Lmw6TZT06vchGLF5gfkaHtmHItkhOOCVlJhOyl487m2+kDAVRlH220tDS3zdVF2u GvyNTA/X57xv+MjcY6E2VshJaxgCbvJ/viKU0tViw8R8Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:date :feedback-id:feedback-id:from:from:in-reply-to:message-id :mime-version:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1653469199; x=1653555599; bh=OPDiVdY6Cy/FBzpGK/KXDazKmz7R/X0wA8B fY1PIthM=; b=almO0h+1Yttww75lvb7kSlPiVlUYNNKcsd9gTS2mJ9PGIQEJUcS d1zDaisXjpKSklT3BcYHqUct3HqD+kUoxIK54q8ei3gtLeofhZPFYlLTeEi7sDne geC8CH06jVj83odIlB3TTq6kKgw0BWrWICW9sqV4w5bCEIQT+I2B968Mpy2Hz6GR V109OT0zIzEsiAvgc/yhvJ5MMpRUtS9dTQTAaCfnjNRCEWBnCCyOD+O/y2iJ1JF/ 2cl5HJFvlSDV5FfMLXZKXvpU6ToU6E9SLR+63dy48XxgueEOmShnFL91vBqYHWlZ 1OabQ0qGkYvza1uGNlTpzTZw4GRl0c43EAQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrjeehgddtlecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffoggfgsedtkeertdertd dtnecuhfhrohhmpedflfdrucffvghkkhgvrhdfuceojhguvghksehithgrnhhimhhulhdr lhhiqeenucggtffrrghtthgvrhhnpefhfedtieetieejieegveefveehudelveevteejke evieeukeejvefhhfetjeethfenucffohhmrghinhepnhgvohhnrdhssgenucevlhhushht vghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehjuggvkhesihhtrghnih hmuhhlrdhlih X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Wed, 25 May 2022 04:59:58 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Wed, 25 May 2022 10:59:53 +0200 Message-Id: <20220525085953.39388-1-jdek@itanimul.li> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] lavc/aarch64: hevc_sao reschedule slightly X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: B89UWBilWC5W Signed-off-by: J. Dekker --- libavcodec/aarch64/hevcdsp_sao_neon.S | 30 +++++++++++++++------------ 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_sao_neon.S b/libavcodec/aarch64/hevcdsp_sao_neon.S index efd8112af4..39056d76ee 100644 --- a/libavcodec/aarch64/hevcdsp_sao_neon.S +++ b/libavcodec/aarch64/hevcdsp_sao_neon.S @@ -3,7 +3,7 @@ * * AArch64 NEON optimised SAO functions for HEVC decoding * - * Copyright (c) 2020 Josh Dekker + * Copyright (c) 2022 J. Dekker * * This file is part of FFmpeg. * @@ -24,6 +24,10 @@ #include "libavutil/aarch64/asm.S" +#define MAX_PB_SIZE 64 +#define AV_INPUT_BUFFER_PADDING_SIZE 64 +#define SAO_STRIDE (2*MAX_PB_SIZE + AV_INPUT_BUFFER_PADDING_SIZE) + // void sao_band_filter(uint8_t *_dst, uint8_t *_src, // ptrdiff_t stride_dst, ptrdiff_t stride_src, // int16_t *sao_offset_val, int sao_left_class, @@ -56,6 +60,7 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 // |xDE#xAD|xCA#xFE|xBE#xEF|xFE#xED|.... // +-----------------------------------> // i-0 i-1 i-2 i-3 + subs w8, w8, #8 ld1 {v2.8b}, [x1], #8 // dst[x] = av_clip_pixel(src[x] + offset_table[src[x] >> shift]); uxtl v0.8h, v2.8b // load src[x] ushr v2.8h, v0.8h, #3 // >> BIT_DEPTH - 3 @@ -66,7 +71,7 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 add v1.8h, v0.8h, v2.8h // src[x] + table sqxtun v4.8b, v1.8h // clip + narrow st1 {v4.8b}, [x0], #8 // store - subs w8, w8, #8 // done 8 pixels + // done 8 pixels bne 2b subs w7, w7, #1 // finished line, prep. new add x0, x0, x2 // dst += stride_dst @@ -75,12 +80,11 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 ret endfunc -// ASSUMES STRIDE_SRC = 192 .Lsao_edge_pos: .word 1 // horizontal -.word 192 // vertical -.word 192 + 1 // 45 degree -.word 192 - 1 // 135 degree +.word SAO_STRIDE // vertical +.word SAO_STRIDE + 1 // 45 degree +.word SAO_STRIDE - 1 // 135 degree // ff_hevc_sao_edge_filter_16x16_8_neon(char *dst, char *src, ptrdiff stride_dst, // int16 *sao_offset_val, int eo, int width, int height) @@ -98,7 +102,7 @@ function ff_hevc_sao_edge_filter_16x16_8_neon, export=1 uzp2 v1.16b, v3.16b, v3.16b // sao_offset_val -> upper uzp1 v0.16b, v3.16b, v3.16b // sao_offset_val -> lower movi v2.16b, #2 - mov x15, #192 + mov x15, #SAO_STRIDE // strides between end of line and next src/dst sub x15, x15, x5 // stride_src - width sub x16, x2, x5 // stride_dst - width @@ -108,6 +112,7 @@ function ff_hevc_sao_edge_filter_16x16_8_neon, export=1 sub x12, x11, x4 // src_a (prev) = src - sao_edge_pos add x13, x11, x4 // src_b (next) = src + sao_edge_pos 2: // process 16 bytes + subs x14, x14, #16 ld1 {v3.16b}, [x11], #16 // load src ld1 {v4.16b}, [x12], #16 // load src_a (prev) ld1 {v5.16b}, [x13], #16 // load src_b (next) @@ -130,12 +135,12 @@ function ff_hevc_sao_edge_filter_16x16_8_neon, export=1 sqxtun v3.8b, v20.8h sqxtun2 v3.16b, v21.8h st1 {v3.16b}, [x0], #16 - subs x14, x14, #16 // filtered 16 bytes + // filtered 16 bytes b.ne 2b // do we have width to filter? // no width to filter, setup next line + subs w6, w6, #1 // filtered line add x11, x11, x15 // stride src to next line add x0, x0, x16 // stride dst to next line - subs w6, w6, #1 // filtered line b.ne 1b // do we have lines to process? // no lines to filter ret @@ -156,12 +161,12 @@ function ff_hevc_sao_edge_filter_8x8_8_neon, export=1 movi v2.16b, #2 add x16, x0, x2 lsl x2, x2, #1 - mov x15, #192 + mov x15, #SAO_STRIDE mov x8, x1 sub x9, x1, x4 add x10, x1, x4 - lsr w17, w6, #1 -1: ld1 {v3.d}[0], [ x8], x15 +1: subs w6, w6, #2 + ld1 {v3.d}[0], [ x8], x15 ld1 {v4.d}[0], [ x9], x15 ld1 {v5.d}[0], [x10], x15 ld1 {v3.d}[1], [ x8], x15 @@ -187,7 +192,6 @@ function ff_hevc_sao_edge_filter_8x8_8_neon, export=1 sqxtun v7.8b, v21.8h st1 {v6.8b}, [ x0], x2 st1 {v7.8b}, [x16], x2 - subs x17, x17, #1 b.ne 1b ret endfunc