From patchwork Thu Oct 7 14:30:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 30974 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:6506:0:0:0:0:0 with SMTP id z6csp1550112iob; Thu, 7 Oct 2021 07:31:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx3+GhBkYOvth/6XHzoeCyCBMUWxq+llnywUfaA652cfCWoDtYojv3pZQk1KYCmBvGhY+qA X-Received: by 2002:a17:906:3fc8:: with SMTP id k8mr5894280ejj.217.1633617106517; Thu, 07 Oct 2021 07:31:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633617106; cv=none; d=google.com; s=arc-20160816; b=WQzoGyUWhLRJ9fZfvHNpOlcvSarl6LbH5KRBhpY0eD1kH9DrTGXQ5hmLr1ky9gUM9T fIdAAt+hdzSM92swijuoIzQ5rZ7S36ezQYzAx21kZheztiCJSHsFnJQWzu7UMZrvCn0z nsMjG51Trp3xVb2JVKC6UUWxFkfP59hlHUJmebHRvvQ7K9FK/LBY4zOsBwk5zRfSwulm +oLmU5bKswojvGc0SNA4CQDCOxfzWMDkCccw7lXHlz+BWV3j6KpgAHQ+q7X+wSGpO1fl KBET6IaagpW7I4ZOhkztrh02AL+XZmbNfgeUcEffzu7KeDjBsxVIg3HuEY6Z93Yltqg+ UmSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:dkim-signature:delivered-to; bh=XPHUaw44S9JD47ej3WtCWTXzu9k68wKzymCtjNhvJe4=; b=xLF5mOtz4vM3VmvZxRRSUquFLxE6Irgx4gXBPQs7as6NeA2a/gulLNs2CVveDJMAxm 4ktCbnFfp81WpdB2rn+gxe2JRGOiYwtlBfnLIrBK8IOfeokRKKTqho5UhgUvLIiLZxuw 1/psgE0v/iD167br9pKlep5WZ4/ATC2iQs+DMYrrODfXkdare9E0LBU00xaslRaxWBF2 4Wb7zdhaa799SNXYqEL4qfeXGX5sCwEzPoxyvmEvl6NaFtedbDyM6aMx8JkPmCCPa7os XHY6NcPL2zZW9UoWDPbeTzPYgESVSQm4bWGpBb06dm5+OCfgNtQwiR1s5KSmzsBbuBAV BPxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm3 header.b=hiojzZpx; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm1 header.b="ku/8pcjU"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id n22si8299955ejk.189.2021.10.07.07.31.46; Thu, 07 Oct 2021 07:31:46 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm3 header.b=hiojzZpx; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm1 header.b="ku/8pcjU"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9E48368A72C; Thu, 7 Oct 2021 17:31:12 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CBAFA68A4A4 for ; Thu, 7 Oct 2021 17:31:03 +0300 (EEST) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id E75245C0077 for ; Thu, 7 Oct 2021 10:31:02 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Thu, 07 Oct 2021 10:31:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= from:to:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm3; bh=qUI8gfSI1tPYd BvruEmZWLUDqchwUeY5WHlukRAMpyk=; b=hiojzZpxrgkUEiod7S+igXkw7XJkl DCDE370a6XTBYc0MCpGE/y2WCfjXzWJaYpRDT7eAk3wKvoQpChncLmoPFk6GZ/TU M3adon8EG+YODtijGq1/wA28ccakMldGpSoWjUpblz4Uvw8YhwXS5Cq47MbaXoVL hFueC/1PNRaXeVemUyhJl2Tsv1rVoBLkNZb2AtHE6kFXFP5T7Kd95IDlWhK8UN2j uuny4AOROEVZDu5sP0EYYehEKElABghJxiSm6kKeDsEAuWuy/6x1jySTZehm83Qc RL0vaQJiAlyePR1uL8tMz6OacgvRqYZ9PihLEtHol1gasg0dLZNmB6LHQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=qUI8gfSI1tPYdBvruEmZWLUDqchwUeY5WHlukRAMpyk=; b=ku/8pcjU s271WnmQV4KcjSsdLzsXxAxF3nUBc8RJeM7U5mtnErLN7gbAyk7DQSQPZdvMYbCU OzyBOVfUjTz7cbj9OP7zVPmXQIDMxD00zxAkopZ7OZ81HJUmsZkfKrfCacyKfuUl LZixpVLN8BQ+/SS1UcsUI4JK+y2Mg5R4K9gp3vuOrgV4lOBZ0kinCQgE3IuMPoy9 YgVYvkMbncKq2diX+w+l8vIfQmdCKx5eFbXO9/WoYyKCwmVCP17yZ2Mg+SYCzQCa /Jx0mIUTJbmV4EmI4hKgncXGjl1RZi5brG7PfqeP+jN0QHM3135tU6+nX8hx6CiB pUul77Ffu/Os0w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrudelkedgjeeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepfdflrdcuffgvkhhkvghrfdcuoehjuggvkhesihhtrghnihhm uhhlrdhliheqnecuggftrfgrthhtvghrnhephfdvfeejffegffefieefvefgheeuudelhe fhffevtddthfehudeitdetfeehieetnecuffhomhgrihhnpehnvghonhdrshgsnecuvehl uhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepjhguvghksehith grnhhimhhulhdrlhhi X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Thu, 7 Oct 2021 10:31:02 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Thu, 7 Oct 2021 16:30:57 +0200 Message-Id: <20211007143057.17870-4-jdek@itanimul.li> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: <20211007143057.17870-1-jdek@itanimul.li> References: <20211007143057.17870-1-jdek@itanimul.li> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/4] lavc/aarch64: clean-up sao band 8x8 function formatting X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 1dR1bqSTeD/1 Signed-off-by: J. Dekker --- libavcodec/aarch64/hevcdsp_sao_neon.S | 103 +++++++++++--------------- 1 file changed, 44 insertions(+), 59 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_sao_neon.S b/libavcodec/aarch64/hevcdsp_sao_neon.S index 263747149f..c2519da7f5 100644 --- a/libavcodec/aarch64/hevcdsp_sao_neon.S +++ b/libavcodec/aarch64/hevcdsp_sao_neon.S @@ -3,7 +3,7 @@ * * AArch64 NEON optimised SAO functions for HEVC decoding * - * Copyright (c) 2020 Josh Dekker + * Copyright (c) 2020-2021 J. Dekker * * This file is part of FFmpeg. * @@ -29,64 +29,49 @@ // int16_t *sao_offset_val, int sao_left_class, // int width, int height) function ff_hevc_sao_band_filter_8x8_8_neon, export=1 - sub sp, sp, #64 - stp xzr, xzr, [sp] - stp xzr, xzr, [sp, #16] - stp xzr, xzr, [sp, #32] - stp xzr, xzr, [sp, #48] - mov w8, #4 - sxtw x6, w6 -0: - ldrsh x9, [x4, x8, lsl #1] // x9 = sao_offset_val[k+1] - subs w8, w8, #1 - add w10, w8, w5 // x10 = k + sao_left_class - and w10, w10, #0x1F - strh w9, [sp, x10, lsl #1] - bne 0b - ld1 {v16.16b-v19.16b}, [sp], #64 - movi v20.8h, #1 - sub x2, x2, x6 // stride_dst - width - sub x3, x3, x6 // stride_src - width -1: // beginning of line - mov x8, x6 -2: - // Simple layout for accessing 16bit values - // with 8bit LUT. - // - // 00 01 02 03 04 05 06 07 - // +-----------------------------------> - // |xDE#xAD|xCA#xFE|xBE#xEF|xFE#xED|.... - // +-----------------------------------> - // i-0 i-1 i-2 i-3 - // dst[x] = av_clip_pixel(src[x] + offset_table[src[x] >> shift]); - ld1 {v2.8b}, [x1], #8 - // load src[x] - uxtl v0.8h, v2.8b - // >> shift - ushr v2.8h, v0.8h, #3 // BIT_DEPTH - 3 - // x2 (access lower short) - shl v1.8h, v2.8h, #1 // low (x2, accessing short) - // +1 access upper short - add v3.8h, v1.8h, v20.8h - // shift insert index to upper byte - sli v1.8h, v3.8h, #8 - // table - tbx v2.16b, {v16.16b-v19.16b}, v1.16b - // src[x] + table - add v1.8h, v0.8h, v2.8h - // clip + narrow - sqxtun v4.8b, v1.8h - // store - st1 {v4.8b}, [x0], #8 - // done 8 pixels - subs w8, w8, #8 - bne 2b - // finished line - subs w7, w7, #1 - add x0, x0, x2 // dst += stride_dst - add x1, x1, x3 // src += stride_src - bne 1b - ret + sub sp, sp, #64 + stp xzr, xzr, [sp] + stp xzr, xzr, [sp, #16] + stp xzr, xzr, [sp, #32] + stp xzr, xzr, [sp, #48] + mov w8, #4 + sxtw x6, w6 +0: ldrsh x9, [x4, x8, lsl #1] // sao_offset_val[k+1] + subs w8, w8, #1 + add w10, w8, w5 // k + sao_left_class + and w10, w10, #0x1F + strh w9, [sp, x10, lsl #1] + bne 0b + ld1 {v16.16b-v19.16b}, [sp], #64 + movi v20.8h, #1 + sub x2, x2, x6 // stride_dst - width + sub x3, x3, x6 // stride_src - width +1: mov x8, x6 // beginning of line +2: // Simple layout for accessing 16bit values + // with 8bit LUT. + // + // 00 01 02 03 04 05 06 07 + // +-----------------------------------> + // |xDE#xAD|xCA#xFE|xBE#xEF|xFE#xED|.... + // +-----------------------------------> + // i-0 i-1 i-2 i-3 + ld1 {v2.8b}, [x1], #8 // dst[x] = av_clip_pixel(src[x] + offset_table[src[x] >> shift]); + uxtl v0.8h, v2.8b // load src[x] + ushr v2.8h, v0.8h, #3 // >> BIT_DEPTH - 3 + shl v1.8h, v2.8h, #1 // low (x2, accessing short) + add v3.8h, v1.8h, v20.8h // +1 access upper short + sli v1.8h, v3.8h, #8 // shift insert index to upper byte + tbx v2.16b, {v16.16b-v19.16b}, v1.16b // table + add v1.8h, v0.8h, v2.8h // src[x] + table + sqxtun v4.8b, v1.8h // clip + narrow + st1 {v4.8b}, [x0], #8 // store + subs w8, w8, #8 // done 8 pixels + bne 2b + subs w7, w7, #1 // finished line, prep. new + add x0, x0, x2 // dst += stride_dst + add x1, x1, x3 // src += stride_src + bne 1b + ret endfunc // ASSUMES STRIDE_SRC = 192