From patchwork Wed Nov 17 04:56:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 31451 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:d206:0:0:0:0:0 with SMTP id q6csp211940iob; Tue, 16 Nov 2021 20:57:23 -0800 (PST) X-Google-Smtp-Source: ABdhPJz7yS6ptF6RMJmSio2az0NiVI7uOzLCIOX+90d54Jw2WP3RP76gpCylvAJLY0cHjRz1ipbB X-Received: by 2002:a17:907:94ca:: with SMTP id dn10mr18132005ejc.263.1637125043559; Tue, 16 Nov 2021 20:57:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1637125043; cv=none; d=google.com; s=arc-20160816; b=ygwP86Yxo/LW99wruX355qOylEmhc5Unj2w5pcPTQL6ntEdJO+hMdblrPhw1uVC+M2 A7mO2iKimUn5cFvykU+dBPJi4AOngFFfV04OnGYv7jh6jlvwvYh2lnDV6uKqUohl+Jq3 SYR8BltDzunW6fBN5sNnWnZqMdlQL5FdtMQLc/ELQZui+pEhgzANrvTAoaWiyMoWFg23 vDp1MyTPMqmAzwIiIwPSDiSpx6lAkX3pX+NVOpdCgbYD1JTjvtNjz1YQtk8ZQhUTjMA6 K6psmmSzv7s3FT5x8l3nF8TON3YUqJvPHKJ6lbZ8ki3rFJhJr/VvMHwE6kWKqGuiHkfb xPYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:dkim-signature:delivered-to; bh=50ZJJZO7h276vDIjot2hn3LHcZp+FavAFshSqui1n/o=; b=sHAu6uijnJ6ALVYmbLySgTmHb3LqLGK0L7IscrEw3aEpjfL7HurewfZsQK+K32co8o n/fOfSm0NxvAVFZ+k7sX38vlfDWBqUqqnwNrLuxL4wAtkBkLc8UN8jyqxpKPwx8yKhJU CHbxDZTx0V1CA9aoh0ZJ8LSBf3epnsHG6BHDOV30eh8OuKc1Pzmutx8NqOMiCkp+zpST rdR+Ci09cois9lKhmIkVCE7IHmtR3PJfCxf06v+yZ9CwxUHGE40hNFEEEjhdGay/xfvL yPNdEAI+pnXT42XDjpCgH2Gfw7/u7JpWWdfAvjlUj1ivY5vpmSzrp9IBTL2U8FGC+xvV DWdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm3 header.b=W3OEJf73; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm1 header.b=KVWNGUUh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id hj1si52379385ejc.555.2021.11.16.20.57.23; Tue, 16 Nov 2021 20:57:23 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm3 header.b=W3OEJf73; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm1 header.b=KVWNGUUh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E0BDF68AE5B; Wed, 17 Nov 2021 06:56:31 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com [64.147.123.25]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8717368AE3F for ; Wed, 17 Nov 2021 06:56:25 +0200 (EET) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id 0E85F3201D7D for ; Tue, 16 Nov 2021 23:56:23 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Tue, 16 Nov 2021 23:56:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= from:to:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm3; bh=uV9QWEa0hO21R 2YiYFt/qMtt4gztArn8fAu/fn7YIac=; b=W3OEJf73gX/OiEAM+hdCUzmXjeQCG kwY1MWJegx2p4wBEBFZbbe4jc+J55luVl/W6dxBpLHalAj4eNzWN6+Shd0C0HDIW /kwJzib07bbIWycE5vVS10eUkzq5CC+ZZArvw5NjE2dOGDgbf8WQy5HqrMGKkMs3 jwaRUBzBtkUdOwrwFjSDmlF27Kym2fwRgKUVNpqtI4nysV4bTs1/YiIesHuc0u7+ AKdbThQOgdUU4RM7bcyed5ihwdprHXlZ6X2531/J/w3FcpeTvDsWBdDp/8dWEewx hW+GDjE5b7O+0Bf4reLyXTGXYVjRIjLNWr3PojRaAaidhKL6UTB5OJGJw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=uV9QWEa0hO21R2YiYFt/qMtt4gztArn8fAu/fn7YIac=; b=KVWNGUUh wrdRyEamPC3hZ7NxAGPW6lCCZxHM2h+dUfg+hPK97+eXX7iQ1UnqgMKQ8FBNAeCa 2rH8rahZd6unxeaEpBCJqzgFT7xtcuKyp6kMY3/aMGP0kcuaBTqiDy0kEUUiTPS/ 1dj5DdLR02j05XzjTY+2339QTf48jZVSH7nZjof+VGtTPISBcsmXAeIcZMLMgqbc zlDlfvl7s7nfE6XJ+ls7HbE205Sfv1p11oZ6/0a7Ynsi7ibEIAmz0HU9RmySeF/J N2IMI/pAWo3kI76G94LmeuB7rKhhjIjxF9EUfu7WT4TTCJXK2d+/KITwDw8naSJd FblmFFThd6Ocaw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrfeefgdejiecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpedflfdrucffvghkkhgvrhdfuceojhguvghksehithgrnhhimhhu lhdrlhhiqeenucggtffrrghtthgvrhhnpefhvdefjeffgefffeeifeevgfehueduleehhf ffvedttdfhheduiedtteefheeiteenucffohhmrghinhepnhgvohhnrdhssgenucevlhhu shhtvghrufhiiigvpedunecurfgrrhgrmhepmhgrihhlfhhrohhmpehjuggvkhesihhtrg hnihhmuhhlrdhlih X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Tue, 16 Nov 2021 23:56:22 -0500 (EST) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Wed, 17 Nov 2021 05:56:13 +0100 Message-Id: <20211117045614.55251-5-jdek@itanimul.li> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: <20211117045614.55251-1-jdek@itanimul.li> References: <20211117045614.55251-1-jdek@itanimul.li> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 5/6] lavc/aarch64: add hevc sao band 8x8 tiling X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: VOZz0sBk1sxx --bench on AWS Graviton: hevc_sao_band_8x8_8_c: 317.5 hevc_sao_band_8x8_8_neon: 97.5 hevc_sao_band_16x16_8_c: 1115.0 hevc_sao_band_16x16_8_neon: 322.7 hevc_sao_band_32x32_8_c: 4599.2 hevc_sao_band_32x32_8_neon: 1246.2 hevc_sao_band_48x48_8_c: 10021.7 hevc_sao_band_48x48_8_neon: 2740.5 hevc_sao_band_64x64_8_c: 17635.0 hevc_sao_band_64x64_8_neon: 4875.7 Signed-off-by: J. Dekker --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 6 +++++- libavcodec/aarch64/hevcdsp_sao_neon.S | 9 ++++++--- 2 files changed, 11 insertions(+), 4 deletions(-) No change since previous patch which was ACK'd, just want to push this set together. diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index b93cec9e44..2002530266 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -77,7 +77,11 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_neon; c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_neon; c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_neon; - c->sao_band_filter[0] = ff_hevc_sao_band_filter_8x8_8_neon; + c->sao_band_filter[0] = + c->sao_band_filter[1] = + c->sao_band_filter[2] = + c->sao_band_filter[3] = + c->sao_band_filter[4] = ff_hevc_sao_band_filter_8x8_8_neon; c->sao_edge_filter[0] = ff_hevc_sao_edge_filter_8x8_8_neon; c->sao_edge_filter[1] = c->sao_edge_filter[2] = diff --git a/libavcodec/aarch64/hevcdsp_sao_neon.S b/libavcodec/aarch64/hevcdsp_sao_neon.S index e844cc8980..82b234aa47 100644 --- a/libavcodec/aarch64/hevcdsp_sao_neon.S +++ b/libavcodec/aarch64/hevcdsp_sao_neon.S @@ -35,6 +35,7 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 stp xzr, xzr, [sp, #32] stp xzr, xzr, [sp, #48] mov w8, #4 + sxtw x6, w6 0: ldrsh x9, [x4, x8, lsl #1] // x9 = sao_offset_val[k+1] subs w8, w8, #1 @@ -44,8 +45,10 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 bne 0b ld1 {v16.16b-v19.16b}, [sp], #64 movi v20.8h, #1 + sub x2, x2, x6 // stride_dst - width + sub x3, x3, x6 // stride_src - width 1: // beginning of line - mov w8, w6 + mov x8, x6 2: // Simple layout for accessing 16bit values // with 8bit LUT. @@ -56,7 +59,7 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 // +-----------------------------------> // i-0 i-1 i-2 i-3 // dst[x] = av_clip_pixel(src[x] + offset_table[src[x] >> shift]); - ld1 {v2.8b}, [x1] + ld1 {v2.8b}, [x1], #8 // load src[x] uxtl v0.8h, v2.8b // >> shift @@ -74,7 +77,7 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 // clip + narrow sqxtun v4.8b, v1.8h // store - st1 {v4.8b}, [x0] + st1 {v4.8b}, [x0], #8 // done 8 pixels subs w8, w8, #8 bne 2b