From patchwork Thu Apr 28 13:42:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 35471 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b9e:b0:7d:cfb5:dc7c with SMTP id b30csp3597890pzh; Thu, 28 Apr 2022 06:42:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzLy20hy7sp0X48bPKWjJ8Y+/zuVEwkE8F2qwv5LeScUv4mhuyzQo7h/LJ4Fp4dViJnbNnT X-Received: by 2002:a50:9986:0:b0:413:bbdd:d5a1 with SMTP id m6-20020a509986000000b00413bbddd5a1mr36055899edb.26.1651153347749; Thu, 28 Apr 2022 06:42:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651153347; cv=none; d=google.com; s=arc-20160816; b=Rl3tz2Xyupw66emUUMA/JBhjXB12WOuHzvrqE5FoNuIfFIb8iWSm9NYwgLj6Ee7Haj lkvgUM96wtthv2HqoOPhM2hoPvPPrrnWIKCX7udT74iPQcp2iwu0JWhd0sLgSECvHtLy jMN7zcI+8+fCn8cQ6y9RWwppscPs5p//kRNWMHVHnhWlyFhpw4LT+UgjFI+V4MThV1LU x1xENhiudsLqDpBZpl76ejAOHTr69ZQ8wydYceSymOvX/RqvsUx6SIGNhKZrkIyxxRmp 6z1lBtsr0rwZUrdCh2FnsdV+MwO1SZxCSkFapIAAA3BEhhKBE16X9UZKRKwAL7xElxJM pKxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:dkim-signature:delivered-to; bh=bAa/s2T5ZoIJO+EfBcvDdRA2ldSKnv9Pm13L1VPpFE0=; b=kst5GU0e/KGNP+5VTsuUtSLHNizUI+FHVhIf7WKjIN8AH4nQyBsLdyuQdGMSvhyuRA i4VynKIWCdQXuuVxn/WbQsLj5fcg/SqQLwDrGrVyLG89pHEW2/Fjqq1X+r6KvqlNAc6t Qwq3DdkqQ8jDSQM9hp8V138I/3mvpLLn7oi3KKx0cr5AeelewG+EtcUU50K6nGnm0861 pMtClk4X6IYablTs7PtS9JlxyE1T71rXqmD6xlj83uZcAl6MoX9QibQTpP4cKTW4E0Vu pz9g8m/OZGLUg5kOFMA7d1f6juTVqNUbECOEZhcZ2c/A9OFPhS774iINHqAiWyT730/G hzdg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm2 header.b=jWmngqMl; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm1 header.b=CF27oPQQ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u22-20020a170906069600b006eff1696032si889520ejb.14.2022.04.28.06.42.27; Thu, 28 Apr 2022 06:42:27 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm2 header.b=jWmngqMl; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm1 header.b=CF27oPQQ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 550FB68B2BE; Thu, 28 Apr 2022 16:42:23 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 622C368B2BE for ; Thu, 28 Apr 2022 16:42:16 +0300 (EEST) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id F42255C00BC for ; Thu, 28 Apr 2022 09:42:14 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Thu, 28 Apr 2022 09:42:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:content-transfer-encoding:date:date:from:from:in-reply-to :message-id:mime-version:reply-to:sender:subject:subject:to:to; s=fm2; t=1651153334; x=1651239734; bh=0J99Gyc3f/hkt1SB9/pOLMJn/ cUscEgU1eN7jnzuAnM=; b=jWmngqMl/zQIyBhoNwHZAr+eLb9colrodVzpHKrj3 GYnEvoqJm3v8LO5K9e1fmtY1VX63QxkqwqXfNHQRGOHuwBWHy1/ku53TWOtAoiqp VnGqy5vXxlnD2eQT/qsM2eEBJKfMZWBBhQWYOcKWz1HiiBdudwqamFrosR3Y0rBx w0r3+BPTMkMbWzSsnGLsze8YC9x/cgs8l4Ftwjb5NpZa+6RbJmL/RvwtW53ktJJ9 DbXUmWsIBgLyagCs0TvRlJLNgV3uAs61rKqor7FUquL18STEh6T9XuXKR7CkzWIz kyoVscEFgpejUypbmmckzCL+nCD+G1MwUn7JVdnuQnE2w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:date :from:from:in-reply-to:message-id:mime-version:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1651153334; x=1651239734; bh=0 J99Gyc3f/hkt1SB9/pOLMJn/cUscEgU1eN7jnzuAnM=; b=CF27oPQQvyFBCr+v0 LBNEMaXwbTf098WJ0+3UH09ejr3Z2b/ywlGyxS1NWqlBx7gBV7iZt+xTwFR/iWdF zOo89d50zF/4mI3/l3amGECfAbF1IjM2g1v2rFu63M6mfiAlxx7d0BJWTAg3dIYx 58jHnkLrOmYUM2VzcQVKDKXtQz9BVLIYu3niPVMYn9MPQ7dtIdna4n7TMZqTS3Fg A++Omb7tqv9iIpfHkSd/IGgdcvztz3B3hlhhkGrdBXgatWwa04z98ZdHh7egtTOR 4B0ZWPJRC8suhO/2oGcbRIvZucU/+l8+8qXUwi8KLqYUgCMEbRR/DsG8V9Psu+Yv g71yg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrudejgdeikecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffoggfgsedtkeertdertd dtnecuhfhrohhmpedflfdrucffvghkkhgvrhdfuceojhguvghksehithgrnhhimhhulhdr lhhiqeenucggtffrrghtthgvrhhnpefhfedtieetieejieegveefveehudelveevteejke evieeukeejvefhhfetjeethfenucffohhmrghinhepnhgvohhnrdhssgenucevlhhushht vghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehjuggvkhesihhtrghnih hmuhhlrdhlih X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Thu, 28 Apr 2022 09:42:14 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Thu, 28 Apr 2022 15:42:09 +0200 Message-Id: <20220428134211.4786-1-jdek@itanimul.li> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/3] lavc/aarch64: fix hevc sao band filter X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: NaS9LDPM0p0y The SAO band filter can be called with non-multiples of 8, we round up to the nearest multiple of 8 to account for this. Signed-off-by: J. Dekker --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 10 +++++----- libavcodec/aarch64/hevcdsp_sao_neon.S | 8 ++++++-- 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 1e40be740c..c8963e6104 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -75,11 +75,11 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_neon; c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_neon; c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_neon; - // This function is disabled, as it doesn't handle widths that aren't - // an even multiple of 8 correctly. fate-hevc doesn't exercise that - // for the current size, but if enabled for bigger sizes, the cases - // of non-multiple of 8 seem to arise. -// c->sao_band_filter[0] = ff_hevc_sao_band_filter_8x8_8_neon; + c->sao_band_filter[0] = + c->sao_band_filter[1] = + c->sao_band_filter[2] = + c->sao_band_filter[3] = + c->sao_band_filter[4] = ff_hevc_sao_band_filter_8x8_8_neon; } if (bit_depth == 10) { c->add_residual[0] = ff_hevc_add_residual_4x4_10_neon; diff --git a/libavcodec/aarch64/hevcdsp_sao_neon.S b/libavcodec/aarch64/hevcdsp_sao_neon.S index d523bf584d..e07e0cea2d 100644 --- a/libavcodec/aarch64/hevcdsp_sao_neon.S +++ b/libavcodec/aarch64/hevcdsp_sao_neon.S @@ -41,7 +41,11 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 and w10, w10, #0x1F strh w9, [sp, x10, lsl #1] bne 0b + add w6, w6, #7 + bic w6, w6, #7 ld1 {v16.16b-v19.16b}, [sp], #64 + sub x2, x2, x6 + sub x3, x3, x6 movi v20.8h, #1 1: mov w8, w6 // beginning of line 2: // Simple layout for accessing 16bit values @@ -52,7 +56,7 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 // |xDE#xAD|xCA#xFE|xBE#xEF|xFE#xED|.... // +-----------------------------------> // i-0 i-1 i-2 i-3 - ld1 {v2.8b}, [x1] // dst[x] = av_clip_pixel(src[x] + offset_table[src[x] >> shift]); + ld1 {v2.8b}, [x1], #8 // dst[x] = av_clip_pixel(src[x] + offset_table[src[x] >> shift]); uxtl v0.8h, v2.8b // load src[x] ushr v2.8h, v0.8h, #3 // >> BIT_DEPTH - 3 shl v1.8h, v2.8h, #1 // low (x2, accessing short) @@ -61,7 +65,7 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 tbx v2.16b, {v16.16b-v19.16b}, v1.16b // table add v1.8h, v0.8h, v2.8h // src[x] + table sqxtun v4.8b, v1.8h // clip + narrow - st1 {v4.8b}, [x0] // store + st1 {v4.8b}, [x0], #8 // store subs w8, w8, #8 // done 8 pixels bne 2b subs w7, w7, #1 // finished line, prep. new From patchwork Thu Apr 28 13:42:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 35472 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b9e:b0:7d:cfb5:dc7c with SMTP id b30csp3598017pzh; Thu, 28 Apr 2022 06:42:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw6b5TlwKuWTL+GXBb9MUhlRHwf27dgeWm2xmUiSPgQSrnpXzemMTk/uAd/uf6J8XL6glB4 X-Received: by 2002:a05:6402:84a:b0:426:262d:967e with SMTP id b10-20020a056402084a00b00426262d967emr4322709edz.286.1651153358785; Thu, 28 Apr 2022 06:42:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651153358; cv=none; d=google.com; s=arc-20160816; b=DcLdTkk+jl/MHwjtmkJtk25fhMjVeRmlaXUh/0o65+xG4Q6yALiU3rFcezqIIo/9Gi joLUeKeeLFEHu5auIroZHQHU0HktjkTMZQdgY7rAkCQbrtOSSQsJkysqQWaz+WMGIq2N mGVtwMc+iYjla7dE55skEo/JD6EJDRWWygOTMtoWQjzIoIiD00slFAg+N+/gXez6C7aZ Ssajhmu4PDuTWnkkdLH1MFmB1OBkFBXdPGFnf8IviKab9744RcM8fHE0W2dcPM7kEIsu axwhLkp/S/svo1Cjliv/qety7fkMrb8q3sJCllT//82jaug0mh7l4eL8zP+IObgcnM6n ZapA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:dkim-signature:delivered-to; bh=ME4UeIWwZwPqQty99BogLxoSfkYSOiDPM/bFqa0rCgI=; b=I24RVpmOUzsZ+q6Hua5U+qE0h5AZeaapCBKorrzaPk0PY9aaZA9L31TIfmlodDQfYw behdwfuU8VTjelr+YoGlIqu/fcoJytiGYT9JMe5UfllfHBhmry8a+E+XRznalcy5UeMf wQMQrElCfSbmLdGVoVCYeOWhBkd2dj/mVZX+p3UPPPRg/G2Ehbepw6dX5C9aLG1jLq1G 5XQ0VXZXqeZyT/pNB7bMTwlLq5weoi5NM/bvFVoMvvhWO7C2oXcA6NkDvDBXL/7HgYIG w9/uyHRCPMZjzR/9lTjt39DByrBQtNklzF5XVrIc3QExN2WWWrxC4q7/mrIZD2in+ZBu tKbA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm2 header.b=cJWVFDFT; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm1 header.b=qOHwgDRB; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id yk8-20020a17090770c800b006f38bf1700fsi3838714ejb.585.2022.04.28.06.42.38; Thu, 28 Apr 2022 06:42:38 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm2 header.b=cJWVFDFT; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm1 header.b=qOHwgDRB; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 564F968B40A; Thu, 28 Apr 2022 16:42:24 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A55B568B2BE for ; Thu, 28 Apr 2022 16:42:16 +0300 (EEST) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id C65955C00EC for ; Thu, 28 Apr 2022 09:42:15 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Thu, 28 Apr 2022 09:42:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:content-transfer-encoding:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm2; t=1651153335; x=1651239735; bh=ko WLtcd7Jtn0thzx/SPDcQ+cv8UoUoJxfl/ClCmPL/k=; b=cJWVFDFTFy9jFXAVgb xqFS39en0V8iM43zZs5AiYRZocmPnR324q+qBv5eU75P7gjfUD1cyFjSTlsbOyRB Vn/IeFklm3WLyyqcZgmw+VJX3QmlE0VZLNEbZ3Mu91pa/zydMMbCNHR1jIbjxmLc zpY7es+W/Dtp/Qo5KzFx3FMwrhinDkYqJzx6OgMQDv46Cf5rXyxJpAeyo4vQvEkg GCWTNXevsD9VeiQv2pYg47XJst+s6p8ubhyArGPLgcFMFKxDRHDg6b5YDELcosSB erTj5VZejOiKYIcwXjZ/Z/cOUNAhvPBfyAyRVJiKzHlvRKBkH/jgRADTqP/VCknt kSLA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1651153335; x=1651239735; bh=koWLtcd7Jtn0thzx/SPDcQ+cv8UoUoJxfl/ ClCmPL/k=; b=qOHwgDRBoBDo1Z8LIQGFVmR72b8m5KNvo2RoXhCHEnxHNZQj+QV bkDQIa7CY9uraenk+bZljpg/hGqDxjz7z6AH00WyLF8Tentq6QEgqRORZuI3OMmD 3F729fHtFLpWMwkN+ZRDF6Sn5GGnY/mdx79IQ0Lirw9cDX6hU//4MCGivBS1QjJM rjoaZQulr4ZVN/EnqRWdlOxyceBqoTNOTK1D7VBIzdZHXCOeviiPr0TqaPlZN8sO hYqSr7yKBlR4//ejhnfsePvjKNT3HKzU2O8ofQ+Wdv2MrgMRd3oBobHygVBPYpsD oB2CYF81aqF3QEvH/pVTzEu5z8KtQaby+bA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrudejgdeikecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpedflfdrucffvghkkhgvrhdfuceojhguvghksehithgrnhhimhhu lhdrlhhiqeenucggtffrrghtthgvrhhnpefhvdefjeffgefffeeifeevgfehueduleehhf ffvedttdfhheduiedtteefheeiteenucffohhmrghinhepnhgvohhnrdhssgenucevlhhu shhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehjuggvkhesihhtrg hnihhmuhhlrdhlih X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Thu, 28 Apr 2022 09:42:15 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Thu, 28 Apr 2022 15:42:10 +0200 Message-Id: <20220428134211.4786-2-jdek@itanimul.li> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220428134211.4786-1-jdek@itanimul.li> References: <20220428134211.4786-1-jdek@itanimul.li> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/3] lavc/aarch64: add hevc sao edge 16x16 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: sRRost0vcVdO bench on AWS Graviton: hevc_sao_edge_16x16_8_c: 1857.0 hevc_sao_edge_16x16_8_neon: 211.0 hevc_sao_edge_32x32_8_c: 7802.2 hevc_sao_edge_32x32_8_neon: 808.2 hevc_sao_edge_48x48_8_c: 16764.2 hevc_sao_edge_48x48_8_neon: 1796.5 hevc_sao_edge_64x64_8_c: 32647.5 hevc_sao_edge_64x64_8_neon: 3118.5 Signed-off-by: J. Dekker --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 8 ++- libavcodec/aarch64/hevcdsp_sao_neon.S | 66 +++++++++++++++++++++++ 2 files changed, 72 insertions(+), 2 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index c8963e6104..df521bb083 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -57,8 +57,8 @@ void ff_hevc_sao_band_filter_8x8_8_neon(uint8_t *_dst, uint8_t *_src, ptrdiff_t stride_dst, ptrdiff_t stride_src, int16_t *sao_offset_val, int sao_left_class, int width, int height); - - +void ff_hevc_sao_edge_filter_16x16_8_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride_dst, + int16_t *sao_offset_val, int eo, int width, int height); av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) { @@ -80,6 +80,10 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->sao_band_filter[2] = c->sao_band_filter[3] = c->sao_band_filter[4] = ff_hevc_sao_band_filter_8x8_8_neon; + c->sao_edge_filter[1] = + c->sao_edge_filter[2] = + c->sao_edge_filter[3] = + c->sao_edge_filter[4] = ff_hevc_sao_edge_filter_16x16_8_neon; } if (bit_depth == 10) { c->add_residual[0] = ff_hevc_add_residual_4x4_10_neon; diff --git a/libavcodec/aarch64/hevcdsp_sao_neon.S b/libavcodec/aarch64/hevcdsp_sao_neon.S index e07e0cea2d..0315c479df 100644 --- a/libavcodec/aarch64/hevcdsp_sao_neon.S +++ b/libavcodec/aarch64/hevcdsp_sao_neon.S @@ -74,3 +74,69 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 bne 1b ret endfunc + +// ASSUMES STRIDE_SRC = 192 +.Lsao_edge_pos: +.word 1 // horizontal +.word 192 // vertical +.word 192 + 1 // 45 degree +.word 192 - 1 // 135 degree + +// ff_hevc_sao_edge_filter_16x16_8_neon(char *dst, char *src, ptrdiff stride_dst, +// int16 *sao_offset_val, int eo, int width, int height) +function ff_hevc_sao_edge_filter_16x16_8_neon, export=1 + adr x7, .Lsao_edge_pos + ld1 {v3.8h}, [x3] // load sao_offset_val + add w5, w5, #0xF + bic w5, w5, #0xF + ldr w4, [x7, w4, uxtw #2] // stride_src + mov v3.h[7], v3.h[0] // reorder to [1,2,0,3,4] + mov v3.h[0], v3.h[1] + mov v3.h[1], v3.h[2] + mov v3.h[2], v3.h[7] + // split 16bit values into two tables + uzp2 v1.16b, v3.16b, v3.16b // sao_offset_val -> upper + uzp1 v0.16b, v3.16b, v3.16b // sao_offset_val -> lower + movi v2.16b, #2 + mov x15, #192 + // strides between end of line and next src/dst + sub x15, x15, x5 // stride_src - width + sub x16, x2, x5 // stride_dst - width + mov x11, x1 // copy base src +1: // new line + mov x14, x5 // copy width + sub x12, x11, x4 // src_a (prev) = src - sao_edge_pos + add x13, x11, x4 // src_b (next) = src + sao_edge_pos +2: // process 16 bytes + ld1 {v3.16b}, [x11], #16 // load src + ld1 {v4.16b}, [x12], #16 // load src_a (prev) + ld1 {v5.16b}, [x13], #16 // load src_b (next) + cmhi v16.16b, v4.16b, v3.16b // (prev > cur) + cmhi v17.16b, v3.16b, v4.16b // (cur > prev) + cmhi v18.16b, v5.16b, v3.16b // (next > cur) + cmhi v19.16b, v3.16b, v5.16b // (cur > next) + sub v20.16b, v16.16b, v17.16b // diff0 = CMP(cur, prev) = (cur > prev) - (cur < prev) + sub v21.16b, v18.16b, v19.16b // diff1 = CMP(cur, next) = (cur > next) - (cur < next) + add v20.16b, v20.16b, v21.16b // diff = diff0 + diff1 + add v20.16b, v20.16b, v2.16b // offset_val = diff + 2 + tbl v16.16b, {v0.16b}, v20.16b + tbl v17.16b, {v1.16b}, v20.16b + uxtl v20.8h, v3.8b // src[0:7] + uxtl2 v21.8h, v3.16b // src[7:15] + zip1 v18.16b, v16.16b, v17.16b // sao_offset_val lower -> + zip2 v19.16b, v16.16b, v17.16b // sao_offset_val upper -> + sqadd v20.8h, v18.8h, v20.8h // + sao_offset_val + sqadd v21.8h, v19.8h, v21.8h + sqxtun v3.8b, v20.8h + sqxtun2 v3.16b, v21.8h + st1 {v3.16b}, [x0], #16 + subs x14, x14, #16 // filtered 16 bytes + b.ne 2b // do we have width to filter? + // no width to filter, setup next line + add x11, x11, x15 // stride src to next line + add x0, x0, x16 // stride dst to next line + subs w6, w6, #1 // filtered line + b.ne 1b // do we have lines to process? + // no lines to filter + ret +endfunc From patchwork Thu Apr 28 13:42:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 35473 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b9e:b0:7d:cfb5:dc7c with SMTP id b30csp3598121pzh; Thu, 28 Apr 2022 06:42:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy5tPTDYEPmfCuWbe53pcHgz36Sa8yJG9+Le0txXUg/9QCAivNAaOrjVuN8doIS1r4Tp5IP X-Received: by 2002:a17:907:3e1e:b0:6ef:ebf8:4059 with SMTP id hp30-20020a1709073e1e00b006efebf84059mr31835847ejc.657.1651153368981; Thu, 28 Apr 2022 06:42:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651153368; cv=none; d=google.com; s=arc-20160816; b=jVnlns49PGD0NU5wSDNXO1QrGWfPj9jqxX4jB/XnutMAjt43t9Hx3jkB0gx3Be6scP AObaMd32z6q/Ob0CkI+1pQ3hijx75cG/Bri2TmSJ+SX+LhfER/RNKFpbtjTRpv91onP2 Wxrv9Map7+OJhA05IBtn+4VdhoRUMIc79y7aSI39nCvDNjoA6pl7wvd3i0NTytBH/s1D +9/2BGspHTpqs3PZjENG8fC6MYlB8hATxjzAFwZjX/8fBlQjJ1KwPG7AP+jNnB+G3drM LBaW8mxM+jbp3q+OCa6O0PPdPH1AkWGZs0DdC3GXLXDj116S/hKLUirImsUYAS8y+cFR NXQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:dkim-signature:delivered-to; bh=WkMDGo4R6vUXDms3FU74I9bGyA+BV5TG6VaETM7foeA=; b=Nc9z51QMvpykJn2gBDXg9jvpTcZlVmr4LAw5nh6AjxWQfnUUMOzwVkPDBTK/xTuIfl 1Y3kbNHkPkehybnXed5BfWdO7o69BuhWbwwJo1UKsBJj8YDCYWZn1trdwM3jhMbJ0jt9 XqaevSo7dyMq5CcOGvWoppJpa3rgo0po1dyAyiBo8v+izvXb6db394ja0mhtrtEdUqed ynFwb/qy0rCc4smAS/4n+As7Wru/tBOU5myXq9S5QDZ+UT/J6LKUUB6DFfsLt7+SsIZc rY66sUyCTjKsaiUPR/I6piYbhGZhN2MOuiqWE6+gL60Ceb0NqfMF7Zd2oA1srYT2mJQQ W/Cw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm2 header.b=SeXW1Eub; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm1 header.b=BYqc2aOs; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id sd40-20020a1709076e2800b006e88cb94b9bsi4238220ejc.360.2022.04.28.06.42.48; Thu, 28 Apr 2022 06:42:48 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm2 header.b=SeXW1Eub; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm1 header.b=BYqc2aOs; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 68E8168B410; Thu, 28 Apr 2022 16:42:25 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6043168B36A for ; Thu, 28 Apr 2022 16:42:17 +0300 (EEST) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 815AA5C017C for ; Thu, 28 Apr 2022 09:42:16 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Thu, 28 Apr 2022 09:42:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:content-transfer-encoding:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm2; t=1651153336; x=1651239736; bh=6b 4IUsF6KuNAsQZFdagiSj2msOgfMcbCbAY8DDNFsJ8=; b=SeXW1EubrS8MxIeko9 roBfsaKGnTEn4JlKj2WEMS2L+bbP5/WXrScLi7FASBHkXl0A04padmy0cC2ES0fq 30Iwm2i4bg9yP+kIfgdtl5w1hh6VeIe1Vt3PCIPrbgIxosEmHIZvglTR+F+g65nI OzON6oiTSARkysVBtJ+g2P4eGaa3sgjVx9VvVZ3r9D3MTGBKBL6HTvO8SjQKvwZy B6vsFhw2Sv/Aq8RFr3I3vrmdssDFArCMddhn3RHExnSBKtTLRxRI91XkZWZOhq1K DXkALdJOTY2g+bhsSesciU8+DYLT/OaWJuv50OQl4ii7K6k+QMsoAHlnFszry15f C9fw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1651153336; x=1651239736; bh=6b4IUsF6KuNAsQZFdagiSj2msOgfMcbCbAY 8DDNFsJ8=; b=BYqc2aOsU5qU+KGkUalCIY4boJzX/JKBfYBcL6WU1kkPsyLqmFY CYrpFdh+Rucma+0i0hiB+vF4c6jDSJhoG74rxD5E1S5MkNIfCABUNFvyEwKqKha/ grj+l346G/UclHxFOpDm3+8bELkIhxOEVgUBkm4GxjRi4L9UXTCQvAQaCpgem4cL IOdWhI5Z8RnDbfqeeMRDSwTezqPNs/5oWB3e/MZY9wvoAsWAThOqGsKuRvduVl+R 0MFOVRFz45p1LqGcYq7hdvRG5EA+yLB5K79fvwG9lnH6RiVFHYM+qK3uxwIidKwI awZEJDxYGgAxGvq9QPraRsNqb1NQaPhHBUA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrudejgdeikecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpedflfdrucffvghkkhgvrhdfuceojhguvghksehithgrnhhimhhu lhdrlhhiqeenucggtffrrghtthgvrhhnpefhvdefjeffgefffeeifeevgfehueduleehhf ffvedttdfhheduiedtteefheeiteenucffohhmrghinhepnhgvohhnrdhssgenucevlhhu shhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehjuggvkhesihhtrg hnihhmuhhlrdhlih X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Thu, 28 Apr 2022 09:42:15 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Thu, 28 Apr 2022 15:42:11 +0200 Message-Id: <20220428134211.4786-3-jdek@itanimul.li> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220428134211.4786-1-jdek@itanimul.li> References: <20220428134211.4786-1-jdek@itanimul.li> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] lavc/aarch64: add hevc sao edge 8x8 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: AEZm5eYmw2jc bench on AWS Graviton: hevc_sao_edge_8x8_8_c: 516.0 hevc_sao_edge_8x8_8_neon: 81.0 Signed-off-by: J. Dekker --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 3 ++ libavcodec/aarch64/hevcdsp_sao_neon.S | 51 +++++++++++++++++++++++ 2 files changed, 54 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index df521bb083..2002530266 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -59,6 +59,8 @@ void ff_hevc_sao_band_filter_8x8_8_neon(uint8_t *_dst, uint8_t *_src, int width, int height); void ff_hevc_sao_edge_filter_16x16_8_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride_dst, int16_t *sao_offset_val, int eo, int width, int height); +void ff_hevc_sao_edge_filter_8x8_8_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride_dst, + int16_t *sao_offset_val, int eo, int width, int height); av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) { @@ -80,6 +82,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->sao_band_filter[2] = c->sao_band_filter[3] = c->sao_band_filter[4] = ff_hevc_sao_band_filter_8x8_8_neon; + c->sao_edge_filter[0] = ff_hevc_sao_edge_filter_8x8_8_neon; c->sao_edge_filter[1] = c->sao_edge_filter[2] = c->sao_edge_filter[3] = diff --git a/libavcodec/aarch64/hevcdsp_sao_neon.S b/libavcodec/aarch64/hevcdsp_sao_neon.S index 0315c479df..efd8112af4 100644 --- a/libavcodec/aarch64/hevcdsp_sao_neon.S +++ b/libavcodec/aarch64/hevcdsp_sao_neon.S @@ -140,3 +140,54 @@ function ff_hevc_sao_edge_filter_16x16_8_neon, export=1 // no lines to filter ret endfunc + +// ff_hevc_sao_edge_filter_8x8_8_neon(char *dst, char *src, ptrdiff stride_dst, +// int16 *sao_offset_val, int eo, int width, int height) +function ff_hevc_sao_edge_filter_8x8_8_neon, export=1 + adr x7, .Lsao_edge_pos + ldr w4, [x7, w4, uxtw #2] + ld1 {v3.8h}, [x3] + mov v3.h[7], v3.h[0] + mov v3.h[0], v3.h[1] + mov v3.h[1], v3.h[2] + mov v3.h[2], v3.h[7] + uzp2 v1.16b, v3.16b, v3.16b + uzp1 v0.16b, v3.16b, v3.16b + movi v2.16b, #2 + add x16, x0, x2 + lsl x2, x2, #1 + mov x15, #192 + mov x8, x1 + sub x9, x1, x4 + add x10, x1, x4 + lsr w17, w6, #1 +1: ld1 {v3.d}[0], [ x8], x15 + ld1 {v4.d}[0], [ x9], x15 + ld1 {v5.d}[0], [x10], x15 + ld1 {v3.d}[1], [ x8], x15 + ld1 {v4.d}[1], [ x9], x15 + ld1 {v5.d}[1], [x10], x15 + cmhi v16.16b, v4.16b, v3.16b + cmhi v17.16b, v3.16b, v4.16b + cmhi v18.16b, v5.16b, v3.16b + cmhi v19.16b, v3.16b, v5.16b + sub v20.16b, v16.16b, v17.16b + sub v21.16b, v18.16b, v19.16b + add v20.16b, v20.16b, v21.16b + add v20.16b, v20.16b, v2.16b + tbl v16.16b, {v0.16b}, v20.16b + tbl v17.16b, {v1.16b}, v20.16b + uxtl v20.8h, v3.8b + uxtl2 v21.8h, v3.16b + zip1 v18.16b, v16.16b, v17.16b + zip2 v19.16b, v16.16b, v17.16b + sqadd v20.8h, v18.8h, v20.8h + sqadd v21.8h, v19.8h, v21.8h + sqxtun v6.8b, v20.8h + sqxtun v7.8b, v21.8h + st1 {v6.8b}, [ x0], x2 + st1 {v7.8b}, [x16], x2 + subs x17, x17, #1 + b.ne 1b + ret +endfunc