From patchwork Wed Mar 8 10:01:00 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 2830 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.50.79 with SMTP id y76csp955646vsy; Wed, 8 Mar 2017 02:09:53 -0800 (PST) X-Received: by 10.28.93.68 with SMTP id r65mr5095629wmb.133.1488967793756; Wed, 08 Mar 2017 02:09:53 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 138si14555555wmk.28.2017.03.08.02.09.53; Wed, 08 Mar 2017 02:09:53 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3D78D68833A; Wed, 8 Mar 2017 12:09:29 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf0-f68.google.com (mail-lf0-f68.google.com [209.85.215.68]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 015CA6882ED for ; Wed, 8 Mar 2017 12:09:27 +0200 (EET) Received: by mail-lf0-f68.google.com with SMTP id r36so2022069lfi.0 for ; Wed, 08 Mar 2017 02:09:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=/yO8HrkWTqw4iijxSlOz/7JYnItVBMtPWQzsn41SCqY=; b=OP3tGCE5CNa1U5mwGiY9CYauv6sC7WTJrezwXyiZtJCbYH2zlOHZqixdcHnNg9jeEd XksuWa9TIphS8q82/wRX5uPQbKwzEDgk045JExbByGMvAp2WBhIl8Z/Zmrkp6GN3MSUW ko4qZWLHiIVGfqzQz1CBOII9GSsSyOWUvDdtYGm6HOBkkR3sqMpocjkNsOgQRBQLbJ1j 2lQaON1ECmNQV2xC8QOdIv5aG90jh5b7ZX+F63GkXp4MpKR47ThhUASDmFKjP88BKMKq Mad92JRvrwHUW9DBSKajjrNFkyAjB2w/zPmbJALX8OxzBZMZt7EE707oHxy84m00htBj GNjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=/yO8HrkWTqw4iijxSlOz/7JYnItVBMtPWQzsn41SCqY=; b=gws7wPD3Pi43JzPzuhfMXRaNmvZvCOjE8GzXBpYj7UNCT7KsTQhlWpC5ki2cC5Cn7u 87+XVVomIeaX/yLvjOATRjoUe1MEaYAf7OWb4pa0+0EDggv7lfbFEB+J7DXs5y3ljncH yQI1mNLk1bJUJWT/PmBfHjCc0ctheLF5e5GSHi4kAvgpQZdvGkrMdHO+oVUbkiYFYShD PCHPL3Ah69TZjwV1f7tV43FM3wjgU1Qf1RXfghITL+IJorL7NAxcAPxD5/k3vqtYR0PA kC8RhQYtNzqkyQYO027MJW7r8PSdFG6N+LcFXSuf7UgqGQjM2MXzVvzTyH4CtLN07PQn YLfA== X-Gm-Message-State: AMke39kb5zGwGjKtLW/6DxMVmscy6DXMWXd0T3q9H1KfAMCBoQpIoQ/jVQU6va1bIeVt8w== X-Received: by 10.46.14.1 with SMTP id 1mr1842083ljo.60.1488967292415; Wed, 08 Mar 2017 02:01:32 -0800 (PST) Received: from localhost.localdomain ([2001:470:28:852:7d47:68e:13e8:4933]) by smtp.gmail.com with ESMTPSA id m127sm513064lfg.58.2017.03.08.02.01.31 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 08 Mar 2017 02:01:32 -0800 (PST) From: =?UTF-8?q?Martin=20Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 8 Mar 2017 12:01:00 +0200 Message-Id: <1488967274-8143-20-git-send-email-martin@martin.st> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1488967274-8143-1-git-send-email-martin@martin.st> References: <1488967274-8143-1-git-send-email-martin@martin.st> Subject: [FFmpeg-devel] [PATCH 20/34] arm/aarch64: vp9lpf: Calculate !hev directly X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Previously we first calculated hev, and then negated it. Since we were able to schedule the negation in the middle of another calculation, we don't see any gain in all cases. Before: Cortex A7 A8 A9 A53 A53/AArch64 vp9_loop_filter_v_4_8_neon: 147.0 129.0 115.8 89.0 88.7 vp9_loop_filter_v_8_8_neon: 242.0 198.5 174.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 500.0 419.5 382.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 971.2 825.5 731.5 579.0 453.0 After: vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7 vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0 This is cherrypicked from libav commit e1f9de86f454861b69b199ad801adc2ec6c3b220. --- libavcodec/aarch64/vp9lpf_neon.S | 5 ++--- libavcodec/arm/vp9lpf_neon.S | 5 ++--- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/libavcodec/aarch64/vp9lpf_neon.S b/libavcodec/aarch64/vp9lpf_neon.S index 55e1964..7fe2c88 100644 --- a/libavcodec/aarch64/vp9lpf_neon.S +++ b/libavcodec/aarch64/vp9lpf_neon.S @@ -292,7 +292,7 @@ .if \mix != 0 sxtl v1.8h, v1.8b .endif - cmhi v5\sz, v5\sz, v3\sz // hev + cmhs v5\sz, v3\sz, v5\sz // !hev .if \wd == 8 // If a 4/8 or 8/4 mix is used, clear the relevant half of v6 .if \mix != 0 @@ -306,11 +306,10 @@ .elseif \wd == 8 bic v4\sz, v4\sz, v6\sz // fm && !flat8in .endif - mvn v5\sz, v5\sz // !hev + and v5\sz, v5\sz, v4\sz // !hev && fm && !flat8in .if \wd == 16 and v7\sz, v7\sz, v6\sz // flat8out && flat8in && fm .endif - and v5\sz, v5\sz, v4\sz // !hev && fm && !flat8in mul_sz \tmp3\().8h, \tmp4\().8h, \tmp3\().8h, \tmp4\().8h, \tmp5\().8h, \tmp5\().8h, \sz // 3 * (q0 - p0) bic \tmp1\sz, \tmp1\sz, v5\sz // if (!hev) av_clip_int8 = 0 diff --git a/libavcodec/arm/vp9lpf_neon.S b/libavcodec/arm/vp9lpf_neon.S index e96f4db..2761956 100644 --- a/libavcodec/arm/vp9lpf_neon.S +++ b/libavcodec/arm/vp9lpf_neon.S @@ -141,7 +141,7 @@ .if \wd == 8 vcle.u8 d6, d6, d0 @ flat8in .endif - vcgt.u8 d5, d5, d3 @ hev + vcle.u8 d5, d5, d3 @ !hev .if \wd == 8 vand d6, d6, d4 @ flat8in && fm .endif @@ -151,11 +151,10 @@ .elseif \wd == 8 vbic d4, d4, d6 @ fm && !flat8in .endif - vmvn d5, d5 @ !hev + vand d5, d5, d4 @ !hev && fm && !flat8in .if \wd == 16 vand d7, d7, d6 @ flat8out && flat8in && fm .endif - vand d5, d5, d4 @ !hev && fm && !flat8in vmul.s16 \tmpq2, \tmpq2, \tmpq3 @ 3 * (q0 - p0) vbic \tmp1, \tmp1, d5 @ if (!hev) av_clip_int8 = 0