From patchwork Wed Mar 8 10:01:02 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 2821 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.50.79 with SMTP id y76csp954768vsy; Wed, 8 Mar 2017 02:07:18 -0800 (PST) X-Received: by 10.28.156.195 with SMTP id f186mr4612148wme.40.1488967638104; Wed, 08 Mar 2017 02:07:18 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id b109si3663359wrd.317.2017.03.08.02.07.17; Wed, 08 Mar 2017 02:07:18 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3CED86882E2; Wed, 8 Mar 2017 12:06:57 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf0-f66.google.com (mail-lf0-f66.google.com [209.85.215.66]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4D5AE688252 for ; Wed, 8 Mar 2017 12:06:51 +0200 (EET) Received: by mail-lf0-f66.google.com with SMTP id y193so2007934lfd.1 for ; Wed, 08 Mar 2017 02:07:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=BpPijP58R9XGBPB9WfQtHI/mfx1o+EmB0pe/wRgYOXU=; b=XKWvCcyGa9NyXjc9De8NaEr2nzF+a4a5Ee+OqKMSU4HGFs/xAgP1iltLrI6xWy8G3K rcck4g3XZKTyRRRVkMcX8JWkR6a3FOUCVUNXR5gvt52i/LcGFPKOt7NlFV3ruoyOc+1y 0cmNCeAt3Enfy9TPFs8wcW72G6Fj5Hiq8z9wbirt5IQQ9m2vt4HmueSNpK2B0cXoBFY2 wSEd13HhHEcHXoIPuJe0uy2F0/3q/ardY5ofme2nv3/o0+CfDpFuDalIEt9vhhuADY35 j8qJ53Z3H7bj870qO4x6+QKqSvA1wbTlUemUaSIQnIvVcsyeHDxOB4fx0JEcG7D62p+v IleQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=BpPijP58R9XGBPB9WfQtHI/mfx1o+EmB0pe/wRgYOXU=; b=lZZqzyZ4W3WL8FjBOdolqpKgn8wpI7IW2q8qn0TERcGnpUFhi8sHIFc7NAW9DrjPoD GxCgJ/7JrecNfoZpU1grq2hjQXLkTnjsMhGCATSKg/9tj79g4O6UAbQtOI/0rpAqsd76 3B/y0lJJfzCpY0iMJNM/xBjbbF4h7Fu4VkBoXjml/0z3RA8WTg3nKE9SV+2Jv3LdR73y 7Lwv9Nz5Gx2uFUEN7HeG5IkbkTBgB/Adj8Ha9z9Q68pGiT1QXpcV4tzeh05dFtd4PI9h flweBYaW4SN7OJB/oJmj6mooCq5/9K0RdQ3v7oVc5IcjTyPT1JAg+HLgZzUtLsJMskPV bXxA== X-Gm-Message-State: AMke39kAdkMUu0tjaFeSuPgleGfIr/Wl0q4SJ65rtUcEio/HN5l+/1G7qXyR4yGuLcnUJA== X-Received: by 10.25.225.79 with SMTP id y76mr1284930lfg.96.1488967293919; Wed, 08 Mar 2017 02:01:33 -0800 (PST) Received: from localhost.localdomain ([2001:470:28:852:7d47:68e:13e8:4933]) by smtp.gmail.com with ESMTPSA id m127sm513064lfg.58.2017.03.08.02.01.33 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 08 Mar 2017 02:01:33 -0800 (PST) From: =?UTF-8?q?Martin=20Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 8 Mar 2017 12:01:02 +0200 Message-Id: <1488967274-8143-22-git-send-email-martin@martin.st> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1488967274-8143-1-git-send-email-martin@martin.st> References: <1488967274-8143-1-git-send-email-martin@martin.st> Subject: [FFmpeg-devel] [PATCH 22/34] arm: vp9lpf: Interleave the start of flat8in into the calculation above X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This adds lots of extra .ifs, but speeds it up by a couple cycles, by avoiding stalls. This is cherrypicked from libav commit e18c39005ad1dbb178b336f691da1de91afd434e. --- libavcodec/arm/vp9lpf_neon.S | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/libavcodec/arm/vp9lpf_neon.S b/libavcodec/arm/vp9lpf_neon.S index 3d289e5..b90c536 100644 --- a/libavcodec/arm/vp9lpf_neon.S +++ b/libavcodec/arm/vp9lpf_neon.S @@ -182,16 +182,20 @@ vmovl.u8 q0, d22 @ p1 vmovl.u8 q1, d25 @ q1 +.if \wd >= 8 + vmov r2, r3, d6 +.endif vaddw.s8 q0, q0, \tmp3 @ p1 + f vsubw.s8 q1, q1, \tmp3 @ q1 - f +.if \wd >= 8 + orrs r2, r2, r3 +.endif vqmovun.s16 d0, q0 @ out p1 vqmovun.s16 d2, q1 @ out q1 vbit d22, d0, d5 @ if (!hev && fm && !flat8in) vbit d25, d2, d5 .if \wd >= 8 - vmov r2, r3, d6 - orrs r2, r2, r3 @ If no pixels need flat8in, jump to flat8out @ (or to a writeout of the inner 4 pixels, for wd=8) beq 6f