From patchwork Wed Mar 8 10:01:03 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 2808 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.50.79 with SMTP id y76csp953251vsy; Wed, 8 Mar 2017 02:03:09 -0800 (PST) X-Received: by 10.28.45.212 with SMTP id t203mr21244028wmt.37.1488967389012; Wed, 08 Mar 2017 02:03:09 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id b207si10853440wme.143.2017.03.08.02.03.08; Wed, 08 Mar 2017 02:03:08 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B2C856882D7; Wed, 8 Mar 2017 12:01:23 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf0-f51.google.com (mail-lf0-f51.google.com [209.85.215.51]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 88E116882AE for ; Wed, 8 Mar 2017 12:01:22 +0200 (EET) Received: by mail-lf0-f51.google.com with SMTP id j90so12388158lfk.2 for ; Wed, 08 Mar 2017 02:01:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=zEyhkmXyD+5YGBvLIPHP/ZgcQyEv0wbuPYY9Pfu4SCc=; b=heCCerVD4zOj+IpHL1GHfRSoKpYRiuZ12Iv6jL9wD1qQEM7r5aOcg2fyJbOuZd7BwJ V3LrbFKbmLTNtNRZ1b4Nm7pnCQbjhv0hz7uKQNryjNgVsjZ5HSI/xkVObR6dNEgtktAK kNHZPZ4AIGy1juMD7LOWU8F4Vv58uMVr+mJyOjnr3l4NKP0YcoZNm6wrImd9KZkWWhDP eVbR4QAvWBe6/2s5Ww4JN+SJaeZNwcqrbQf31DCdELDuAvIlyVefHiy5arL75BgeM49Q ca7+wugmGM61DGosojR+P4rZSrr57dkHCzGuqiq0xyIe7z4WmEhjtaHTzowK1E6ydqix JAcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=zEyhkmXyD+5YGBvLIPHP/ZgcQyEv0wbuPYY9Pfu4SCc=; b=mzUVxyzpazoTbCawfxZMsCSzGPhn7DekXJYDLD4xduWUjXm5uD5olspdB8Ac3LJrZu 8kGlBVBJ1c1JZuXdQKvCciVJkDkgNo0Njg9nmfySMfjcZgDOCTejBVk++koOttakEPdO PAwMTkoiNQewxmpu/ag5Dds7nNjFyQfVIcPaSwtEDZA9hT8wCofAZ0PDyUcQiPq7kwOy 2NweSwaips2PKhop79KD73oVi1hOQIoy1CfBwf+kJKOjOpK640fxU3qZj2cjdGRE/sOM YyXK2sKZ2r0wrtHUrzgMAw+t/vrx1PJD+7ScZAp8Er13V2R24lxTW9+Ijx2SIO8qPzPd QzDw== X-Gm-Message-State: AMke39mtIXwYnWAbhIiPt6bwpaQDUhYQW5BxYAgclPxIqhwYL9Rxr8XcZRDJ/ldGcpkyTg== X-Received: by 10.25.18.95 with SMTP id h92mr1291247lfi.63.1488967294882; Wed, 08 Mar 2017 02:01:34 -0800 (PST) Received: from localhost.localdomain ([2001:470:28:852:7d47:68e:13e8:4933]) by smtp.gmail.com with ESMTPSA id m127sm513064lfg.58.2017.03.08.02.01.33 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 08 Mar 2017 02:01:34 -0800 (PST) From: =?UTF-8?q?Martin=20Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 8 Mar 2017 12:01:03 +0200 Message-Id: <1488967274-8143-23-git-send-email-martin@martin.st> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1488967274-8143-1-git-send-email-martin@martin.st> References: <1488967274-8143-1-git-send-email-martin@martin.st> Subject: [FFmpeg-devel] [PATCH 23/34] aarch64: vp9lpf: Interleave the start of flat8in into the calculation above X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This adds lots of extra .ifs, but speeds it up by a couple cycles, by avoiding stalls. This is cherrypicked from libav commit b0806088d3b27044145b20421da8d39089ae0c6a. --- libavcodec/aarch64/vp9lpf_neon.S | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/libavcodec/aarch64/vp9lpf_neon.S b/libavcodec/aarch64/vp9lpf_neon.S index 7fe2c88..cd3e26c 100644 --- a/libavcodec/aarch64/vp9lpf_neon.S +++ b/libavcodec/aarch64/vp9lpf_neon.S @@ -338,20 +338,28 @@ uxtl_sz v0.8h, v1.8h, v22, \sz // p1 uxtl_sz v2.8h, v3.8h, v25, \sz // q1 +.if \wd >= 8 + mov x5, v6.d[0] +.ifc \sz, .16b + mov x6, v6.d[1] +.endif +.endif saddw_sz v0.8h, v1.8h, v0.8h, v1.8h, \tmp3, \sz // p1 + f ssubw_sz v2.8h, v3.8h, v2.8h, v3.8h, \tmp3, \sz // q1 - f sqxtun_sz v0, v0.8h, v1.8h, \sz // out p1 sqxtun_sz v2, v2.8h, v3.8h, \sz // out q1 +.if \wd >= 8 +.ifc \sz, .16b + adds x5, x5, x6 +.endif +.endif bit v22\sz, v0\sz, v5\sz // if (!hev && fm && !flat8in) bit v25\sz, v2\sz, v5\sz // If no pixels need flat8in, jump to flat8out // (or to a writeout of the inner 4 pixels, for wd=8) .if \wd >= 8 - mov x5, v6.d[0] .ifc \sz, .16b - mov x6, v6.d[1] - adds x5, x5, x6 b.eq 6f .else cbz x5, 6f