From patchwork Wed Mar 8 10:01:07 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 2810 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.50.79 with SMTP id y76csp953484vsy; Wed, 8 Mar 2017 02:03:45 -0800 (PST) X-Received: by 10.223.173.199 with SMTP id w65mr4480805wrc.125.1488967425207; Wed, 08 Mar 2017 02:03:45 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id o18si3681839wrc.152.2017.03.08.02.03.44; Wed, 08 Mar 2017 02:03:45 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5A1736882F5; Wed, 8 Mar 2017 12:01:27 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf0-f67.google.com (mail-lf0-f67.google.com [209.85.215.67]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4028C6882AE for ; Wed, 8 Mar 2017 12:01:25 +0200 (EET) Received: by mail-lf0-f67.google.com with SMTP id g70so1985454lfh.3 for ; Wed, 08 Mar 2017 02:01:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=mhrZpKD++rsgpmsRpSV2YvmlZ9Uhem+CF7iSglwuU3o=; b=fMYMNPnwqCcIDB+A8l/AjSvyu+A9cxwATAwaD5DhA+t3DJsPkIes7175bndbMDb/nh WLb2wxJnYuMVwG6qHTObJvCAMgkYUD07v9xdQtEwOi+1g7eehOrDUURt0vVB7/qHQHMO eI6U5y/NTrk2YPraNFFfzHYhIQgVJCJ2D1UFl1LXJBORtoPJmqqp5/GCIzjRgphmjNdy ghQZNnRrKUPsq4st3kBw6Hss7HyocCRXx3tzpxe7O0s3ccxkoSEhlHTn6Fa78ntDSSaB sSCJfccivSYkv8YR5Nl1js5jwadef4nB4wAxuvwnMwxkDt/odCgZuJ6xXbeqgTbuSsJt ZCjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=mhrZpKD++rsgpmsRpSV2YvmlZ9Uhem+CF7iSglwuU3o=; b=DvU+285Zyx1cOaG1vlY/+ktRTnt6biFdmTXSEeR/zy/JEgN4aYJ6o7mK+G2/L+tAuN 6trKRKQ0T09KOxHGBQFpnrxT71R0JDGAEmyieEd97voeB4Qs6HYsWij2F16OiIMNth3Y tgdP6YIkoM02DnA1zLLtexLgDdfnAwZSHlknN7qKr3HmH3S0/1pgb59gTUwhVKIwt0yZ cmZt80Ba6n5U+v22j8c5enOT+IUFrs9fvrkNM3WDNyL/twrprmcgMlMbL5WVtqPKdySt iss4gdqJ7ikATyUfi5Ls2521vPxUdl9oVXOogMi3UNZttKDZLtAWbxjcjSO8JxTtExjr Fq9A== X-Gm-Message-State: AMke39lx8V17MuYBBr3GVFbRyoiE5oAVJryZHLzAA//Yp13FOPmeGwOKEtL8p9ATSg/Wsw== X-Received: by 10.25.221.132 with SMTP id w4mr1303966lfi.59.1488967298047; Wed, 08 Mar 2017 02:01:38 -0800 (PST) Received: from localhost.localdomain ([2001:470:28:852:7d47:68e:13e8:4933]) by smtp.gmail.com with ESMTPSA id m127sm513064lfg.58.2017.03.08.02.01.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 08 Mar 2017 02:01:37 -0800 (PST) From: =?UTF-8?q?Martin=20Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 8 Mar 2017 12:01:07 +0200 Message-Id: <1488967274-8143-27-git-send-email-martin@martin.st> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1488967274-8143-1-git-send-email-martin@martin.st> References: <1488967274-8143-1-git-send-email-martin@martin.st> Subject: [FFmpeg-devel] [PATCH 27/34] aarch64: vp9lpf: Use dup+rev16+uzp1 instead of dup+lsr+dup+trn1 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This is one cycle faster in total, and three instructions fewer. Before: vp9_loop_filter_mix2_v_44_16_neon: 123.2 After: vp9_loop_filter_mix2_v_44_16_neon: 122.2 This is cherrypicked from libav commit 3bf9c48320f25f3d5557485b0202f22ae60748b0. --- libavcodec/aarch64/vp9lpf_neon.S | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/libavcodec/aarch64/vp9lpf_neon.S b/libavcodec/aarch64/vp9lpf_neon.S index a9eea7f..0878763 100644 --- a/libavcodec/aarch64/vp9lpf_neon.S +++ b/libavcodec/aarch64/vp9lpf_neon.S @@ -162,18 +162,15 @@ dup v2\sz, w3 // I dup v3\sz, w4 // H .else - dup v0.8b, w2 // E - dup v2.8b, w3 // I - dup v3.8b, w4 // H - lsr w5, w2, #8 - lsr w6, w3, #8 - lsr w7, w4, #8 - dup v1.8b, w5 // E - dup v4.8b, w6 // I - dup v5.8b, w7 // H - trn1 v0.2d, v0.2d, v1.2d - trn1 v2.2d, v2.2d, v4.2d - trn1 v3.2d, v3.2d, v5.2d + dup v0.8h, w2 // E + dup v2.8h, w3 // I + dup v3.8h, w4 // H + rev16 v1.16b, v0.16b // E + rev16 v4.16b, v2.16b // I + rev16 v5.16b, v3.16b // H + uzp1 v0.16b, v0.16b, v1.16b + uzp1 v2.16b, v2.16b, v4.16b + uzp1 v3.16b, v3.16b, v5.16b .endif uabd v4\sz, v20\sz, v21\sz // abs(p3 - p2)