From patchwork Wed Mar 8 10:00:57 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 2806 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.50.79 with SMTP id y76csp953195vsy; Wed, 8 Mar 2017 02:03:00 -0800 (PST) X-Received: by 10.28.22.203 with SMTP id 194mr4730091wmw.22.1488967380337; Wed, 08 Mar 2017 02:03:00 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id i88si3685613wri.108.2017.03.08.02.02.59; Wed, 08 Mar 2017 02:03:00 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 622EF6882CE; Wed, 8 Mar 2017 12:01:20 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf0-f45.google.com (mail-lf0-f45.google.com [209.85.215.45]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3C43C688283 for ; Wed, 8 Mar 2017 12:01:17 +0200 (EET) Received: by mail-lf0-f45.google.com with SMTP id y193so12357120lfd.3 for ; Wed, 08 Mar 2017 02:01:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=r7rIgZ67bBkJOOd8vkQHg1tzhCIeHEHwuzlEmrDzaP4=; b=U/h5btLfHqXZGB2p6/O8eiwlP9zPULaCjjsWaADr9iiELLfQQcITKp6g0vwfWaBHP9 VbDDCL4I+HPBYR9YTfPOA59MO7Uh7eAXaph6Gnf4GYyTl83F1mDiFvBlfye0lYYbuZcc slZSGChZjjaZZ1LlI0zh0zcp45XGkyWLJ9xNCFHp6pZoesaNv+seS5LRG0KrKYvcfqiJ S+lF4FQrXvMm48/xs5skK8rbUenAuKo0U81915zv1Ij3NpOKAIbjuBxOmDvT44KKvleI UcQYBJ3EThp/HzWxLFf/LpZzF1nv7ijHDeo5yp/eGzKhWP0paCfXBi4Vnlez7+3BdMY0 X/fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=r7rIgZ67bBkJOOd8vkQHg1tzhCIeHEHwuzlEmrDzaP4=; b=X870Kxi7w7GvjK34EPhMO9Jkzj7iLgeL+LR/MMGEIYH5zdzaosg/q+2pn2ovtS3I/o q8Viut9vHrg8K0ylJjAEZqEbN7ucU5O5UVdXSe83Mkk5glS5RWC9aUZGKWv0vadBAnHj z4Mg9v1tvdPPnVFdc7MXDtm0KMHdrS0KOe611nyBVVGiZuNfnJwHYtdU6V20p+4fymsv pPO3u1CsecI3XVI30kPHrKQoVU2ke/cpipwt/qnbCDkzI8YuX1is/Yo0EZP1b2D0rZtT 5nJ0cgvfL9c/PkP+AUOG5SX/np+EaVWIJPOTOx8uGhdds8LgHlHM/sGS+HhoS8d2JX4y cQEQ== X-Gm-Message-State: AMke39ljouztbvFJEC8ai64V5X51RpUmJlqhzjVtyIguWT24vUvz675F/RmmvgFVqljJDg== X-Received: by 10.46.81.18 with SMTP id f18mr1747188ljb.136.1488967290056; Wed, 08 Mar 2017 02:01:30 -0800 (PST) Received: from localhost.localdomain ([2001:470:28:852:7d47:68e:13e8:4933]) by smtp.gmail.com with ESMTPSA id m127sm513064lfg.58.2017.03.08.02.01.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 08 Mar 2017 02:01:29 -0800 (PST) From: =?UTF-8?q?Martin=20Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 8 Mar 2017 12:00:57 +0200 Message-Id: <1488967274-8143-17-git-send-email-martin@martin.st> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1488967274-8143-1-git-send-email-martin@martin.st> References: <1488967274-8143-1-git-send-email-martin@martin.st> Subject: [FFmpeg-devel] [PATCH 17/34] aarch64: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" No measured speedup on a Cortex A53, but other cores might benefit. This is cherrypicked from libav commit 388e0d2515bc6bbc9d0c9af1d230bd16cf945fe7. --- libavcodec/aarch64/vp9mc_neon.S | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/libavcodec/aarch64/vp9mc_neon.S b/libavcodec/aarch64/vp9mc_neon.S index 9403911..82a0f53 100644 --- a/libavcodec/aarch64/vp9mc_neon.S +++ b/libavcodec/aarch64/vp9mc_neon.S @@ -202,9 +202,12 @@ endfunc ext v23.16b, \src5\().16b, \src6\().16b, #(2*\offset) mla \dst2\().8h, v21.8h, v0.h[\offset] mla \dst4\().8h, v23.8h, v0.h[\offset] -.else +.elseif \size == 8 mla \dst1\().8h, v20.8h, v0.h[\offset] mla \dst3\().8h, v22.8h, v0.h[\offset] +.else + mla \dst1\().4h, v20.4h, v0.h[\offset] + mla \dst3\().4h, v22.4h, v0.h[\offset] .endif .endm // The same as above, but don't accumulate straight into the @@ -219,16 +222,24 @@ endfunc ext v23.16b, \src5\().16b, \src6\().16b, #(2*\offset) mul v21.8h, v21.8h, v0.h[\offset] mul v23.8h, v23.8h, v0.h[\offset] -.else +.elseif \size == 8 mul v20.8h, v20.8h, v0.h[\offset] mul v22.8h, v22.8h, v0.h[\offset] +.else + mul v20.4h, v20.4h, v0.h[\offset] + mul v22.4h, v22.4h, v0.h[\offset] .endif +.if \size == 4 + sqadd \dst1\().4h, \dst1\().4h, v20.4h + sqadd \dst3\().4h, \dst3\().4h, v22.4h +.else sqadd \dst1\().8h, \dst1\().8h, v20.8h sqadd \dst3\().8h, \dst3\().8h, v22.8h .if \size >= 16 sqadd \dst2\().8h, \dst2\().8h, v21.8h sqadd \dst4\().8h, \dst4\().8h, v23.8h .endif +.endif .endm