From patchwork Wed Mar 8 10:01:14 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 2816 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.50.79 with SMTP id y76csp953874vsy; Wed, 8 Mar 2017 02:04:50 -0800 (PST) X-Received: by 10.223.173.76 with SMTP id p70mr4227449wrc.168.1488967490005; Wed, 08 Mar 2017 02:04:50 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s11si3678979wra.190.2017.03.08.02.04.49; Wed, 08 Mar 2017 02:04:49 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 33923688321; Wed, 8 Mar 2017 12:01:34 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf0-f53.google.com (mail-lf0-f53.google.com [209.85.215.53]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 53D206882FD for ; Wed, 8 Mar 2017 12:01:31 +0200 (EET) Received: by mail-lf0-f53.google.com with SMTP id j90so12389929lfk.2 for ; Wed, 08 Mar 2017 02:01:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=5lHEJKYSJ6oi3f05JVN+mh+0TRlgTD/XmwPDLWQuJdQ=; b=lXzFqLfxDExmtHn6/OL/epIS/gRpUARwpbg0iQgdZwc27E2N7qmXMSl6p2OWFcyPdj pxskD6envGv1M6ZSt7L3eJLwH4uex1MCK5qQxWChRMUB/Utb9pXA8f3WAt0fMhb16M0h v0zOvzaASaN0STa47LEd6DPbSJ2Ii6VaPZ09QHbCOP4DjRLLCR0ZnFiabLsgfhQGZPZC eBNbawW/w6soDau8daLxvXZaHTueJ4E+HE2ZNNSB9mWoCEaZvyWJelaFgTtz+0yXCbaW LNWkrBi8LegELR1Yi+UjtYyDZolLLuXytnHvUy7Jpb2K/2yajrtBvSuDhf/E8NaZkPZq csvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=5lHEJKYSJ6oi3f05JVN+mh+0TRlgTD/XmwPDLWQuJdQ=; b=h0PqHYIovq9eauK0DzkCDFm1ugBqFGWTdpvLg59G4Rd7HlHAC3tU5sZ9I1uK8Dwmv+ rlSsebDzNJjijoFg8UrnY1seia6I/15veiI/CTTnKSj547TgegXxxfs5dBfcJbr2T2Ry 2hL5mRngjO40Rd59gWUvzfjSESzAXxwz1pyENbDMh+Gqkyg4k2bmPn/1FFLs7fnqrnHw KHvyyFOIauue8FN/+P95EJ6htLz4ajuG436HuArqoQ/vICaeTqgCE3443Mrb55h20vDu WS0FKB6/VxRsuMND8wYXU9NnjyXdz985ZjSLvvzvHOD8wtpctuds49h8+2txaTmHKzDU MBqQ== X-Gm-Message-State: AMke39lWyVs8ZodkVHj5pdKR6ylFbAn66sqrIh2z/Dq/BOyv7+O7xi7cwgMjAznXLiHSPw== X-Received: by 10.46.88.29 with SMTP id m29mr1835094ljb.91.1488967304146; Wed, 08 Mar 2017 02:01:44 -0800 (PST) Received: from localhost.localdomain ([2001:470:28:852:7d47:68e:13e8:4933]) by smtp.gmail.com with ESMTPSA id m127sm513064lfg.58.2017.03.08.02.01.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 08 Mar 2017 02:01:43 -0800 (PST) From: =?UTF-8?q?Martin=20Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 8 Mar 2017 12:01:14 +0200 Message-Id: <1488967274-8143-34-git-send-email-martin@martin.st> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1488967274-8143-1-git-send-email-martin@martin.st> References: <1488967274-8143-1-git-send-email-martin@martin.st> Subject: [FFmpeg-devel] [PATCH 34/34] aarch64: vp9itxfm: Reorder iadst16 coeffs X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This matches the order they are in the 16 bpp version. There they are in this order, to make sure we access them in the same order they are declared, easing loading only half of the coefficients at a time. This makes the 8 bpp version match the 16 bpp version better. This is cherrypicked from libav commit b8f66c0838b4c645227f23a35b4d54373da4c60a. --- libavcodec/aarch64/vp9itxfm_neon.S | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libavcodec/aarch64/vp9itxfm_neon.S b/libavcodec/aarch64/vp9itxfm_neon.S index 31c6e3c..2c3c002 100644 --- a/libavcodec/aarch64/vp9itxfm_neon.S +++ b/libavcodec/aarch64/vp9itxfm_neon.S @@ -37,8 +37,8 @@ idct_coeffs: endconst const iadst16_coeffs, align=4 - .short 16364, 804, 15893, 3981, 14811, 7005, 13160, 9760 - .short 11003, 12140, 8423, 14053, 5520, 15426, 2404, 16207 + .short 16364, 804, 15893, 3981, 11003, 12140, 8423, 14053 + .short 14811, 7005, 13160, 9760, 5520, 15426, 2404, 16207 endconst // out1 = ((in1 + in2) * v0[0] + (1 << 13)) >> 14 @@ -628,19 +628,19 @@ function iadst16 ld1 {v0.8h,v1.8h}, [x11] dmbutterfly_l v6, v7, v4, v5, v31, v16, v0.h[1], v0.h[0] // v6,v7 = t1, v4,v5 = t0 - dmbutterfly_l v10, v11, v8, v9, v23, v24, v1.h[1], v1.h[0] // v10,v11 = t9, v8,v9 = t8 + dmbutterfly_l v10, v11, v8, v9, v23, v24, v0.h[5], v0.h[4] // v10,v11 = t9, v8,v9 = t8 dbutterfly_n v31, v24, v6, v7, v10, v11, v12, v13, v10, v11 // v31 = t1a, v24 = t9a dmbutterfly_l v14, v15, v12, v13, v29, v18, v0.h[3], v0.h[2] // v14,v15 = t3, v12,v13 = t2 dbutterfly_n v16, v23, v4, v5, v8, v9, v6, v7, v8, v9 // v16 = t0a, v23 = t8a - dmbutterfly_l v6, v7, v4, v5, v21, v26, v1.h[3], v1.h[2] // v6,v7 = t11, v4,v5 = t10 + dmbutterfly_l v6, v7, v4, v5, v21, v26, v0.h[7], v0.h[6] // v6,v7 = t11, v4,v5 = t10 dbutterfly_n v29, v26, v14, v15, v6, v7, v8, v9, v6, v7 // v29 = t3a, v26 = t11a - dmbutterfly_l v10, v11, v8, v9, v27, v20, v0.h[5], v0.h[4] // v10,v11 = t5, v8,v9 = t4 + dmbutterfly_l v10, v11, v8, v9, v27, v20, v1.h[1], v1.h[0] // v10,v11 = t5, v8,v9 = t4 dbutterfly_n v18, v21, v12, v13, v4, v5, v6, v7, v4, v5 // v18 = t2a, v21 = t10a dmbutterfly_l v14, v15, v12, v13, v19, v28, v1.h[5], v1.h[4] // v14,v15 = t13, v12,v13 = t12 dbutterfly_n v20, v28, v10, v11, v14, v15, v4, v5, v14, v15 // v20 = t5a, v28 = t13a - dmbutterfly_l v6, v7, v4, v5, v25, v22, v0.h[7], v0.h[6] // v6,v7 = t7, v4,v5 = t6 + dmbutterfly_l v6, v7, v4, v5, v25, v22, v1.h[3], v1.h[2] // v6,v7 = t7, v4,v5 = t6 dbutterfly_n v27, v19, v8, v9, v12, v13, v10, v11, v12, v13 // v27 = t4a, v19 = t12a dmbutterfly_l v10, v11, v8, v9, v17, v30, v1.h[7], v1.h[6] // v10,v11 = t15, v8,v9 = t14