From patchwork Wed Mar 8 10:01:13 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 2818 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.50.79 with SMTP id y76csp953816vsy; Wed, 8 Mar 2017 02:04:41 -0800 (PST) X-Received: by 10.28.194.7 with SMTP id s7mr4699888wmf.34.1488967481103; Wed, 08 Mar 2017 02:04:41 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id z2si22558896wmz.77.2017.03.08.02.04.40; Wed, 08 Mar 2017 02:04:41 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BAE416882F9; Wed, 8 Mar 2017 12:01:31 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf0-f53.google.com (mail-lf0-f53.google.com [209.85.215.53]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 94FDF68830E for ; Wed, 8 Mar 2017 12:01:30 +0200 (EET) Received: by mail-lf0-f53.google.com with SMTP id j90so12389772lfk.2 for ; Wed, 08 Mar 2017 02:01:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=j111FmAt1pgEDtOdPNclZ42u2p1nYFaXzBSzeuABz8k=; b=u8je4qjlKYTS3LP25vu+xsQr1xKu18nN+OimWwpApAdHCexCCyrDQ2o9JedErPpj/l FJ90f1HeF9Kx7u8i8KWcc/Yf18yAe4XGvHcMRLf8nLCko+gU6ZIkaKSJ913BfVqMpddB wbzjDy+fWUVDjE3GatY8FyR6hrDjWkZB+7bBbuuERyx4uFsDWkqqkzxbQ0n3IsegtrxM 7GOPpQoRujgDmjFaCJi6IE2hLhAClZWJB8i+mZEmxC7dVfQFoNQHyzuR+3aue4vdS6up nb2Xw2dfg+sUvTPSI5ZitOyxzV1DOu2Zxw2tFujdaiWfWIaSxIWezAd0rcw1ARKXx1Av LeZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=j111FmAt1pgEDtOdPNclZ42u2p1nYFaXzBSzeuABz8k=; b=JtKntyvPc8W8wOL4KRbippYlUI8/nyXgUXHyy1KEe/CtJjS1n6SyQAfpGVE0tjz0+C cuy/CLF/4+xO8BJqwL/4xlhU2a+gAGmllPT9S5DrtI5a7z1TQT8tERSnqywBFJlrM0JZ B7ixOhHm26r2VR/5ZVyq7xzUkm6CQ4a1MZYjrwBut9OgR3/3X+CMqtyDnohqumtEXkc+ Zt4ZB5xt1+l2B34EZCE+f6Migl/gptHiHqiVoIu9+sbf2h//RitHebAoE80Be0MKnHzC swN2/IUdvA8DMbeuDtDHGpsCyEBsX+ow61sjcmK4eYyjD+kfuIomJUh2clw28X82kXUd FHGQ== X-Gm-Message-State: AMke39ld1M/jQIi7TDSz3zc/oZvNuqztE01XsFCU5eFLJ173dz3LaDoZz0Vz1ce/VGlXaw== X-Received: by 10.46.1.77 with SMTP id 74mr1831372ljb.67.1488967303407; Wed, 08 Mar 2017 02:01:43 -0800 (PST) Received: from localhost.localdomain ([2001:470:28:852:7d47:68e:13e8:4933]) by smtp.gmail.com with ESMTPSA id m127sm513064lfg.58.2017.03.08.02.01.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 08 Mar 2017 02:01:42 -0800 (PST) From: =?UTF-8?q?Martin=20Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 8 Mar 2017 12:01:13 +0200 Message-Id: <1488967274-8143-33-git-send-email-martin@martin.st> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1488967274-8143-1-git-send-email-martin@martin.st> References: <1488967274-8143-1-git-send-email-martin@martin.st> Subject: [FFmpeg-devel] [PATCH 33/34] arm: vp9itxfm: Reorder iadst16 coeffs X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This matches the order they are in the 16 bpp version. There they are in this order, to make sure we access them in the same order they are declared, easing loading only half of the coefficients at a time. This makes the 8 bpp version match the 16 bpp version better. This is cherrypicked from libav commit 08074c092d8c97d71c5986e5325e97ffc956119d. --- libavcodec/arm/vp9itxfm_neon.S | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libavcodec/arm/vp9itxfm_neon.S b/libavcodec/arm/vp9itxfm_neon.S index 05e31e6..ebbbda9 100644 --- a/libavcodec/arm/vp9itxfm_neon.S +++ b/libavcodec/arm/vp9itxfm_neon.S @@ -37,8 +37,8 @@ idct_coeffs: endconst const iadst16_coeffs, align=4 - .short 16364, 804, 15893, 3981, 14811, 7005, 13160, 9760 - .short 11003, 12140, 8423, 14053, 5520, 15426, 2404, 16207 + .short 16364, 804, 15893, 3981, 11003, 12140, 8423, 14053 + .short 14811, 7005, 13160, 9760, 5520, 15426, 2404, 16207 endconst @ Do four 4x4 transposes, using q registers for the subtransposes that don't @@ -678,19 +678,19 @@ function iadst16 vld1.16 {q0-q1}, [r12,:128] mbutterfly_l q3, q2, d31, d16, d0[1], d0[0] @ q3 = t1, q2 = t0 - mbutterfly_l q5, q4, d23, d24, d2[1], d2[0] @ q5 = t9, q4 = t8 + mbutterfly_l q5, q4, d23, d24, d1[1], d1[0] @ q5 = t9, q4 = t8 butterfly_n d31, d24, q3, q5, q6, q5 @ d31 = t1a, d24 = t9a mbutterfly_l q7, q6, d29, d18, d0[3], d0[2] @ q7 = t3, q6 = t2 butterfly_n d16, d23, q2, q4, q3, q4 @ d16 = t0a, d23 = t8a - mbutterfly_l q3, q2, d21, d26, d2[3], d2[2] @ q3 = t11, q2 = t10 + mbutterfly_l q3, q2, d21, d26, d1[3], d1[2] @ q3 = t11, q2 = t10 butterfly_n d29, d26, q7, q3, q4, q3 @ d29 = t3a, d26 = t11a - mbutterfly_l q5, q4, d27, d20, d1[1], d1[0] @ q5 = t5, q4 = t4 + mbutterfly_l q5, q4, d27, d20, d2[1], d2[0] @ q5 = t5, q4 = t4 butterfly_n d18, d21, q6, q2, q3, q2 @ d18 = t2a, d21 = t10a mbutterfly_l q7, q6, d19, d28, d3[1], d3[0] @ q7 = t13, q6 = t12 butterfly_n d20, d28, q5, q7, q2, q7 @ d20 = t5a, d28 = t13a - mbutterfly_l q3, q2, d25, d22, d1[3], d1[2] @ q3 = t7, q2 = t6 + mbutterfly_l q3, q2, d25, d22, d2[3], d2[2] @ q3 = t7, q2 = t6 butterfly_n d27, d19, q4, q6, q5, q6 @ d27 = t4a, d19 = t12a mbutterfly_l q5, q4, d17, d30, d3[3], d3[2] @ q5 = t15, q4 = t14