From patchwork Thu Jun 22 10:30:54 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?Q2zDqW1lbnQgQsWTc2No?= X-Patchwork-Id: 4074 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.22.4 with SMTP id 4csp2307293vsw; Thu, 22 Jun 2017 03:31:09 -0700 (PDT) X-Received: by 10.223.169.138 with SMTP id b10mr1498344wrd.29.1498127469812; Thu, 22 Jun 2017 03:31:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1498127469; cv=none; d=google.com; s=arc-20160816; b=QVMdt52trH5mGlqXb3/lT9Qof4VKGPBN6U5MnlB3dPgVcAk9MOUUfe+If7zD+pWUDr /a8e+763OFbkVTCzxxzyl10ANbOG/IP/lY7Sofrp3vaztBtvfqnRz1yZxxMw1A64Jv5q EWXZnjFNwlxywss5dub0ZfbPqeaBjNh9zpV/9CMjOSxmB7JFfUfs4hyeMJKJOMuRspWj RGz4GymJfvwSPhuGHCsfQWWs8wxkGrKStQcvyT+Bw7JdzKADFpSq4Uari9/prYiM88XT fkmmAkSr+H//GZ9yKAgV4utX+4OSwuRYd+HJ6dgtUfUmLX1jf6YppmxBTvFt2dHkPUfn 3/+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :delivered-to:arc-authentication-results; bh=WrWhQkCIYw7pUIYZANtze+yqu6BmjxUX1SapGCsu0kE=; b=ogDQJeEZViBcslVG7aUII88rkru2n5Ro7d8YC+V7qLKOf7ulSskM7akfQaQN+XYX9S hZgWm/RsNgDxZMEejZhd2/BCtaaHMsEYouGRO2u//hbr8dCknWxRTZP4UYRG2IfTLqnF zDJuyw/itCZuj0rAQ7FHt5G+GVubtawSwaCfh9RXNbc75j1+MhYy1OC+hdpJYErfi2if 48y3yDjbK43NatiTPBIwE6d7h6kvDSRSTfOpzjb82+eOIarZhU7uHvNim7TTabCngYjn uOxVT6Ha/JQn9zdKtj2wUaDnaGdQxBNItwynckSkYOb9FrODIdI5H+W3NC26DskFGog1 9iVQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id j14si527353wre.45.2017.06.22.03.31.09; Thu, 22 Jun 2017 03:31:09 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BA09668A3B5; Thu, 22 Jun 2017 13:31:06 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from golem.pkh.me (LStLambert-657-1-117-164.w92-154.abo.wanadoo.fr [92.154.28.164]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4FBA468A320 for ; Thu, 22 Jun 2017 13:31:00 +0300 (EEST) Received: from localhost (golem.pkh.me [local]) by golem.pkh.me (OpenSMTPD) with ESMTPA id 02ffb561; Thu, 22 Jun 2017 10:30:59 +0000 (UTC) From: =?UTF-8?q?Cl=C3=A9ment=20B=C5=93sch?= To: ffmpeg-devel@ffmpeg.org Date: Thu, 22 Jun 2017 12:30:54 +0200 Message-Id: <20170622103058.6855-1-u@pkh.me> X-Mailer: git-send-email 2.13.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/5] lavc/arm: fix lack of precision in ff_ps_stereo_interpolate_neon X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?UTF-8?q?Cl=C3=A9ment=20B=C5=93sch?= Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Clément Bœsch The code originally pre-multiply by 2 the steps, causing the running sum of the h factors to drift away due to the lack of precision. It quickly causes an inaccuracy > 0.01. I tried diverse approaches such as multiply by 2.0 (instead of adding the value itself) without success. I'm unable to bench the impact of this change, feel free to compare. This commit fixes the incoming aacpsdsp tests. Following is an alternative simplified function (matching the incoming AArch64 code) that may be used: function ff_ps_stereo_interpolate_neon, export=1 vld1.32 {q0}, [r2] vld1.32 {q1}, [r3] ldr r12, [sp] vmov.f32 q8, q0 vmov.f32 q9, q1 vzip.32 q8, q0 vzip.32 q9, q1 1: vld1.32 {d4}, [r0,:64] vld1.32 {d6}, [r1,:64] vadd.f32 q8, q8, q9 vadd.f32 q0, q0, q1 vmov.f32 d5, d4 vmov.f32 d7, d6 vmul.f32 q2, q2, q8 vmla.f32 q2, q3, q0 vst1.32 {d4}, [r0,:64]! vst1.32 {d5}, [r1,:64]! subs r12, r12, #1 bgt 1b bx lr endfunc --- libavcodec/arm/aacpsdsp_neon.S | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/libavcodec/arm/aacpsdsp_neon.S b/libavcodec/arm/aacpsdsp_neon.S index a93bbfea9c..3b1bed2aa7 100644 --- a/libavcodec/arm/aacpsdsp_neon.S +++ b/libavcodec/arm/aacpsdsp_neon.S @@ -232,12 +232,11 @@ endfunc function ff_ps_stereo_interpolate_neon, export=1 vld1.32 {q0}, [r2] vld1.32 {q14}, [r3] - vadd.f32 q15, q14, q14 mov r2, r0 mov r3, r1 ldr r12, [sp] vadd.f32 q1, q0, q14 - vadd.f32 q0, q0, q15 + vadd.f32 q0, q1, q14 vld1.32 {q2}, [r0,:64]! vld1.32 {q3}, [r1,:64]! subs r12, r12, #1 @@ -251,8 +250,10 @@ function ff_ps_stereo_interpolate_neon, export=1 vmla.f32 d17, d7, d1[0] vmla.f32 d18, d6, d3[1] vmla.f32 d19, d7, d1[1] - vadd.f32 q1, q1, q15 - vadd.f32 q0, q0, q15 + vadd.f32 q1, q1, q14 + vadd.f32 q0, q0, q14 + vadd.f32 q1, q1, q14 + vadd.f32 q0, q0, q14 vld1.32 {q2}, [r0,:64]! vld1.32 {q3}, [r1,:64]! vst1.32 {q8}, [r2,:64]!