From patchwork Tue Dec 10 20:10:39 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= <martin@martin.st>
X-Patchwork-Id: 16699
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
X-Original-To: patchwork@ffaux-bg.ffmpeg.org
Delivered-To: patchwork@ffaux-bg.ffmpeg.org
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by ffaux.localdomain (Postfix) with ESMTP id 0835844708F
	for <patchwork@ffaux-bg.ffmpeg.org>;
	Tue, 10 Dec 2019 22:10:49 +0200 (EET)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E48B868B092;
	Tue, 10 Dec 2019 22:10:48 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com
	[209.85.208.196])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8402768AFB3
	for <ffmpeg-devel@ffmpeg.org>; Tue, 10 Dec 2019 22:10:42 +0200 (EET)
Received: by mail-lj1-f196.google.com with SMTP id 21so21383311ljr.0
	for <ffmpeg-devel@ffmpeg.org>; Tue, 10 Dec 2019 12:10:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=martin-st.20150623.gappssmtp.com; s=20150623;
	h=from:to:subject:date:message-id;
	bh=e8lvckJWjDsMXMMVEUYTlIJG35VUuKxNBedfZ80v7S4=;
	b=O24mK0shLl5NHpgt3Gw3qSyj8vnxq5a4sSZVqGauxCs3yo8UM0OZsPxcphWNWLFfOi
	FlNz+LFs9j+38wWNFTFV4VaEs/3YJT/JfUnfprjPuvIRmEDH3DStU2d5bszVvn/aECva
	IHVWplG6WkLpvw7Tzmr4VE+Q0lVTWqNYDc89yzvpIs+QbbbECwwD59utKF6nhDPr8Epm
	xuFTQ4PAeIE7XYHyMVG8FBgOqhx5ljB3WKg1oODzex9lm4tsfrvHecbxzuRHPq0AjAaS
	ITFxnkNiWpu0kKIbdKu4+XFLTOQKR6t0QW7oPI6HCjOFbl1jXYEDSuDhxUxis+UTAx80
	j6Rw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:subject:date:message-id;
	bh=e8lvckJWjDsMXMMVEUYTlIJG35VUuKxNBedfZ80v7S4=;
	b=EKDVeEzbdvBEII5GBcOCoPV4Cqy+LimLfdARlL7XSED+nnWCLKEQ1ELN9Ryhi4pwD0
	5mYakHVHqF+6ZjHDDlHiXy/ueyiEid+aYYzCEch9t86ex8bTHUOcl5YJ0zp0hINqul6v
	SmxMV7VZgC6950ssp/rmkPrjarzttWS8TbVgHQ+rAng6Ut945Etf019VRhfsN6OELsM1
	onWFBU6vMN/Y9EERSgFGkT2eIGw33eNz5jwLKbDEb20RecBEz5tap5lNoeO1xOzDLdd1
	M7zSzsXe6NyEnkR9iQWzJTuhP+jVVe/Yo5UuFgJAt+hI0eS579VFxKhbVvkVeEgIn/Wr
	BoJQ==
X-Gm-Message-State: APjAAAVUcGGyfFMkz6xLoKgaFgHg89r0BAOC7N5Jue3D/sppP+EqFJsf
	AQ0HE80d5sT1bztCRRsivO4KF+ssN0A=
X-Google-Smtp-Source: 
 APXvYqxO/4x8pKeQqYIA8T3oieeHlyJSKWivvFsNYc5gWBUP63//96QWVlozdSFdiK9Wc8IU7RFLDw==
X-Received: by 2002:a2e:9899:: with SMTP id
	b25mr21162793ljj.70.1576008641688;
	Tue, 10 Dec 2019 12:10:41 -0800 (PST)
Received: from localhost.localdomain (dsl-tkubng21-58c01c-243.dhcp.inet.fi.
	[88.192.28.243]) by smtp.gmail.com with ESMTPSA id
	r15sm2361652ljk.3.2019.12.10.12.10.41 for <ffmpeg-devel@ffmpeg.org>
	(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
	Tue, 10 Dec 2019 12:10:41 -0800 (PST)
From: =?UTF-8?q?Martin=20Storsj=C3=B6?= <martin@martin.st>
To: ffmpeg-devel@ffmpeg.org
Date: Tue, 10 Dec 2019 22:10:39 +0200
Message-Id: <20191210201040.22050-1-martin@martin.st>
X-Mailer: git-send-email 2.17.1
Subject: [FFmpeg-devel] [PATCH 1/2] checkasm: aacpsdsp: Tolerate extra
	intermediate precision in stereo_interpolate
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

The stereo_interpolate functions add h_step to the values h
BUF_SIZE times. Within the stereo_interpolate C functions, the
values h (h0-h3, h00-h13) are declared as local float variables,
but the compiler is free to keep them in a register with extra
precision.

If the accumulation is rounded to 32 bit float precision after
each step, the less significant bits of h_step end up ignored
and the sum can deviate, affecting the end result more than
the currently set EPS.

By clearing the log2(BUF_SIZE) lower bits of h_step, we make sure
that the accumulation shouldn't differ significantly, regardless
of any extra precision in the accmulating register/variable.

This fixes the aacpsdsp checkasm test when built with clang for
mingw/x86_32.
---
 tests/checkasm/aacpsdsp.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tests/checkasm/aacpsdsp.c b/tests/checkasm/aacpsdsp.c
index ea68b39fa9..2ceef4341f 100644
--- a/tests/checkasm/aacpsdsp.c
+++ b/tests/checkasm/aacpsdsp.c
@@ -17,6 +17,7 @@
  */
 
 #include "libavcodec/aacpsdsp.h"
+#include "libavutil/intfloat.h"
 
 #include "checkasm.h"
 
@@ -34,6 +35,16 @@
 
 #define EPS 0.005
 
+static void clear_less_significant_bits(INTFLOAT *buf, int len, int bits)
+{
+    int i;
+    for (i = 0; i < len; i++) {
+        union av_intfloat32 u = { .f = buf[i] };
+        u.i &= (0xffffffff << bits);
+        buf[i] = u.f;
+    }
+}
+
 static void test_add_squares(void)
 {
     LOCAL_ALIGNED_16(INTFLOAT, dst0, [BUF_SIZE]);
@@ -198,6 +209,13 @@ static void test_stereo_interpolate(PSDSPContext *psdsp)
 
             randomize((INTFLOAT *)h, 2 * 4);
             randomize((INTFLOAT *)h_step, 2 * 4);
+            // Clear the least significant 14 bits of h_step, to avoid
+            // divergence when accumulating h_step BUF_SIZE times into
+            // a float variable which may or may not have extra intermediate
+            // precision. Therefore clear roughly log2(BUF_SIZE) less
+            // significant bits, to get the same result regardless of any
+            // extra precision in the accumulator.
+            clear_less_significant_bits((INTFLOAT *)h_step, 2 * 4, 14);
 
             call_ref(l0, r0, h, h_step, BUF_SIZE);
             call_new(l1, r1, h, h_step, BUF_SIZE);