From patchwork Sat Jul 1 00:10:24 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ivan Kalvachev X-Patchwork-Id: 4170 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.1.76 with SMTP id 73csp6368095vsb; Fri, 30 Jun 2017 17:10:38 -0700 (PDT) X-Received: by 10.223.175.18 with SMTP id z18mr29691860wrc.22.1498867838214; Fri, 30 Jun 2017 17:10:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1498867838; cv=none; d=google.com; s=arc-20160816; b=UGaQScsU/0Ck3nXAQlbYz0McpW9e3h5krsqhBhhXH9jKmkaxoAOZeKif+J3tQphloz DGtaIF/XaA8RBgKUF0wlCUmdt2jhkOXyH5Y0DyyPLjjkGwOR+ZJCt5RtNblGIROWbIzd NGUGjppQORdud1aMfItzDX0MuP+8F2p7t8ELTFbDdlqr5Ss8Lf5ci2hJvc/GTTaCYmEW lp5zYdu5F26gkqqhsw29G5pf0KDf3tK+j3bXHGdxFgmQM2iC/vw1Hl1v4YsTD0O/ElMD s6hRWeHVFDp0cE1SB34VCJE/sZE9ARzwX3WrmQE1Te19xpNLVufAq/6h+LGXrxZiCm5H AQvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:to :message-id:date:from:mime-version:dkim-signature:delivered-to :arc-authentication-results; bh=g5rMcIYNwpD1o36cNzzLXkAbjuB/KMYRaPg3loXkZoc=; b=RRzcAL4eylfMWnWScFpQHbhgna4HIbxkQIxxf1YFGhep4X6GJZBkRGbXxyiYfEXrhg snh7aGwCmQzOsh80QJszMa77sq/UFHagp1bqW7nB1wrsNDKCAeO8oyWsG0ah8GcZ9vAR Nrj5RfZmSThkYWZNHxaayEkL65VcKv6TY3wKyR70+d+5+7533mGxCVgLtdJY1Lhv1Nna bfXYWYV5puoSwZ/q373R86Jn8mHNtDPwnn2jkthGhn4WJBrfUOthSByrUlsZCaYtmLJg rJP4dg41RP5wANkcyBKlcGah74uSiTDTxD0R7PpHTOMnh9Vw5KeFZFzo5ZuA/RM9yBVR F05A== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.b=cSDQE8YY; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id g201si11349911wmg.158.2017.06.30.17.10.37; Fri, 30 Jun 2017 17:10:38 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.b=cSDQE8YY; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3BBBA689A93; Sat, 1 Jul 2017 03:10:34 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf0-f178.google.com (mail-pf0-f178.google.com [209.85.192.178]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D76CB689A62 for ; Sat, 1 Jul 2017 03:10:26 +0300 (EEST) Received: by mail-pf0-f178.google.com with SMTP id q86so74040322pfl.3 for ; Fri, 30 Jun 2017 17:10:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=rXOylS4xX/+cwhZtxo/PVJAyVnFExjZQ23MwnnHs+2I=; b=cSDQE8YYmKyaSNUT4N3DeLUOpR3ZZD3JdECx58jaw8mSoqZfkmhJc9hhSlcBtNyhZd a8gc9MQEhQH0LpAsNaGSABmrsC/NB5GYFoyEqAObrPYwLvvXEWTac//z1jsIVFLq5m05 NwiNCn29ziHM1SFKG9G+DVNQThQgsvGUp+HjwwQWhH34KZnzlTGU2lCr3CAxbC/cXS8j o3cTYgBaDm/Ftzr51v52z4NwKQVmG3avUUmd6Sae6jJTMjXuIBrp+S+XeNzQLDL931pg Q7Jo6liC3v2vGkPGKg6coCl2h2pJpUJhp6kOoEq1UPPDVOMVOdLge1Cpqvm0x2Fuhe7c hcUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=rXOylS4xX/+cwhZtxo/PVJAyVnFExjZQ23MwnnHs+2I=; b=AMb4ed8EA5L5c2EafS/FOLVLMd4Ah7XByaUxX2AGwLKsCkmRrUQ3aE6Xn6TdMCH+cf 564OPZUBYWpCCA8niV6DNIA9K2RhsT/pH13Hf8AkJvrn11m/08KsnctDuaeG4VFnfm9z Xua46DVPiFbEwDGlXD4v4rVjVz1t+YxDSDe8WUCjxphLMTJSLlHszSp18MwCkkjIS/V3 LAS1LU9c0/eQi+gzGQe5v1EiDPIBgXaemTgQU/YzEeBzWnwkmDRBY9rbhXsK/R3mUA+C 5YNc6RRV04PS0QN5Q/PBSCXLwW2vBdv4682WL61+U+1Yplzz2jnFNAO+nsUAF0546M71 zpvg== X-Gm-Message-State: AKS2vOzlXGevWODgFYjDWcnGA6JJZGC+rWkGdspcO+H+Q+PuWZBETj5q KnmYCi3h8b9qA7butmOXk0x6NsoL0w== X-Received: by 10.101.73.135 with SMTP id r7mr23535092pgs.21.1498867825208; Fri, 30 Jun 2017 17:10:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.144.87 with HTTP; Fri, 30 Jun 2017 17:10:24 -0700 (PDT) From: Ivan Kalvachev Date: Sat, 1 Jul 2017 03:10:24 +0300 Message-ID: To: ffmpeg-devel@ffmpeg.org Subject: Re: [FFmpeg-devel] [WIP][PATCH]v3 Opus Pyramid Vector Quantization Search in x86 SIMD asm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" First, I've removed the hacked formula. While it seems to improve precision in the synthetic test, it is not enough to avoid assigning pulses in padded area. Worse, it also interferes with the special case with subtraction of pulses from y[i]==0 output. Handling these cases needs combining two masks and using "blendvps" instead the faster "maxps". Most of these problems are related to the fact that the hacked formula result is in the range -1.0 to 0.0 where 0.0 is the perfect match. Second, fixed the all_float rounding mode, I got it swapped in v2. Third, reordered some code. Moved the branch for exact match of the pre-search, to be first taken. Load the constants needed in the pulse search only when they will be used. Use a macro for conditional loading/relabeling of the constants. Use similar idea for loading and using PIC register. Use "smartalign" on NASM, to avoid 15 consequitev 1 byte NOPs. YASM is smart by default. I attach a second patch, that is slightly modified version of atomnuker's code. It sums the distortions of C and SIMD implementations. In my test approx#2 seems to be pretty close to the C version, sometimes better, sometimes worse. If you find a file with dramatic difference, I'm interested. :D If there are no more issues, v4 would be cleaned up and ready for review. Best Regards. From a1005984a6e144fcd1701b4b71e9de5a17ebbca9 Mon Sep 17 00:00:00 2001 From: Ivan Kalvachev Date: Sat, 1 Jul 2017 00:34:22 +0300 Subject: [PATCH 2/4] Code to measure the total distortion sum of the reference and dsp implementation. --- libavcodec/opus_pvq.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 70 insertions(+), 3 deletions(-) diff --git a/libavcodec/opus_pvq.c b/libavcodec/opus_pvq.c index 6c7504296d..3ad995e7a4 100644 --- a/libavcodec/opus_pvq.c +++ b/libavcodec/opus_pvq.c @@ -420,10 +420,77 @@ static uint32_t celt_alg_quant(OpusRangeCoder *rc, float *X, uint32_t N, uint32_ celt_exp_rotation(X, N, blocks, K, spread, 1); { - START_TIMER - gain /= sqrtf(pvq->pvq_search(X, y, K, N)); - STOP_TIMER("pvq_search"); + static double adist1 = 0.0f; + static double adist2 = 0.0f; + static int cnt,cntlg; + float xn1[256]; + float xn2[256]; + int ysec[256]; + + float sq1 = 0.0f; + float sq2 = 0.0f; + + float dist1 = 0.0f; + float dist2 = 0.0f; + + int sum1 = 0, sum2 = 0; + + // START_TIMER + gain /= sqrtf(pvq->pvq_search(X, y, K, N)); + // STOP_TIMER("pvq_search"); + + ppp_pvq_search_c(X, ysec, K, N); + + + for (int i = 0; i < N; i++) { + sq1 += y[i]*y[i]; + sq2 += ysec[i]*ysec[i]; + sum1 += FFABS(y[i]); + sum2 += FFABS(ysec[i]); + } + + if (sum1 != K) { + printf("\nERROR! sum Sy=%i K=%i N=%d \n", sum1, K, N); + printf(" X = "); + for(int i=0; i=2*cntlg){ + cntlg=cnt; + printf("\nruns = %d\n", cnt); + printf("Distortion1 = %f\n", adist1); + printf("Distortion2 = %f\n", adist2); + } } +what: celt_encode_pulses(rc, y, N, K); celt_normalize_residual(y, X, N, gain); -- 2.13.0