From patchwork Sat Aug 19 15:10:51 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ivan Kalvachev X-Patchwork-Id: 4750 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.46.211 with SMTP id u202csp2002625vsu; Sat, 19 Aug 2017 08:11:03 -0700 (PDT) X-Received: by 10.223.143.18 with SMTP id p18mr5945951wrb.268.1503155463148; Sat, 19 Aug 2017 08:11:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1503155463; cv=none; d=google.com; s=arc-20160816; b=H7y4GqPS33C8Dgvcm2TXYDk2NTZ+ABn5R+GJxW9bLO8R8nB2+lZVrSTpIRFOLMOnTt 1WTqsyZaLLUj9A01oQACm5Xc4ynhGtQi26qfUJBzqI+xdH8QdPYVr5rjABDLlxqnuXmr fbWGGDz8+ixxzbDOWYe2g1lBEIYWIk08BpL+yoQRJKAoi6b/u9KNgwKjQ+f+v9WC+kAa XkDJHdT2oQ04pfWS2+lGDhywTg50KraALRAdk9MOCN71r0s7wGWjWWqWKsTAyF9/xaE0 NCVrsLyOHPPZe/7LT/9rAsQvHI3c0c3Aql8FcpuYoIx2gWkRckH4JzDCfUkqXKWuqth9 LCAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:to :message-id:date:from:mime-version:dkim-signature:delivered-to :arc-authentication-results; bh=8eB3ulHCvKEx07ivGePos3uiZAROVqU/j6uNj/DtRjo=; b=rH8GTj24XDAoQqHxsubgThL+kyWMgcUmt6KruQHRtpOioTTKuvy9RRZ79VBEtVOMqh XJbSi51cxP7PrwcBl2O9+sfDki9rOIaBBhiESBZxAXLAqG7aNZ0bBZN+8aEU4oZQLb/c UnMPiZgcsR0C/uVXsokJxyTZIU5Q7v/SvKI4CFgwSMr+8ts9tp4YlZFqHlmZeLJIChdC G7aQYGWY1AXg/KkYJSs5yHdZb6gppQAZWMRnd94L/wOdebGHv+P89C90OOdr8r23ty/M HH+WAu6qlO5D2WweQkvp18u/6vMS8SIWmL9Jf1NchUEBjjxcqvpfQJlsMSEZlMeHILwI YtFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=WR3y4Rs3; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id l197si3161229wma.183.2017.08.19.08.11.02; Sat, 19 Aug 2017 08:11:03 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=WR3y4Rs3; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 70987689AFC; Sat, 19 Aug 2017 18:10:54 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pg0-f54.google.com (mail-pg0-f54.google.com [74.125.83.54]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0EAD5680A44 for ; Sat, 19 Aug 2017 18:10:48 +0300 (EEST) Received: by mail-pg0-f54.google.com with SMTP id u191so4301936pgc.2 for ; Sat, 19 Aug 2017 08:10:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=gCwfXyrBlQ/cTngCOsMSDcstyBYGj4z6JJNyCLT4xM8=; b=WR3y4Rs3Rw21VNEKnSdqqcQWpRsmpoxVGka58KOnNWBJXU7hNY5Bjo5SsGBnMrmCAk o7NPJwC19EjLZcx+/62TIatYv6qNdHW6eFI8JdI3GgCuvQRuzCIalnnvQjpzJs7Lz6B2 oeDsE3SDTl+pCrb5hxvluNlpk+CDPEp50f0UyheGODqJI9P1hueoFSQS4TVwfDBtruhD jnYkDMCqyKTWtw7y5lgvqY0QNUuskN8oZ9WAgvK2uhEEcB/XMV7tfVUVwruCnZcjY6J1 05SGFboIeTZ+bWvlAnwz9IlTAl/f/6gYm3vGa5Yme6U0c3hdmf64SMfWczaT7dpvLzt9 sVtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=gCwfXyrBlQ/cTngCOsMSDcstyBYGj4z6JJNyCLT4xM8=; b=BPOGES8du5sUZv/bhFrxn1/e7d6Yih9//wlHk/smsdPnpGfmEyEgeXrZYSirhtdBEn oV555ivUeNszDQPdG3y83oimlGJ41vznfUu5HouzexAWIeBVglGNmcCKnFurn6YYj28e AlA9VNMKlKf79228eJ/wRrF6HeqQArky5nCDcsfeedLFWrHl/sm6D7I4Gi2IyUGq4DW9 FBC6Ziq/70bWy6JFA7zc+MtRdvWNh+gibHYNp6WNZpDBssb4i+gVCQSz9aYePckE60m8 B7ElEO06l/MJHfRB2Nccj3ElzDr8oyllqC9AuZJ+wMCnMo5jK0QEAhBC+KEYYpfOEcDK tFwg== X-Gm-Message-State: AHYfb5iZydKd72yrBBZzuqVYSAU5Xil5aKZNXbvPUyZ92+j2izHWvm0U c43zh45YQ+jSnld6QVTDwm5k4HMHeg== X-Received: by 10.98.112.132 with SMTP id l126mr11879289pfc.181.1503155452849; Sat, 19 Aug 2017 08:10:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.168.77 with HTTP; Sat, 19 Aug 2017 08:10:51 -0700 (PDT) From: Ivan Kalvachev Date: Sat, 19 Aug 2017 18:10:51 +0300 Message-ID: To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] opus_pvq_search: Restore the proper use of conditional define and simplify the function name suffix handling X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Using named define properly documents the code paths. It also avoids passing additional numbered arguments through multiple levels of macro templates. The suffix handling is done by concatenation, like in other asm functions and avoid having two separate "cglobal" defines. --- I have to point few things out. commit f386dd70acdc81d42d6bcb885d2889634cdf45b7 "opus_pvq_search: only use rsqrtps approximation on CPUs with avx" was rushed hack job: 1. It broke compilation with yasm. 2. Its subject says the opposite of what the patch does. 3. The changes were numerous and intrusive, obfuscating code, while the same result could have been achieved with a single new line. I did request revert of this commit on irc and in ffmpeg-cvs-log mail, I did request a patch to be sent to ffmpeg-dev-eng, to discuss the best way to make the change. I did oppose the follow up commit 3c99523a2864af729a8576c3fffe81fb884fa0d5, that tried to fix the compilation, by redoing the intrusive changes once again, and making new changes that affect the C code too. (Thus making revert a lot more messy.) I find it extremely offensive that I as author of the code and FFmpeg developer was totally ignored, without providing any technical reasons and with something that sums up to "I like my code more". This breaks good faith, COC and the written rules. I'm not going to request punishment, reverts, public lynching... I don't want drama. I'm sending this patch to make the code as I think it should have been done from the start and I do ask you to push it without further bikeshedding. That would be enough of an apology. From 18324e5535dd0c928a3ec2e8f25babc583b031d5 Mon Sep 17 00:00:00 2001 From: Ivan Kalvachev Date: Sat, 19 Aug 2017 14:29:40 +0300 Subject: [PATCH] opus_pvq_search: Restore the proper use of conditional define and simplify the function name suffix handling. Using named define properly documents the code paths. It also avoids passing additional numbered arguments through multiple levels of macro templates. The suffix handling is done by concatenation, like in other asm functions and avoid having two separate "cglobal" defines. Signed-off-by: Ivan Kalvachev --- libavcodec/x86/opus_pvq_search.asm | 37 ++++++++++++++++++------------------- 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/libavcodec/x86/opus_pvq_search.asm b/libavcodec/x86/opus_pvq_search.asm index 8cf040465d..5c1e6d6174 100644 --- a/libavcodec/x86/opus_pvq_search.asm +++ b/libavcodec/x86/opus_pvq_search.asm @@ -82,7 +82,7 @@ SECTION .text %endif %endmacro -%macro PULSES_SEARCH 2 ; %1 - add or sub, %2 - use approximation +%macro PULSES_SEARCH 1 ; m6 Syy_norm ; m7 Sxy_norm addps m6, mm_const_float_0_5 ; Syy_norm += 1.0/2 @@ -96,17 +96,17 @@ align 16 movaps m4, [tmpY + r4] ; y[i] movaps m5, [tmpX + r4] ; X[i] -%if %2 + %if USE_APPROXIMATION == 1 xorps m0, m0 cmpps m0, m0, m5, 4 ; m0 = (X[i] != 0.0) -%endif + %endif addps m4, m6 ; m4 = Syy_new = y[i] + Syy_norm addps m5, m7 ; m5 = Sxy_new = X[i] + Sxy_norm -%if %2 + %if USE_APPROXIMATION == 1 andps m5, m0 ; if(X[i] == 0) Sxy_new = 0; Prevent aproximation error from setting pulses in array padding. -%endif + %endif %else movaps m5, [tmpY + r4] ; m5 = y[i] @@ -119,7 +119,7 @@ align 16 andps m5, m0 ; (0 0 %%add_pulses_loop: - PULSES_SEARCH add, %1 ; m6 Syy_norm ; m7 Sxy_norm + PULSES_SEARCH add ; m6 Syy_norm ; m7 Sxy_norm sub Kd, 1 jnz %%add_pulses_loop @@ -325,7 +320,7 @@ align 16 ; K - pulses > 0 align 16 %%remove_pulses_loop: - PULSES_SEARCH sub, %1 ; m6 Syy_norm ; m7 Sxy_norm + PULSES_SEARCH sub ; m6 Syy_norm ; m7 Sxy_norm add Kd, 1 jnz %%remove_pulses_loop @@ -376,11 +371,15 @@ align 16 ; On Skylake & Ryzen the division is much faster (around 11c/3), ; that makes the full precision code about 2% slower. ; Opus also does use rsqrt approximation in their intrinsics code. +%define USE_APPROXIMATION 1 + INIT_XMM sse2 -PVQ_FAST_SEARCH 1 +PVQ_FAST_SEARCH _approx INIT_XMM sse4 -PVQ_FAST_SEARCH 1 +PVQ_FAST_SEARCH _approx + +%define USE_APPROXIMATION 0 INIT_XMM avx -PVQ_FAST_SEARCH 0 +PVQ_FAST_SEARCH _exact -- 2.14.1