From patchwork Mon May 7 17:24:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?Q2zDqW1lbnQgQsWTc2No?= X-Patchwork-Id: 8839 Delivered-To: ffmpegpatchwork@gmail.com Received: by 2002:a02:155:0:0:0:0:0 with SMTP id c82-v6csp3102696jad; Mon, 7 May 2018 10:24:50 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqDiUwWllMJvcuh+ODnYRcmt2QX9dbOzcVvsdYV3tQkxdzs+PjnAGErUYqpQUlXXF+J9n3G X-Received: by 2002:adf:ba91:: with SMTP id p17-v6mr26453084wrg.125.1525713890483; Mon, 07 May 2018 10:24:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525713890; cv=none; d=google.com; s=arc-20160816; b=g2IAmscrdPZvUzVfgIdzzknBoNXXl8BmSW5btNfaJyLITcgqlu26iWv50N8lIDUTmc Nl6WIk9Uu84weVuDB/JXH5Xh9eWrn5nQq6q519LmdKfncZfWNfva1Z1yWsvkMXTPtM8p 756lJKpgrCZfKpT5WhleY6y+8eHxEsWIEIpvbPKXqLlyQbgbsOWhDNOkDsabYKQgRHpr KGryyejYAZGuZgcUutUNS5rIskdXI8F1jBmmJ+CV6+YrtqsOIt+12C//VbKj88aErObA QhaeywZa1z9QQ92V5QtW+8BtPaLSy4kUFYwF1yjgDUIGs1Z9ALtuX769WAO+sAXoSefo zLWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:domainkey-signature:dkim-signature:delivered-to :arc-authentication-results; bh=fIv/LnBe/eODV9FOJ8jhwvZqm8veRkB1tSljYJ0Wnvk=; b=QBK+bRc7GZq/bY2gk3JCgoZQFQwT+9o2HbdBwwuyXDTqDpLQwjq6oaEp2uHTnrB2O7 zjdvE6JE6BHqtaIsLz8TNsJoLPNjhVUvh26lsBDDd4m/PDrAUhZgDNVAYkTIsSHEyl9O +doamTpHww4RI0KmzrAVDthNVE0cmTD87n/oWfdw5CdD00HB6EXPIFgiaSSjgCK6I3zS Em1DZlZn3WCL5v1oG5BfUcI9jntOAF3I/plezYexoV0xpHJwr+2FfLmuuQ/Muu+0s8ME cZZZq4qNt7nOccZDy8qPecAFU7mqmLlgC2IO/eBv3Nae+MLHw4dOXRl//BdHuC+WAeCe mg6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@pkh.me header.s=selector1 header.b=C0XUpQhl; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id i12si5211344wmb.191.2018.05.07.10.24.50; Mon, 07 May 2018 10:24:50 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@pkh.me header.s=selector1 header.b=C0XUpQhl; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 17F0368A64F; Mon, 7 May 2018 20:24:02 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from golem.pkh.me (LStLambert-657-1-117-164.w92-154.abo.wanadoo.fr [92.154.28.164]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4953568A623 for ; Mon, 7 May 2018 20:23:55 +0300 (EEST) Received: from golem.pkh.me (localhost.localdomain [127.0.0.1]) by golem.pkh.me (OpenSMTPD) with ESMTP id 69514f41 for ; Mon, 7 May 2018 17:24:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pkh.me; h=from:to:cc :subject:date:message-id:in-reply-to:references; s=selector1; bh=pTrv6DsTFF9kQN9H+DvJ4L3eTCk=; b=C0XUpQhlGLLg1hZerJNnb0k2RTSJ FR7q5TJdvZktaf9JlV8Y8MXR/9mRpgZ8OB7qgvvNnwXsCt2fyWJAjceocCzFmRHU Dgw+v3eNrltErnvbHI1JSuaRHnThD3MLONRke1LVwIEDQ0sfB2du1bGux51GK8CY tGPBmX/rlKCL0pCJFt3Vm2sVuynFku1vKP2nLtWj1gSk/BX9snXvH3VG/5GLEnwK C1WSYQS56sL1VbW5Jeom6ZlVaWmLGwMiUH94WFlsxEI7uIjOELe6enHMmTGpU5h1 IsguzYmKH76juqXnQW4FXyPb4WbXVHtvmS/byEJzKxMS8wbGeUpkPa60iw== DomainKey-Signature: a=rsa-sha1; c=nofws; d=pkh.me; h=from:to:cc:subject :date:message-id:in-reply-to:references; q=dns; s=selector1; b=F iH112lkaudT8Lx1kyorGYnU8dOxrQKEAQu2n/L/6Jsjr4CkC1cSGwILowhW1ojOl 7GSpi629rmlR9crJufwmPK2M4+Rmi1TR2yUeeeyfRKVDhbU6YhpZTHkTkC6tq9Fv i/Ryxos5gUFRJX7XLtTMGO5NP/rOdmGu2T9Z5XNX/BDiOH7fLlvAIQujpMOlHLeE UDn9JN6lu9YXLMjW0Jq0xCYp2Aw9Sd8QZNf/pGw8YbYBJ1q3I/v8vSxZu23q5iWD 2VtkQzOYBi1hVw5jWdQYggw0rqT7lUj5or/kO7KdmPHK5e1B9z8pcH2iruE++7nM xaYyODsyVleDx5V3FkaaA== Received: from localhost (golem.pkh.me [local]) by golem.pkh.me (OpenSMTPD) with ESMTPA id 22a76e0e; Mon, 7 May 2018 17:24:24 +0000 (UTC) From: =?UTF-8?q?Cl=C3=A9ment=20B=C5=93sch?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 7 May 2018 19:24:14 +0200 Message-Id: <20180507172422.11003-3-u@pkh.me> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180507172422.11003-1-u@pkh.me> References: <20180507172422.11003-1-u@pkh.me> Subject: [FFmpeg-devel] [PATCH v2 02/10] lavfi/nlmeans: add SIMD-friendly assumptions for compute_safe_ssd_integral_image X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?UTF-8?q?Cl=C3=A9ment=20B=C5=93sch?= MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" SIMD code will not have to deal with padding itself. Overwriting in that function may have been possible but involve large overreading of the sources. Instead, we simply make sure the width to process is always a multiple of 16. Additionally, there must be some actual area to process so the SIMD code can have its boundary checks after processing the first pixels. --- libavfilter/vf_nlmeans.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c index d222d3913e..3f0a43ee72 100644 --- a/libavfilter/vf_nlmeans.c +++ b/libavfilter/vf_nlmeans.c @@ -157,6 +157,9 @@ static void compute_safe_ssd_integral_image_c(uint32_t *dst, int dst_linesize_32 { int x, y; + /* SIMD-friendly assumptions allowed here */ + av_assert2(!(w & 0xf) && w >= 16 && h >= 1); + for (y = 0; y < h; y++) { uint32_t acc = dst[-1] - dst[-dst_linesize_32 - 1]; @@ -257,9 +260,16 @@ static void compute_ssd_integral_image(uint32_t *ii, int ii_linesize_32, // to compare the 2 sources pixels const int startx_safe = FFMAX(s1x, s2x); const int starty_safe = FFMAX(s1y, s2y); - const int endx_safe = FFMIN(s1x + w, s2x + w); + const int u_endx_safe = FFMIN(s1x + w, s2x + w); // unaligned const int endy_safe = FFMIN(s1y + h, s2y + h); + // deduce the safe area width and height + const int safe_pw = (u_endx_safe - startx_safe) & ~0xf; + const int safe_ph = endy_safe - starty_safe; + + // adjusted end x position of the safe area after width of the safe area gets aligned + const int endx_safe = startx_safe + safe_pw; + // top part where only one of s1 and s2 is still readable, or none at all compute_unsafe_ssd_integral_image(ii, ii_linesize_32, 0, 0, @@ -273,24 +283,25 @@ static void compute_ssd_integral_image(uint32_t *ii, int ii_linesize_32, 0, starty_safe, src, linesize, offx, offy, e, w, h, - startx_safe, endy_safe - starty_safe); + startx_safe, safe_ph); // main and safe part of the integral av_assert1(startx_safe - s1x >= 0); av_assert1(startx_safe - s1x < w); av_assert1(starty_safe - s1y >= 0); av_assert1(starty_safe - s1y < h); av_assert1(startx_safe - s2x >= 0); av_assert1(startx_safe - s2x < w); av_assert1(starty_safe - s2y >= 0); av_assert1(starty_safe - s2y < h); - compute_safe_ssd_integral_image_c(ii + starty_safe*ii_linesize_32 + startx_safe, ii_linesize_32, - src + (starty_safe - s1y) * linesize + (startx_safe - s1x), linesize, - src + (starty_safe - s2y) * linesize + (startx_safe - s2x), linesize, - endx_safe - startx_safe, endy_safe - starty_safe); + if (safe_pw && safe_ph) + compute_safe_ssd_integral_image_c(ii + starty_safe*ii_linesize_32 + startx_safe, ii_linesize_32, + src + (starty_safe - s1y) * linesize + (startx_safe - s1x), linesize, + src + (starty_safe - s2y) * linesize + (startx_safe - s2x), linesize, + safe_pw, safe_ph); // right part of the integral compute_unsafe_ssd_integral_image(ii, ii_linesize_32, endx_safe, starty_safe, src, linesize, offx, offy, e, w, h, - ii_w - endx_safe, endy_safe - starty_safe); + ii_w - endx_safe, safe_ph); // bottom part where only one of s1 and s2 is still readable, or none at all compute_unsafe_ssd_integral_image(ii, ii_linesize_32,