From patchwork Sun May 6 11:40:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?Q2zDqW1lbnQgQsWTc2No?= X-Patchwork-Id: 8805 Delivered-To: ffmpegpatchwork@gmail.com Received: by 2002:a02:155:0:0:0:0:0 with SMTP id c82-v6csp1853746jad; Sun, 6 May 2018 04:41:28 -0700 (PDT) X-Google-Smtp-Source: AB8JxZp//hSlf0u0HppNQNWHeqxAAp3FTAsHdPcN20bns8dDE+LdoFif8jE7vknohI61tvEReGhG X-Received: by 2002:adf:86b2:: with SMTP id 47-v6mr26379355wrx.256.1525606888515; Sun, 06 May 2018 04:41:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525606888; cv=none; d=google.com; s=arc-20160816; b=SbOzOknqsFsX2GktHNplYtmrqBWMECBrRiyHnXa2DaZ9mYOH808RbvbPTadeIUYKB0 rOEwAzwtWSixX/D14kRapeQoiYAPoyz5tOBhFyX/ouCUI8sEp1i+jsNeXZ5z0l7VVhRY hEggLA4SH8lyC4xJbmbSnb/bJWP10psmCRGDzHQB96Ebs7br5upoGLyVL0re7RGq91KL 8F+yzxO8zydaipjW/yRn3cA8x56djo6QWpo7Asz8AAAK3fu2uYu32IN0dxx3Ro4ZJp9c xxe1+v0CWxANnbAA9s1iN04sVj9PVXzHh7lpfRFt6V126h2KECjVFKFCSLaFOfJsChb+ cxUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:domainkey-signature:dkim-signature:delivered-to :arc-authentication-results; bh=zlQYwF39nUntM4JRZm4H87yCw6UmPF8qmPz0b7aX44Q=; b=t2QmHmfo41Odcla1nKWmhzauSo3FPUptXdvyjSnzLlo8PtCsw8J4iwOjFyylgua34F 365lKmn8YYEYObatugWcYgF5RY/C4+0yuOq8t5aaUoIiWTwUWZaXh/eDs81AqixfcuBU fX3bLcwSuKbTUBniTbYyocCG4spLKFLfEBHmSW3qkEY9i5ryZ5aWXfcIyapsd9up1QsM EpXdLzKSPflE+t4renyh316c+9adMJSlf50UX8MXVau0VGXSK579mZ7fLHOTdzRDrMmU ylooydlwdydT5BFEx9UeMfTyBwPiuBYQ6PHwkBZigp7R8Dt6YjjTVYothYoMkC96Z8T+ SgeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@pkh.me header.s=selector1 header.b=gtSPn6R2; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id k184-v6si412084wmg.182.2018.05.06.04.41.28; Sun, 06 May 2018 04:41:28 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@pkh.me header.s=selector1 header.b=gtSPn6R2; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C842F68A606; Sun, 6 May 2018 14:40:42 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from golem.pkh.me (LStLambert-657-1-117-164.w92-154.abo.wanadoo.fr [92.154.28.164]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1377068A5F9 for ; Sun, 6 May 2018 14:40:36 +0300 (EEST) Received: from golem.pkh.me (localhost.localdomain [127.0.0.1]) by golem.pkh.me (OpenSMTPD) with ESMTP id aa2a0014 for ; Sun, 6 May 2018 11:41:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pkh.me; h=from:to:cc :subject:date:message-id:in-reply-to:references; s=selector1; bh=vt/2tv/g/nKE67a1PO8zFavIE6U=; b=gtSPn6R2G21g2Z5oWPA2YTrFiG1w OfCFeTo15If0lda8895YEJvm8ZQXjRDY/3IoT3esSNqzCPY3uCa5vQiV4Kn+UfLp Ffa2mpuYRQcLDJwZT4LL+AiVpfHMyQfRPm0lry9Wm0+BtJ9vXPUCe2NChuFK0NZ5 9z7YJ7h8X+Lctwz08yAIMSkW6KUAZjxr4i2pXo0OTKo/GZRJBvm9wX4vJKLzzv34 FlIfz0FN5gxprP9CtTEln1/cOvOZ0wSSYyseagGJdfkSQzLmOI/eeZ7tkAIT4LVh jqSTiKFZafPM7Wdcs/oR9X0cp2JwfLis5JMymsUoSnC0HTdAdojc04MH7Q== DomainKey-Signature: a=rsa-sha1; c=nofws; d=pkh.me; h=from:to:cc:subject :date:message-id:in-reply-to:references; q=dns; s=selector1; b=V OBQnmyvsY8D9xrWjzF9BHAuIBiRkgVP0NM4mY35QwhSYXoOXUE622j7ozpgnuOeX Ox7P/bM1oiqkFKQ8VjRWbS30gYijlOHVAYLW9F35L32iCEw+J9yPlB6Hzl2Rdc6Y NtZIoDWm//SDmpNI8aRxv5OK0lfVihskf4bcPVeMYxhQCMFBODczlxJI2h+MeXxN 1KWRU0K62mZEufaiQdjFcsU9DSbm6MLpYAt+LMh/v1vG2n7ew9FtnSGNT1/9mMAX e45xoi18cW9SxLgnCLYpuU/jRLb03fz0bFzsz7E98PW/7+en+/vuaCnDa/T/Kk2w U8fKhA21rRqhqrav0yE6Q== Received: from localhost (golem.pkh.me [local]) by golem.pkh.me (OpenSMTPD) with ESMTPA id 321f3c3f; Sun, 6 May 2018 11:41:03 +0000 (UTC) From: =?UTF-8?q?Cl=C3=A9ment=20B=C5=93sch?= To: ffmpeg-devel@ffmpeg.org Date: Sun, 6 May 2018 13:40:53 +0200 Message-Id: <20180506114100.4223-3-u@pkh.me> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180506114100.4223-1-u@pkh.me> References: <20180506114100.4223-1-u@pkh.me> Subject: [FFmpeg-devel] [PATCH 2/9] lavfi/nlmeans: add SIMD-friendly assumptions for compute_safe_ssd_integral_image X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?UTF-8?q?Cl=C3=A9ment=20B=C5=93sch?= MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" SIMD code will not have to deal with padding itself. Overwriting in that function may have been possible but involve large overreading of the sources. Instead, we simply make sure the width to process is always a multiple of 16. Additionally, there must be some actual area to process so the SIMD code can have its boundary checks after processing the first pixels. --- libavfilter/vf_nlmeans.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c index d222d3913e..21f981a605 100644 --- a/libavfilter/vf_nlmeans.c +++ b/libavfilter/vf_nlmeans.c @@ -157,6 +157,9 @@ static void compute_safe_ssd_integral_image_c(uint32_t *dst, int dst_linesize_32 { int x, y; + /* SIMD-friendly assumptions allowed here */ + av_assert2(!(w & 0xf) && w >= 16 && h >= 1); + for (y = 0; y < h; y++) { uint32_t acc = dst[-1] - dst[-dst_linesize_32 - 1]; @@ -257,9 +260,16 @@ static void compute_ssd_integral_image(uint32_t *ii, int ii_linesize_32, // to compare the 2 sources pixels const int startx_safe = FFMAX(s1x, s2x); const int starty_safe = FFMAX(s1y, s2y); - const int endx_safe = FFMIN(s1x + w, s2x + w); + const int u_endx_safe = FFMIN(s1x + w, s2x + w); // unaligned const int endy_safe = FFMIN(s1y + h, s2y + h); + // deduce the safe area width and height + const int safe_pw = (u_endx_safe - startx_safe) & ~0xf; + const int safe_ph = endy_safe - starty_safe; + + // adjusted end x position of the safe area after width of the safe area gets aligned + const int endx_safe = startx_safe + safe_pw; + // top part where only one of s1 and s2 is still readable, or none at all compute_unsafe_ssd_integral_image(ii, ii_linesize_32, 0, 0, @@ -273,24 +283,25 @@ static void compute_ssd_integral_image(uint32_t *ii, int ii_linesize_32, 0, starty_safe, src, linesize, offx, offy, e, w, h, - startx_safe, endy_safe - starty_safe); + startx_safe, safe_ph); // main and safe part of the integral av_assert1(startx_safe - s1x >= 0); av_assert1(startx_safe - s1x < w); av_assert1(starty_safe - s1y >= 0); av_assert1(starty_safe - s1y < h); av_assert1(startx_safe - s2x >= 0); av_assert1(startx_safe - s2x < w); av_assert1(starty_safe - s2y >= 0); av_assert1(starty_safe - s2y < h); - compute_safe_ssd_integral_image_c(ii + starty_safe*ii_linesize_32 + startx_safe, ii_linesize_32, - src + (starty_safe - s1y) * linesize + (startx_safe - s1x), linesize, - src + (starty_safe - s2y) * linesize + (startx_safe - s2x), linesize, - endx_safe - startx_safe, endy_safe - starty_safe); + if (safe_pw && safe_ph) + dsp->compute_safe_ssd_integral_image(ii + starty_safe*ii_linesize_32 + startx_safe, ii_linesize_32, + src + (starty_safe - s1y) * linesize + (startx_safe - s1x), linesize, + src + (starty_safe - s2y) * linesize + (startx_safe - s2x), linesize, + safe_pw, safe_ph); // right part of the integral compute_unsafe_ssd_integral_image(ii, ii_linesize_32, endx_safe, starty_safe, src, linesize, offx, offy, e, w, h, - ii_w - endx_safe, endy_safe - starty_safe); + ii_w - endx_safe, safe_ph); // bottom part where only one of s1 and s2 is still readable, or none at all compute_unsafe_ssd_integral_image(ii, ii_linesize_32,