From patchwork Fri Feb 1 02:45:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jun Zhao X-Patchwork-Id: 11938 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 6450744D668 for ; Fri, 1 Feb 2019 05:14:16 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8047368ACFE; Fri, 1 Feb 2019 05:14:04 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f194.google.com (mail-pf1-f194.google.com [209.85.210.194]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D4AFD68ACFE for ; Fri, 1 Feb 2019 05:13:57 +0200 (EET) Received: by mail-pf1-f194.google.com with SMTP id g62so2435773pfd.12 for ; Thu, 31 Jan 2019 19:14:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=e83vn48qoft+9gTumhdmhu6XuB5bWjmZjWrslfwMzGg=; b=cvSPGlqFC0icPMQyhCelY98I3JBjfnXpd1xy3OtmDT+G6DzNu2N2gaMEEcpOD2jZSK uFk3SZoZdis4HtOwV+ZDnLYiZn03WBfLTaHREIsr7n1iULysO/RoG0fnAfXSr8axgiLq 6RN/4GTP26vzFdEk7CjcLMVwxK2kXFrLsG3DvnHmsyKi3KFMWvb2ifHFjEegcp+KksSX ap9WmXOnBIhfe38YyaAoczsjcjuJlf7WGqm70nDPvRyPOEvkDdVvAMLgK0QPvBESUVSQ UH6PduXOQoktgcw8lOFGkcHqHX2qSvh20EqiaDBz40Pg6MIEfGBHtScMiDelIA8YZDyt oegw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=e83vn48qoft+9gTumhdmhu6XuB5bWjmZjWrslfwMzGg=; b=OC9pjzo+l63yejxYwGxhePob0NfmqC1/+yUE6djzwWGaByhIrIDMplDb4D3g7dqsCL 7UiNoGES9ie2ry63LAiqbsGa0aa5CyORD/jBeOLf7k7S1MPjtFn229Npbg0V/Ep742fu pjsZ1CVJbPZ4h7pQ6TJYJAvvYMO7jiLOJmE0oSK4XylZt5G23NIq8Rmi84w0ipPyYsS3 jOi4Om92slpfUiqqXnaYwA8GCLmrRTymU6N19uum8X30oTFwlywigPm1XH4P8YdQ8FhV lEDo78fzIiyfThjmZFEaakWa5RN1o0ZwDlXwGKTVZjh9SYaz/OEy9Men0kfwQLE1aOUl d4Ww== X-Gm-Message-State: AJcUukfC67huhpmBR2tDWXa/voB/YqI84NbyEHisM6nQmQnR4MKOxnNx TO3kzrFfA8x0kkRXejj8jK1FcKcn X-Google-Smtp-Source: ALg8bN6ZmFRuV/dPvziABA27xJp7S4MIPzRdvFpWqO64pTadjhHyuQuwpEnW7p8r4qyTOyPKt32nSw== X-Received: by 2002:a62:5910:: with SMTP id n16mr36939307pfb.128.1548989129433; Thu, 31 Jan 2019 18:45:29 -0800 (PST) Received: from localhost.localdomain ([47.90.47.25]) by smtp.gmail.com with ESMTPSA id o6sm230379pgp.59.2019.01.31.18.45.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 31 Jan 2019 18:45:28 -0800 (PST) From: Jun Zhao To: ffmpeg-devel@ffmpeg.org Date: Fri, 1 Feb 2019 10:45:24 +0800 Message-Id: <1548989124-27074-1-git-send-email-mypopydev@gmail.com> X-Mailer: git-send-email 1.7.1 Subject: [FFmpeg-devel] [PATCH V2] lavfi/vf_nlmeans: Improve the performance for nlmeans X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Jun Zhao MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Remove the pdiff_lut_scale in nlmeans and increase weight_lut table size from 2^9 to 800000, this change will avoid using pdiff_lut_scale in nlmeans_slice() for weight_lut table search, it's will improve the performance about 12%. (in 1080P size picture case). Use the profiling command like: perf stat -a -d -r 5 ./ffmpeg -i input -an -vf nlmeans=s=30 -vframes 10 \ -f null /dev/null without this change: when s=1.0(default value) 63s s=30.0 72s after this change: s=1.0(default value) 56s s=30.0 63s Reviewed-by: Carl Eugen Hoyos Signed-off-by: Jun Zhao --- libavfilter/vf_nlmeans.c | 12 ++++-------- 1 files changed, 4 insertions(+), 8 deletions(-) diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c index 82e779c..72eb819 100644 --- a/libavfilter/vf_nlmeans.c +++ b/libavfilter/vf_nlmeans.c @@ -43,8 +43,7 @@ struct weighted_avg { float sum; }; -#define WEIGHT_LUT_NBITS 9 -#define WEIGHT_LUT_SIZE (1< 300 * 300 * log(255) typedef struct NLMeansContext { const AVClass *class; @@ -63,7 +62,6 @@ typedef struct NLMeansContext { struct weighted_avg *wa; // weighted average of every pixel ptrdiff_t wa_linesize; // linesize for wa in struct size unit float weight_lut[WEIGHT_LUT_SIZE]; // lookup table mapping (scaled) patch differences to their associated weights - float pdiff_lut_scale; // scale factor for patch differences before looking into the LUT uint32_t max_meaningful_diff; // maximum difference considered (if the patch difference is too high we ignore the pixel) NLMeansDSPContext dsp; } NLMeansContext; @@ -401,8 +399,7 @@ static int nlmeans_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs const uint32_t patch_diff_sq = e - d - b + a; if (patch_diff_sq < s->max_meaningful_diff) { - const unsigned weight_lut_idx = patch_diff_sq * s->pdiff_lut_scale; - const float weight = s->weight_lut[weight_lut_idx]; // exp(-patch_diff_sq * s->pdiff_scale) + const float weight = s->weight_lut[patch_diff_sq]; // exp(-patch_diff_sq * s->pdiff_scale) wa[x].total_weight += weight; wa[x].sum += weight * src[x]; } @@ -527,10 +524,9 @@ static av_cold int init(AVFilterContext *ctx) s->pdiff_scale = 1. / (h * h); s->max_meaningful_diff = -log(1/255.) / s->pdiff_scale; - s->pdiff_lut_scale = 1./s->max_meaningful_diff * WEIGHT_LUT_SIZE; - av_assert0((s->max_meaningful_diff - 1) * s->pdiff_lut_scale < FF_ARRAY_ELEMS(s->weight_lut)); + av_assert0((s->max_meaningful_diff - 1) < FF_ARRAY_ELEMS(s->weight_lut)); for (i = 0; i < WEIGHT_LUT_SIZE; i++) - s->weight_lut[i] = exp(-i / s->pdiff_lut_scale * s->pdiff_scale); + s->weight_lut[i] = exp(-i * s->pdiff_scale); CHECK_ODD_FIELD(research_size, "Luma research window"); CHECK_ODD_FIELD(patch_size, "Luma patch");