From patchwork Fri Oct 29 15:19:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 31251 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a5e:a610:0:0:0:0:0 with SMTP id q16csp1804867ioi; Fri, 29 Oct 2021 08:20:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyWxcxffGA4oK2L8E7Ez+O7NENJjrq+yH7uo+qwYhRsJAES+GDGS/iEjTPPoJ5dCElqhW3W X-Received: by 2002:a17:906:1815:: with SMTP id v21mr14841186eje.218.1635520818613; Fri, 29 Oct 2021 08:20:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635520818; cv=none; d=google.com; s=arc-20160816; b=ZjRze+cVI/IgYNLgYnrRKlPtroynwtHRmhCG90rxCzf6P/CDjgLVFSWpB8EqrYsXiu aYGwQ/zterVBE76FEFsSPcwbrniH2NhUC6chGnmmQtbfixBt4aVZKuKlF9MyOmiH2arz qgyR+7uM3RXBmCcqy9z/2qJO37QhZA5cqJ1ImXmgNNFqPaThdKp7GpuvpZGClC1dFcUU tQLRn7u9pd+CEohljtdwgtKHz0hfFWtCqTN6svNTmlk9/XdAtNW9C0La5RhTl5M7eYFN drZVqqolCnyyBsbv/lRlybEyqHwCdv7Iv41SKsDDn16m11IL/Zhnb0VlGrvZmkwBm8tv 4qww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=fL6W9xw3YiM9qACX8ENU3f4zmg48Zn3puV/4b48VBGk=; b=yCTzTqDRrkm00FQA80GFzR2oClUb3rOfsFlUyGAS5UhOOHipKoDEigseMe1q8ur56/ 2HmTWeknO5ZnGlDqiApeYbjMnpcHByBLz1Q+QSWpUkdznn+U3gKEUZ9FJSL9s7VuM7/L jnSWxRas9HXO5jAuPYuVLj7t4Luhs3lBMJ5PJDZEK2krbic+EdVf/FBVwmorXfAG+oQH VV39Mh4j87NO9aoKFhCoBghcuM18T/hnDaTfc0OeH6QhiCC9j+kEQUQ0IkalzHPk+K+g 4nlVlR08LzxdDXYinRH5mFmG2HR06Lva9l7MQX/DsVZL3F/6+z01i729V9nHXNHpk+UI JmTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=PGy0jQCs; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id e8si8594163ejj.400.2021.10.29.08.20.18; Fri, 29 Oct 2021 08:20:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=PGy0jQCs; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E22E768A12D; Fri, 29 Oct 2021 18:19:16 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3891668A938 for ; Fri, 29 Oct 2021 18:19:06 +0300 (EEST) Received: by mail-ed1-f43.google.com with SMTP id w15so39994747edc.9 for ; Fri, 29 Oct 2021 08:19:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=Sg+sGgHAQUpVoatxNgHdQeAb4pXQkblj+p/eWVof6TU=; b=PGy0jQCsweQPgwDqTgBGHajkHtB+mqki33vAmztb0KlxB0Gsp0X2+EbebJ9EW/Vak6 R5jfp0qqEmkzNA+J6JwYgHDNmRyvHNFcjG+qDF6LAdnksBRmuAuZxfPHWEsodrH6s5fn XW4eoNU8ONtKx3B1Vy63WMqH6cBBkIWtil7PXOLDmXvsc0282IW8zNTJa85CTXw0Bto5 o8sKoLCncKefEeQHcyEtaMi5VIDUctGAcndUuAZkjH2IvrH9h82zhB7U4fmGVg1F3BhR 8t/CJ2oapgNe8M9Tro7iCanFZIuH5jNHyeajzePEpzzDjXJPVwz7MWa7U0gJDf7Q4fJ2 Q0ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Sg+sGgHAQUpVoatxNgHdQeAb4pXQkblj+p/eWVof6TU=; b=C0U/vTF2iGkWXF6/lxKfhuGzk6x2AHwpeMQUp0lXWx9ZAEswl8GPtfTfbWWu9kgZw3 iJpy960dEYlxo+1j1UDi/kce1m/Lj28ShS8Vy7iJ9Y2tsWgKE8C/0iyhjmONoEjgL8zs Pxx2OJFP3ycRPCgBzn3jYMd1XBIasDnJ+/LJsndlXoBJ8AIHrchiK253WBX+fpGozgHv w6FdTzXUWG7xKRZkXb8J/sCN5Ye1mHSv3npuLYPlKimPqujdpvXPLiTStpD0Opitg2O+ KHhNhoNXd8TgE4RiIRapgH5YLfNGfwxfR3bn2n9GhB+Rxa5jIAoXfDfqfRv8Eb0SepyE Nwvg== X-Gm-Message-State: AOAM5323Ql+RfV+xaJlY3Gmfmw3RYpTfBp4FjCaPHRKFN4Aq7YGYuGSS JlYCnTFYhaeBE/xrlCTQHbb1vyH5GP4= X-Received: by 2002:a05:6402:2787:: with SMTP id b7mr15772031ede.230.1635520745463; Fri, 29 Oct 2021 08:19:05 -0700 (PDT) Received: from localhost.localdomain ([212.15.177.28]) by smtp.gmail.com with ESMTPSA id gb3sm3101386ejc.81.2021.10.29.08.19.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Oct 2021 08:19:05 -0700 (PDT) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Fri, 29 Oct 2021 17:19:02 +0200 Message-Id: <20211029151903.1078367-6-onemda@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211029151903.1078367-1-onemda@gmail.com> References: <20211029151903.1078367-1-onemda@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 6/7] avfilter/vf_nlmeans: split wa struct X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: WvAVY6X8EN1o This will make x86 SIMD simpler and faster. Signed-off-by: Paul B Mahol --- libavfilter/vf_nlmeans.c | 43 +++++++++++++++++++++++----------------- libavfilter/vf_nlmeans.h | 8 ++------ 2 files changed, 27 insertions(+), 24 deletions(-) diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c index 93a14bcf19..dee1f68101 100644 --- a/libavfilter/vf_nlmeans.c +++ b/libavfilter/vf_nlmeans.c @@ -52,8 +52,9 @@ typedef struct NLMeansContext { uint32_t *ii; // integral image starting after the 0-line and 0-column int ii_w, ii_h; // width and height of the integral image ptrdiff_t ii_lz_32; // linesize in 32-bit units of the integral image - struct weighted_avg *wa; // weighted average of every pixel - ptrdiff_t wa_linesize; // linesize for wa in struct size unit + float *total_weight; // total weight for every pixel + float *sum; // weighted sum for every pixel + int linesize; // sum and total_weight linesize float *weight_lut; // lookup table mapping (scaled) patch differences to their associated weights uint32_t max_meaningful_diff; // maximum difference considered (if the patch difference is too high we ignore the pixel) NLMeansDSPContext dsp; @@ -307,9 +308,10 @@ static int config_input(AVFilterLink *inlink) s->ii = s->ii_orig + s->ii_lz_32 + 1; // allocate weighted average for every pixel - s->wa_linesize = inlink->w; - s->wa = av_malloc_array(s->wa_linesize, inlink->h * sizeof(*s->wa)); - if (!s->wa) + s->linesize = inlink->w; + s->total_weight = av_malloc_array(inlink->w, inlink->h * sizeof(*s->total_weight)); + s->sum = av_malloc_array(inlink->w, inlink->h * sizeof(*s->sum)); + if (!s->total_weight || !s->sum) return AVERROR(ENOMEM); return 0; @@ -329,7 +331,8 @@ static void compute_weights_line_c(const uint32_t *const iia, const uint32_t *const iid, const uint32_t *const iie, const uint8_t *const src, - struct weighted_avg *wa, + float *total_weight, + float *sum, const float *const weight_lut, int max_meaningful_diff, int startx, int endx) @@ -371,8 +374,8 @@ static void compute_weights_line_c(const uint32_t *const iia, const uint32_t patch_diff_sq = FFMIN(e - d - b + a, max_meaningful_diff); const float weight = weight_lut[patch_diff_sq]; // exp(-patch_diff_sq * s->pdiff_scale) - wa[x].total_weight += weight; - wa[x].sum += weight * src[x]; + total_weight[x] += weight; + sum[x] += weight * src[x]; } } @@ -397,13 +400,14 @@ static int nlmeans_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs for (int y = starty; y < endy; y++) { const uint8_t *const src = td->src + y*src_linesize; - struct weighted_avg *wa = s->wa + y*s->wa_linesize; + float *total_weight = s->total_weight + y*s->linesize; + float *sum = s->sum + y*s->linesize; const uint32_t *const iia = ii; const uint32_t *const iib = ii + dist_b; const uint32_t *const iid = ii + dist_d; const uint32_t *const iie = ii + dist_e; - dsp->compute_weights_line(iia, iib, iid, iie, src, wa, + dsp->compute_weights_line(iia, iib, iid, iie, src, total_weight, sum, weight_lut, max_meaningful_diff, td->startx, td->endx); ii += s->ii_lz_32; @@ -413,19 +417,20 @@ static int nlmeans_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs static void weight_averages(uint8_t *dst, ptrdiff_t dst_linesize, const uint8_t *src, ptrdiff_t src_linesize, - struct weighted_avg *wa, ptrdiff_t wa_linesize, + float *total_weight, float *sum, ptrdiff_t linesize, int w, int h) { for (int y = 0; y < h; y++) { for (int x = 0; x < w; x++) { // Also weight the centered pixel - wa[x].total_weight += 1.f; - wa[x].sum += 1.f * src[x]; - dst[x] = av_clip_uint8(wa[x].sum / wa[x].total_weight + 0.5f); + total_weight[x] += 1.f; + sum[x] += 1.f * src[x]; + dst[x] = av_clip_uint8(sum[x] / total_weight[x] + 0.5f); } dst += dst_linesize; src += src_linesize; - wa += wa_linesize; + total_weight += linesize; + sum += linesize; } } @@ -440,7 +445,8 @@ static int nlmeans_plane(AVFilterContext *ctx, int w, int h, int p, int r, /* focus an integral pointer on the centered image (s1) */ const uint32_t *centered_ii = s->ii + e*s->ii_lz_32 + e; - memset(s->wa, 0, s->wa_linesize * h * sizeof(*s->wa)); + memset(s->total_weight, 0, s->linesize * h * sizeof(*s->total_weight)); + memset(s->sum, 0, s->linesize * h * sizeof(*s->sum)); for (int offy = -r; offy <= r; offy++) { for (int offx = -r; offx <= r; offx++) { @@ -466,7 +472,7 @@ static int nlmeans_plane(AVFilterContext *ctx, int w, int h, int p, int r, } weight_averages(dst, dst_linesize, src, src_linesize, - s->wa, s->wa_linesize, w, h); + s->total_weight, s->sum, s->linesize, w, h); return 0; } @@ -556,7 +562,8 @@ static av_cold void uninit(AVFilterContext *ctx) NLMeansContext *s = ctx->priv; av_freep(&s->weight_lut); av_freep(&s->ii_orig); - av_freep(&s->wa); + av_freep(&s->total_weight); + av_freep(&s->sum); } static const AVFilterPad nlmeans_inputs[] = { diff --git a/libavfilter/vf_nlmeans.h b/libavfilter/vf_nlmeans.h index d0d0056163..cd1ee7c0bf 100644 --- a/libavfilter/vf_nlmeans.h +++ b/libavfilter/vf_nlmeans.h @@ -22,11 +22,6 @@ #include #include -struct weighted_avg { - float total_weight; - float sum; -}; - typedef struct NLMeansDSPContext { void (*compute_safe_ssd_integral_image)(uint32_t *dst, ptrdiff_t dst_linesize_32, const uint8_t *s1, ptrdiff_t linesize1, @@ -37,7 +32,8 @@ typedef struct NLMeansDSPContext { const uint32_t *const iid, const uint32_t *const iie, const uint8_t *const src, - struct weighted_avg *wa, + float *total_weight, + float *sum, const float *const weight_lut, int max_meaningful_diff, int startx, int endx);