From patchwork Fri Oct 29 15:19:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 31255 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a5e:a610:0:0:0:0:0 with SMTP id q16csp1804581ioi; Fri, 29 Oct 2021 08:20:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx05kLcDV2Y/ixNYtbyTvcfbbk/QTySn4F07sRNTKMxBvl52D3C3T+SAn4SoNqxtLvHJgPB X-Received: by 2002:adf:d082:: with SMTP id y2mr6625267wrh.214.1635520800532; Fri, 29 Oct 2021 08:20:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635520800; cv=none; d=google.com; s=arc-20160816; b=JxTegUtOp+KUteLEqZxbvyUpPDhhYraehfDwup3YL+cFHumK2mYgrqMV7WgAaH7Ful VHzIEJ+qQu60fSBcHRGBFbmJnJmvfcZ+65Wad3n4XbJc4gmvPCPK37SDsRHSC+S4DjSJ zS7ouqSP3IfjJbgtHdAYv0w02d6a2NvX6Fo/0DXpqIQLf7rn116EO4LPRgYNEBBTQf3e B8dw1EeejUfLFfKdZUBMlYIVxS25sKqgpHtgBZhEhqSy0RCAUwdHQbmNVSFb9OaAgcCM iKIoOtdjAqXmOM+6W7DOIC44C+y+7xuc5ed1FOR6tjMTVSDZc0f42/N4tsyGVrGay4ke WuYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=onNoMF3w33ayqVup7pdMpuT1Gq3EYblyeJnFiEgXdlw=; b=l0EyTt8SJOndxuRIO20WmuGmnpeacAk7zR0Pqfg5PHHMw/xde0z1lU19uznwPXH1nU yZcQkBTUspzh54uDhPQY83weOW6JUv8dUZTPzAK3s+qSblYfJgqOAnaEi7MbSrQrdMzF ZgA1sMHQcYKisq1diwuDM45ad5nkZjDkqGQpHuEWB+tbsb+Xy2wTwNDkgZLRFHeiccrE 3lWO/Qb77Pg0Le1cjr9MynxXXbBXPi+0XgE0W/xUNPgVyQ3TKCtBrgHwkSLuMEqNwbRR dHZIBU8Im2np/X26Ny5ROCiraI3Z88A8CHuf2jUX/mqUFIt+pSUbQfKg/tlFmaPBfeq/ 3NNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=PwqhsHkI; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id fj2si11483555ejc.153.2021.10.29.08.19.52; Fri, 29 Oct 2021 08:20:00 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=PwqhsHkI; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 02EB468A439; Fri, 29 Oct 2021 18:19:12 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2B70968A8EE for ; Fri, 29 Oct 2021 18:19:05 +0300 (EEST) Received: by mail-ed1-f47.google.com with SMTP id g10so39566063edj.1 for ; Fri, 29 Oct 2021 08:19:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=2oWxsKAZZGtNg2BX2RBCAuqdAuiJ7qws54vrPe4yKZg=; b=PwqhsHkIbw+VuIKKVtlnzBnT1BVbF+O7gItNxYY7baqqJtfETlMd0g6NUgr8zTrUKS nNpj9Xm6cAkNQVowdYL1gfGmcW12HRQ4OxtE8YACaSevTBlv9rn2ZFEQah6KcM3e+B3w BiNv3jJG3P6Aopy5mDI4/AizKW5SwJaGgFxl7n+A4uj7FhTtWDCcmDVyp0NScITevNA5 vmQ+Hj7vRonPghHAJBzg1VelObIB4dTu8yOoXcnTLknZpG0i3BP25Y5cgRYUxcaiW0LG Z7vkx1zfYFd3wFUUZDAGeWGKgOi14mThfeS6a+WZ6m5Mbk4mMD+hCBGn5X9rS0jnEPK2 zL1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2oWxsKAZZGtNg2BX2RBCAuqdAuiJ7qws54vrPe4yKZg=; b=xp67WK10wXD3j+r3v0lHukEUswY2hUsiUGlwK45XWhHZH2wmAssgvVh6vh3IJcHKU5 1Du5eVPaQK7EdAYNUm1XgMPZPcc+FOK6kzGqgIWT+L0CHnNl9MO3UVkOszFplMSX5z2h hmtJL+4H0SNiRfeffX9D4ZhYyhv9iKGbxNfW4hj02asLLS2XACwxjKnpI6I+IogPqrJZ grKKLeRXvaQvspt/iXYUwlW9KXYyyyalTHyZJIbxPAttpAvvg0nPlvLuXQjbtwpdKaI+ lfc6CfBmHvnbOWKzqWPp3RERLfcUZ5AWRU/fj5D6eX/O2uneYCGnIYMBurV0jCxXzmFg x/tw== X-Gm-Message-State: AOAM533MWHKWUSab9VB7nmafso2RyBBA6r70a7W1R3n1qJ12z85jsrvW uxRqKnONytTQT0ZnwH3AQa2ZRS0BXYk= X-Received: by 2002:a17:907:3e85:: with SMTP id hs5mr14344721ejc.234.1635520744660; Fri, 29 Oct 2021 08:19:04 -0700 (PDT) Received: from localhost.localdomain ([212.15.177.28]) by smtp.gmail.com with ESMTPSA id gb3sm3101386ejc.81.2021.10.29.08.19.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Oct 2021 08:19:04 -0700 (PDT) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Fri, 29 Oct 2021 17:19:01 +0200 Message-Id: <20211029151903.1078367-5-onemda@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211029151903.1078367-1-onemda@gmail.com> References: <20211029151903.1078367-1-onemda@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 5/7] avfilter/vf_nlmeans: refactor line processing in preparation for x86 SIMD assembly X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: AtQ4m7yhQ8+Z Signed-off-by: Paul B Mahol --- libavfilter/vf_nlmeans.c | 109 ++++++++++++++++++++++----------------- libavfilter/vf_nlmeans.h | 14 +++++ 2 files changed, 77 insertions(+), 46 deletions(-) diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c index af165c861c..93a14bcf19 100644 --- a/libavfilter/vf_nlmeans.c +++ b/libavfilter/vf_nlmeans.c @@ -38,11 +38,6 @@ #include "vf_nlmeans.h" #include "video.h" -struct weighted_avg { - float total_weight; - float sum; -}; - typedef struct NLMeansContext { const AVClass *class; int nb_planes; @@ -329,6 +324,58 @@ struct thread_data { int p; }; +static void compute_weights_line_c(const uint32_t *const iia, + const uint32_t *const iib, + const uint32_t *const iid, + const uint32_t *const iie, + const uint8_t *const src, + struct weighted_avg *wa, + const float *const weight_lut, + int max_meaningful_diff, + int startx, int endx) +{ + for (int x = startx; x < endx; x++) { + /* + * M is a discrete map where every entry contains the sum of all the entries + * in the rectangle from the top-left origin of M to its coordinate. In the + * following schema, "i" contains the sum of the whole map: + * + * M = +----------+-----------------+----+ + * | | | | + * | | | | + * | a| b| c| + * +----------+-----------------+----+ + * | | | | + * | | | | + * | | X | | + * | | | | + * | d| e| f| + * +----------+-----------------+----+ + * | | | | + * | g| h| i| + * +----------+-----------------+----+ + * + * The sum of the X box can be calculated with: + * X = e-d-b+a + * + * See https://en.wikipedia.org/wiki/Summed_area_table + * + * The compute*_ssd functions compute the integral image M where every entry + * contains the sum of the squared difference of every corresponding pixels of + * two input planes of the same size as M. + */ + const uint32_t a = iia[x]; + const uint32_t b = iib[x]; + const uint32_t d = iid[x]; + const uint32_t e = iie[x]; + const uint32_t patch_diff_sq = FFMIN(e - d - b + a, max_meaningful_diff); + const float weight = weight_lut[patch_diff_sq]; // exp(-patch_diff_sq * s->pdiff_scale) + + wa[x].total_weight += weight; + wa[x].sum += weight * src[x]; + } +} + static int nlmeans_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) { NLMeansContext *s = ctx->priv; @@ -346,50 +393,19 @@ static int nlmeans_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs const int dist_d = dist_b * s->ii_lz_32; const int dist_e = dist_d + dist_b; const float *const weight_lut = s->weight_lut; + NLMeansDSPContext *dsp = &s->dsp; for (int y = starty; y < endy; y++) { - const uint8_t *src = td->src + y*src_linesize; + const uint8_t *const src = td->src + y*src_linesize; struct weighted_avg *wa = s->wa + y*s->wa_linesize; - for (int x = td->startx; x < td->endx; x++) { - /* - * M is a discrete map where every entry contains the sum of all the entries - * in the rectangle from the top-left origin of M to its coordinate. In the - * following schema, "i" contains the sum of the whole map: - * - * M = +----------+-----------------+----+ - * | | | | - * | | | | - * | a| b| c| - * +----------+-----------------+----+ - * | | | | - * | | | | - * | | X | | - * | | | | - * | d| e| f| - * +----------+-----------------+----+ - * | | | | - * | g| h| i| - * +----------+-----------------+----+ - * - * The sum of the X box can be calculated with: - * X = e-d-b+a - * - * See https://en.wikipedia.org/wiki/Summed_area_table - * - * The compute*_ssd functions compute the integral image M where every entry - * contains the sum of the squared difference of every corresponding pixels of - * two input planes of the same size as M. - */ - const uint32_t a = ii[x]; - const uint32_t b = ii[x + dist_b]; - const uint32_t d = ii[x + dist_d]; - const uint32_t e = ii[x + dist_e]; - const uint32_t patch_diff_sq = FFMIN(e - d - b + a, max_meaningful_diff); - const float weight = weight_lut[patch_diff_sq]; // exp(-patch_diff_sq * s->pdiff_scale) - - wa[x].total_weight += weight; - wa[x].sum += weight * src[x]; - } + const uint32_t *const iia = ii; + const uint32_t *const iib = ii + dist_b; + const uint32_t *const iid = ii + dist_d; + const uint32_t *const iie = ii + dist_e; + + dsp->compute_weights_line(iia, iib, iid, iie, src, wa, + weight_lut, max_meaningful_diff, + td->startx, td->endx); ii += s->ii_lz_32; } return 0; @@ -493,6 +509,7 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *in) void ff_nlmeans_init(NLMeansDSPContext *dsp) { dsp->compute_safe_ssd_integral_image = compute_safe_ssd_integral_image_c; + dsp->compute_weights_line = compute_weights_line_c; if (ARCH_AARCH64) ff_nlmeans_init_aarch64(dsp); diff --git a/libavfilter/vf_nlmeans.h b/libavfilter/vf_nlmeans.h index 0a9aab2928..d0d0056163 100644 --- a/libavfilter/vf_nlmeans.h +++ b/libavfilter/vf_nlmeans.h @@ -22,11 +22,25 @@ #include #include +struct weighted_avg { + float total_weight; + float sum; +}; + typedef struct NLMeansDSPContext { void (*compute_safe_ssd_integral_image)(uint32_t *dst, ptrdiff_t dst_linesize_32, const uint8_t *s1, ptrdiff_t linesize1, const uint8_t *s2, ptrdiff_t linesize2, int w, int h); + void (*compute_weights_line)(const uint32_t *const iia, + const uint32_t *const iib, + const uint32_t *const iid, + const uint32_t *const iie, + const uint8_t *const src, + struct weighted_avg *wa, + const float *const weight_lut, + int max_meaningful_diff, + int startx, int endx); } NLMeansDSPContext; void ff_nlmeans_init(NLMeansDSPContext *dsp);