From patchwork Sat Sep 25 09:19:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 30565 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:6506:0:0:0:0:0 with SMTP id z6csp2309960iob; Sat, 25 Sep 2021 02:19:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyXaeeuw0V7OatmF2RsoFof9JRNBFs+q9rmyAjI8XhDVVVg9JHVMopi9h7HnRTIrISog9AG X-Received: by 2002:a05:6402:493:: with SMTP id k19mr10213488edv.386.1632561578580; Sat, 25 Sep 2021 02:19:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632561578; cv=none; d=google.com; s=arc-20160816; b=O0J9+v14+m07DCNd7uWx9n8imvBgskN+hnUCnzvDrLSm15X9TU1BCZ89YYhx/5bYCY CB5yy2+r0jb/yJ/dH63fyxAuXx7gYJ0R4XZFw09cWzdJneJ1hoKygLiDnUkE8o9dZzJ6 XgVXYpaxm7FupW7N/iw09GmiOaUFqWmePmxXXrXTVx0LY19CMKpa249yYK/vRQToa9vY iwinjfTxPtX8yiNrXXQR/DTiMMd3YmLNQ95ZfDi+g8S2cTtRzFx9hSRJ2LKXQfNCTTuH V2gbGAiiXOZ/6MsXVKD3CcuAlSenKci2PomT3Auoi2+tHT/qRsIlKFGQX0yVIelk5VMP Y4zQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:message-id:date:to:from:dkim-signature :delivered-to; bh=yuJD6YgLh4IftCzAtmf1Zso8lf0alIP5cf3FtGWXJFc=; b=Ly7WfVo33f9q/fMdfnYQAhxw0TvSCMlpr2yulpO2sYSE+Z/QS3mBsUNfGeGhLb7xcl z/0i1/zx9hqU+evx1IjuvQexy3q7uux6Ou9oJHQQC86aKMEKSUWEWJrLrPvFP4K+Zjdb qedVJH4RDJwAQ9UfAJeqT76v5Re6XRiV4PIVAuUELvrn5qoZDF9Nv4yeDn4EMOxpfkmU xULcL3gcKwKf4qdC+Z7HEXqTONJdR0MQhMCzOftGU8ZqAo/6IN+PA6nBH4ErBVCW42cz TuVFvLmXAiEKlDfT3o5M6JVuXLWFRmN532KkSJwh5y2YcfJ0E6g4pXPUuF3uDfzgqPPR kZxg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=jOmCkDWr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id c24si12387919ejj.646.2021.09.25.02.19.37; Sat, 25 Sep 2021 02:19:38 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=jOmCkDWr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A736E68A776; Sat, 25 Sep 2021 12:19:33 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C4BA068A755 for ; Sat, 25 Sep 2021 12:19:26 +0300 (EEST) Received: by mail-wr1-f48.google.com with SMTP id t8so34784774wrq.4 for ; Sat, 25 Sep 2021 02:19:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:subject:date:message-id; bh=wETr7kKjDx5vbnbpJJEFOHjsXGTMaNkWlEgv8+h6R5E=; b=jOmCkDWrE7xb8JEfpzxEkPsbW2tY+1RBCdKX3vzqz6O3Lg9z2gazvRjbQu0hys+9fL LUnCH1rVLbm/eGwfGmQ+ux8m94iH5wJ3sKhmA7CR+FJQvsFoOMzHbuzA0ya1Y9Z9mmZj ACzs0MMnw3Dr/iQYHTv0gLpPEGUM7j1PcCYkRBS78Bih8eVEtVPLrnNcm35Q1Xa6EDWZ tGyktqIcfjtvfOJ47h3+OscnrTR5v399W90gD12QHbyA1Mtaw3BquWoXaiE9xUO1xLWN hwNRBQDONwo9J8/GoI5vEllQgzKlxSFHtolUpbSdPi3HjIVzwTSX3q6HsMg8bBt3JGf2 nkQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id; bh=wETr7kKjDx5vbnbpJJEFOHjsXGTMaNkWlEgv8+h6R5E=; b=tyOP0KcNLlLvzsWIfIJ5pcqIxrR95Smd0yLwh3qLx6tN8sK4Q9dpDk/YfEy22HGSbz v17BG+dkdQoY0C1B5qK7dU7S/gnBSD530WY2wYs5Dmj4y9j2LD7iCFHl6gtYitbUtnMu EyWlSYxLEVpAJMhKdUcy0M6oTAq00ziRkJaKlwFYPK3vMgdvtf9D9ZUiMNdBOI/sY0Lz uLS03rVGwKT6qTNs97Co0KEQ8DNGqv16faAVV7Vyd+eo+VID23rj60LbyN00q1DWUCNN uRwtKD7kaNo+S8QFGLZGeP05Ry0wtAQOVKMa96PZ+sZHomeaZ5NA7HC/c9W7uOg8Zm7z ZVeg== X-Gm-Message-State: AOAM530Gl067eg8M7Lpe4KEKmWRvuW1ofU7unjCgUbSV6SGwMXj2bLrN yBCH/n8g0rVOC9F7T5Y6RIU8m+B/GfI= X-Received: by 2002:a5d:4243:: with SMTP id s3mr401335wrr.216.1632561566025; Sat, 25 Sep 2021 02:19:26 -0700 (PDT) Received: from localhost.localdomain ([95.168.121.82]) by smtp.gmail.com with ESMTPSA id c8sm10811110wru.30.2021.09.25.02.19.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 25 Sep 2021 02:19:25 -0700 (PDT) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Sat, 25 Sep 2021 11:19:17 +0200 Message-Id: <20210925091917.25916-1-onemda@gmail.com> X-Mailer: git-send-email 2.17.1 Subject: [FFmpeg-devel] [PATCH] avfilter/vf_avgblur: switch to faster algorithm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 5XasoYCT504b Signed-off-by: Paul B Mahol --- libavfilter/vf_avgblur.c | 311 +++++++++++++++++++++------------------ 1 file changed, 171 insertions(+), 140 deletions(-) diff --git a/libavfilter/vf_avgblur.c b/libavfilter/vf_avgblur.c index 3e222a43fa..a838285bb4 100644 --- a/libavfilter/vf_avgblur.c +++ b/libavfilter/vf_avgblur.c @@ -20,6 +20,7 @@ * SOFTWARE. */ +#include "libavutil/avassert.h" #include "libavutil/imgutils.h" #include "libavutil/opt.h" #include "libavutil/pixdesc.h" @@ -36,13 +37,15 @@ typedef struct AverageBlurContext { int planes; int depth; + int max; + int area; int planewidth[4]; int planeheight[4]; - float *buffer; + void *buffer; + uint16_t lut[256 * 256 * 256]; int nb_planes; - int (*filter_horizontally)(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs); - int (*filter_vertically)(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs); + int (*filter[2])(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs); } AverageBlurContext; #define OFFSET(x) offsetof(AverageBlurContext, x) @@ -60,124 +63,138 @@ AVFILTER_DEFINE_CLASS(avgblur); typedef struct ThreadData { int height; int width; - uint8_t *ptr; - int linesize; + const void *ptr; + void *dptr; + int linesize, dlinesize; } ThreadData; -#define HORIZONTAL_FILTER(name, type) \ -static int filter_horizontally_##name(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)\ -{ \ - AverageBlurContext *s = ctx->priv; \ - ThreadData *td = arg; \ - const int height = td->height; \ - const int width = td->width; \ - const int slice_start = (height * jobnr ) / nb_jobs; \ - const int slice_end = (height * (jobnr+1)) / nb_jobs; \ - const int radius = FFMIN(s->radius, width / 2); \ - const int linesize = td->linesize / sizeof(type); \ - float *buffer = s->buffer; \ - const type *src; \ - float *ptr; \ - int y, x; \ - \ - /* Filter horizontally along each row */ \ - for (y = slice_start; y < slice_end; y++) { \ - float acc = 0; \ - int count = 0; \ - \ - src = (const type *)td->ptr + linesize * y; \ - ptr = buffer + width * y; \ - \ - for (x = 0; x < radius; x++) { \ - acc += src[x]; \ - } \ - count += radius; \ - \ - for (x = 0; x <= radius; x++) { \ - acc += src[x + radius]; \ - count++; \ - ptr[x] = acc / count; \ - } \ - \ - for (; x < width - radius; x++) { \ - acc += src[x + radius] - src[x - radius - 1]; \ - ptr[x] = acc / count; \ - } \ - \ - for (; x < width; x++) { \ - acc -= src[x - radius]; \ - count--; \ - ptr[x] = acc / count; \ - } \ - } \ - \ - return 0; \ +#define LUT_DIV(sum, area) (lut[(sum)]) +#define SLOW_DIV(sum, area) ((sum) / (area)) + +#define FILTER(name, type, btype, lutunused, areaunused, lutdiv) \ +static int filter_##name(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) \ +{ \ + AverageBlurContext *s = ctx->priv; \ + ThreadData *td = arg; \ + areaunused const int area = s->area; \ + lutunused const uint16_t *lut = s->lut; \ + const int size_w = s->radius; \ + const int size_h = s->radiusV; \ + btype *col_sum = (btype *)s->buffer + size_w; \ + const int dlinesize = td->dlinesize / sizeof(type); \ + const int linesize = td->linesize / sizeof(type); \ + const int height = td->height; \ + const int width = td->width; \ + const type *src = td->ptr; \ + type *dst = td->dptr; \ + btype sum = 0; \ + \ + for (int x = -size_w; x < 0; x++) { \ + sum = src[0] * size_h; \ + for (int y = 0; y <= size_h; y++) \ + sum += src[y * linesize]; \ + av_assert2(sum >= 0); \ + col_sum[x] = sum; \ + } \ + \ + for (int x = 0; x < width; x++) { \ + sum = src[x] * size_h; \ + for (int y = 0; y <= size_h; y++) \ + sum += src[x + y * linesize]; \ + av_assert2(sum >= 0); \ + col_sum[x] = sum; \ + } \ + \ + for (int x = width; x < width + size_w; x++) { \ + sum = src[width - 1] * size_h; \ + for (int y = 0; y <= size_h; y++) \ + sum += src[width - 1 + y * linesize]; \ + av_assert2(sum >= 0); \ + col_sum[x] = sum; \ + } \ + \ + sum = 0; \ + for (int x = -size_w; x <= size_w; x++) \ + sum += col_sum[x]; \ + av_assert2(sum >= 0); \ + dst[0] = lutdiv(sum, area); \ + \ + for (int x = 1; x < width; x++) { \ + sum = sum - col_sum[x - size_w - 1] + col_sum[x + size_w]; \ + av_assert2(sum >= 0); \ + dst[x] = lutdiv(sum, area); \ + } \ + \ + src = td->ptr; \ + src += linesize; \ + dst += dlinesize; \ + \ + for (int y = 1; y < height; y++) { \ + const int syp = FFMIN(size_h, height - y - 1) * linesize; \ + const int syn = FFMIN(y, size_h + 1) * linesize; \ + \ + sum = 0; \ + \ + for (int x = -size_w; x < 0; x++) \ + col_sum[x] += src[0 + syp] - src[0 - syn]; \ + \ + for (int x = 0; x < width; x++) \ + col_sum[x] += src[x + syp] - src[x - syn]; \ + \ + for (int x = width; x < width + size_w; x++) \ + col_sum[x] += src[width - 1 + syp] - src[width - 1 - syn]; \ + \ + for (int x = -size_w; x <= size_w; x++) \ + sum += col_sum[x]; \ + av_assert2(sum >= 0); \ + dst[0] = lutdiv(sum, area); \ + \ + for (int x = 1; x < width; x++) { \ + sum = sum - col_sum[x - size_w - 1] + col_sum[x + size_w]; \ + av_assert2(sum >= 0); \ + dst[x] = lutdiv(sum, area); \ + } \ + \ + src += linesize; \ + dst += dlinesize; \ + } \ + \ + return 0; \ } -HORIZONTAL_FILTER(8, uint8_t) -HORIZONTAL_FILTER(16, uint16_t) - -#define VERTICAL_FILTER(name, type) \ -static int filter_vertically_##name(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) \ -{ \ - AverageBlurContext *s = ctx->priv; \ - ThreadData *td = arg; \ - const int height = td->height; \ - const int width = td->width; \ - const int slice_start = (width * jobnr ) / nb_jobs; \ - const int slice_end = (width * (jobnr+1)) / nb_jobs; \ - const int radius = FFMIN(s->radiusV, height / 2); \ - const int linesize = td->linesize / sizeof(type); \ - type *buffer = (type *)td->ptr; \ - const float *src; \ - type *ptr; \ - int i, x; \ - \ - /* Filter vertically along each column */ \ - for (x = slice_start; x < slice_end; x++) { \ - float acc = 0; \ - int count = 0; \ - \ - src = s->buffer + x; \ - \ - for (i = 0; i < radius; i++) { \ - acc += src[0]; \ - src += width; \ - } \ - count += radius; \ - \ - src = s->buffer + x; \ - ptr = buffer + x; \ - for (i = 0; i + radius < height && i <= radius; i++) { \ - acc += src[(i + radius) * width]; \ - count++; \ - ptr[i * linesize] = acc / count; \ - } \ - \ - for (; i < height - radius; i++) { \ - acc += src[(i + radius) * width] - src[(i - radius - 1) * width]; \ - ptr[i * linesize] = acc / count; \ - } \ - \ - for (; i < height; i++) { \ - acc -= src[(i - radius) * width]; \ - count--; \ - ptr[i * linesize] = acc / count; \ - } \ - } \ - \ - return 0; \ -} +FILTER(lut8, uint8_t, int32_t, , av_unused, LUT_DIV) +FILTER(lut16, uint16_t, int64_t, , av_unused, LUT_DIV) + +FILTER(slow8, uint8_t, int32_t, av_unused, , SLOW_DIV) +FILTER(slow16, uint16_t, int64_t, av_unused, , SLOW_DIV) + +static void build_lut(AVFilterContext *ctx, int max) +{ + AverageBlurContext *s = ctx->priv; + const int area = (2 * s->radiusV + 1) * (2 * s->radius + 1); + + s->area = area; + if (max * area >= FF_ARRAY_ELEMS(s->lut)) + return; + + for (int i = 0, j = 0, k = 0; i < max * area; i++, j++) { + if (j == area) { + k++; + j = 0; + } -VERTICAL_FILTER(8, uint8_t) -VERTICAL_FILTER(16, uint16_t) + s->lut[i] = k; + } +} static int config_input(AVFilterLink *inlink) { + AVFilterContext *ctx = inlink->dst; const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(inlink->format); - AverageBlurContext *s = inlink->dst->priv; + AverageBlurContext *s = ctx->priv; s->depth = desc->comp[0].depth; + s->max = 1 << s->depth; s->planewidth[1] = s->planewidth[2] = AV_CEIL_RSHIFT(inlink->w, desc->log2_chroma_w); s->planewidth[0] = s->planewidth[3] = inlink->w; s->planeheight[1] = s->planeheight[2] = AV_CEIL_RSHIFT(inlink->h, desc->log2_chroma_h); @@ -185,21 +202,20 @@ static int config_input(AVFilterLink *inlink) s->nb_planes = av_pix_fmt_count_planes(inlink->format); - s->buffer = av_malloc_array(inlink->w, inlink->h * sizeof(*s->buffer)); + s->buffer = av_calloc(inlink->w + (1024 * 2 + 1), 4 * ((s->depth + 7) / 8)); if (!s->buffer) return AVERROR(ENOMEM); - if (s->radiusV <= 0) { + if (s->radiusV <= 0) s->radiusV = s->radius; - } - if (s->depth == 8) { - s->filter_horizontally = filter_horizontally_8; - s->filter_vertically = filter_vertically_8; - } else { - s->filter_horizontally = filter_horizontally_16; - s->filter_vertically = filter_vertically_16; - } + s->filter[0] = s->depth <= 8 ? filter_lut8 : filter_lut16; + s->filter[1] = s->depth <= 8 ? filter_slow8 : filter_slow16; + + s->radius = FFMIN(s->planewidth[1] / 2, s->radius); + s->radiusV = FFMIN(s->planeheight[1] / 2, s->radiusV); + + build_lut(ctx, s->max); return 0; } @@ -209,19 +225,16 @@ static void averageiir2d(AVFilterContext *ctx, AVFrame *in, AVFrame *out, int pl AverageBlurContext *s = ctx->priv; const int width = s->planewidth[plane]; const int height = s->planeheight[plane]; - const int nb_threads = ff_filter_get_nb_threads(ctx); + const int slow = (s->max * s->area) >= FF_ARRAY_ELEMS(s->lut); ThreadData td; td.width = width; td.height = height; td.ptr = in->data[plane]; td.linesize = in->linesize[plane]; - ff_filter_execute(ctx, s->filter_horizontally, &td, - NULL, FFMIN(height, nb_threads)); - td.ptr = out->data[plane]; - td.linesize = out->linesize[plane]; - ff_filter_execute(ctx, s->filter_vertically, &td, - NULL, FFMIN(width, nb_threads)); + td.dptr = out->data[plane]; + td.dlinesize = out->linesize[plane]; + s->filter[slow](ctx, &td, 0, 0); } static int query_formats(AVFilterContext *ctx) @@ -259,16 +272,12 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *in) AVFrame *out; int plane; - if (av_frame_is_writable(in)) { - out = in; - } else { - out = ff_get_video_buffer(outlink, outlink->w, outlink->h); - if (!out) { - av_frame_free(&in); - return AVERROR(ENOMEM); - } - av_frame_copy_props(out, in); + out = ff_get_video_buffer(outlink, outlink->w, outlink->h); + if (!out) { + av_frame_free(&in); + return AVERROR(ENOMEM); } + av_frame_copy_props(out, in); for (plane = 0; plane < s->nb_planes; plane++) { const int height = s->planeheight[plane]; @@ -285,11 +294,33 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *in) averageiir2d(ctx, in, out, plane); } - if (out != in) - av_frame_free(&in); + av_frame_free(&in); return ff_filter_frame(outlink, out); } +static int process_command(AVFilterContext *ctx, const char *cmd, const char *args, + char *res, int res_len, int flags) +{ + AverageBlurContext *s = ctx->priv; + const int area = s->area; + int ret; + + ret = ff_filter_process_command(ctx, cmd, args, res, res_len, flags); + if (ret < 0) + return ret; + + if (s->radiusV <= 0) + s->radiusV = s->radius; + + s->radius = FFMIN(s->planewidth[1] / 2, s->radius); + s->radiusV = FFMIN(s->planeheight[1] / 2, s->radiusV); + + if (area != (2 * s->radiusV + 1) * (2 * s->radius + 1)) + build_lut(ctx, s->max); + + return 0; +} + static av_cold void uninit(AVFilterContext *ctx) { AverageBlurContext *s = ctx->priv; @@ -322,6 +353,6 @@ const AVFilter ff_vf_avgblur = { .query_formats = query_formats, FILTER_INPUTS(avgblur_inputs), FILTER_OUTPUTS(avgblur_outputs), - .flags = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC | AVFILTER_FLAG_SLICE_THREADS, - .process_command = ff_filter_process_command, + .flags = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC, + .process_command = process_command, };