From patchwork Thu Oct 3 01:53:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jun Zhao X-Patchwork-Id: 15491 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id CC0B744A335 for ; Thu, 3 Oct 2019 05:49:59 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A2717687F2C; Thu, 3 Oct 2019 05:49:59 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pg1-f195.google.com (mail-pg1-f195.google.com [209.85.215.195]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A14826805A9 for ; Thu, 3 Oct 2019 05:49:52 +0300 (EEST) Received: by mail-pg1-f195.google.com with SMTP id c17so812214pgg.4 for ; Wed, 02 Oct 2019 19:49:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=jJaazisythJVdfTyVR0lvn4Fj6hlGnUrdJp+jvZHZT8=; b=beyJ6xmQEblVAge086c9dgjKdJ672tsE7jppjQO9GVaQcUrTVWTKRQ8pI7DfRyIPVE N1WdEwXkwDqrf55SubodpRxnw6W2jliiHgCq5stDYJSGOouVzo6ftjPWzl42atV1N7xD HmT6RJMyCAvQdEXw2hrZ8nNkbrVimBLebEbL9DomhEFMc420Y7UPFT/S9xg2401BQU8n mLzE+RBNEzY2wAa8Ggbj5k+gDVGXUxZujgN9T4OsLQnDw6sYfq4FNuFUrufWY06UuCjt tfaHqMly5OM9yYMK2ncat2OhL8eXEMYCguzkorxp3xn2QxNrP83bpkT1uvYXhwu2AUN0 LKfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=jJaazisythJVdfTyVR0lvn4Fj6hlGnUrdJp+jvZHZT8=; b=cK5oPLFcYfp5V1WrjNAtEStYZJSHBFyXeHch2ZN5gk30xaemedklhFgWonrHy0GJJE 7xv5HODPShIj+fLZ3QjiIO0EzRwDfcrRZ27ICrZoCDFQzLWxVkKR0fyqWCv+fhE5E7vn 9WUi2X0Tnydrsl9iojNmm6TC+s+Z7KU06EHzw2hdB8UHG85ChFI7Vf2AE32OrP7x6/uk ivv0dqIqSPggEjfp78WwKTOBoKBXnhLe7smQzYUxaWQHfRV5p8z9LR1O10RjOTITBeHr iQeTPazbnd9JyH+TQowSGJGVhU08sRr7e23LyAkswvAMW/JBxrymkL5VrIJtRrMnvt02 eO3g== X-Gm-Message-State: APjAAAVugRamBvsBi5PkbtoIJ+2KWh48LVVQXW0gK+wBhn6d3jR3Uu2X GPGshGrlmVcB3zcjbO2gf4DZPJKK X-Google-Smtp-Source: APXvYqyhitLWFZLNMOh9OeBn4Mv53hHw5LJ3LWSALpPv7KVfA7mavmjy9YMz/FE86W86Q/eSlJN7kQ== X-Received: by 2002:a63:5050:: with SMTP id q16mr6778330pgl.451.1570067604648; Wed, 02 Oct 2019 18:53:24 -0700 (PDT) Received: from localhost.localdomain ([47.90.47.25]) by smtp.gmail.com with ESMTPSA id p1sm712151pfb.112.2019.10.02.18.53.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 02 Oct 2019 18:53:24 -0700 (PDT) From: Jun Zhao To: ffmpeg-devel@ffmpeg.org Date: Thu, 3 Oct 2019 09:53:16 +0800 Message-Id: <1570067596-19224-2-git-send-email-mypopydev@gmail.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1570067596-19224-1-git-send-email-mypopydev@gmail.com> References: <1570067596-19224-1-git-send-email-mypopydev@gmail.com> Subject: [FFmpeg-devel] [PATCH V1 2/2] lavfi/hqdn3d: add slice thread optionmation X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Jun Zhao MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Jun Zhao Enabled one thread per planar, used the test command for 1080P video (YUV420P format) as follow: ffmpeg -i 1080p.mp4 -an -vf hqdn3d -f null /dev/nul This optimization improved the performance about 30% in 1080P YUV420P case (from 110fps to 143fps), also pass the framemd5 check and FATE. Signed-off-by: Jun Zhao --- libavfilter/vf_hqdn3d.c | 55 +++++++++++++++++++++++++++++++++------------- libavfilter/vf_hqdn3d.h | 2 +- 2 files changed, 40 insertions(+), 17 deletions(-) diff --git a/libavfilter/vf_hqdn3d.c b/libavfilter/vf_hqdn3d.c index d6c14bb..1530c40 100644 --- a/libavfilter/vf_hqdn3d.c +++ b/libavfilter/vf_hqdn3d.c @@ -223,7 +223,9 @@ static av_cold void uninit(AVFilterContext *ctx) av_freep(&s->coefs[1]); av_freep(&s->coefs[2]); av_freep(&s->coefs[3]); - av_freep(&s->line); + av_freep(&s->line[0]); + av_freep(&s->line[1]); + av_freep(&s->line[2]); av_freep(&s->frame_prev[0]); av_freep(&s->frame_prev[1]); av_freep(&s->frame_prev[2]); @@ -271,9 +273,11 @@ static int config_input(AVFilterLink *inlink) s->vsub = desc->log2_chroma_h; s->depth = desc->comp[0].depth; - s->line = av_malloc_array(inlink->w, sizeof(*s->line)); - if (!s->line) - return AVERROR(ENOMEM); + for (i = 0; i < 3; i++) { + s->line[i] = av_malloc_array(inlink->w, sizeof(*s->line[i])); + if (!s->line[i]) + return AVERROR(ENOMEM); + } for (i = 0; i < 4; i++) { s->coefs[i] = precalc_coefs(s->strength[i], s->depth); @@ -287,14 +291,38 @@ static int config_input(AVFilterLink *inlink) return 0; } +struct ThreadData { + AVFrame *in, *out; + int direct; +}; + +static int do_denoise(AVFilterContext *ctx, void *data, int job_nr, int n_jobs) +{ + struct ThreadData *td = data; + HQDN3DContext *s = ctx->priv; + AVFrame *out = td->out; + AVFrame *in = td->in; + int direct = td->direct; + + denoise(s, in->data[job_nr], out->data[job_nr], + s->line[job_nr], &s->frame_prev[job_nr], + AV_CEIL_RSHIFT(in->width, (!!job_nr * s->hsub)), + AV_CEIL_RSHIFT(in->height, (!!job_nr * s->vsub)), + in->linesize[job_nr], out->linesize[job_nr], + s->coefs[job_nr ? CHROMA_SPATIAL : LUMA_SPATIAL], + s->coefs[job_nr ? CHROMA_TMP : LUMA_TMP]); + + return 0; +} + static int filter_frame(AVFilterLink *inlink, AVFrame *in) { AVFilterContext *ctx = inlink->dst; - HQDN3DContext *s = ctx->priv; AVFilterLink *outlink = ctx->outputs[0]; AVFrame *out; - int c, direct = av_frame_is_writable(in) && !ctx->is_disabled; + int direct = av_frame_is_writable(in) && !ctx->is_disabled; + struct ThreadData td; if (direct) { out = in; @@ -308,15 +336,10 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *in) av_frame_copy_props(out, in); } - for (c = 0; c < 3; c++) { - denoise(s, in->data[c], out->data[c], - s->line, &s->frame_prev[c], - AV_CEIL_RSHIFT(in->width, (!!c * s->hsub)), - AV_CEIL_RSHIFT(in->height, (!!c * s->vsub)), - in->linesize[c], out->linesize[c], - s->coefs[c ? CHROMA_SPATIAL : LUMA_SPATIAL], - s->coefs[c ? CHROMA_TMP : LUMA_TMP]); - } + td.in = in; + td.out = out; + td.direct = direct; + ctx->internal->execute(ctx, do_denoise, &td, NULL, 3); if (ctx->is_disabled) { av_frame_free(&out); @@ -370,5 +393,5 @@ AVFilter ff_vf_hqdn3d = { .query_formats = query_formats, .inputs = avfilter_vf_hqdn3d_inputs, .outputs = avfilter_vf_hqdn3d_outputs, - .flags = AVFILTER_FLAG_SUPPORT_TIMELINE_INTERNAL, + .flags = AVFILTER_FLAG_SUPPORT_TIMELINE_INTERNAL | AVFILTER_FLAG_SLICE_THREADS, }; diff --git a/libavfilter/vf_hqdn3d.h b/libavfilter/vf_hqdn3d.h index 03a79a1..3279bbc 100644 --- a/libavfilter/vf_hqdn3d.h +++ b/libavfilter/vf_hqdn3d.h @@ -31,7 +31,7 @@ typedef struct HQDN3DContext { const AVClass *class; int16_t *coefs[4]; - uint16_t *line; + uint16_t *line[3]; uint16_t *frame_prev[3]; double strength[4]; int hsub, vsub;