From patchwork Mon May 31 07:55:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Khirnov X-Patchwork-Id: 28010 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:b214:0:0:0:0:0 with SMTP id b20csp2657078iof; Mon, 31 May 2021 00:57:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwCQ4O4CnuHWIw0SA9u3CCmo/cF4NJSAHC+O9SKjAL+lscAXq8h8APa2OFWB4x9fY/1DPIy X-Received: by 2002:a05:6402:2207:: with SMTP id cq7mr23587345edb.216.1622447870689; Mon, 31 May 2021 00:57:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622447870; cv=none; d=google.com; s=arc-20160816; b=FlYjdsaRS67khdKFd+w7Gn6GKJ9hIyLzpVcwuigUbXb/MhWpHffMUSCUOAoF8K7ZIt FD2ZQhVa9iYijdRxWrjtIP3ppeA5ZRpJAiYUroVu6fonrh/JZKZh4s0XPReeOMYDIgW4 hYR/0Mb1OJwoOEOWF0nMKCXpkvmwOKawdh4BSOTB531MGFtJSh2CdDQMG0y762h0gGzj IveBSgMTFuQJzyiIllhTrne+WvO3domH97uNYcwYh/LKenM8IDsG8uCmsgIvn99nUSRz 5EJnjKqMLqYNs43UcC/LOb+dSSS+Zo8PrLyfu46pLKk7VjI5AH+jOBolf+gKA2lcG4xY 900w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=J6cyvZnErla3T8yIUaPW/O9wtedl2r/FSYAN/eOXHWM=; b=nSpO9hDrTa3nXC4yLt5QELLgTwzLh2KRQmG6fO5n0c5UHkMN+6mgRXP7op2/hJ6Yd8 Wnl32/VrG/eTBXiegihIVACkhy2jcGDw2aPMruDimNgZFQ9wo0Z/rWwLpjlQcPx+vmn9 dncZpDXbchwM38Tuqifxck0yAnpV+9fWXD5TI4K3wsvt5HBHsdf/qga5zgFhBd/z9fjv qVy9hI8ZeBd60JO8TbZtr8fQ/OoM7+FqmIjyf1PmQ0Bz/ydQ8M178VvQZlpTOSW1yZ0C WyxKwvwqIOz7jZPbxbJNRPPTA639yNmSG6HtRX4pZm54Y+Hadx5pWSrgDY2shtmINrZO HQoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id kf11si8327596ejc.490.2021.05.31.00.57.50; Mon, 31 May 2021 00:57:50 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8914E689953; Mon, 31 May 2021 10:55:44 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail0.khirnov.net (red.khirnov.net [176.97.15.12]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2D2E568813A for ; Mon, 31 May 2021 10:55:29 +0300 (EEST) Received: from localhost (localhost [IPv6:::1]) by mail0.khirnov.net (Postfix) with ESMTP id 5232E2406A2 for ; Mon, 31 May 2021 09:55:27 +0200 (CEST) Received: from mail0.khirnov.net ([IPv6:::1]) by localhost (mail0.khirnov.net [IPv6:::1]) (amavisd-new, port 10024) with ESMTP id z1nd8fRBl6Fl for ; Mon, 31 May 2021 09:55:26 +0200 (CEST) Received: from libav.khirnov.net (libav.khirnov.net [IPv6:2a00:c500:561:201::7]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "libav.khirnov.net", Issuer "smtp.khirnov.net SMTP CA" (verified OK)) by mail0.khirnov.net (Postfix) with ESMTPS id 4FDE12406A0 for ; Mon, 31 May 2021 09:55:22 +0200 (CEST) Received: by libav.khirnov.net (Postfix, from userid 1000) id 396DF3A1908; Mon, 31 May 2021 09:55:21 +0200 (CEST) From: Anton Khirnov To: ffmpeg-devel@ffmpeg.org Date: Mon, 31 May 2021 09:55:15 +0200 Message-Id: <20210531075515.19544-24-anton@khirnov.net> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210531075515.19544-1-anton@khirnov.net> References: <20210531075515.19544-1-anton@khirnov.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 24/24] lavfi/vf_scale: implement slice threading X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: q7jTiG/oDhhO --- libavfilter/vf_scale.c | 182 +++++++++++++++++++++++++++++++---------- 1 file changed, 141 insertions(+), 41 deletions(-) diff --git a/libavfilter/vf_scale.c b/libavfilter/vf_scale.c index cdd7c4da0d..87317393bd 100644 --- a/libavfilter/vf_scale.c +++ b/libavfilter/vf_scale.c @@ -106,8 +106,16 @@ enum EvalMode { typedef struct ScaleContext { const AVClass *class; - struct SwsContext *sws; ///< software scaler context - struct SwsContext *isws[2]; ///< software scaler context for interlaced material + + /** + * Scaler contexts. + * [0] - progressive + * [1/2] - top/bottom fields + */ + struct SwsContext *(*scalers)[3]; + unsigned int nb_scalers; + int *scaler_res; + AVDictionary *opts; /** @@ -122,6 +130,7 @@ typedef struct ScaleContext { double param[2]; // sws params int hsub, vsub; ///< chroma subsampling + int ohsub, ovsub; ///< output chroma subsampling int slice_y; ///< top of current output slice int input_is_pal; ///< set to 1 if the input format is paletted int output_is_pal; ///< set to 1 if the output format is paletted @@ -153,6 +162,7 @@ typedef struct ScaleContext { int eval_mode; ///< expression evaluation mode + int passthrough; } ScaleContext; const AVFilter ff_vf_scale2ref; @@ -330,13 +340,11 @@ static av_cold int init_dict(AVFilterContext *ctx, AVDictionary **opts) static void scaler_free(ScaleContext *s) { - sws_freeContext(s->sws); - sws_freeContext(s->isws[0]); - sws_freeContext(s->isws[1]); + for (int i = 0; i < s->nb_scalers; i++) + for (int j = 0; j < 3; j++) + sws_freeContext(s->scalers[i][j]); - s->sws = NULL; - s->isws[0] = NULL; - s->isws[1] = NULL; + av_freep(&s->scalers); } static av_cold void uninit(AVFilterContext *ctx) @@ -346,6 +354,7 @@ static av_cold void uninit(AVFilterContext *ctx) av_expr_free(scale->h_pexpr); scale->w_pexpr = scale->h_pexpr = NULL; scaler_free(scale); + av_freep(&scale->scaler_res); av_dict_free(&scale->opts); } @@ -522,19 +531,28 @@ static int config_props(AVFilterLink *outlink) scaler_free(scale); - if (inlink0->w == outlink->w && - inlink0->h == outlink->h && - !scale->out_color_matrix && - scale->in_range == scale->out_range && - inlink0->format == outlink->format) - ; - else { - struct SwsContext **swscs[3] = {&scale->sws, &scale->isws[0], &scale->isws[1]}; - int i; - - for (i = 0; i < 3; i++) { + scale->passthrough = inlink0->w == outlink->w && + inlink0->h == outlink->h && + !scale->out_color_matrix && + scale->in_range == scale->out_range && + inlink0->format == outlink->format; + + if (!scale->passthrough) { + int nb_scalers = ff_filter_get_nb_threads(ctx); + + scale->scalers = av_mallocz_array(nb_scalers, 3 * sizeof(struct SwsContext*)); + if (!scale->scalers) + return AVERROR(ENOMEM); + + ret = av_reallocp_array(&scale->scaler_res, nb_scalers, sizeof(*scale->scaler_res)); + if (ret < 0) + return ret; + + for (int i = 0; i < 3; i++) { + for (int t = 0; t < nb_scalers; t++) { int in_v_chr_pos = scale->in_v_chr_pos, out_v_chr_pos = scale->out_v_chr_pos; - struct SwsContext **s = swscs[i]; + struct SwsContext **s = &scale->scalers[t][i]; + *s = sws_alloc_context(); if (!*s) return AVERROR(ENOMEM); @@ -580,9 +598,29 @@ static int config_props(AVFilterLink *outlink) if ((ret = sws_init_context(*s, NULL, NULL)) < 0) return ret; + + /* do not multithread error-diffusion dithering */ + if (i == 0 && t == 0) { + const AVOption *opt; + int64_t dither; + + av_opt_get_int(*s, "sws_dither", 0, &dither); + opt = av_opt_find2(*s, "ed", "sws_dither", 0, 0, NULL); + if (!opt) + return AVERROR_BUG; + + if (dither == opt->default_val.i64) { + av_log(ctx, AV_LOG_WARNING, "Error-diffusion dithering is " + "used, conversion will be single-threaded.\n"); + nb_scalers = 1; + } + } + } + if (!scale->interlaced) break; } + scale->nb_scalers = nb_scalers; } if (inlink0->sample_aspect_ratio.num){ @@ -625,7 +663,8 @@ static int request_frame_ref(AVFilterLink *outlink) return ff_request_frame(outlink->src->inputs[1]); } -static int scale_slice(ScaleContext *scale, AVFrame *out_buf, AVFrame *cur_pic, struct SwsContext *sws, int y, int h, int mul, int field) +static int scale_slice(ScaleContext *scale, AVFrame *out_buf, AVFrame *cur_pic, struct SwsContext *sws, + int y, int h, int mul, int field, int dst) { const uint8_t *in[4]; uint8_t *out[4]; @@ -633,9 +672,10 @@ static int scale_slice(ScaleContext *scale, AVFrame *out_buf, AVFrame *cur_pic, int i; for (i=0; i<4; i++) { - int vsub= ((i+1)&2) ? scale->vsub : 0; - ptrdiff_t in_offset = ((y>>vsub)+field) * cur_pic->linesize[i]; - ptrdiff_t out_offset = field * out_buf->linesize[i]; + int vsub = ((i+1)&2) ? scale->vsub : 0; + int ovsub = ((i+1)&2) ? scale->ovsub : 0; + ptrdiff_t in_offset = (((y * !dst) >> vsub) + field) * cur_pic->linesize[i]; + ptrdiff_t out_offset = (((y * dst) >> ovsub) + field) * out_buf->linesize[i]; in_stride[i] = cur_pic->linesize[i] * mul; out_stride[i] = out_buf->linesize[i] * mul; in[i] = FF_PTR_ADD(cur_pic->data[i], in_offset); @@ -646,17 +686,57 @@ static int scale_slice(ScaleContext *scale, AVFrame *out_buf, AVFrame *cur_pic, if (scale->output_is_pal) out[1] = out_buf->data[1]; + if (dst) + return sws_scale_dst_slice(sws, in, in_stride, + out, out_stride, y / mul, h); + return sws_scale(sws, in, in_stride, y/mul, h, out,out_stride); } +typedef struct ScaleThreadData { + AVFrame *frame_in; + AVFrame *frame_out; + int scaler_idx; +} ScaleThreadData; + +static int scaler_res(ScaleContext *scale) +{ + for (int i = 0; i < scale->nb_scalers; i++) + if (scale->scaler_res[i] < 0) + return scale->scaler_res[i]; + return 0; +} + +static int scale_job(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) +{ + ScaleContext *scale = ctx->priv; + ScaleThreadData *td = arg; + int stride_mul = 1 << (td->scaler_idx > 0); + int first_field = td->scaler_idx == 1; + int picture_height = (td->frame_out->height + first_field) / stride_mul; + int slice_height = FFALIGN(FFMAX((picture_height + nb_jobs - 1) / nb_jobs, 1), + 1 << scale->ovsub); + int slice_start = jobnr * slice_height; + int slice_end = FFMIN((jobnr + 1) * slice_height, picture_height); + + if (slice_start < slice_end) { + scale_slice(scale, td->frame_out, td->frame_in, + scale->scalers[jobnr][td->scaler_idx], slice_start, + slice_end - slice_start, stride_mul, td->scaler_idx == 2, 1); + } + + return 0; +} + static int scale_frame(AVFilterLink *link, AVFrame *in, AVFrame **frame_out) { AVFilterContext *ctx = link->dst; ScaleContext *scale = ctx->priv; AVFilterLink *outlink = ctx->outputs[0]; AVFrame *out; - const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(link->format); + const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(link->format); + const AVPixFmtDescriptor *odesc = av_pix_fmt_desc_get(outlink->format); char buf[32]; int ret; int in_range; @@ -723,13 +803,15 @@ static int scale_frame(AVFilterLink *link, AVFrame *in, AVFrame **frame_out) } scale: - if (!scale->sws) { + if (!scale->nb_scalers) { *frame_out = in; return 0; } scale->hsub = desc->log2_chroma_w; scale->vsub = desc->log2_chroma_h; + scale->ohsub = odesc->log2_chroma_w; + scale->ovsub = odesc->log2_chroma_h; out = ff_get_video_buffer(outlink, outlink->w, outlink->h); if (!out) { @@ -755,7 +837,7 @@ scale: int in_full, out_full, brightness, contrast, saturation; const int *inv_table, *table; - sws_getColorspaceDetails(scale->sws, (int **)&inv_table, &in_full, + sws_getColorspaceDetails(scale->scalers[0][0], (int **)&inv_table, &in_full, (int **)&table, &out_full, &brightness, &contrast, &saturation); @@ -773,17 +855,14 @@ scale: if (scale->out_range != AVCOL_RANGE_UNSPECIFIED) out_full = (scale->out_range == AVCOL_RANGE_JPEG); - sws_setColorspaceDetails(scale->sws, inv_table, in_full, + for (int i = 0; i < 3; i++) + for (int j = 0; j < scale->nb_scalers; j++) { + if (!scale->scalers[j][i]) + continue; + sws_setColorspaceDetails(scale->scalers[j][i], inv_table, in_full, table, out_full, brightness, contrast, saturation); - if (scale->isws[0]) - sws_setColorspaceDetails(scale->isws[0], inv_table, in_full, - table, out_full, - brightness, contrast, saturation); - if (scale->isws[1]) - sws_setColorspaceDetails(scale->isws[1], inv_table, in_full, - table, out_full, - brightness, contrast, saturation); + } out->color_range = out_full ? AVCOL_RANGE_JPEG : AVCOL_RANGE_MPEG; } @@ -793,10 +872,22 @@ scale: (int64_t)in->sample_aspect_ratio.den * outlink->w * link->h, INT_MAX); + memset(scale->scaler_res, 0, scale->nb_scalers * sizeof(*scale->scaler_res)); + if (scale->interlaced>0 || (scale->interlaced<0 && in->interlaced_frame)) { - ret = scale_slice(scale, out, in, scale->isws[0], 0, (link->h+1)/2, 2, 0); - if (ret >= 0) - ret = scale_slice(scale, out, in, scale->isws[1], 0, link->h /2, 2, 1); + ScaleThreadData td = { + .scaler_idx = 1, + .frame_in = in, + .frame_out = out, + }; + + ctx->internal->execute(ctx, scale_job, &td, scale->scaler_res, scale->nb_scalers); + + if (scaler_res(scale) >= 0) { + td.scaler_idx = 2; + memset(scale->scaler_res, 0, scale->nb_scalers * sizeof(*scale->scaler_res)); + ctx->internal->execute(ctx, scale_job, &td, scale->scaler_res, scale->nb_scalers); + } } else if (scale->nb_slices) { int i, slice_h, slice_start, slice_end = 0; const int nb_slices = FFMIN(scale->nb_slices, link->h); @@ -804,14 +895,22 @@ scale: slice_start = slice_end; slice_end = (link->h * (i+1)) / nb_slices; slice_h = slice_end - slice_start; - ret = scale_slice(scale, out, in, scale->sws, slice_start, slice_h, 1, 0); + ret = scale_slice(scale, out, in, scale->scalers[0][0], slice_start, slice_h, 1, 0, 0); if (ret < 0) break; } } else { - ret = scale_slice(scale, out, in, scale->sws, 0, link->h, 1, 0); + ScaleThreadData td = { + .scaler_idx = 0, + .frame_in = in, + .frame_out = out, + }; + + ctx->internal->execute(ctx, scale_job, &td, scale->scaler_res, scale->nb_scalers); } + ret = scaler_res(scale); + av_frame_free(&in); if (ret < 0) av_frame_free(frame_out); @@ -984,6 +1083,7 @@ const AVFilter ff_vf_scale = { .inputs = avfilter_vf_scale_inputs, .outputs = avfilter_vf_scale_outputs, .process_command = process_command, + .flags = AVFILTER_FLAG_SLICE_THREADS, }; static const AVClass scale2ref_class = {