From patchwork Fri Feb 4 09:31:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Victoria Zhislina X-Patchwork-Id: 34117 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2c4e:0:0:0:0 with SMTP id x14csp3249501iov; Fri, 4 Feb 2022 01:31:46 -0800 (PST) X-Google-Smtp-Source: ABdhPJy3YTsLr5ggYevT1lpTOx+WCxANl1pP+hCXBsnfT4K/CW/wi/kJ/Xv61VM1FlLKnxuSO1gL X-Received: by 2002:a05:6402:2754:: with SMTP id z20mr2102824edd.235.1643967106810; Fri, 04 Feb 2022 01:31:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643967106; cv=none; d=google.com; s=arc-20160816; b=aOJAU5XBl1FpSkK//9H/wPOR47d1bThDmF1PyMRCuOztk5r6NTtFniMZ7q8hgNjkP8 Psuk1+KsJBEIi583p8onGuWI2SsBxlbgh9ZFghqbgKkpe8k8SkFBWg56Ea5IAqcx6jGJ 6UFzBHG0sB9p7MXIT9wVrvQShhoQiPnwMyBaAd+QNOtDkhmHRDcWScPx0Zi+PVmmprVa a+ma0lRRGslvfulTKhAY4OqnA4oMRDuwZyda8wObR//gGIOQHfTbY1qlp3pJIRSduSm5 QEb7Pj3CeM9fcLuod7ReH9lO8CMrDmeY9MzzzVTnKFBegQEuXs8sfxre9UdZNCQTNzuU TAOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=MtojXm2UMU4hRAgGStqAZPnWXO6SuH/qPWNpTWk9fWs=; b=FXrqSMw7c4YoIlscIk/2g1zBrf/sOpHBnxExTpCrqy8hMadWv/BH12Dg5J2d3W2mrM hBW9W7DYZ0wX0rkR2XeKYO+Atj8KTt1+2I2q91PRzbmBmY9dbGwLOj1q2LnDff2KTkvw d2cPRRYGD88yS9ioy8tUf95bi+behg/qigGZFeBnvhMvGwoUFvCsEDU9ODOINMXrt3/B C0Zpg6UBAFRwJiKUxORZK9KU419+2Po2OdaJwJ8cj1YqjGX0G/t1ngC6YGhkf+4eoJaQ s4tOabsNyLQgK4aYjql+SkXbWW4uFc0dCo/jMIa2/QMNL6yRZpkT0kvMlJj837cw9Vjk +07Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=qU3gy+ah; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id m5si822093ejn.637.2022.02.04.01.31.45; Fri, 04 Feb 2022 01:31:46 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=qU3gy+ah; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8FDDD68B203; Fri, 4 Feb 2022 11:31:42 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com [209.85.167.52]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9E9D168B021 for ; Fri, 4 Feb 2022 11:31:34 +0200 (EET) Received: by mail-lf1-f52.google.com with SMTP id a28so11418492lfl.7 for ; Fri, 04 Feb 2022 01:31:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=4mZQvTK61LQMAtV90s+vmd9KWZbCbpc9+RNKyzNQJtg=; b=qU3gy+ah8HeC+Lk6YhSjxPtPKBjZ3uZIVOxlrrhLYFQXHcwk/WOfpOr7WGCO41a7TY FI1ip6w3JW56/U7UOuD2dNoILOBEmNbjJ5HDsu2TXPU50RsRVp7mCIOKbH80UKmvjV5H QzLWBh652b9ShGFyVFk4IHch1sJJ+wdIgkMzLUmTsSmVfSTs0KLp0krQgis8VN4m4a57 ZULgxUpgnKxBFykdR41SZ4HB8klzrdYpvDUAGwsyPsfKqomUy+aj9cF1FIJrkDgAed8q xWjqWYmUHz50ql27Y1vRonHUb6CvrFNBtBKq0xdWJsOw+sAe3Y0sWCkbQgy2vSyNBDj2 sYpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=4mZQvTK61LQMAtV90s+vmd9KWZbCbpc9+RNKyzNQJtg=; b=vt9e5Zx6vseLFKb43PcYzz42lzLOV9HuIfwjWKZogJ/EpLqwVYm3tPhWvzvnQ9eOPL rq/fCrUTQdRpcOc3YTw8+EexzuMGeaos9YM0eD4ycXZ7xMYSeJ6nVY7OPEj5x8C2bp9x OOENvifKFldJwdGyqlItwKZp0Se1xe//tIQfpRl/g8JiyyQKkT4b/FvIUYOLwffHSoxp 3keQ9mgl7HhJnfWGr2JjEA5J1Sj+DwCHFkz3/12Dc+eTWduFSWJmnR74kQWSJRaCmT4O UHUsIW8q6BFB94JYujqlOwM1k70i8vHoQVSc4JINtU8B3+IwCbRTV+XuYAV0qTab6ocz Tulg== X-Gm-Message-State: AOAM530KmFbZQxktcZg4HgZOHJxPvV0Khs0XYy3T7aioWOoGsQb8YfYe 9XbP6cJPB168rNJC/fpQnVg/pL4j2Ts= X-Received: by 2002:a05:6512:3d0f:: with SMTP id d15mr1725597lfv.77.1643967093198; Fri, 04 Feb 2022 01:31:33 -0800 (PST) Received: from localhost.localdomain (89-109-50-203.dynamic.mts-nn.ru. [89.109.50.203]) by smtp.googlemail.com with ESMTPSA id e13sm191139ljj.85.2022.02.04.01.31.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Feb 2022 01:31:32 -0800 (PST) From: Victoria Zhislina X-Google-Original-From: Victoria Zhislina To: ffmpeg-devel@ffmpeg.org Date: Fri, 4 Feb 2022 12:31:14 +0300 Message-Id: <20220204093114.1502-1-Victoria.Zhislina@intel.com> X-Mailer: git-send-email 2.31.1.windows.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] zscale video filter performance optimization 4x X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Victoria Zhislina Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: hvGCqZc0TK72 Optimizations: by ffmpeg threading support implementation via frame slicing and moving zimg_filter_graph_build to the filter initialization phase from each frame processig the performance increase vs original version in video downscale and color conversion up to 4x is seen on 64 cores Intel Xeon, 3x on i7-6700K (4 cores with HT) Signed-off-by: Victoria Zhislina --- libavfilter/vf_zscale.c | 779 ++++++++++++++++++++++------------------ 1 file changed, 433 insertions(+), 346 deletions(-) diff --git a/libavfilter/vf_zscale.c b/libavfilter/vf_zscale.c index 1288c5efc1..1a2de1fe21 100644 --- a/libavfilter/vf_zscale.c +++ b/libavfilter/vf_zscale.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2015 Paul B Mahol - * + * * 2022 Victoria Zhislina, Intel Corporation - performance optimization + * This file is part of FFmpeg. * * FFmpeg is free software; you can redistribute it and/or @@ -44,6 +45,8 @@ #include "libavutil/imgutils.h" #define ZIMG_ALIGNMENT 32 +#define MIN_TILESIZE 64 +#define MAX_THREADS 64 static const char *const var_names[] = { "in_w", "iw", @@ -113,13 +116,14 @@ typedef struct ZScaleContext { int force_original_aspect_ratio; - void *tmp; - size_t tmp_size; + void *tmp[MAX_THREADS]; //separate for each thread; + int nb_threads; + int slice_h; zimg_image_format src_format, dst_format; zimg_image_format alpha_src_format, alpha_dst_format; zimg_graph_builder_params alpha_params, params; - zimg_filter_graph *alpha_graph, *graph; + zimg_filter_graph *alpha_graph[MAX_THREADS], *graph[MAX_THREADS]; enum AVColorSpace in_colorspace, out_colorspace; enum AVColorTransferCharacteristic in_trc, out_trc; @@ -128,10 +132,167 @@ typedef struct ZScaleContext { enum AVChromaLocation in_chromal, out_chromal; } ZScaleContext; + +typedef struct ThreadData { + const AVPixFmtDescriptor *desc, *odesc; + AVFrame *in, *out; +} ThreadData; + +static int convert_chroma_location(enum AVChromaLocation chroma_location) +{ + switch (chroma_location) { + case AVCHROMA_LOC_UNSPECIFIED: + case AVCHROMA_LOC_LEFT: + return ZIMG_CHROMA_LEFT; + case AVCHROMA_LOC_CENTER: + return ZIMG_CHROMA_CENTER; + case AVCHROMA_LOC_TOPLEFT: + return ZIMG_CHROMA_TOP_LEFT; + case AVCHROMA_LOC_TOP: + return ZIMG_CHROMA_TOP; + case AVCHROMA_LOC_BOTTOMLEFT: + return ZIMG_CHROMA_BOTTOM_LEFT; + case AVCHROMA_LOC_BOTTOM: + return ZIMG_CHROMA_BOTTOM; + } + return ZIMG_CHROMA_LEFT; +} + +static int convert_matrix(enum AVColorSpace colorspace) +{ + switch (colorspace) { + case AVCOL_SPC_RGB: + return ZIMG_MATRIX_RGB; + case AVCOL_SPC_BT709: + return ZIMG_MATRIX_709; + case AVCOL_SPC_UNSPECIFIED: + return ZIMG_MATRIX_UNSPECIFIED; + case AVCOL_SPC_FCC: + return ZIMG_MATRIX_FCC; + case AVCOL_SPC_BT470BG: + return ZIMG_MATRIX_470BG; + case AVCOL_SPC_SMPTE170M: + return ZIMG_MATRIX_170M; + case AVCOL_SPC_SMPTE240M: + return ZIMG_MATRIX_240M; + case AVCOL_SPC_YCGCO: + return ZIMG_MATRIX_YCGCO; + case AVCOL_SPC_BT2020_NCL: + return ZIMG_MATRIX_2020_NCL; + case AVCOL_SPC_BT2020_CL: + return ZIMG_MATRIX_2020_CL; + case AVCOL_SPC_CHROMA_DERIVED_NCL: + return ZIMG_MATRIX_CHROMATICITY_DERIVED_NCL; + case AVCOL_SPC_CHROMA_DERIVED_CL: + return ZIMG_MATRIX_CHROMATICITY_DERIVED_CL; + case AVCOL_SPC_ICTCP: + return ZIMG_MATRIX_ICTCP; + } + return ZIMG_MATRIX_UNSPECIFIED; +} + +static int convert_trc(enum AVColorTransferCharacteristic color_trc) +{ + switch (color_trc) { + case AVCOL_TRC_UNSPECIFIED: + return ZIMG_TRANSFER_UNSPECIFIED; + case AVCOL_TRC_BT709: + return ZIMG_TRANSFER_709; + case AVCOL_TRC_GAMMA22: + return ZIMG_TRANSFER_470_M; + case AVCOL_TRC_GAMMA28: + return ZIMG_TRANSFER_470_BG; + case AVCOL_TRC_SMPTE170M: + return ZIMG_TRANSFER_601; + case AVCOL_TRC_SMPTE240M: + return ZIMG_TRANSFER_240M; + case AVCOL_TRC_LINEAR: + return ZIMG_TRANSFER_LINEAR; + case AVCOL_TRC_LOG: + return ZIMG_TRANSFER_LOG_100; + case AVCOL_TRC_LOG_SQRT: + return ZIMG_TRANSFER_LOG_316; + case AVCOL_TRC_IEC61966_2_4: + return ZIMG_TRANSFER_IEC_61966_2_4; + case AVCOL_TRC_BT2020_10: + return ZIMG_TRANSFER_2020_10; + case AVCOL_TRC_BT2020_12: + return ZIMG_TRANSFER_2020_12; + case AVCOL_TRC_SMPTE2084: + return ZIMG_TRANSFER_ST2084; + case AVCOL_TRC_ARIB_STD_B67: + return ZIMG_TRANSFER_ARIB_B67; + case AVCOL_TRC_IEC61966_2_1: + return ZIMG_TRANSFER_IEC_61966_2_1; + } + return ZIMG_TRANSFER_UNSPECIFIED; +} + +static int convert_primaries(enum AVColorPrimaries color_primaries) +{ + switch (color_primaries) { + case AVCOL_PRI_UNSPECIFIED: + return ZIMG_PRIMARIES_UNSPECIFIED; + case AVCOL_PRI_BT709: + return ZIMG_PRIMARIES_709; + case AVCOL_PRI_BT470M: + return ZIMG_PRIMARIES_470_M; + case AVCOL_PRI_BT470BG: + return ZIMG_PRIMARIES_470_BG; + case AVCOL_PRI_SMPTE170M: + return ZIMG_PRIMARIES_170M; + case AVCOL_PRI_SMPTE240M: + return ZIMG_PRIMARIES_240M; + case AVCOL_PRI_FILM: + return ZIMG_PRIMARIES_FILM; + case AVCOL_PRI_BT2020: + return ZIMG_PRIMARIES_2020; + case AVCOL_PRI_SMPTE428: + return ZIMG_PRIMARIES_ST428; + case AVCOL_PRI_SMPTE431: + return ZIMG_PRIMARIES_ST431_2; + case AVCOL_PRI_SMPTE432: + return ZIMG_PRIMARIES_ST432_1; + case AVCOL_PRI_JEDEC_P22: + return ZIMG_PRIMARIES_EBU3213_E; + } + return ZIMG_PRIMARIES_UNSPECIFIED; +} + +static int convert_range(enum AVColorRange color_range) +{ + switch (color_range) { + case AVCOL_RANGE_UNSPECIFIED: + case AVCOL_RANGE_MPEG: + return ZIMG_RANGE_LIMITED; + case AVCOL_RANGE_JPEG: + return ZIMG_RANGE_FULL; + } + return ZIMG_RANGE_LIMITED; +} + +static enum AVColorRange convert_range_from_zimg(enum zimg_pixel_range_e color_range) +{ + switch (color_range) { + case ZIMG_RANGE_LIMITED: + return AVCOL_RANGE_MPEG; + case ZIMG_RANGE_FULL: + return AVCOL_RANGE_JPEG; + } + return AVCOL_RANGE_UNSPECIFIED; +} + static av_cold int init(AVFilterContext *ctx) { ZScaleContext *s = ctx->priv; int ret; + int i; + + for (i = 0; i < MAX_THREADS; i++) { + s->tmp[i] = NULL; + s->graph[i] = NULL; + s->alpha_graph[i] = NULL; + } if (s->size_str && (s->w_expr || s->h_expr)) { av_log(ctx, AV_LOG_ERROR, @@ -194,11 +355,153 @@ static int query_formats(AVFilterContext *ctx) return ff_formats_ref(ff_make_format_list(pixel_fmts), &ctx->outputs[0]->incfg.formats); } +static void format_init(zimg_image_format *format, AVFrame *frame, const AVPixFmtDescriptor *desc, + int colorspace, int primaries, int transfer, int range, int location) +{ + format->width = frame->width; + format->height = frame->height; + format->subsample_w = desc->log2_chroma_w; + format->subsample_h = desc->log2_chroma_h; + format->depth = desc->comp[0].depth; + format->pixel_type = (desc->flags & AV_PIX_FMT_FLAG_FLOAT) ? ZIMG_PIXEL_FLOAT : desc->comp[0].depth > 8 ? ZIMG_PIXEL_WORD : ZIMG_PIXEL_BYTE; + format->color_family = (desc->flags & AV_PIX_FMT_FLAG_RGB) ? ZIMG_COLOR_RGB : ZIMG_COLOR_YUV; + format->matrix_coefficients = (desc->flags & AV_PIX_FMT_FLAG_RGB) ? ZIMG_MATRIX_RGB : colorspace == -1 ? convert_matrix(frame->colorspace) : colorspace; + format->color_primaries = primaries == -1 ? convert_primaries(frame->color_primaries) : primaries; + format->transfer_characteristics = transfer == -1 ? convert_trc(frame->color_trc) : transfer; + format->pixel_range = (desc->flags & AV_PIX_FMT_FLAG_RGB) ? ZIMG_RANGE_FULL : range == -1 ? convert_range(frame->color_range) : range; + format->chroma_location = location == -1 ? convert_chroma_location(frame->chroma_location) : location; +} + +static int print_zimg_error(AVFilterContext *ctx) +{ + char err_msg[1024]; + int err_code = zimg_get_last_error(err_msg, sizeof(err_msg)); + + av_log(ctx, AV_LOG_ERROR, "code %d: %s\n", err_code, err_msg); + + return AVERROR_EXTERNAL; +} + +static int graphs_build(AVFilterLink *inlink, AVFilterLink *outlink, const AVPixFmtDescriptor *desc, const AVPixFmtDescriptor *out_desc, + ZScaleContext *s) +{ + int ret; + int i; + size_t size; + zimg_image_format src_format; + zimg_image_format dst_format; + AVFrame *in, *out; + in = ff_get_video_buffer(inlink, inlink->w, inlink->h); + out = ff_get_video_buffer(outlink, outlink->w, outlink->h); + + zimg_image_format_default(&s->src_format, ZIMG_API_VERSION); + zimg_image_format_default(&s->dst_format, ZIMG_API_VERSION); + zimg_graph_builder_params_default(&s->params, ZIMG_API_VERSION); + + format_init(&s->src_format, in, desc, s->colorspace_in, + s->primaries_in, s->trc_in, s->range_in, s->chromal_in); + format_init(&s->dst_format, out, out_desc, s->colorspace, + s->primaries, s->trc, s->range, s->chromal); + + s->params.dither_type = s->dither; + s->params.cpu_type = ZIMG_CPU_AUTO; + s->params.resample_filter = s->filter; + s->params.resample_filter_uv = s->filter; + s->params.nominal_peak_luminance = s->nominal_peak_luminance; + s->params.allow_approximate_gamma = s->approximate_gamma; + s->params.filter_param_a = s->params.filter_param_a_uv = s->param_a; + s->params.filter_param_b = s->params.filter_param_b_uv = s->param_b; + + for (i = 0; i < s->nb_threads; i++) { + src_format = s->src_format; + dst_format = s->dst_format; + /* The input slice is specified through the active_region field, + unlike the output slice. + according to zimg requirements input and output slices should have even dimentions */ + src_format.active_region.width = in->width; + src_format.active_region.height = s->slice_h; + src_format.active_region.left = 0; + src_format.active_region.top = i * src_format.active_region.height; + //dst now is the single tile only!! + dst_format.width = out->width; + dst_format.height = ((unsigned int)(out->height / s->nb_threads)) & 0xfffffffe; + + //the last slice could be higher than previous ones due to the slices division "tail" + if (i == (s->nb_threads - 1)) { + src_format.active_region.height = src_format.height - src_format.active_region.top; + dst_format.height = out->height - i * dst_format.height; + } + + if (s->graph[i]) { + zimg_filter_graph_free(s->graph[i]); + } + s->graph[i] = zimg_filter_graph_build(&src_format, &dst_format, &s->params); + if (!s->graph[i]) + return print_zimg_error(NULL); + + ret = zimg_filter_graph_get_tmp_size(s->graph[i], &size); + if (ret) + return print_zimg_error(NULL); + + if (s->tmp[i]) + av_freep(&s->tmp[i]); + s->tmp[i] = av_malloc(size); + if (!s->tmp[i]) + return AVERROR(ENOMEM); + } + if (desc->flags & AV_PIX_FMT_FLAG_ALPHA && out_desc->flags & AV_PIX_FMT_FLAG_ALPHA) { + zimg_image_format_default(&s->alpha_src_format, ZIMG_API_VERSION); + zimg_image_format_default(&s->alpha_dst_format, ZIMG_API_VERSION); + zimg_graph_builder_params_default(&s->alpha_params, ZIMG_API_VERSION); + + s->alpha_params.dither_type = s->dither; + s->alpha_params.cpu_type = ZIMG_CPU_AUTO; + s->alpha_params.resample_filter = s->filter; + + s->alpha_src_format.width = in->width; + s->alpha_src_format.height = s->slice_h; + s->alpha_src_format.depth = desc->comp[0].depth; + s->alpha_src_format.pixel_type = (desc->flags & AV_PIX_FMT_FLAG_FLOAT) ? ZIMG_PIXEL_FLOAT : desc->comp[0].depth > 8 ? ZIMG_PIXEL_WORD : ZIMG_PIXEL_BYTE; + s->alpha_src_format.color_family = ZIMG_COLOR_GREY; + + s->alpha_dst_format.depth = out_desc->comp[0].depth; + s->alpha_dst_format.pixel_type = (out_desc->flags & AV_PIX_FMT_FLAG_FLOAT) ? ZIMG_PIXEL_FLOAT : out_desc->comp[0].depth > 8 ? ZIMG_PIXEL_WORD : ZIMG_PIXEL_BYTE; + s->alpha_dst_format.color_family = ZIMG_COLOR_GREY; + + for (i = 0; i < s->nb_threads; i++) { + /* The input slice is specified through the active_region field, unlike the output slice. + according to zimg requirements input and output slices should have even dimentions */ + s->alpha_src_format.active_region.width = in->width; + s->alpha_src_format.active_region.height = s->slice_h; + s->alpha_src_format.active_region.left = 0; + s->alpha_src_format.active_region.top = i * s->src_format.active_region.height; + + s->alpha_dst_format.width = out->width; + s->alpha_dst_format.height = ((unsigned int)(out->height / s->nb_threads)) & 0xfffffffe; + + //the last slice could be higher than previous ones due to the slices division "tail" + if (i == (s->nb_threads - 1)) { + s->alpha_src_format.active_region.height = s->alpha_src_format.height - s->alpha_src_format.active_region.top; + s->alpha_dst_format.height = out->height - i * s->alpha_dst_format.height; + } + + if (s->alpha_graph[i]) { + zimg_filter_graph_free(s->alpha_graph[i]); + } + s->alpha_graph[i] = zimg_filter_graph_build(&s->alpha_src_format, &s->alpha_dst_format, &s->alpha_params); + if (!s->alpha_graph[i]) + return print_zimg_error(NULL); + } + } + return 0; +} + static int config_props(AVFilterLink *outlink) { AVFilterContext *ctx = outlink->src; AVFilterLink *inlink = outlink->src->inputs[0]; ZScaleContext *s = ctx->priv; + const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(inlink->format); const AVPixFmtDescriptor *out_desc = av_pix_fmt_desc_get(outlink->format); int64_t w, h; @@ -307,6 +610,16 @@ static int config_props(AVFilterLink *outlink) inlink->sample_aspect_ratio.num, inlink->sample_aspect_ratio.den, outlink->w, outlink->h, av_get_pix_fmt_name(outlink->format), outlink->sample_aspect_ratio.num, outlink->sample_aspect_ratio.den); + + + s->nb_threads = FFMIN(ff_filter_get_nb_threads(ctx), inlink->h / MIN_TILESIZE); + s->slice_h = ((unsigned int)(inlink->h / s->nb_threads)) & 0xfffffffe;// slice_h should be even + + //create graphs for each thread + ret = graphs_build(inlink, outlink, desc, out_desc, s); + if (ret < 0) + return ret; + return 0; fail: @@ -317,212 +630,15 @@ fail: return ret; } -static int print_zimg_error(AVFilterContext *ctx) -{ - char err_msg[1024]; - int err_code = zimg_get_last_error(err_msg, sizeof(err_msg)); - - av_log(ctx, AV_LOG_ERROR, "code %d: %s\n", err_code, err_msg); - - return AVERROR_EXTERNAL; -} - -static int convert_chroma_location(enum AVChromaLocation chroma_location) -{ - switch (chroma_location) { - case AVCHROMA_LOC_UNSPECIFIED: - case AVCHROMA_LOC_LEFT: - return ZIMG_CHROMA_LEFT; - case AVCHROMA_LOC_CENTER: - return ZIMG_CHROMA_CENTER; - case AVCHROMA_LOC_TOPLEFT: - return ZIMG_CHROMA_TOP_LEFT; - case AVCHROMA_LOC_TOP: - return ZIMG_CHROMA_TOP; - case AVCHROMA_LOC_BOTTOMLEFT: - return ZIMG_CHROMA_BOTTOM_LEFT; - case AVCHROMA_LOC_BOTTOM: - return ZIMG_CHROMA_BOTTOM; - } - return ZIMG_CHROMA_LEFT; -} - -static int convert_matrix(enum AVColorSpace colorspace) -{ - switch (colorspace) { - case AVCOL_SPC_RGB: - return ZIMG_MATRIX_RGB; - case AVCOL_SPC_BT709: - return ZIMG_MATRIX_709; - case AVCOL_SPC_UNSPECIFIED: - return ZIMG_MATRIX_UNSPECIFIED; - case AVCOL_SPC_FCC: - return ZIMG_MATRIX_FCC; - case AVCOL_SPC_BT470BG: - return ZIMG_MATRIX_470BG; - case AVCOL_SPC_SMPTE170M: - return ZIMG_MATRIX_170M; - case AVCOL_SPC_SMPTE240M: - return ZIMG_MATRIX_240M; - case AVCOL_SPC_YCGCO: - return ZIMG_MATRIX_YCGCO; - case AVCOL_SPC_BT2020_NCL: - return ZIMG_MATRIX_2020_NCL; - case AVCOL_SPC_BT2020_CL: - return ZIMG_MATRIX_2020_CL; - case AVCOL_SPC_CHROMA_DERIVED_NCL: - return ZIMG_MATRIX_CHROMATICITY_DERIVED_NCL; - case AVCOL_SPC_CHROMA_DERIVED_CL: - return ZIMG_MATRIX_CHROMATICITY_DERIVED_CL; - case AVCOL_SPC_ICTCP: - return ZIMG_MATRIX_ICTCP; - } - return ZIMG_MATRIX_UNSPECIFIED; -} - -static int convert_trc(enum AVColorTransferCharacteristic color_trc) -{ - switch (color_trc) { - case AVCOL_TRC_UNSPECIFIED: - return ZIMG_TRANSFER_UNSPECIFIED; - case AVCOL_TRC_BT709: - return ZIMG_TRANSFER_709; - case AVCOL_TRC_GAMMA22: - return ZIMG_TRANSFER_470_M; - case AVCOL_TRC_GAMMA28: - return ZIMG_TRANSFER_470_BG; - case AVCOL_TRC_SMPTE170M: - return ZIMG_TRANSFER_601; - case AVCOL_TRC_SMPTE240M: - return ZIMG_TRANSFER_240M; - case AVCOL_TRC_LINEAR: - return ZIMG_TRANSFER_LINEAR; - case AVCOL_TRC_LOG: - return ZIMG_TRANSFER_LOG_100; - case AVCOL_TRC_LOG_SQRT: - return ZIMG_TRANSFER_LOG_316; - case AVCOL_TRC_IEC61966_2_4: - return ZIMG_TRANSFER_IEC_61966_2_4; - case AVCOL_TRC_BT2020_10: - return ZIMG_TRANSFER_2020_10; - case AVCOL_TRC_BT2020_12: - return ZIMG_TRANSFER_2020_12; - case AVCOL_TRC_SMPTE2084: - return ZIMG_TRANSFER_ST2084; - case AVCOL_TRC_ARIB_STD_B67: - return ZIMG_TRANSFER_ARIB_B67; - case AVCOL_TRC_IEC61966_2_1: - return ZIMG_TRANSFER_IEC_61966_2_1; - } - return ZIMG_TRANSFER_UNSPECIFIED; -} - -static int convert_primaries(enum AVColorPrimaries color_primaries) -{ - switch (color_primaries) { - case AVCOL_PRI_UNSPECIFIED: - return ZIMG_PRIMARIES_UNSPECIFIED; - case AVCOL_PRI_BT709: - return ZIMG_PRIMARIES_709; - case AVCOL_PRI_BT470M: - return ZIMG_PRIMARIES_470_M; - case AVCOL_PRI_BT470BG: - return ZIMG_PRIMARIES_470_BG; - case AVCOL_PRI_SMPTE170M: - return ZIMG_PRIMARIES_170M; - case AVCOL_PRI_SMPTE240M: - return ZIMG_PRIMARIES_240M; - case AVCOL_PRI_FILM: - return ZIMG_PRIMARIES_FILM; - case AVCOL_PRI_BT2020: - return ZIMG_PRIMARIES_2020; - case AVCOL_PRI_SMPTE428: - return ZIMG_PRIMARIES_ST428; - case AVCOL_PRI_SMPTE431: - return ZIMG_PRIMARIES_ST431_2; - case AVCOL_PRI_SMPTE432: - return ZIMG_PRIMARIES_ST432_1; - case AVCOL_PRI_JEDEC_P22: - return ZIMG_PRIMARIES_EBU3213_E; - } - return ZIMG_PRIMARIES_UNSPECIFIED; -} - -static int convert_range(enum AVColorRange color_range) -{ - switch (color_range) { - case AVCOL_RANGE_UNSPECIFIED: - case AVCOL_RANGE_MPEG: - return ZIMG_RANGE_LIMITED; - case AVCOL_RANGE_JPEG: - return ZIMG_RANGE_FULL; - } - return ZIMG_RANGE_LIMITED; -} - -static enum AVColorRange convert_range_from_zimg(enum zimg_pixel_range_e color_range) -{ - switch (color_range) { - case ZIMG_RANGE_LIMITED: - return AVCOL_RANGE_MPEG; - case ZIMG_RANGE_FULL: - return AVCOL_RANGE_JPEG; - } - return AVCOL_RANGE_UNSPECIFIED; -} - -static void format_init(zimg_image_format *format, AVFrame *frame, const AVPixFmtDescriptor *desc, - int colorspace, int primaries, int transfer, int range, int location) -{ - format->width = frame->width; - format->height = frame->height; - format->subsample_w = desc->log2_chroma_w; - format->subsample_h = desc->log2_chroma_h; - format->depth = desc->comp[0].depth; - format->pixel_type = (desc->flags & AV_PIX_FMT_FLAG_FLOAT) ? ZIMG_PIXEL_FLOAT : desc->comp[0].depth > 8 ? ZIMG_PIXEL_WORD : ZIMG_PIXEL_BYTE; - format->color_family = (desc->flags & AV_PIX_FMT_FLAG_RGB) ? ZIMG_COLOR_RGB : ZIMG_COLOR_YUV; - format->matrix_coefficients = (desc->flags & AV_PIX_FMT_FLAG_RGB) ? ZIMG_MATRIX_RGB : colorspace == -1 ? convert_matrix(frame->colorspace) : colorspace; - format->color_primaries = primaries == -1 ? convert_primaries(frame->color_primaries) : primaries; - format->transfer_characteristics = transfer == - 1 ? convert_trc(frame->color_trc) : transfer; - format->pixel_range = (desc->flags & AV_PIX_FMT_FLAG_RGB) ? ZIMG_RANGE_FULL : range == -1 ? convert_range(frame->color_range) : range; - format->chroma_location = location == -1 ? convert_chroma_location(frame->chroma_location) : location; -} - -static int graph_build(zimg_filter_graph **graph, zimg_graph_builder_params *params, - zimg_image_format *src_format, zimg_image_format *dst_format, - void **tmp, size_t *tmp_size) -{ - int ret; - size_t size; - - zimg_filter_graph_free(*graph); - *graph = zimg_filter_graph_build(src_format, dst_format, params); - if (!*graph) - return print_zimg_error(NULL); - - ret = zimg_filter_graph_get_tmp_size(*graph, &size); - if (ret) - return print_zimg_error(NULL); - - if (size > *tmp_size) { - av_freep(tmp); - *tmp = av_malloc(size); - if (!*tmp) - return AVERROR(ENOMEM); - - *tmp_size = size; - } - - return 0; -} static int realign_frame(const AVPixFmtDescriptor *desc, AVFrame **frame) { AVFrame *aligned = NULL; - int ret = 0, plane; + int ret = 0, plane, planes; /* Realign any unaligned input frame. */ - for (plane = 0; plane < 3; plane++) { + planes = av_pix_fmt_count_planes(desc->nb_components); + for (plane = 0; plane < planes; plane++) { int p = desc->comp[plane].plane; if ((uintptr_t)(*frame)->data[p] % ZIMG_ALIGNMENT || (*frame)->linesize[p] % ZIMG_ALIGNMENT) { if (!(aligned = av_frame_alloc())) { @@ -554,6 +670,7 @@ fail: return ret; } + static void update_output_color_information(ZScaleContext *s, AVFrame *frame) { if (s->colorspace != -1) @@ -572,20 +689,61 @@ static void update_output_color_information(ZScaleContext *s, AVFrame *frame) frame->chroma_location = (int)s->dst_format.chroma_location + 1; } +static int filter_slice(AVFilterContext *ctx, void *data, int job_nr, int n_jobs) +{ + ThreadData *td = data; + int ret = 0; + int p; + int out_sampl; + ZScaleContext *s = ctx->priv; + + zimg_image_buffer_const src_buf = { ZIMG_API_VERSION }; + zimg_image_buffer dst_buf = { ZIMG_API_VERSION }; + int dst_tile_height = ((unsigned int)(td->out->height / n_jobs)) & 0xfffffffe; + out_sampl = FFMAX3(td->out->linesize[0], td->out->linesize[1], td->out->linesize[2]); + for (int i = 0; i < 3; i++) { + p = td->desc->comp[i].plane; + + src_buf.plane[i].data = td->in->data[p]; + src_buf.plane[i].stride = td->in->linesize[p]; + src_buf.plane[i].mask = -1; + + p = td->odesc->comp[i].plane; + dst_buf.plane[i].data = td->out->data[p] + td->out->linesize[p] * dst_tile_height * td->out->linesize[p] / out_sampl * job_nr; + dst_buf.plane[i].stride = td->out->linesize[p]; + dst_buf.plane[i].mask = -1; + } + ret = zimg_filter_graph_process(s->graph[job_nr], &src_buf, &dst_buf, s->tmp[job_nr], 0, 0, 0, 0); + if (ret) + return print_zimg_error(ctx); + + if (td->desc->flags & AV_PIX_FMT_FLAG_ALPHA && td->odesc->flags & AV_PIX_FMT_FLAG_ALPHA) { + src_buf.plane[0].data = td->in->data[3]; + src_buf.plane[0].stride = td->in->linesize[3]; + src_buf.plane[0].mask = -1; + + dst_buf.plane[0].data = td->out->data[3] + td->out->linesize[3] * dst_tile_height * job_nr; + dst_buf.plane[0].stride = td->out->linesize[3]; + dst_buf.plane[0].mask = -1; + + ret = zimg_filter_graph_process(s->alpha_graph[job_nr], &src_buf, &dst_buf, s->tmp[job_nr], 0, 0, 0, 0); + if (ret) + return print_zimg_error(ctx); + } + return 0; +} + static int filter_frame(AVFilterLink *link, AVFrame *in) { - ZScaleContext *s = link->dst->priv; - AVFilterLink *outlink = link->dst->outputs[0]; + AVFilterContext *ctx = link->dst; + ZScaleContext *s = ctx->priv; + AVFilterLink *outlink = ctx->outputs[0]; const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(link->format); const AVPixFmtDescriptor *odesc = av_pix_fmt_desc_get(outlink->format); - zimg_image_buffer_const src_buf = { ZIMG_API_VERSION }; - zimg_image_buffer dst_buf = { ZIMG_API_VERSION }; char buf[32]; - int ret = 0, plane; + int ret = 0; AVFrame *out = NULL; - - if ((ret = realign_frame(desc, &in)) < 0) - goto fail; + ThreadData td; if (!(out = ff_get_video_buffer(outlink, outlink->w, outlink->h))) { ret = AVERROR(ENOMEM); @@ -596,19 +754,24 @@ static int filter_frame(AVFilterLink *link, AVFrame *in) out->width = outlink->w; out->height = outlink->h; - if( in->width != link->w - || in->height != link->h - || in->format != link->format - || s->in_colorspace != in->colorspace - || s->in_trc != in->color_trc - || s->in_primaries != in->color_primaries - || s->in_range != in->color_range - || s->out_colorspace != out->colorspace - || s->out_trc != out->color_trc - || s->out_primaries != out->color_primaries - || s->out_range != out->color_range - || s->in_chromal != in->chroma_location - || s->out_chromal != out->chroma_location) { + //we need to use this filter if something is different for an input and output only + //otherwise - just copy the input frame to the output + if ((link->w != outlink->w) || + (link->h != outlink->h) || + (s->src_format.chroma_location != s->dst_format.chroma_location)|| + (s->src_format.color_family !=s->dst_format.color_family)|| + (s->src_format.color_primaries !=s->dst_format.color_primaries)|| + (s->src_format.depth !=s->dst_format.depth)|| + (s->src_format.matrix_coefficients !=s->dst_format.matrix_coefficients)|| + (s->src_format.field_parity !=s->dst_format.field_parity)|| + (s->src_format.pixel_range !=s->dst_format.pixel_range)|| + (s->src_format.pixel_type !=s->dst_format.pixel_type)|| + (s->src_format.transfer_characteristics !=s->dst_format.transfer_characteristics) + ){ + + if ((ret = realign_frame(desc, &in)) < 0) + goto fail; + snprintf(buf, sizeof(buf)-1, "%d", outlink->w); av_opt_set(s, "w", buf, 0); snprintf(buf, sizeof(buf)-1, "%d", outlink->h); @@ -618,128 +781,50 @@ static int filter_frame(AVFilterLink *link, AVFrame *in) link->dst->inputs[0]->w = in->width; link->dst->inputs[0]->h = in->height; - if ((ret = config_props(outlink)) < 0) - goto fail; - - zimg_image_format_default(&s->src_format, ZIMG_API_VERSION); - zimg_image_format_default(&s->dst_format, ZIMG_API_VERSION); - zimg_graph_builder_params_default(&s->params, ZIMG_API_VERSION); - - s->params.dither_type = s->dither; - s->params.cpu_type = ZIMG_CPU_AUTO; - s->params.resample_filter = s->filter; - s->params.resample_filter_uv = s->filter; - s->params.nominal_peak_luminance = s->nominal_peak_luminance; - s->params.allow_approximate_gamma = s->approximate_gamma; - s->params.filter_param_a = s->params.filter_param_a_uv = s->param_a; - s->params.filter_param_b = s->params.filter_param_b_uv = s->param_b; - - format_init(&s->src_format, in, desc, s->colorspace_in, - s->primaries_in, s->trc_in, s->range_in, s->chromal_in); - format_init(&s->dst_format, out, odesc, s->colorspace, - s->primaries, s->trc, s->range, s->chromal); - update_output_color_information(s, out); - - ret = graph_build(&s->graph, &s->params, &s->src_format, &s->dst_format, - &s->tmp, &s->tmp_size); - if (ret < 0) - goto fail; - - s->in_colorspace = in->colorspace; - s->in_trc = in->color_trc; - s->in_primaries = in->color_primaries; - s->in_range = in->color_range; + + s->in_colorspace = in->colorspace; + s->in_trc = in->color_trc; + s->in_primaries = in->color_primaries; + s->in_range = in->color_range; s->out_colorspace = out->colorspace; - s->out_trc = out->color_trc; - s->out_primaries = out->color_primaries; - s->out_range = out->color_range; - - if (desc->flags & AV_PIX_FMT_FLAG_ALPHA && odesc->flags & AV_PIX_FMT_FLAG_ALPHA) { - zimg_image_format_default(&s->alpha_src_format, ZIMG_API_VERSION); - zimg_image_format_default(&s->alpha_dst_format, ZIMG_API_VERSION); - zimg_graph_builder_params_default(&s->alpha_params, ZIMG_API_VERSION); - - s->alpha_params.dither_type = s->dither; - s->alpha_params.cpu_type = ZIMG_CPU_AUTO; - s->alpha_params.resample_filter = s->filter; - - s->alpha_src_format.width = in->width; - s->alpha_src_format.height = in->height; - s->alpha_src_format.depth = desc->comp[0].depth; - s->alpha_src_format.pixel_type = (desc->flags & AV_PIX_FMT_FLAG_FLOAT) ? ZIMG_PIXEL_FLOAT : desc->comp[0].depth > 8 ? ZIMG_PIXEL_WORD : ZIMG_PIXEL_BYTE; - s->alpha_src_format.color_family = ZIMG_COLOR_GREY; - - s->alpha_dst_format.width = out->width; - s->alpha_dst_format.height = out->height; - s->alpha_dst_format.depth = odesc->comp[0].depth; - s->alpha_dst_format.pixel_type = (odesc->flags & AV_PIX_FMT_FLAG_FLOAT) ? ZIMG_PIXEL_FLOAT : odesc->comp[0].depth > 8 ? ZIMG_PIXEL_WORD : ZIMG_PIXEL_BYTE; - s->alpha_dst_format.color_family = ZIMG_COLOR_GREY; - - zimg_filter_graph_free(s->alpha_graph); - s->alpha_graph = zimg_filter_graph_build(&s->alpha_src_format, &s->alpha_dst_format, &s->alpha_params); - if (!s->alpha_graph) { - ret = print_zimg_error(link->dst); - goto fail; - } - } - } - - update_output_color_information(s, out); - - av_reduce(&out->sample_aspect_ratio.num, &out->sample_aspect_ratio.den, - (int64_t)in->sample_aspect_ratio.num * outlink->h * link->w, - (int64_t)in->sample_aspect_ratio.den * outlink->w * link->h, - INT_MAX); - - for (plane = 0; plane < 3; plane++) { - int p = desc->comp[plane].plane; - src_buf.plane[plane].data = in->data[p]; - src_buf.plane[plane].stride = in->linesize[p]; - src_buf.plane[plane].mask = -1; - - p = odesc->comp[plane].plane; - dst_buf.plane[plane].data = out->data[p]; - dst_buf.plane[plane].stride = out->linesize[p]; - dst_buf.plane[plane].mask = -1; - } - - ret = zimg_filter_graph_process(s->graph, &src_buf, &dst_buf, s->tmp, 0, 0, 0, 0); - if (ret) { - ret = print_zimg_error(link->dst); - goto fail; - } - - if (desc->flags & AV_PIX_FMT_FLAG_ALPHA && odesc->flags & AV_PIX_FMT_FLAG_ALPHA) { - src_buf.plane[0].data = in->data[3]; - src_buf.plane[0].stride = in->linesize[3]; - src_buf.plane[0].mask = -1; - - dst_buf.plane[0].data = out->data[3]; - dst_buf.plane[0].stride = out->linesize[3]; - dst_buf.plane[0].mask = -1; - - ret = zimg_filter_graph_process(s->alpha_graph, &src_buf, &dst_buf, s->tmp, 0, 0, 0, 0); - if (ret) { - ret = print_zimg_error(link->dst); - goto fail; - } - } else if (odesc->flags & AV_PIX_FMT_FLAG_ALPHA) { - int x, y; - - if (odesc->flags & AV_PIX_FMT_FLAG_FLOAT) { - for (y = 0; y < out->height; y++) { - for (x = 0; x < out->width; x++) { - AV_WN32(out->data[3] + x * odesc->comp[3].step + y * out->linesize[3], - av_float2int(1.0f)); + s->out_trc = out->color_trc; + s->out_primaries = out->color_primaries; + s->out_range = out->color_range; + + av_reduce(&out->sample_aspect_ratio.num, &out->sample_aspect_ratio.den, + (int64_t)in->sample_aspect_ratio.num * outlink->h * link->w, + (int64_t)in->sample_aspect_ratio.den * outlink->w * link->h, + INT_MAX); + + td.in = in; + td.out = out; + td.desc = desc; + td.odesc = odesc; + + ff_filter_execute(ctx, filter_slice, &td, NULL, s->nb_threads); + + if ((!(desc->flags & AV_PIX_FMT_FLAG_ALPHA)) && (odesc->flags & AV_PIX_FMT_FLAG_ALPHA) ){ + int x, y; + if (odesc->flags & AV_PIX_FMT_FLAG_FLOAT) { + for (y = 0; y < out->height; y++) { + for (x = 0; x < out->width; x++) { + AV_WN32(out->data[3] + x * odesc->comp[3].step + y * out->linesize[3], + av_float2int(1.0f)); + } } + } else { + for (y = 0; y < outlink->h; y++) + memset(out->data[3] + y * out->linesize[3], 0xff, outlink->w); } - } else { - for (y = 0; y < outlink->h; y++) - memset(out->data[3] + y * out->linesize[3], 0xff, outlink->w); } } - + else { + /*no need for any filtering */ + ret = av_frame_copy(out, in); + if (ret < 0) + return ret; + } fail: av_frame_free(&in); if (ret) { @@ -753,11 +838,12 @@ fail: static av_cold void uninit(AVFilterContext *ctx) { ZScaleContext *s = ctx->priv; - - zimg_filter_graph_free(s->graph); - zimg_filter_graph_free(s->alpha_graph); - av_freep(&s->tmp); - s->tmp_size = 0; + int i; + for (i = 0; i < s->nb_threads; i++) { + if (s->tmp[i]) av_freep(&s->tmp[i]); + if (s->graph[i]) zimg_filter_graph_free(s->graph[i]); + if (s->alpha_graph[i]) zimg_filter_graph_free(s->alpha_graph[i]); + } } static int process_command(AVFilterContext *ctx, const char *cmd, const char *args, @@ -941,4 +1027,5 @@ const AVFilter ff_vf_zscale = { FILTER_OUTPUTS(avfilter_vf_zscale_outputs), FILTER_QUERY_FUNC(query_formats), .process_command = process_command, + .flags = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC | AVFILTER_FLAG_SLICE_THREADS, };