From patchwork Sat Jul 4 13:17:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: hanishkvc X-Patchwork-Id: 20801 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 34AF44498AB for ; Sat, 4 Jul 2020 16:19:33 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 19C6E68B4B2; Sat, 4 Jul 2020 16:19:33 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CDD0268B464 for ; Sat, 4 Jul 2020 16:19:25 +0300 (EEST) Received: by mail-pj1-f52.google.com with SMTP id cm21so5589380pjb.3 for ; Sat, 04 Jul 2020 06:19:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=oOCrAqbZlwMEKiTEhmO9Mtxfesr4Xe5LSV9XP39BQhg=; b=b2MtaHY5fhXxAFkQyEheYlviHDgps7NA3FxKtCRblu1GagF3zfjESsl+PiilEmfg1s tilDj3+xqRLcasDmn/jO5DNNs4+BzOSOLhctM9jU2ccEqTZKLTIqkm4Ss6DjwhP7t/ia pJWAcdBCV21d04TQ9oWa1y9r/8riiiaWpy3da7gyoeJy5A5eqmN+ujGtoVW/kWH7t2jt CRHiD+YrJOEyeIt964NX+PRCrRpqLmJjope+JNyunmGUiJhtKUIJmp34Se+bk5a43gz7 vsSKABHCBNRU9Z9e8GsaqtNl/GfKRKGAeN799aT23EGxQg4MESR8hAo2ciPyEgFT5sTk ydbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oOCrAqbZlwMEKiTEhmO9Mtxfesr4Xe5LSV9XP39BQhg=; b=W5AH0zVYAZxSHIAkA9Bt7P26DvJ2vv2qHwSkcSxCxOwERNEBSfkisf24blDstanNVd U3T7K7/r3EasiGOW2jeXU061i7P7IdqRw2Am2FQbDe6andqr0vPndSafvGu+YJ9rKxWQ 16psHf6eBySz4l4YvMxWFy86oiXepkwo8Y1u9NqgphWEGHbcHk6QDvVTyHzR57CZlxy4 Z674RYXCuQ0s8O6gFMOpH+lQJ+ovEMG7dBSrK2ucSyfid9xckEM62eeo4/z53meUC27l BleQ92E4/2PjZUX/ChqVdTh+Feh/683jMxug0Nkoq7mUV2EtHdQrWM37WgMnua052KCq cT8Q== X-Gm-Message-State: AOAM530XFJNGspQCBxtY5fhbVAYzr0Cads/zWJmpqjHY3IaIJelyycJs 0Dyb1UmS2YuLkw9dUzKJy3mX9kDU X-Google-Smtp-Source: ABdhPJyOgLVqLLxpIVLH09+gS4v/1hel8nix4xR49O8eLA3RUyhnEfIY5PdnQ+4dscGVQR69F+KdUw== X-Received: by 2002:a17:90a:2465:: with SMTP id h92mr12225292pje.26.1593868763586; Sat, 04 Jul 2020 06:19:23 -0700 (PDT) Received: from localhost.localdomain ([122.179.70.80]) by smtp.gmail.com with ESMTPSA id b191sm14657507pga.13.2020.07.04.06.19.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 Jul 2020 06:19:22 -0700 (PDT) From: hanishkvc To: ffmpeg-devel@ffmpeg.org Date: Sat, 4 Jul 2020 18:47:17 +0530 Message-Id: <20200704131717.49428-6-hanishkvc@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200704131717.49428-1-hanishkvc@gmail.com> References: <20200704131717.49428-1-hanishkvc@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v06 5/5] fbdetile videofilter cpu based framebuffer detiling X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: hanishkvc Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This adds a video filter called fbdetile, which allows the user to detile framebuffer layout into a linear layout, if required. It uses the fbtile helper routines to achieve the detiling. This is useful, if a) the user doesnt want to apply detiling when capturing some content/framebuffer which is tiled. OR b) the user already has tiled content with them. OR c) a developer wants to experiment with tiled data. --- Changelog | 1 + doc/filters.texi | 78 +++++++++++++ libavfilter/Makefile | 1 + libavfilter/allfilters.c | 1 + libavfilter/vf_fbdetile.c | 238 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 319 insertions(+) create mode 100644 libavfilter/vf_fbdetile.c diff --git a/Changelog b/Changelog index 6174770ce1..a4e098f94f 100644 --- a/Changelog +++ b/Changelog @@ -2,6 +2,7 @@ Entries are sorted chronologically from oldest to youngest within each release, releases are sorted from youngest to oldest. version : +- fbdetile cpu based framebuffer layout detiling video filter - hwdownload framebuffer layout detiling (Intel tile-x|y|yf layouts) - hwcontext_drm detiles non linear layouts, if possible - kmsgrab GetFB2 format_modifier, if user doesnt specify diff --git a/doc/filters.texi b/doc/filters.texi index c783e059c2..4ff8b7edc4 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -12229,6 +12229,84 @@ It accepts the following optional parameters: The number of the CUDA device to use @end table +@anchor{fbdetile} +@section fbdetile + +Detiles the Framebuffer tile layout into a linear layout using CPU. + +It currently supports conversion from Intel legacy tile-x and tile-y as well +as the newer Intel tile-yf layouts into a linear layout. This is useful if +one is using kmsgrab and hwdownload to capture a screen which is using one +of these non-linear layouts. + +NOTE: It also provides a generic detiling logic, which can be easily configured +to detile many different tiling schemes if required, in future. The same is +used for detiling the intel tile-yf layout. Also sample configuration to handle +intel tile-x and tile-y using generic detile logic is also shown for reference, +in the code. + +Currently it expects the data to be a 32bit RGB based pixel format. However +the logic doesnt do any pixel format conversion or so. Later will be enabling +16bit RGB data also, as the logic is transparent to it at one level. + +One could either insert this into the filter chain while capturing itself, +or else, if it is slowing things down or so, then one could instead insert +it into the filter chain during playback or transcoding or so. + +It supports the following optional parameters + +@table @option +@item type +Specify which detiling conversion to apply. The supported values are +@table @var +@item 0 +Dont do detiling. +@item 1 +Auto detect detile logic to apply (supported in vf_hwdownload, not in vf_fbdetile). +@item 2 +intel tile-x to linear conversion (the default). +@item 3 +intel tile-y to linear conversion. +@item 4 +intel tile-yf to linear conversion. +@end table +@end table + +If one wants to convert during capture itself, one could do +@example +ffmpeg -f kmsgrab -i - -vf "hwdownload,format=bgr0,fbdetile" OUTPUT +@end example + +However if one wants to convert after the tiled data has been already captured +@example +ffmpeg -i INPUT -vf "fbdetile" OUTPUT +@end example +@example +ffplay -i INPUT -vf "fbdetile" +@end example + +NOTE: While transcoding a test 1080p h264 stream, with 276 frames, below was +the average times taken by the different detile logics. +@example +rm out.mp4; time ./ffmpeg -i input.mp4 out.mp4 +rm out.mp4; time ./ffmpeg -i input.mp4 -vf fbdetile=2 out.mp4 +rm out.mp4; time ./ffmpeg -i input.mp4 -vf fbdetile=3 out.mp4 +rm out.mp4; time ./ffmpeg -i input.mp4 -vf fbdetile=4 out.mp4 +@end example +@table @option +@item with no fbdetile filter +it took ~7.28 secs, i5-8th Gen +it took ~10.1 secs, i7-7th Gen +@item with fbdetile=2 filter, Intel Tile-X +it took ~8.69 secs, i5-8th Gen +it took ~13.3 secs, i7-7th Gen +@item with fbdetile=3 filter, Intel Tile-Y +it took ~9.20 secs. i5-8th Gen +it took ~13.5 secs. i7-7th Gen +@item with fbdetile=4 filter, Intel Tile-Yf +it took ~13.8 secs. i7-7th Gen +@end table + @section hqx Apply a high-quality magnification filter designed for pixel art. This filter diff --git a/libavfilter/Makefile b/libavfilter/Makefile index 5123540653..bdb0c379ae 100644 --- a/libavfilter/Makefile +++ b/libavfilter/Makefile @@ -280,6 +280,7 @@ OBJS-$(CONFIG_HWDOWNLOAD_FILTER) += vf_hwdownload.o OBJS-$(CONFIG_HWMAP_FILTER) += vf_hwmap.o OBJS-$(CONFIG_HWUPLOAD_CUDA_FILTER) += vf_hwupload_cuda.o OBJS-$(CONFIG_HWUPLOAD_FILTER) += vf_hwupload.o +OBJS-$(CONFIG_FBDETILE_FILTER) += vf_fbdetile.o OBJS-$(CONFIG_HYSTERESIS_FILTER) += vf_hysteresis.o framesync.o OBJS-$(CONFIG_IDET_FILTER) += vf_idet.o OBJS-$(CONFIG_IL_FILTER) += vf_il.o diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c index 1183e40267..f8dceb2a88 100644 --- a/libavfilter/allfilters.c +++ b/libavfilter/allfilters.c @@ -265,6 +265,7 @@ extern AVFilter ff_vf_hwdownload; extern AVFilter ff_vf_hwmap; extern AVFilter ff_vf_hwupload; extern AVFilter ff_vf_hwupload_cuda; +extern AVFilter ff_vf_fbdetile; extern AVFilter ff_vf_hysteresis; extern AVFilter ff_vf_idet; extern AVFilter ff_vf_il; diff --git a/libavfilter/vf_fbdetile.c b/libavfilter/vf_fbdetile.c new file mode 100644 index 0000000000..bfc28da465 --- /dev/null +++ b/libavfilter/vf_fbdetile.c @@ -0,0 +1,238 @@ +/* + * Copyright (c) 2020 HanishKVC + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * Detile the Frame buffer's tile layout using the cpu + * Currently it supports detiling of following layouts + * legacy Intel Tile-X + * legacy Intel Tile-Y + * newer Intel Tile-Yf + * More tiling layouts can be easily supported by adding configuration data + * for the generic detile logic, wrt the required tiling schemes. + * + */ + +/* + * ToThink|Check: Optimisations + * + * Does gcc setting used by ffmpeg allows memcpy | stringops inlining, + * loop unrolling, better native matching instructions, additional + * optimisations, ... + * + * Does gcc map to optimal memcpy logic, based on the situation it is + * used in i.e like + * based on size of transfer, alignment, architecture, etc + * a suitable combination of inlining and or rep movsb and or + * simd load/store and or unrolling and or ... + * + * If not, may be look at vector_size or intrinsics or appropriate arch + * and cpu specific inline asm or ... + * + */ + +/* + * Performance check results on i7-7500u + * TileYf, TileGX, TileGY using detile_generic_opti + * This mainly impacts TileYf, due to its deeper subtiling + * Without opti, its TSCCnt rises to aroun 11.XYM + * Run Type : Type : Seconds Max, Min : TSCCnt Min, Max + * Non filter run: : 10.11s, 09.96s : + * fbdetile=2 run: TileX : 13.45s, 13.20s : 05.95M, 06.10M + * fbdetile=3 run: TileY : 13.50s, 13.39s : 06.22M, 06.39M + * fbdetile=4 run: TileYf : 13.75s, 13.63s : 09.82M, 09.90M + * fbdetile=5 run: TileGX : 13.70s, 13.32s : 06.15M, 06.24M + * fbdetile=6 run: TileGY : 14.12s, 13.57s : 08.75M, 09.10M + */ + +#include "libavutil/avassert.h" +#include "libavutil/imgutils.h" +#include "libavutil/opt.h" +#include "libavutil/fbtile.h" +#include "avfilter.h" +#include "formats.h" +#include "internal.h" +#include "video.h" + +// Use Optimised detile_generic or the Simpler but more fine grained one +#define DETILE_GENERIC_OPTI 1 +// Enable printing of the tile walk +#undef DEBUG_FBTILE +// Print time taken by detile using performance counter +#if ARCH_X86 +#define DEBUG_PERF 1 +#else +#undef DEBUG_PERF +#endif + +#ifdef DEBUG_PERF +#include +uint64_t perfTime = 0; +int perfCnt = 0; +#endif + +typedef struct FBDetileContext { + const AVClass *class; + int width, height; + int type; +} FBDetileContext; + +#define OFFSET(x) offsetof(FBDetileContext, x) +#define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM +static const AVOption fbdetile_options[] = { + { "type", "set framebuffer tile|format_modifier conversion type", OFFSET(type), AV_OPT_TYPE_INT, {.i64=TILE_INTELX}, 0, TILE_NONE_END-1, FLAGS, "type" }, + { "None", "Dont detile", 0, AV_OPT_TYPE_CONST, {.i64=TILE_NONE}, INT_MIN, INT_MAX, FLAGS, "type" }, + { "Auto", "Auto detect tile conversion type, NotImplemented", 0, AV_OPT_TYPE_CONST, {.i64=TILE_AUTO}, INT_MIN, INT_MAX, FLAGS, "type" }, + { "intelx", "Intel Tile-X layout", 0, AV_OPT_TYPE_CONST, {.i64=TILE_INTELX}, INT_MIN, INT_MAX, FLAGS, "type" }, + { "intely", "Intel Tile-Y layout", 0, AV_OPT_TYPE_CONST, {.i64=TILE_INTELY}, INT_MIN, INT_MAX, FLAGS, "type" }, + { "intelyf", "Intel Tile-Yf layout", 0, AV_OPT_TYPE_CONST, {.i64=TILE_INTELYF}, INT_MIN, INT_MAX, FLAGS, "type" }, + { "intelgx", "Intel Tile-X layout, GenericDetile", 0, AV_OPT_TYPE_CONST, {.i64=TILE_INTELGX}, INT_MIN, INT_MAX, FLAGS, "type" }, + { "intelgy", "Intel Tile-Y layout, GenericDetile", 0, AV_OPT_TYPE_CONST, {.i64=TILE_INTELGY}, INT_MIN, INT_MAX, FLAGS, "type" }, + { NULL } +}; + +AVFILTER_DEFINE_CLASS(fbdetile); + +static av_cold int init(AVFilterContext *ctx) +{ + FBDetileContext *fbdetile = ctx->priv; + + if (fbdetile->type == TILE_NONE) { + av_log(ctx, AV_LOG_INFO, "init: Wont detile, pass through\n"); + } else if (fbdetile->type == TILE_AUTO) { + av_log(ctx, AV_LOG_WARNING, "init: Auto detile mode detect, not supported, pass through\n"); + fbdetile->type = TILE_NONE; + } else if (fbdetile->type == TILE_INTELX) { + av_log(ctx, AV_LOG_INFO, "init: Intel tile-x to linear\n"); + } else if (fbdetile->type == TILE_INTELY) { + av_log(ctx, AV_LOG_INFO, "init: Intel tile-y to linear\n"); + } else if (fbdetile->type == TILE_INTELYF) { + av_log(ctx, AV_LOG_INFO, "init: Intel tile-yf to linear\n"); + } else if (fbdetile->type == TILE_INTELGX) { + av_log(ctx, AV_LOG_INFO, "init: Intel tile-x to linear, using generic detile\n"); + } else if (fbdetile->type == TILE_INTELGY) { + av_log(ctx, AV_LOG_INFO, "init: Intel tile-y to linear, using generic detile\n"); + } else { + av_log(ctx, AV_LOG_ERROR, "init: Unknown Tile format specified, shouldnt reach here\n"); + } + fbdetile->width = 1920; + fbdetile->height = 1080; + return 0; +} + +static int query_formats(AVFilterContext *ctx) +{ + AVFilterFormats *fmts_list; + + fmts_list = ff_make_format_list(fbtilePixFormats); + if (!fmts_list) + return AVERROR(ENOMEM); + return ff_set_common_formats(ctx, fmts_list); +} + +static int config_props(AVFilterLink *inlink) +{ + AVFilterContext *ctx = inlink->dst; + FBDetileContext *fbdetile = ctx->priv; + + fbdetile->width = inlink->w; + fbdetile->height = inlink->h; + av_log(ctx, AV_LOG_INFO, "config_props: %d x %d\n", fbdetile->width, fbdetile->height); + + return 0; +} + + +static int filter_frame(AVFilterLink *inlink, AVFrame *in) +{ + AVFilterContext *ctx = inlink->dst; + FBDetileContext *fbdetile = ctx->priv; + AVFilterLink *outlink = ctx->outputs[0]; + AVFrame *out; + + if (fbdetile->type == TILE_NONE) + return ff_filter_frame(outlink, in); + + out = ff_get_video_buffer(outlink, outlink->w, outlink->h); + if (!out) { + av_frame_free(&in); + return AVERROR(ENOMEM); + } + av_frame_copy_props(out, in); + +#ifdef DEBUG_PERF + uint64_t perfStart = __rdtsc(); +#endif + + detile_this(fbdetile->type, 0, fbdetile->width, fbdetile->height, + out->data[0], out->linesize[0], + in->data[0], in->linesize[0], 4); + +#ifdef DEBUG_PERF + uint64_t perfEnd = __rdtsc(); + perfTime += (perfEnd - perfStart); + perfCnt += 1; +#endif + + av_frame_free(&in); + return ff_filter_frame(outlink, out); +} + +static av_cold void uninit(AVFilterContext *ctx) +{ +#ifdef DEBUG_PERF + if (perfCnt == 0) + perfCnt = 1; + av_log(ctx, AV_LOG_INFO, "uninit:perf: AvgTSCCnt %ld\n", perfTime/perfCnt); +#endif +} + +static const AVFilterPad fbdetile_inputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_VIDEO, + .config_props = config_props, + .filter_frame = filter_frame, + }, + { NULL } +}; + +static const AVFilterPad fbdetile_outputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_VIDEO, + }, + { NULL } +}; + +AVFilter ff_vf_fbdetile = { + .name = "fbdetile", + .description = NULL_IF_CONFIG_SMALL("Detile Framebuffer using CPU"), + .priv_size = sizeof(FBDetileContext), + .init = init, + .uninit = uninit, + .query_formats = query_formats, + .inputs = fbdetile_inputs, + .outputs = fbdetile_outputs, + .priv_class = &fbdetile_class, + .flags = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC, +}; + +// vim: set expandtab sts=4: //