From patchwork Fri May 14 08:47:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Fu, Ting" X-Patchwork-Id: 27770 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:b214:0:0:0:0:0 with SMTP id b20csp276973iof; Fri, 14 May 2021 01:57:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyEvnaIzKNTgQEBPkTo5W2S8WRcmQtPblABARJPMrfvB0nJQUqh901/LDVXBquoZEiNDV0u X-Received: by 2002:a17:906:1110:: with SMTP id h16mr31392327eja.530.1620982640066; Fri, 14 May 2021 01:57:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620982640; cv=none; d=google.com; s=arc-20160816; b=xXxnzuTi+rimLntLI7I4Oie7PQL9jx4O7PXMEyEJXKkYl8n9rbSt8pF3jxNk7pKeJU cj21nYF6lWQoLNRowAxULVI7roLTMJr5HJSpAMQyfx193SstHFSaMcZ53plESAiHfucF RYuIkQ5MS3YkVKlI2qNNsBYQjLqf/SCN0speu/uA8nOY9EMLVOweZ6Y52Lri/fgGmrDf haVFvnSm5Laj6r5GDGEEnnFocE4+OMt6gj1Ivm8K+oIQV8edd8GdNq43eKAsIy+Ek1jY 7G50AEnVKEY+QA8hwGFRMATZFhMsmUG0pusNeOVUIUsqzb3W9mRktp4ftEwYCvFtEjwq xfew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:ironport-sdr:ironport-sdr:delivered-to; bh=YG4snRR4UufA2dL4QaaRIEEBeZyV+IeGOnnGxZfYzbs=; b=YXsc0sE/CKXODCY0MhIW0rOnvhMvV0rYqGxh8Ku3TTS8Tlrb/POKlD4NXX3C6wvhxT BZii5AWM9CTbzPCaz+qtONSk4hOM9KUPztDOW6CjvL8t4aHwOXIzZSMoGDOGCsQIEgTE GJxlw7D+bXPyPcWCSMPSx/nbn+OEuptuUccoTdu0OnrAgBFnMq7lX0wxNjgDm2U0PMpT IY/jX75598AUhckb7XOYxRKygCB+GYsgheVCROP37foh8bB9NiFd4YIhV0sE5qv4nVOR IpNLOsF8VROKbvGkRRzYXRc86FB7NEH1ecdCCwZLoJ6Z4AIhUkPfGLvCdy+AiB7l1eFf 8wCA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id y21si4703620eda.281.2021.05.14.01.57.19; Fri, 14 May 2021 01:57:20 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 371AE6881AC; Fri, 14 May 2021 11:57:06 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0E2EF68809B for ; Fri, 14 May 2021 11:56:57 +0300 (EEST) IronPort-SDR: ZX4XL9PN05/wq5nL4EpbNNu/cZs1XIAdT2WWaJSfqfmSKu8TGYGeOnqFF7JsP0pX6BwetTgIjz CMrAK29SdrDw== X-IronPort-AV: E=McAfee;i="6200,9189,9983"; a="199831922" X-IronPort-AV: E=Sophos;i="5.82,299,1613462400"; d="scan'208";a="199831922" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2021 01:56:42 -0700 IronPort-SDR: GrhcOphwtvR/Plce06RrAjGwB/VUc2ZvcNdouhN2y9p7A2apqh6QKX79l9E4vxv9Aw77udC/et rdfMVaqoN/Wg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,299,1613462400"; d="scan'208";a="393561462" Received: from semmer-ubuntu.sh.intel.com ([10.239.159.83]) by orsmga006.jf.intel.com with ESMTP; 14 May 2021 01:56:42 -0700 From: Ting Fu To: ffmpeg-devel@ffmpeg.org Date: Fri, 14 May 2021 16:47:02 +0800 Message-Id: <20210514084702.21273-3-ting.fu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210514084702.21273-1-ting.fu@intel.com> References: <20210514084702.21273-1-ting.fu@intel.com> Subject: [FFmpeg-devel] [PATCH 3/3] libavfilter: vf_drawtext filter support draw text with detection bounding boxes in side_data X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: KDVoEAzbVvV4 This feature can be used with dnn detection by setting vf_drawtext's option text_source=side_data_detection_bboxes, for example: ./ffmpeg -i face.jpeg -vf dnn_detect=dnn_backend=openvino:model=face-detection-adas-0001.xml:\ input=data:output=detection_out:labels=face-detection-adas-0001.label,drawbox=box_source= side_data_detection_bboxes,drawtext=text_source=side_data_detection_bboxes:fontcolor=green:\ fontsize=40, -y face_detect.jpeg Please note, the default fontsize of vf_drawtext is 12, which may be too small to be seen clearly. Signed-off-by: Ting Fu --- doc/filters.texi | 8 ++++ libavfilter/vf_drawtext.c | 77 ++++++++++++++++++++++++++++++++++++--- 2 files changed, 79 insertions(+), 6 deletions(-) diff --git a/doc/filters.texi b/doc/filters.texi index f2ac8c4cc8..d10e6de03d 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -10788,6 +10788,14 @@ parameter @var{text}. If both @var{text} and @var{textfile} are specified, an error is thrown. +@item text_source +Text source should be set as side_data_detection_bboxes if you want to use text data in +detection bboxes of side data. + +If text source is set, @var{text} and @var{textfile} will be ignored and still use +text data in detection bboxes of side data. So please do not use this parameter +if you are not sure about the text source. + @item reload If set to 1, the @var{textfile} will be reloaded before each frame. Be sure to update it atomically, or it may be read partially, or even fail. diff --git a/libavfilter/vf_drawtext.c b/libavfilter/vf_drawtext.c index 7ea057b812..382d589e26 100644 --- a/libavfilter/vf_drawtext.c +++ b/libavfilter/vf_drawtext.c @@ -55,6 +55,7 @@ #include "libavutil/time_internal.h" #include "libavutil/tree.h" #include "libavutil/lfg.h" +#include "libavutil/detection_bbox.h" #include "avfilter.h" #include "drawutils.h" #include "formats.h" @@ -199,6 +200,8 @@ typedef struct DrawTextContext { int tc24hmax; ///< 1 if timecode is wrapped to 24 hours, 0 otherwise int reload; ///< reload text file for each frame int start_number; ///< starting frame number for n/frame_num var + char *text_source_string; ///< the string to specify text data source + enum AVFrameSideDataType text_source; #if CONFIG_LIBFRIBIDI int text_shaping; ///< 1 to shape the text before drawing it #endif @@ -246,6 +249,7 @@ static const AVOption drawtext_options[]= { { "alpha", "apply alpha while rendering", OFFSET(a_expr), AV_OPT_TYPE_STRING, { .str = "1" }, .flags = FLAGS }, {"fix_bounds", "check and fix text coords to avoid clipping", OFFSET(fix_bounds), AV_OPT_TYPE_BOOL, {.i64=0}, 0, 1, FLAGS}, {"start_number", "start frame number for n/frame_num variable", OFFSET(start_number), AV_OPT_TYPE_INT, {.i64=0}, 0, INT_MAX, FLAGS}, + {"text_source", "the source of text", OFFSET(text_source_string), AV_OPT_TYPE_STRING, {.str=NULL}, 0, 1, FLAGS }, #if CONFIG_LIBFRIBIDI {"text_shaping", "attempt to shape text before drawing", OFFSET(text_shaping), AV_OPT_TYPE_BOOL, {.i64=1}, 0, 1, FLAGS}, @@ -690,6 +694,16 @@ out: } #endif +static enum AVFrameSideDataType text_source_string_parse(const char *text_source_string) +{ + av_assert0(text_source_string); + if (!strcmp(text_source_string, "side_data_detection_bboxes")) { + return AV_FRAME_DATA_DETECTION_BBOXES; + } else { + return AVERROR(EINVAL); + } +} + static av_cold int init(AVFilterContext *ctx) { int err; @@ -731,9 +745,28 @@ static av_cold int init(AVFilterContext *ctx) s->text = av_strdup(""); } + if (s->text_source_string) { + s->text_source = text_source_string_parse(s->text_source_string); + if ((int)s->text_source < 0) { + av_log(ctx, AV_LOG_ERROR, "Error text source: %s\n", s->text_source_string); + return AVERROR(EINVAL); + } + } + + if (s->text_source == AV_FRAME_DATA_DETECTION_BBOXES) { + if (s->text) { + av_log(ctx, AV_LOG_WARNING, "Multiple texts provided, will use text_source only\n"); + av_free(s->text); + } + s->text = av_mallocz(AV_DETECTION_BBOX_LABEL_NAME_MAX_SIZE * + (AV_NUM_DETECTION_BBOX_CLASSIFY + 1)); + if (!s->text) + return AVERROR(ENOMEM); + } + if (!s->text) { av_log(ctx, AV_LOG_ERROR, - "Either text, a valid file or a timecode must be provided\n"); + "Either text, a valid file, a timecode or text source must be provided\n"); return AVERROR(EINVAL); } @@ -1440,10 +1473,15 @@ continue_on_invalid2: s->var_values[VAR_LINE_H] = s->var_values[VAR_LH] = s->max_glyph_h; - s->x = s->var_values[VAR_X] = av_expr_eval(s->x_pexpr, s->var_values, &s->prng); - s->y = s->var_values[VAR_Y] = av_expr_eval(s->y_pexpr, s->var_values, &s->prng); - /* It is necessary if x is expressed from y */ - s->x = s->var_values[VAR_X] = av_expr_eval(s->x_pexpr, s->var_values, &s->prng); + if (s->text_source == AV_FRAME_DATA_DETECTION_BBOXES) { + s->var_values[VAR_X] = s->x; + s->var_values[VAR_Y] = s->y; + } else { + s->x = s->var_values[VAR_X] = av_expr_eval(s->x_pexpr, s->var_values, &s->prng); + s->y = s->var_values[VAR_Y] = av_expr_eval(s->y_pexpr, s->var_values, &s->prng); + /* It is necessary if x is expressed from y */ + s->x = s->var_values[VAR_X] = av_expr_eval(s->x_pexpr, s->var_values, &s->prng); + } update_alpha(s); update_color_with_alpha(s, &fontcolor , s->fontcolor ); @@ -1511,6 +1549,21 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *frame) AVFilterLink *outlink = ctx->outputs[0]; DrawTextContext *s = ctx->priv; int ret; + const AVDetectionBBoxHeader *header = NULL; + const AVDetectionBBox *bbox; + AVFrameSideData *sd; + int loop = 1; + + if (s->text_source == AV_FRAME_DATA_DETECTION_BBOXES && sd) { + sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES); + if (sd) { + header = (AVDetectionBBoxHeader *)sd->data; + loop = header->nb_bboxes; + } else { + av_log(s, AV_LOG_WARNING, "No detection bboxes.\n"); + return ff_filter_frame(outlink, frame); + } + } if (s->reload) { if ((ret = load_textfile(ctx)) < 0) { @@ -1536,7 +1589,19 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *frame) s->var_values[VAR_PKT_SIZE] = frame->pkt_size; s->metadata = frame->metadata; - draw_text(ctx, frame, frame->width, frame->height); + for (int i = 0; i < loop; i++) { + if (header) { + bbox = av_get_detection_bbox(header, i); + strcpy(s->text, bbox->detect_label); + for (int j = 0; j < bbox->classify_count; j++) { + strcat(s->text, ", "); + strcat(s->text, bbox->classify_labels[j]); + } + s->x = bbox->x; + s->y = bbox->y - s->fontsize; + } + draw_text(ctx, frame, frame->width, frame->height); + } av_log(ctx, AV_LOG_DEBUG, "n:%d t:%f text_w:%d text_h:%d x:%d y:%d\n", (int)s->var_values[VAR_N], s->var_values[VAR_T],