From patchwork Thu May 6 08:46:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Fu, Ting" X-Patchwork-Id: 27612 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:6109:0:0:0:0:0 with SMTP id v9csp1098450iob; Thu, 6 May 2021 01:56:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJynLmxeyRWoQD6G4UkBOsQDobvJUJN80wstG6DgysOmV5fX5jjLURTLviijvvI92QW2PPSb X-Received: by 2002:a05:6402:1a2f:: with SMTP id be15mr3840292edb.207.1620291403207; Thu, 06 May 2021 01:56:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620291403; cv=none; d=google.com; s=arc-20160816; b=jSh9Kh/PsWBUWNqo87HcBSXo3UabOruxyyQVxiTgnJEPunpj3Ej1mPuDmYN392sgIj rpTQezsbIZQTU9eQv64Ef2gEkAe+pe6ZXufGOADmsEuPOCZJSLNpSMImUgzUK5wOcJ+Q 4/KlfDgpGt/ayDhHMOZJdPg/XYzLyappWygHI4YhaVcjpuvowN/TVPTgBZfs5sEYis5W FQGkhzYc5ZnIHly1gwjVrXmGGTwW2R9cl8/OparFOJVch1MarrgCZZbBGuZpVhz8For+ +YyXwVxq0LIoWdl8g79RJ/qKaSEoh3qomlQNXsB5Ad2hmwO4u9/B6cTr7v2JgROpFt9F 7f7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:ironport-sdr:ironport-sdr:delivered-to; bh=JNCtRzTD5qlR+ulMJNMdrT9v2ETcj0LbWb9wpqQ7Frg=; b=zcbmd4sewi489PIy4XZ72FY6iDA2VWy77928cx7y1BdbQd7CPi2712ffVfOwBAc8Gv Pc9JWSpLZZ85CEqWWbZEpji8QV4jV3f8ATplPHl7v7nK4U95UQO2lqxGW9BvM0X8rZac kFb7KVLz5dSqzSA+wC4xAbzgkYR+3G0cUGeLXbLLQ1gAj4dWuEe6F7JH9rdlggMvK4Gc 7vVhBQ7aEkxgsMr+r8oTNPU4JnApmNelQEdrAwm5KgytQtTQxLbJ5U1B10RAcGh92fxT p6wruFRjDaiT3jp4y6GkloAJ6+TafeBQJYFo8yrN+MQlpLwYoMxTTdGr1RjXTsvdoKwy iWLw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id l7si2248952ejk.583.2021.05.06.01.56.42; Thu, 06 May 2021 01:56:43 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6E5E06808E6; Thu, 6 May 2021 11:56:19 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8B6506807AA for ; Thu, 6 May 2021 11:56:11 +0300 (EEST) IronPort-SDR: 01DgP0q8a2C93okCdM2t0abUFpKpnDaHwq362mmF6Tz7GVXVSIA+91SoHCgzzVnKdiY9QLFyNb 8LFSFlOjKFug== X-IronPort-AV: E=McAfee;i="6200,9189,9975"; a="177977607" X-IronPort-AV: E=Sophos;i="5.82,277,1613462400"; d="scan'208";a="177977607" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 May 2021 01:55:55 -0700 IronPort-SDR: TzaRl45vcFV120WM2aueGRvDUKFIPLTGTv1Y3fEwL+kVQGvp3F5ZQ6ys5WxqlA9zhkb8bRk/a7 Muqg2abfvRsQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,277,1613462400"; d="scan'208";a="607740318" Received: from semmer-ubuntu.sh.intel.com ([10.239.159.83]) by orsmga005.jf.intel.com with ESMTP; 06 May 2021 01:55:54 -0700 From: Ting Fu To: ffmpeg-devel@ffmpeg.org Date: Thu, 6 May 2021 16:46:10 +0800 Message-Id: <20210506084610.23487-4-ting.fu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210506084610.23487-1-ting.fu@intel.com> References: <20210506084610.23487-1-ting.fu@intel.com> Subject: [FFmpeg-devel] [PATCH V2 4/4] dnn/vf_dnn_detect: add tensorflow output parse support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 53ztXwTksQzZ Testing model is tensorflow offical model in github repo, please refer https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md to download the detect model as you need. For example, local testing was carried on with 'ssd_mobilenet_v2_coco_2018_03_29.tar.gz', and used one image of dog in https://github.com/tensorflow/models/blob/master/research/object_detection/test_images/image1.jpg Testing command is: ./ffmpeg -i image1.jpg -vf dnn_detect=dnn_backend=tensorflow:input=image_tensor:output=\ "num_detections&detection_scores&detection_classes&detection_boxes":model=ssd_mobilenet_v2_coco.pb,\ showinfo -f null - We will see the result similar as below: [Parsed_showinfo_1 @ 0x33e65f0] side data - detection bounding boxes: [Parsed_showinfo_1 @ 0x33e65f0] source: ssd_mobilenet_v2_coco.pb [Parsed_showinfo_1 @ 0x33e65f0] index: 0, region: (382, 60) -> (1005, 593), label: 18, confidence: 9834/10000. [Parsed_showinfo_1 @ 0x33e65f0] index: 1, region: (12, 8) -> (328, 549), label: 18, confidence: 8555/10000. [Parsed_showinfo_1 @ 0x33e65f0] index: 2, region: (293, 7) -> (682, 458), label: 1, confidence: 8033/10000. [Parsed_showinfo_1 @ 0x33e65f0] index: 3, region: (342, 0) -> (690, 325), label: 1, confidence: 5878/10000. There are two boxes of dog with cores 94.05% & 93.45% and two boxes of person with scores 80.33% & 58.78%. Signed-off-by: Ting Fu --- libavfilter/vf_dnn_detect.c | 95 ++++++++++++++++++++++++++++++++++++- 1 file changed, 94 insertions(+), 1 deletion(-) diff --git a/libavfilter/vf_dnn_detect.c b/libavfilter/vf_dnn_detect.c index 7d39acb653..818b53a052 100644 --- a/libavfilter/vf_dnn_detect.c +++ b/libavfilter/vf_dnn_detect.c @@ -48,6 +48,9 @@ typedef struct DnnDetectContext { #define FLAGS AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM static const AVOption dnn_detect_options[] = { { "dnn_backend", "DNN backend", OFFSET(backend_type), AV_OPT_TYPE_INT, { .i64 = 2 }, INT_MIN, INT_MAX, FLAGS, "backend" }, +#if (CONFIG_LIBTENSORFLOW == 1) + { "tensorflow", "tensorflow backend flag", 0, AV_OPT_TYPE_CONST, { .i64 = 1 }, 0, 0, FLAGS, "backend" }, +#endif #if (CONFIG_LIBOPENVINO == 1) { "openvino", "openvino backend flag", 0, AV_OPT_TYPE_CONST, { .i64 = 2 }, 0, 0, FLAGS, "backend" }, #endif @@ -59,7 +62,7 @@ static const AVOption dnn_detect_options[] = { AVFILTER_DEFINE_CLASS(dnn_detect); -static int dnn_detect_post_proc(AVFrame *frame, DNNData *output, uint32_t nb, AVFilterContext *filter_ctx) +static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) { DnnDetectContext *ctx = filter_ctx->priv; float conf_threshold = ctx->confidence; @@ -136,6 +139,96 @@ static int dnn_detect_post_proc(AVFrame *frame, DNNData *output, uint32_t nb, AV return 0; } +static int dnn_detect_post_proc_tf(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) +{ + DnnDetectContext *ctx = filter_ctx->priv; + int proposal_count; + float conf_threshold = ctx->confidence; + float *conf, *position, *label_id, x0, y0, x1, y1; + int nb_bboxes = 0; + AVFrameSideData *sd; + AVDetectionBBox *bbox; + AVDetectionBBoxHeader *header; + + proposal_count = *(float *)(output[0].data); + conf = output[1].data; + position = output[3].data; + label_id = output[2].data; + + sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES); + if (sd) { + av_log(filter_ctx, AV_LOG_ERROR, "already have dnn bounding boxes in side data.\n"); + return -1; + } + + for (int i = 0; i < proposal_count; ++i) { + if (conf[i] < conf_threshold) + continue; + nb_bboxes++; + } + + if (nb_bboxes == 0) { + av_log(filter_ctx, AV_LOG_VERBOSE, "nothing detected in this frame.\n"); + return 0; + } + + header = av_detection_bbox_create_side_data(frame, nb_bboxes); + if (!header) { + av_log(filter_ctx, AV_LOG_ERROR, "failed to create side data with %d bounding boxes\n", nb_bboxes); + return -1; + } + + av_strlcpy(header->source, ctx->dnnctx.model_filename, sizeof(header->source)); + + for (int i = 0; i < proposal_count; ++i) { + y0 = position[i * 4]; + x0 = position[i * 4 + 1]; + y1 = position[i * 4 + 2]; + x1 = position[i * 4 + 3]; + + bbox = av_get_detection_bbox(header, i); + + if (conf[i] < conf_threshold) { + continue; + } + + bbox->x = (int)(x0 * frame->width); + bbox->w = (int)(x1 * frame->width) - bbox->x; + bbox->y = (int)(y0 * frame->height); + bbox->h = (int)(y1 * frame->height) - bbox->y; + + bbox->detect_confidence = av_make_q((int)(conf[i] * 10000), 10000); + bbox->classify_count = 0; + + if (ctx->labels && label_id[i] < ctx->label_count) { + av_strlcpy(bbox->detect_label, ctx->labels[(int)label_id[i]], sizeof(bbox->detect_label)); + } else { + snprintf(bbox->detect_label, sizeof(bbox->detect_label), "%d", (int)label_id[i]); + } + + nb_bboxes--; + if (nb_bboxes == 0) { + break; + } + } + return 0; +} + +static int dnn_detect_post_proc(AVFrame *frame, DNNData *output, uint32_t nb, AVFilterContext *filter_ctx) +{ + DnnDetectContext *ctx = filter_ctx->priv; + DnnContext *dnn_ctx = &ctx->dnnctx; + switch (dnn_ctx->backend_type) { + case DNN_OV: + return dnn_detect_post_proc_ov(frame, output, filter_ctx); + case DNN_TF: + return dnn_detect_post_proc_tf(frame, output, filter_ctx); + default: + avpriv_report_missing_feature(filter_ctx, "Current dnn backend do not support detect filter\n"); + return AVERROR(EINVAL); + } +} + static void free_detect_labels(DnnDetectContext *ctx) { for (int i = 0; i < ctx->label_count; i++) {