From patchwork Thu May  6 08:46:10 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Fu, Ting" <ting.fu@intel.com>
X-Patchwork-Id: 27612
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a6b:6109:0:0:0:0:0 with SMTP id v9csp1098450iob;
        Thu, 6 May 2021 01:56:43 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJynLmxeyRWoQD6G4UkBOsQDobvJUJN80wstG6DgysOmV5fX5jjLURTLviijvvI92QW2PPSb
X-Received: by 2002:a05:6402:1a2f:: with SMTP id
 be15mr3840292edb.207.1620291403207;
        Thu, 06 May 2021 01:56:43 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1620291403; cv=none;
        d=google.com; s=arc-20160816;
        b=jSh9Kh/PsWBUWNqo87HcBSXo3UabOruxyyQVxiTgnJEPunpj3Ej1mPuDmYN392sgIj
         rpTQezsbIZQTU9eQv64Ef2gEkAe+pe6ZXufGOADmsEuPOCZJSLNpSMImUgzUK5wOcJ+Q
         4/KlfDgpGt/ayDhHMOZJdPg/XYzLyappWygHI4YhaVcjpuvowN/TVPTgBZfs5sEYis5W
         FQGkhzYc5ZnIHly1gwjVrXmGGTwW2R9cl8/OparFOJVch1MarrgCZZbBGuZpVhz8For+
         +YyXwVxq0LIoWdl8g79RJ/qKaSEoh3qomlQNXsB5Ad2hmwO4u9/B6cTr7v2JgROpFt9F
         7f7A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:mime-version:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:references:in-reply-to:message-id:date
         :to:from:ironport-sdr:ironport-sdr:delivered-to;
        bh=JNCtRzTD5qlR+ulMJNMdrT9v2ETcj0LbWb9wpqQ7Frg=;
        b=zcbmd4sewi489PIy4XZ72FY6iDA2VWy77928cx7y1BdbQd7CPi2712ffVfOwBAc8Gv
         Pc9JWSpLZZ85CEqWWbZEpji8QV4jV3f8ATplPHl7v7nK4U95UQO2lqxGW9BvM0X8rZac
         kFb7KVLz5dSqzSA+wC4xAbzgkYR+3G0cUGeLXbLLQ1gAj4dWuEe6F7JH9rdlggMvK4Gc
         7vVhBQ7aEkxgsMr+r8oTNPU4JnApmNelQEdrAwm5KgytQtTQxLbJ5U1B10RAcGh92fxT
         p6wruFRjDaiT3jp4y6GkloAJ6+TafeBQJYFo8yrN+MQlpLwYoMxTTdGr1RjXTsvdoKwy
         iWLw==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id l7si2248952ejk.583.2021.05.06.01.56.42;
        Thu, 06 May 2021 01:56:43 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6E5E06808E6;
	Thu,  6 May 2021 11:56:19 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8B6506807AA
 for <ffmpeg-devel@ffmpeg.org>; Thu,  6 May 2021 11:56:11 +0300 (EEST)
IronPort-SDR: 
 01DgP0q8a2C93okCdM2t0abUFpKpnDaHwq362mmF6Tz7GVXVSIA+91SoHCgzzVnKdiY9QLFyNb
 8LFSFlOjKFug==
X-IronPort-AV: E=McAfee;i="6200,9189,9975"; a="177977607"
X-IronPort-AV: E=Sophos;i="5.82,277,1613462400"; d="scan'208";a="177977607"
Received: from orsmga005.jf.intel.com ([10.7.209.41])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 06 May 2021 01:55:55 -0700
IronPort-SDR: 
 TzaRl45vcFV120WM2aueGRvDUKFIPLTGTv1Y3fEwL+kVQGvp3F5ZQ6ys5WxqlA9zhkb8bRk/a7
 Muqg2abfvRsQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.82,277,1613462400"; d="scan'208";a="607740318"
Received: from semmer-ubuntu.sh.intel.com ([10.239.159.83])
 by orsmga005.jf.intel.com with ESMTP; 06 May 2021 01:55:54 -0700
From: Ting Fu <ting.fu@intel.com>
To: ffmpeg-devel@ffmpeg.org
Date: Thu,  6 May 2021 16:46:10 +0800
Message-Id: <20210506084610.23487-4-ting.fu@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20210506084610.23487-1-ting.fu@intel.com>
References: <20210506084610.23487-1-ting.fu@intel.com>
Subject: [FFmpeg-devel] [PATCH V2 4/4] dnn/vf_dnn_detect: add tensorflow
 output parse support
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: 53ztXwTksQzZ

Testing model is tensorflow offical model in github repo, please refer
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md
to download the detect model as you need.
For example, local testing was carried on with 'ssd_mobilenet_v2_coco_2018_03_29.tar.gz', and
used one image of dog in
https://github.com/tensorflow/models/blob/master/research/object_detection/test_images/image1.jpg

Testing command is:
./ffmpeg -i image1.jpg -vf dnn_detect=dnn_backend=tensorflow:input=image_tensor:output=\
"num_detections&detection_scores&detection_classes&detection_boxes":model=ssd_mobilenet_v2_coco.pb,\
showinfo -f null -

We will see the result similar as below:
[Parsed_showinfo_1 @ 0x33e65f0]   side data - detection bounding boxes:
[Parsed_showinfo_1 @ 0x33e65f0] source: ssd_mobilenet_v2_coco.pb
[Parsed_showinfo_1 @ 0x33e65f0] index: 0,       region: (382, 60) -> (1005, 593), label: 18, confidence: 9834/10000.
[Parsed_showinfo_1 @ 0x33e65f0] index: 1,       region: (12, 8) -> (328, 549), label: 18, confidence: 8555/10000.
[Parsed_showinfo_1 @ 0x33e65f0] index: 2,       region: (293, 7) -> (682, 458), label: 1, confidence: 8033/10000.
[Parsed_showinfo_1 @ 0x33e65f0] index: 3,       region: (342, 0) -> (690, 325), label: 1, confidence: 5878/10000.

There are two boxes of dog with cores 94.05% & 93.45% and two boxes of person with scores 80.33% & 58.78%.

Signed-off-by: Ting Fu <ting.fu@intel.com>
---
 libavfilter/vf_dnn_detect.c | 95 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 94 insertions(+), 1 deletion(-)

diff --git a/libavfilter/vf_dnn_detect.c b/libavfilter/vf_dnn_detect.c
index 7d39acb653..818b53a052 100644
--- a/libavfilter/vf_dnn_detect.c
+++ b/libavfilter/vf_dnn_detect.c
@@ -48,6 +48,9 @@ typedef struct DnnDetectContext {
 #define FLAGS AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM
 static const AVOption dnn_detect_options[] = {
     { "dnn_backend", "DNN backend",                OFFSET(backend_type),     AV_OPT_TYPE_INT,       { .i64 = 2 },    INT_MIN, INT_MAX, FLAGS, "backend" },
+#if (CONFIG_LIBTENSORFLOW == 1)
+    { "tensorflow",  "tensorflow backend flag",    0,                        AV_OPT_TYPE_CONST,     { .i64 = 1 },    0, 0, FLAGS, "backend" },
+#endif
 #if (CONFIG_LIBOPENVINO == 1)
     { "openvino",    "openvino backend flag",      0,                        AV_OPT_TYPE_CONST,     { .i64 = 2 },    0, 0, FLAGS, "backend" },
 #endif
@@ -59,7 +62,7 @@ static const AVOption dnn_detect_options[] = {
 
 AVFILTER_DEFINE_CLASS(dnn_detect);
 
-static int dnn_detect_post_proc(AVFrame *frame, DNNData *output, uint32_t nb, AVFilterContext *filter_ctx)
+static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx)
 {
     DnnDetectContext *ctx = filter_ctx->priv;
     float conf_threshold = ctx->confidence;
@@ -136,6 +139,96 @@ static int dnn_detect_post_proc(AVFrame *frame, DNNData *output, uint32_t nb, AV
     return 0;
 }
 
+static int dnn_detect_post_proc_tf(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx)
+{
+    DnnDetectContext *ctx = filter_ctx->priv;
+    int proposal_count;
+    float conf_threshold = ctx->confidence;
+    float *conf, *position, *label_id, x0, y0, x1, y1;
+    int nb_bboxes = 0;
+    AVFrameSideData *sd;
+    AVDetectionBBox *bbox;
+    AVDetectionBBoxHeader *header;
+
+    proposal_count = *(float *)(output[0].data);
+    conf           = output[1].data;
+    position       = output[3].data;
+    label_id       = output[2].data;
+
+    sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES);
+    if (sd) {
+        av_log(filter_ctx, AV_LOG_ERROR, "already have dnn bounding boxes in side data.\n");
+        return -1;
+    }
+
+    for (int i = 0; i < proposal_count; ++i) {
+        if (conf[i] < conf_threshold)
+            continue;
+        nb_bboxes++;
+    }
+
+    if (nb_bboxes == 0) {
+        av_log(filter_ctx, AV_LOG_VERBOSE, "nothing detected in this frame.\n");
+        return 0;
+    }
+
+    header = av_detection_bbox_create_side_data(frame, nb_bboxes);
+    if (!header) {
+        av_log(filter_ctx, AV_LOG_ERROR, "failed to create side data with %d bounding boxes\n", nb_bboxes);
+        return -1;
+    }
+
+    av_strlcpy(header->source, ctx->dnnctx.model_filename, sizeof(header->source));
+
+    for (int i = 0; i < proposal_count; ++i) {
+        y0 = position[i * 4];
+        x0 = position[i * 4 + 1];
+        y1 = position[i * 4 + 2];
+        x1 = position[i * 4 + 3];
+
+        bbox = av_get_detection_bbox(header, i);
+
+        if (conf[i] < conf_threshold) {
+            continue;
+        }
+
+        bbox->x = (int)(x0 * frame->width);
+        bbox->w = (int)(x1 * frame->width) - bbox->x;
+        bbox->y = (int)(y0 * frame->height);
+        bbox->h = (int)(y1 * frame->height) - bbox->y;
+
+        bbox->detect_confidence = av_make_q((int)(conf[i] * 10000), 10000);
+        bbox->classify_count = 0;
+
+        if (ctx->labels && label_id[i] < ctx->label_count) {
+            av_strlcpy(bbox->detect_label, ctx->labels[(int)label_id[i]], sizeof(bbox->detect_label));
+        } else {
+            snprintf(bbox->detect_label, sizeof(bbox->detect_label), "%d", (int)label_id[i]);
+        }
+
+        nb_bboxes--;
+        if (nb_bboxes == 0) {
+            break;
+        }
+    }
+    return 0;
+}
+
+static int dnn_detect_post_proc(AVFrame *frame, DNNData *output, uint32_t nb, AVFilterContext *filter_ctx)
+{
+    DnnDetectContext *ctx = filter_ctx->priv;
+    DnnContext *dnn_ctx = &ctx->dnnctx;
+    switch (dnn_ctx->backend_type) {
+    case DNN_OV:
+        return dnn_detect_post_proc_ov(frame, output, filter_ctx);
+    case DNN_TF:
+        return dnn_detect_post_proc_tf(frame, output, filter_ctx);
+    default:
+        avpriv_report_missing_feature(filter_ctx, "Current dnn backend do not support detect filter\n");
+        return AVERROR(EINVAL);
+    }
+}
+
 static void free_detect_labels(DnnDetectContext *ctx)
 {
     for (int i = 0; i < ctx->label_count; i++) {